Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can this be used to look for genes? #9

Closed
marcucio opened this issue Oct 19, 2016 · 6 comments
Closed

Can this be used to look for genes? #9

marcucio opened this issue Oct 19, 2016 · 6 comments

Comments

@marcucio
Copy link

Can this be used to look for a particular gene? For example I have this gene I am looking for:

http://www.snpedia.com/index.php/KCNQ2

How would I go about writing a query to see if this gene exists?

@yocontra
Copy link
Member

@marcucio You would look up each SNP associated with the gene - there's 136 of them (found in the sidebar on the snpedia page you linked).

And when you're done, publish it as a module!

You might not need gql for this since there is no conditional logic. The genome.js format is a JS object where the key is the SNP RS ID and the value is the SNP value, so if you're trying to extract the sequence you can do:

const getKCNQ2 = (dna) =>
  dna.rs118192185 + dna.rs118192186 + (and so on)

@yocontra
Copy link
Member

yocontra commented Oct 19, 2016

I think it would be great to have a meta-module that was each gene and it's assosciated SNPs - let me know if you're interested in helping out with that.

It could be as simple as:

{
  "KCNQ2": [ array of rsids ]
}

Then you can iterate over them to extract sequences

@marcucio
Copy link
Author

Sorry I'm super new to this gene and dna area so I don't fully understand.

I agree it would be nice to just define it in an array if there are no conditions. Ideally there should be a repo somewhere defining all the genosets and genes in the format that can be used by gql. I found 20 or so genosets defined in gql but I'm guessing there isn't a repo out there defining more.

Do the SNPs associated with the gene have to be in a particular order in order to find if the gene exists? Or if I can find all the SNPs for the gene in the dna object then we know the gene exists?

@yocontra
Copy link
Member

yocontra commented Oct 19, 2016

@marcucio Yep, if you have any genosets you are passionate about you should publish them as modules! It's really easy to do.

I think for this gene, if all of the SNPs exist the gene exists. Not sure if there are certain values to look for, the 23andme page might be more helpful.

If it's an existence check you can do this:

module.exports = gql.and([
  gql.exists('rswhatever'),
  gql.exists('rswhatever'),
  ... and so on
])

or given an array of rsids

module.exports = gql.and(rsids.map((id) => gql.exists(id)))

then you can hasKCNQ2(dna) // true or false

@marcucio
Copy link
Author

Ok I understand, thanks! I will play around with it some more. I was thinking of creating 1 repo with all the SNP and gene defines, I would like to ideally check for 200+ genes so I think it would be better as 200 modules in a repo instead of 200 different repos.

I was thinking of writing a script to convert the JSON data from snpedia [http://www.snpedia.com/index.php?title=Special:Ask&offset=0&limit=500&q=%5B%5BCategory%3AIs+a+snp%5D%5D+%5B%5BIn+gene%3A%3AKCNQ2%5D%5D&p=mainlabel%3D%2Fformat%3Dtable&po=%3FMax+Magnitude%0A%3FChromosome+position%0A%3FSummary%0A] into a format we can use, maybe save to a JSON array like you previously suggested.

I was also thinking of changing the format of the SNP modules to better fit my needs. It currently looks something like this:

var gql = require('gql');

module.exports = gql.and([
    gql.or([gql.exact('rs6311', 'C'), gql.exact('rs6311', 'CT')]),
    gql.or([gql.exact('rs1328674', 'A'), gql.exact('rs1328674', 'AG')]),
    gql.or([gql.exact('rs6313', 'C'), gql.exact('rs6313', 'CT')]),
    gql.or([gql.exact('rs6314', 'G'), gql.exact('rs6314', 'AG')])
])

But I think it might be helpful to have more info, maybe define it like this (not tested btw!) so that we can programmatically get the description and what a match might mean (just brainstorming):

(function(exports){
    exports.exists = gql.and([
        gql.or([gql.exact('rs6311', 'C'), gql.exact('rs6311', 'CT')]),
        gql.or([gql.exact('rs1328674', 'A'), gql.exact('rs1328674', 'AG')]),
        gql.or([gql.exact('rs6313', 'C'), gql.exact('rs6313', 'CT')]),
        gql.or([gql.exact('rs6314', 'G'), gql.exact('rs6314', 'AG')])
    ]);

    exports.interpreter = function (exists) {
        if (exists) {
            return 'Description of what this exists means';
        }
        return 'Description of what it means that this dosen\'t exist';
    };

    exports.descrition = 'Description of SNP or gene';

})(exports);

BTW, feel free to close this if you want, you answered my initial question.

@yocontra
Copy link
Member

yocontra commented Oct 19, 2016

@marcucio Yeah, there isn't really a "spec" of what the modules should look like per se, as long as it works off the dna data it can be considered compatible with any other genome.js module - doesn't even have to use gql. LMK what you end up publishing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants