-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull mutation data from GDC #80
Comments
Looks like GDC has an API that we can use to pull in the data from directly: https://docs.gdc.cancer.gov/API/Users_Guide/Getting_Started/#getting-started We don't need to use the CDA library to bring this data in. We can write a python client/script that pulls data by interacting with the above API. |
I can work on modifying the code here: https://github.com/monarch-initiative/oncoexporter/blob/develop/src/oncoexporter/cda/cda_mutation_factory.py to bring in mutation data directly from GDC. |
great @sujaypatil96! |
My recollection from what Brian & Matt said at the hackathon was that the CRDC-H model did not have mutation information included in it yet. If you look at the Appendix A for the GDC API, I do not see any of the fields that are available in the CDA mutation endpoint. My view is that we do not need to be restricted to using CDA if it doesn't do what we want, but that we should not recreate existing capabilities unnecessarily. The CDA has some process for producing the mutation table, and it makes sense to me to at least try to understand how they did that before trying to build our own from scratch. However, if Sujay can figure out an easy way to get the mutation data directly from GDC I certainly don't have any objections to do things that way. |
Let's consider a subject/case ( My understanding is that the mutation information that we want to pull for cases from GDC is what's available under the "MOST FREQUENT SOMATIC MUTATIONS" section/table on the above webpage say. To obtain this data we would need to query the "SSM (Simple Somatic Mutation)" endpoint. The GDC mutation data can be found at |
thanks all I'd suggest that Sujay does a first pass at collecting MOST FREQUENT SOMATIC MUTATIONS using the GDC API, and then we can have a closer look. Does that sound reasonable? |
Ahh, my problem was I did not look at the "Data Analysis" page, which describes the mutation endpoints: https://docs.gdc.cancer.gov/API/Users_Guide/Data_Analysis/ |
I've experimented with pulling mutation data from GDC directly here: https://gist.github.com/sujaypatil96/5659f766abeed7adf52fb6ce771e5552 |
I was looking at the list of mutation fields to pull in (from CDA) in the oncoexporter code and saw this: https://github.com/monarch-initiative/oncoexporter/blob/develop/src/oncoexporter/cda/cda_mutation_factory.py#L18-L50 Based on that list (and the full list of fields that we can pull mutation information for here), I wrote a quick Python script to demo/illustrate how we can use the GDC API (specifically the |
There's more information available at the |
Thanks @sujaypatil96! We'll take a look hopefully today |
cc: @pnrobinson |
@ielis could you have a go at incorporating Sujay's code into oncoexporter? I think we just need to put Sujay's code in cda_mutation_factory and also write a bit of code to translate the mutation JSON into phenopacket items - glad to hack on this with you |
@justaddcoffee @sujaypatil96 The class is, however, not hooked up to the rest of the framework yet. Unfortunately, I cannot work on that this week, I'm taking 3 days off starting with Wed. Do you guys think you can look into this? Probably use it instead of the |
great @ielis ! thanks @sujaypatil96 do you have any time this week to hook up Daniel's code into |
@ielis thanks for working on #81 it looks really good! @justaddcoffee i'm mostly working on some high priority NMDC tasks for the rest of the week, but if I get done with them early I can take a look at hooking it up with the rest of the framework. |
@sujaypatil96 okay, no worries at all - NMDC I think should take precedence |
@justaddcoffee happy to take a look at hooking up the code from @ielis in #81 with the rest of the framework tomorrow if no one else is working on it. |
Okay great @sujaypatil96 I don't think anyone else is currently working on this |
Sounds good! I'll work on sometime today/tomorrow. |
We'd like to pull mutation data from GDC directly if possible
@sujaypatil96 pointed us to some code that might help here - see cell 16:
https://github.com/cancerDHC/example-data/tree/main/cptac2-subject-09CO022
This code also might be useful for extract things from what the code above gets from GDC
https://github.com/cancerDHC/example-data/tree/main
cc @sujaypatil96 @msierk @ielis
The text was updated successfully, but these errors were encountered: