-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull mutation data from GDC #81
Conversation
…tion Start using `GdcMutationService` and associated methods in `CdaTableImporter`
Are you all seeing this error on your end? I'm on branch (venv) ~/PythonProject/oncoexporter fetch-mutations-from-gdc $ python3 scripts/run_bone.py
Creating cached dataframe as /Users/jtr4v/PythonProject/oncoexporter/.oncoexporter_cache/Bone_mutation_df.pkl
individual dataframe: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 9890.31it/s]
merged diagnosis dataframe: 0%| | 0/670 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/jtr4v/PythonProject/oncoexporter/scripts/run_bone.py", line 11, in <module>
p = table_importer.get_ga4gh_phenopackets(Tsite, cohort_name=cohort_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/cda_table_importer.py", line 174, in get_ga4gh_phenopackets
disease_message = self._disease_factory.to_ga4gh(row)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/cda_disease_factory.py", line 105, in to_ga4gh
primary_site = self._uberon_mapper.get_ontology_term(row)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/mapper/op_uberon_mapper.py", line 56, in get_ontology_term
raise ValueError(f"Could not find UBERON term for primary_site=\"{primary_site}\"")
ValueError: Could not find UBERON term for primary_site="Bones, joints and articular cartilage of other and unspecified sites" |
Were you able to resolve the above issue you were running into @justaddcoffee? |
@sujaypatil96 yes, this problem went away. Possibly it was a problem with a stale cache file, b/c it went away when I deleted the old cache |
@sujaypatil96 @ielis btw, can we merge this PR? |
Awesome!! this PR is ready to be merged then 🚀 |
The PR creates
GdcMutationService
to retrieve variants of CDA subjects from GDC.See the test for example usage.
The logic is based on the gist written by @sujaypatil96 here.
There are still some TODOs left. The mapping of the VCF coordinates and functional annotations should be complete, however, we still may need to explore GDC to add the read depths, gene, and the mutation status.
When ready, we can use
GdcMutationService
e.g. withinCdaTableImporter
, to get the variants for the subjects.#80