Fix_Aug2017 #41

XiaoleiZ · 2017-08-29T09:16:39Z

Fix the bugs reported in Issues and referees report

The following key changes are added:

In parse_clinvar_xml.py

Adding columnsstart,stop and strand for variant representation. Fix Feature request: Including strand info and genomic start and end coordinates #36
Adding columns pathogenic,likely_pathogenic,uncertain_significance,likely_benign and benign (the standard terms used by ACMG guideline) to record the counts of individual submissions reported the variants as "Pathogenic","Likely pathogenic","Uncertain significance","Likely benign" and "Benign" (ignore cases) respectively. It is worth noting that the previous pathogenic and benign columns encoding the binary information are replaced. Fix Improper labelling of conflicting variants as pathogenic #40
Adding column scv to list all the scv accession number of individual submissions
Changing columns names: all column names with prefix measureset are replaced with variation since the latter are more familiar with ClinVar users.
Changing the way to extract gene symbol: using the symbol used in the variant name/title. Fix Wrong gene mapping #37 and Sometimes "symbol" disagrees with primary hgvs gene annotation #31

In group_by_allele.py:
6. Adding the counts for each term in pathogenic,likely_pathogenic,uncertain_significance,likely_benign and benign

when joining variant_summary.txt file:
7. Replacing the R script using a Python equivalent. Fix #35
8. Changing the way to encode column conflicted: according to the updated terms used in ClinVar aggregated variation reports, conflicted is changed to indicate whether the variation is aggregated to report as Conflicting interpretations of pathogenicity. Fix #40
9. Propagating the columns like last_evaluated, submitters_ordered and etc. Fix #38
10. Remove the duplicated records in variant_summary before joining: the variant_summary file is indeed not allele_id-specific. Variants with alternative loci like in PAR or complex variation like translocation would have more than one genomic coordinates but same allele_ids. The alternative loci would be recorded as another entry in variant_summary file. I just simply remove the duplicated records after extracting the interesting columns from variant_summary. Currently, only one of the sequence locations of these variants are kept after parsing the xml file. There is still problems in handling these type of variants with current pipeline: e.g the variants in PAR are represented on Y chromosome and would not be able to find the variant info from ExAC and gnomAD. And for complex variation like translocation, just one allele is represented in final output files. Since these are rare cases, I am not sure how to deal with them uniformly. For the variants with alternative loci, there is a separate VCF file available for download on ClinVar FTP . Fix #39

In add_gnomad_field.py and add_exac_field.py:
11. Adding the DP - approximate read depth for users to query about the coverage info

… a bit less verbose

…ome; add the ordered fields

…se a ValueError

…ing reference genomes if desired; use gunzip -c rather than zcat to enable compatibility with os x; add new ordered columns

…nic'

…o a smaller one for e.g. testing

…d shouldn't have been

Fix the issues reported in ISSUES and referee reports

change measureset to variation; update the doi link

konradjk · 2017-09-13T01:13:02Z

FWIW we just ran this code as-is and it worked totally fine! Might want to merge since master definitely does not work on the current clinvar xml

bw2 · 2017-09-26T12:04:08Z

@XiaoleiZ should we merge this into master?

2017 Sept release

kristjaneerik and others added 27 commits May 5, 2017 10:38

make configargparse an explicit requirement

87bbe85

save submitters and the dates of the assertions as ordered fields, be…

9bfec79

… a bit less verbose

propagate ordered fields through join_data.R

b97cb3b

propagate ordered fields through group_by_allele.py

57d8229

clinvar VCF header should take into account the correct reference gen…

67edbeb

…ome; add the ordered fields

gnomad doesn't have Y chromosome, causing add_gnomad_fields.py to rai…

1e3bf60

…se a ValueError

add --output-prefix to enable creating multiple outputs; enable skipp…

6a95226

…ing reference genomes if desired; use gunzip -c rather than zcat to enable compatibility with os x; add new ordered columns

centralize definition of clinvar table header

b6df8e8

make testing more programmatic and easier to follow

d89f17d

add join_variant_summary_with_clinvar_alleles.py to replace join_data.R

a29971c

rm join_data.R

e50bcc6

add 'uncertain' column, no need to mess around with e.g. 'non-pathoge…

ec390e3

…nic'

tweak 'uncertain', 'conflicted' messages a bit

547f234

add --tmp-dir flag for changing output_tmp directory name

34d0e52

separate out clinvar_alleles_stats.py

1399be2

simple differ for clinvar_alleles.tsv.gz

cea79f9

helper script to grab interesting measuresets from the master XML int…

24b8afb

…o a smaller one for e.g. testing

remove tqdm

b74f281

change how ; in allele_id is handled

a04e531

--output-prefix and --tmp-dir were in the mutually exclusive group an…

a0bd60c

…d shouldn't have been

Fix_Aug

03aa390

Fix the issues reported in ISSUES and referee reports

Update README

461c47e

change measureset to variation; update the doi link

Update README

da572ef

add back test_group_by_allele

f76db12

add back test_group_by_allele

6f644a1

Change test_group_by_allele with new header

cf7b3cb

Adding Read depth from ExAC and GNOMAD

1bdcf56

XiaoleiZ mentioned this pull request Aug 29, 2017

[WIP] Propagating ordered fields #33

Merged

XiaoleiZ added 4 commits September 28, 2017 09:47

2017 Sept release

b230803

2017 Sept release

Merge branch 'master' into pr/33

50ba4e1

resolve conflicts

9628f0c

resolve conflict

6a8327e

XiaoleiZ merged commit f52ba1c into master Sep 28, 2017

XiaoleiZ deleted the pr/33 branch September 28, 2017 11:38

kristjaneerik mentioned this pull request Dec 4, 2017

Clinical Significance Order Lost in Allele Grouping #48

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix_Aug2017 #41

Fix_Aug2017 #41

XiaoleiZ commented Aug 29, 2017 •

edited

Loading

konradjk commented Sep 13, 2017

bw2 commented Sep 26, 2017

Fix_Aug2017 #41

Fix_Aug2017 #41

Conversation

XiaoleiZ commented Aug 29, 2017 • edited Loading

konradjk commented Sep 13, 2017

bw2 commented Sep 26, 2017

XiaoleiZ commented Aug 29, 2017 •

edited

Loading