Adds the option for processing ALLELE_NUM. #141

bashir2 · 2018-03-16T17:25:55Z

Check the documentation of --allele_number option of VEP.

Tested:
Unit tests added; Also ran the pipeline on a sample of gnomAD with --use_allele_num

Issue #81

coveralls · 2018-03-16T17:29:43Z

Pull Request Test Coverage Report for Build 483

47 of 51 (92.16%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage increased (+0.01%) to 90.341%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
gcp_variant_transforms/options/variant_transform_options.py	0	1	0.0%
gcp_variant_transforms/libs/processed_variant.py	28	31	90.32%

Files with Coverage Reduction	New Missed Lines	%
gcp_variant_transforms/options/variant_transform_options.py	1	53.97%

Totals
Change from base Build 479:	0.01%
Covered Lines:	3470
Relevant Lines:	3841

💛 - Coveralls

bashir2 · 2018-03-16T17:38:44Z

BTW, stating the obvious: This PR is on top of PR #131 so the first commit is actually that PR. To see changes for this PR only, please check the second commit.

arostamianfar

Thanks for finding this feature! Looks good. Just one suggestion about whether we even need the new flag.

arostamianfar · 2018-03-16T20:18:03Z

gcp_variant_transforms/libs/processed_variant.py

+  def _add_annotations_by_allele_num(
+      self, proc_var, annotation_dict, annotation_field_name):
+    # type: (ProcessedVariant, Dict[str, str], str) -> None
+    if _ALLELE_NUM_ANNOTATION not in annotation_dict:


can we always use this check so that we no longer need the use_allele_num flag? (i.e. the logic would be to first look for allele_num, if it doesn't exist, then use minimal or exact match depending on the --minimal flag). Or are you concerned about cases where allele_num is incorrect?

Yes, when I was adding this feature I also preferred not to add an extra flag, but then I thought it is better to be explicit such that if there are problems with ALLELE_NUM we don't force them on the user. Of course we can try to use ALLELE_NUM and if it fails for whatever reason (e.g., bad indices) then fall back on matching (whether exact or minimal).

One problem with the latter approach I described is that in the ALLELE_NUM mode, annotation lists are processed one by one regardless of their alt_bases but in the non ALLELE_NUM mode, we map all annotations lists that have the same alt_bases into a single ALT (check the new unit-test I have added and note both lists have 'T' as their alt_bases but are mapped to two different ALTs). Of course, we can fix this problem too if we strongly prefer not to have the extra flag.

So with all these, I decided that being explicit and either use ALLELE_NUM always or ignore it always is a more clear approach. What do you think?

I see. Sounds good. Let's keep the flag then.

arostamianfar · 2018-03-16T20:25:35Z

gcp_variant_transforms/libs/processed_variant.py

+    index_str = annotation_dict[_ALLELE_NUM_ANNOTATION]
+    try:
+      alt_index = int(index_str) - 1
+      alt_list = proc_var._alternate_datas


consider using the public getter?

I explicitly have chosen not to use the public attributes when the intention is to mutate (i.e., almost everywhere in this module, you can find other examples too). If you remember, we even wanted to make the attributes to be read-only such that they can be read but not altered but then we decided that it is probably over-designing. So I prefer to treat the attributes as read-only getters.

…fix_minimal_review-allele_num

bashir2

PTAL; this commit has the main changes since last review (others are merge related).

bashir2 · 2018-03-16T22:52:27Z

gcp_variant_transforms/libs/processed_variant.py

+  def _add_annotations_by_allele_num(
+      self, proc_var, annotation_dict, annotation_field_name):
+    # type: (ProcessedVariant, Dict[str, str], str) -> None
+    if _ALLELE_NUM_ANNOTATION not in annotation_dict:


Yes, when I was adding this feature I also preferred not to add an extra flag, but then I thought it is better to be explicit such that if there are problems with ALLELE_NUM we don't force them on the user. Of course we can try to use ALLELE_NUM and if it fails for whatever reason (e.g., bad indices) then fall back on matching (whether exact or minimal).

One problem with the latter approach I described is that in the ALLELE_NUM mode, annotation lists are processed one by one regardless of their alt_bases but in the non ALLELE_NUM mode, we map all annotations lists that have the same alt_bases into a single ALT (check the new unit-test I have added and note both lists have 'T' as their alt_bases but are mapped to two different ALTs). Of course, we can fix this problem too if we strongly prefer not to have the extra flag.

So with all these, I decided that being explicit and either use ALLELE_NUM always or ignore it always is a more clear approach. What do you think?

bashir2 · 2018-03-16T22:56:54Z

gcp_variant_transforms/libs/processed_variant.py

+    index_str = annotation_dict[_ALLELE_NUM_ANNOTATION]
+    try:
+      alt_index = int(index_str) - 1
+      alt_list = proc_var._alternate_datas


I explicitly have chosen not to use the public attributes when the intention is to mutate (i.e., almost everywhere in this module, you can find other examples too). If you remember, we even wanted to make the attributes to be read-only such that they can be read but not altered but then we decided that it is probably over-designing. So I prefer to treat the attributes as read-only getters.

bashir2 · 2018-03-16T23:27:38Z

BTW, thanks for the quick review.

arostamianfar

LGTM! Thanks!

arostamianfar · 2018-03-19T21:36:31Z

gcp_variant_transforms/libs/processed_variant.py

+  def _add_annotations_by_allele_num(
+      self, proc_var, annotation_dict, annotation_field_name):
+    # type: (ProcessedVariant, Dict[str, str], str) -> None
+    if _ALLELE_NUM_ANNOTATION not in annotation_dict:


I see. Sounds good. Let's keep the flag then.

bashir2 added 2 commits March 13, 2018 18:13

Adds support for --minimal option of VEP with related schema chagnes.

dcbc206

Adds the option for processing ALLELE_NUM.

a0a66c8

bashir2 requested a review from arostamianfar March 16, 2018 17:25

bashir2 added 2 commits March 16, 2018 13:55

Merge branch 'master' into alt_annotation_fix_minimal_review

ee34cbc

review comments

b357283

arostamianfar reviewed Mar 16, 2018

View reviewed changes

bashir2 added 4 commits March 16, 2018 17:04

review comments round two

e2f7b68

Merge branch 'alt_annotation_fix_minimal_review' into alt_annotation_…

edb8202

…fix_minimal_review-allele_num

added unit-tests

b5ad461

Merge branch 'master' into alt_annotation_fix_minimal_review-allele_num

1f5a03e

bashir2 commented Mar 16, 2018

View reviewed changes

arostamianfar approved these changes Mar 19, 2018

View reviewed changes

Merge branch 'master' into alt_annotation_fix_minimal_review-allele_num

763e036

bashir2 merged commit e9310dd into googlegenomics:master Mar 20, 2018

bashir2 deleted the alt_annotation_fix_minimal_review-allele_num branch March 20, 2018 19:05

Adds the option for processing ALLELE_NUM. #141

Adds the option for processing ALLELE_NUM. #141

Uh oh!

Conversation

bashir2 commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 483

💛 - Coveralls

Uh oh!

bashir2 commented Mar 16, 2018

Uh oh!

arostamianfar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bashir2 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bashir2 commented Mar 16, 2018

Uh oh!

arostamianfar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bashir2 commented Mar 16, 2018 •

edited

Loading

coveralls commented Mar 16, 2018 •

edited

Loading