Skip to content

Conversation

@bashir2
Copy link
Member

@bashir2 bashir2 commented Mar 16, 2018

Check the documentation of --allele_number option of VEP.

Tested:
Unit tests added; Also ran the pipeline on a sample of gnomAD with --use_allele_num

Issue #81

@bashir2 bashir2 requested a review from arostamianfar March 16, 2018 17:25
@coveralls
Copy link

coveralls commented Mar 16, 2018

Pull Request Test Coverage Report for Build 483

  • 47 of 51 (92.16%) changed or added relevant lines in 3 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 90.341%

Changes Missing Coverage Covered Lines Changed/Added Lines %
gcp_variant_transforms/options/variant_transform_options.py 0 1 0.0%
gcp_variant_transforms/libs/processed_variant.py 28 31 90.32%
Files with Coverage Reduction New Missed Lines %
gcp_variant_transforms/options/variant_transform_options.py 1 53.97%
Totals Coverage Status
Change from base Build 479: 0.01%
Covered Lines: 3470
Relevant Lines: 3841

💛 - Coveralls

@bashir2
Copy link
Member Author

bashir2 commented Mar 16, 2018

BTW, stating the obvious: This PR is on top of PR #131 so the first commit is actually that PR. To see changes for this PR only, please check the second commit.

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this feature! Looks good. Just one suggestion about whether we even need the new flag.

def _add_annotations_by_allele_num(
self, proc_var, annotation_dict, annotation_field_name):
# type: (ProcessedVariant, Dict[str, str], str) -> None
if _ALLELE_NUM_ANNOTATION not in annotation_dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we always use this check so that we no longer need the use_allele_num flag? (i.e. the logic would be to first look for allele_num, if it doesn't exist, then use minimal or exact match depending on the --minimal flag). Or are you concerned about cases where allele_num is incorrect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when I was adding this feature I also preferred not to add an extra flag, but then I thought it is better to be explicit such that if there are problems with ALLELE_NUM we don't force them on the user. Of course we can try to use ALLELE_NUM and if it fails for whatever reason (e.g., bad indices) then fall back on matching (whether exact or minimal).

One problem with the latter approach I described is that in the ALLELE_NUM mode, annotation lists are processed one by one regardless of their alt_bases but in the non ALLELE_NUM mode, we map all annotations lists that have the same alt_bases into a single ALT (check the new unit-test I have added and note both lists have 'T' as their alt_bases but are mapped to two different ALTs). Of course, we can fix this problem too if we strongly prefer not to have the extra flag.

So with all these, I decided that being explicit and either use ALLELE_NUM always or ignore it always is a more clear approach. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Sounds good. Let's keep the flag then.

index_str = annotation_dict[_ALLELE_NUM_ANNOTATION]
try:
alt_index = int(index_str) - 1
alt_list = proc_var._alternate_datas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using the public getter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly have chosen not to use the public attributes when the intention is to mutate (i.e., almost everywhere in this module, you can find other examples too). If you remember, we even wanted to make the attributes to be read-only such that they can be read but not altered but then we decided that it is probably over-designing. So I prefer to treat the attributes as read-only getters.

Copy link
Member Author

@bashir2 bashir2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL; this commit has the main changes since last review (others are merge related).

def _add_annotations_by_allele_num(
self, proc_var, annotation_dict, annotation_field_name):
# type: (ProcessedVariant, Dict[str, str], str) -> None
if _ALLELE_NUM_ANNOTATION not in annotation_dict:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when I was adding this feature I also preferred not to add an extra flag, but then I thought it is better to be explicit such that if there are problems with ALLELE_NUM we don't force them on the user. Of course we can try to use ALLELE_NUM and if it fails for whatever reason (e.g., bad indices) then fall back on matching (whether exact or minimal).

One problem with the latter approach I described is that in the ALLELE_NUM mode, annotation lists are processed one by one regardless of their alt_bases but in the non ALLELE_NUM mode, we map all annotations lists that have the same alt_bases into a single ALT (check the new unit-test I have added and note both lists have 'T' as their alt_bases but are mapped to two different ALTs). Of course, we can fix this problem too if we strongly prefer not to have the extra flag.

So with all these, I decided that being explicit and either use ALLELE_NUM always or ignore it always is a more clear approach. What do you think?

index_str = annotation_dict[_ALLELE_NUM_ANNOTATION]
try:
alt_index = int(index_str) - 1
alt_list = proc_var._alternate_datas
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly have chosen not to use the public attributes when the intention is to mutate (i.e., almost everywhere in this module, you can find other examples too). If you remember, we even wanted to make the attributes to be read-only such that they can be read but not altered but then we decided that it is probably over-designing. So I prefer to treat the attributes as read-only getters.

@bashir2
Copy link
Member Author

bashir2 commented Mar 16, 2018

BTW, thanks for the quick review.

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

def _add_annotations_by_allele_num(
self, proc_var, annotation_dict, annotation_field_name):
# type: (ProcessedVariant, Dict[str, str], str) -> None
if _ALLELE_NUM_ANNOTATION not in annotation_dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Sounds good. Let's keep the flag then.

@bashir2 bashir2 merged commit e9310dd into googlegenomics:master Mar 20, 2018
@bashir2 bashir2 deleted the alt_annotation_fix_minimal_review-allele_num branch March 20, 2018 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants