-
Notifications
You must be signed in to change notification settings - Fork 59
Add an option to allow records with malformed Number=A fields. #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 1428
💛 - Coveralls |
allieychen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks awesome! I only have some minor comments.
| minimal_match=False, # type: bool | ||
| infer_annotation_types=False, # type: bool | ||
| counter_factory=None # type: metrics_util.CounterFactoryInterface | ||
| counter_factory=None, # type: metrics_util.CounterFactoryInterface, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove , at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I prefer to switch the order of counter_factory and allow_alternate_allele_info_mismatch, so that all bool flags can stay together to improve the readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! I actually moved it all the way up to be close to split_alternate_allele_info_fields as they're related.
gcp_variant_transforms/vcf_to_bq.py
Outdated
| known_args.minimal_vep_alt_matching, | ||
| known_args.infer_annotation_types, | ||
| counter_factory) | ||
| counter_factory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not related with this PR, but is there any reason why not create the counter inside ProcessedVariantFactory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The main reason is to share the factory across all classes (currently only one classes uses counters, but we have some open issues to add counters to other classes too) and also that we want to override it in tests to use the NoOpCounterFactory (or any other custom metric factory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Thanks!
| self._variant_counter = cfactory.create_counter( | ||
| _CounterEnum.VARIANT.value) | ||
| self._alternate_allele_info_mismatche_counter = cfactory.create_counter( | ||
| _CounterEnum.ALTERNATE_ALLELE_INFO_MISMATCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/ALTERNATE_ALLELE_INFO_MISMATCH/ALTERNATE_ALLELE_INFO_MISMATCH.value, as the above counter _variant_counter used VARIANT.value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Done.
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed review!!
| self._variant_counter = cfactory.create_counter( | ||
| _CounterEnum.VARIANT.value) | ||
| self._alternate_allele_info_mismatche_counter = cfactory.create_counter( | ||
| _CounterEnum.ALTERNATE_ALLELE_INFO_MISMATCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Done.
| minimal_match=False, # type: bool | ||
| infer_annotation_types=False, # type: bool | ||
| counter_factory=None # type: metrics_util.CounterFactoryInterface | ||
| counter_factory=None, # type: metrics_util.CounterFactoryInterface, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! I actually moved it all the way up to be close to split_alternate_allele_info_fields as they're related.
gcp_variant_transforms/vcf_to_bq.py
Outdated
| known_args.minimal_vep_alt_matching, | ||
| known_args.infer_annotation_types, | ||
| counter_factory) | ||
| counter_factory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. The main reason is to share the factory across all classes (currently only one classes uses counters, but we have some open issues to add counters to other classes too) and also that we want to override it in tests to use the NoOpCounterFactory (or any other custom metric factory).
This option gracefully handles both excess and insufficient values. For now, the flag is set via
allow_malformed_recordsin order to avoid having too many flags. We can consider adding a separate flag if users want to control this behavior separate from other malformed conditions.Tested:
unit + integration tests
Fixes #129