New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vds/combiner] Teach VDS combiner how to handle custom call fields #13385
Conversation
Add the call_fields argument. Much like refrence entries, when passing a VDS to the combiner, we check it's call fields to ensure that there is a match between the passed argument and the VDS's call fields.
@@ -156,7 +157,7 @@ class VariantDatasetCombiner: # pylint: disable=too-many-instance-attributes | |||
for all GVCF files. Finer partitioning yields more parallelism but less work per task. | |||
gvcf_info_to_keep : :class:`list` of :class:`str` or :obj:`None` | |||
GVCF ``INFO`` fields to keep in the ``gvcf_info`` entry field. By default, all fields are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove the are
on the previous line
call_fields_tmp = (call_fields_tmp - {'LPGT'}) | {'PGT'} | ||
if set(call_fields) != call_fields_tmp: | ||
warning("Mismatch between 'call_fields' and VDS call fields. " | ||
"Overwriting with types from supplied VDS") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the mismatch. In particular, the user should know the call_fields
argument but might not know what's in the dataset.
What do you mean by "overwriting with types ..."? It's not the types that are taken from the VDS but the call fields that are taken from it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Also will clarify.
@@ -701,6 +709,16 @@ def maybe_load_from_saved_path(save_path: str) -> Optional[VariantDatasetCombine | |||
"entry types. Overwriting with types from supplied VDS.") | |||
gvcf_reference_entry_fields_to_keep = ref_entry_tmp | |||
|
|||
all_entry_types = chain(vds.reference_data._type.entry.items(), | |||
vds.variant_data._type.entry.items()) | |||
call_fields_tmp = {name for name, typ in all_entry_types if typ == hl.tcall} - {'LGT', 'GT'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the call fields from the vds, right? Let's use a descriptive name. Maybe vds_call_fields
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
vds.variant_data._type.entry.items()) | ||
call_fields_tmp = {name for name, typ in all_entry_types if typ == hl.tcall} - {'LGT', 'GT'} | ||
if 'LPGT' in call_fields_tmp: | ||
call_fields_tmp = (call_fields_tmp - {'LPGT'}) | {'PGT'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this is here. If the VDS has LPGT, why would we rename that to PGT? Why doesn't that cause a problem later when the VDS is lacking PGT (it has LPGT instead)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We transform certain fields from their 'normal' gvcf versions to the VDS 'local' ones, PGT is one of them and is also the only VCF field we parse as a call field by default.
Mostly me not knowing how this code works, so, my apologies for slowing you down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
Add the call_fields argument.
Much like refrence entries, when passing a VDS to the combiner, we check it's call fields to ensure that there is a match between the passed argument and the VDS's call fields.