Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vds/combiner] Teach VDS combiner how to handle custom call fields #13385

Merged
merged 3 commits into from Aug 7, 2023

Conversation

chrisvittal
Copy link
Collaborator

Add the call_fields argument.

Much like refrence entries, when passing a VDS to the combiner, we check it's call fields to ensure that there is a match between the passed argument and the VDS's call fields.

Add the call_fields argument.

Much like refrence entries, when passing a VDS to the combiner, we check
it's call fields to ensure that there is a match between the passed
argument and the VDS's call fields.
@@ -156,7 +157,7 @@ class VariantDatasetCombiner: # pylint: disable=too-many-instance-attributes
for all GVCF files. Finer partitioning yields more parallelism but less work per task.
gvcf_info_to_keep : :class:`list` of :class:`str` or :obj:`None`
GVCF ``INFO`` fields to keep in the ``gvcf_info`` entry field. By default, all fields are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove the are on the previous line

call_fields_tmp = (call_fields_tmp - {'LPGT'}) | {'PGT'}
if set(call_fields) != call_fields_tmp:
warning("Mismatch between 'call_fields' and VDS call fields. "
"Overwriting with types from supplied VDS")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include the mismatch. In particular, the user should know the call_fields argument but might not know what's in the dataset.

What do you mean by "overwriting with types ..."? It's not the types that are taken from the VDS but the call fields that are taken from it, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Also will clarify.

@@ -701,6 +709,16 @@ def maybe_load_from_saved_path(save_path: str) -> Optional[VariantDatasetCombine
"entry types. Overwriting with types from supplied VDS.")
gvcf_reference_entry_fields_to_keep = ref_entry_tmp

all_entry_types = chain(vds.reference_data._type.entry.items(),
vds.variant_data._type.entry.items())
call_fields_tmp = {name for name, typ in all_entry_types if typ == hl.tcall} - {'LGT', 'GT'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the call fields from the vds, right? Let's use a descriptive name. Maybe vds_call_fields?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

vds.variant_data._type.entry.items())
call_fields_tmp = {name for name, typ in all_entry_types if typ == hl.tcall} - {'LGT', 'GT'}
if 'LPGT' in call_fields_tmp:
call_fields_tmp = (call_fields_tmp - {'LPGT'}) | {'PGT'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is here. If the VDS has LPGT, why would we rename that to PGT? Why doesn't that cause a problem later when the VDS is lacking PGT (it has LPGT instead)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We transform certain fields from their 'normal' gvcf versions to the VDS 'local' ones, PGT is one of them and is also the only VCF field we parse as a call field by default.

@danking
Copy link
Collaborator

danking commented Aug 7, 2023

Mostly me not knowing how this code works, so, my apologies for slowing you down.

Copy link
Collaborator

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@danking danking merged commit b7442e8 into hail-is:main Aug 7, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants