You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some scenarios where the fill-gaps (and fill-missing) operation is skipping, and should be completed.
Pending scenarios
Gap found in archive.
It may happen that the original file had a region non covered by the file. This situation is normal in VCFs, but really uncommon in gVCFs. Currently, the FillGapsTask is not writing anything. This is then understood as HOM_REF (0/0). In this scenario we should write ?/? to represent that the information at this locus is unknown.
When executing the fill-missing operation (aga aggregate) we may find a lot of gaps, because we are not reading the reference blocks with HOM_REF genotypes. This should be taken into account.
Insertion not overlapping with any variant.
This scenario is quite similar to the previous one, where can't be found any overlapping variant from archive. This may happen because we are trying to complete an insertion variant that is between two variants. In this scenario we can do two things:
Try to overlap with the variant in the previous position (if any)
Write a 0/0 indicating that the insertion does not happen for this samples
Multiple overlaps
This scenario consists of having multiple overlapping positions in one variant. This may happen because of many reasons:
Deletion from sample A overlapping with N smaller variants from sample B
Inconsistent input VCF with overlapping variants
In this scenario, we should mark that there is something in this position, but we can not determine what. For this, we should use the special allele <*> from the VCF spec v4.3 (known as <NON_REF> at GATK)
Deletion from sample A overlapping with N reference blocks from sample B
PENDING SCENARIO
Overlap with a split multi-allelic variant
In this scenario, a variant from file A may overlap with many variants produced from the split of a multi-allelic variant from file B. The information in these split variants from B is not inconsistent, so we know exactly what is in this position. This will happen if all the overlapping variants share the same FileEntry.call. We should just take any of them.
Structural variants
We should not try to merge structural variants with other smaller variants.
Rename operation
After some deliberation, we decided that this operations should be called "aggregate" and "aggregate-famliy" . Therefore, command and internal classes should be renamed to match with this new names.
Tasks
Write unknown genotype ?/? when gaps are found in the archive file
Applies only for fill-gaps (aka aggregate-family) operation, when reading all archive records.
Use the NON_REF allele for multiple overlappings.
Handle overlap with symbolic ref blocks (<*>)
Handle multiple overlaps
Special scenario: Overlap with multiple reference blocks
Special scenario: Overlap with multiple variants from the same multi-allelic variant
Decide what to do with insertions between two variants
Ensure structural variants are not being merged
Rename operation
Rename command line
Internal rename
The text was updated successfully, but these errors were encountered:
There are some scenarios where the fill-gaps (and fill-missing) operation is skipping, and should be completed.
Pending scenarios
It may happen that the original file had a region non covered by the file. This situation is normal in VCFs, but really uncommon in gVCFs. Currently, the FillGapsTask is not writing anything. This is then understood as HOM_REF (0/0). In this scenario we should write
?/?
to represent that the information at this locus is unknown.When executing the
fill-missing
operation (agaaggregate
) we may find a lot of gaps, because we are not reading the reference blocks withHOM_REF
genotypes. This should be taken into account.Insertion not overlapping with any variant.
This scenario is quite similar to the previous one, where can't be found any overlapping variant from archive. This may happen because we are trying to complete an insertion variant that is between two variants. In this scenario we can do two things:
Multiple overlaps
This scenario consists of having multiple overlapping positions in one variant. This may happen because of many reasons:
In this scenario, we should mark that there is something in this position, but we can not determine what. For this, we should use the special allele
<*>
from the VCF spec v4.3 (known as<NON_REF>
at GATK)Deletion from sample A overlapping with N reference blocks from sample B
PENDING SCENARIO
Overlap with a split multi-allelic variant
In this scenario, a variant from file A may overlap with many variants produced from the split of a multi-allelic variant from file B. The information in these split variants from B is not inconsistent, so we know exactly what is in this position. This will happen if all the overlapping variants share the same
FileEntry.call
. We should just take any of them.Structural variants
We should not try to merge structural variants with other smaller variants.
Rename operation
After some deliberation, we decided that this operations should be called "aggregate" and "aggregate-famliy" . Therefore, command and internal classes should be renamed to match with this new names.
Tasks
?/?
when gaps are found in the archive filefill-gaps
(akaaggregate-family
) operation, when reading all archive records.<*>
)The text was updated successfully, but these errors were encountered: