-
Notifications
You must be signed in to change notification settings - Fork 774
Description
I am working with @sam-widmayer on a GRCm39 mapped WGS sample. We are running the google-deepvariant-1.9.0 docker image via Apptainer on our Slurm HPC. We have had success running multiple samples, split on chromosome; however, one particular sample is returning an error when we run on chromosome MT using the following command:
apptainer exec --no-home google-deepvariant-1.9.0.img run_deepvariant --model_type WGS --ref GRCm39_masked.fa --reads MT_2.bam --output_vcf MT.vcf.gz --sample_name MT01 --num_shards 4 --regions MT --output_gvcf MT.gvcf.gz --verbosity 1
The error trace is as follows:
W0814 11:04:48.035857 23456246502016 postprocess_variants.py:624] Alt allele indices found from call_variants_outputs for variant reference_bases: "TG"
alternate_bases: "CG"
alternate_bases: "T"
calls {
info {
key: "AD"
value {
values {
int_value: 2
}
values {
int_value: 79
}
values {
int_value: 91
}
}
}
info {
key: "DP"
value {
values {
int_value: 172
}
}
}
info {
key: "MID"
value {
values {
string_value: "small_model"
}
}
}
info {
key: "VAF"
value {
values {
number_value: 0.45930232558139533
}
values {
number_value: 0.5290697674418605
}
}
}
genotype: -1
genotype: -1
call_set_name: "MT01"
}
end: 55
reference_name: "MT"
start: 53
is [[0, 1]], which is invalid.
I0814 11:04:48.051477 23456246502016 postprocess_variants.py:1603] VCF and gVCF creation took 0.0002703746159871419 minutes.
I0814 11:04:48.052498 23456246502016 postprocess_variants.py:1603] VCF and gVCF creation took 0.0004706700642903646 minutes.
I0814 11:04:48.055323 23456246502016 postprocess_variants.py:1603] VCF and gVCF creation took 0.0004635254542032878 minutes.
Checking the headers of 4 files.
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
[E::naive_concat_check_headers] Failed to parse header: /tmp/tmpxpvturbb.gz
Fatal Python error: Segmentation fault
In attempting to debug this, we have zeroed in on a region around base position 53-56 that appears to have an insertion:
In attempting to debug this, we have restricted the target calling region, and find that the command completes successfully under a set of conditions that does not make immediate sense to us:
-
--regions MT:1-100: Successful run with call:
MT 53 . T TC 47.2 PASS . GT:GQ:DP:AD:VAF:MID:PL 1/1:47:946:12,931:0.984144:small_model:47,99,0 -
regions MT:1-500: Successful run with call:
MT 55 . G T 30.4 PASS . GT:GQ:DP:AD:VAF:MID:PL 1/1:23:155:0,149:0.96129:small_model:30,23,0 -
regions MT:1-1000: Fails with segfault error above. -
--regions MT:50-16299: Successful run with call:
MT 54 . TG T 37.6 PASS . GT:GQ:DP:AD:VAF:MID:PL 1/1:37:145:2,143:0.986207:small_model:37,45,0 -
--regions MT:40-16299: Fails with segfault error above. -
--regions MT:40-1800: Fails with segfault error above.
There is clearly something about the region around MT:53-56 that the algorithm is struggling with, and we are hoping for some guidance on the underlying cause of this and how to account for such samples moving forward.
Thanks.