Skip to content

Fatal Python error: Segmentation fault --- With log reporting: is [[0, 1]], which is invalid. #1001

@MikeWLloyd

Description

@MikeWLloyd

I am working with @sam-widmayer on a GRCm39 mapped WGS sample. We are running the google-deepvariant-1.9.0 docker image via Apptainer on our Slurm HPC. We have had success running multiple samples, split on chromosome; however, one particular sample is returning an error when we run on chromosome MT using the following command:

apptainer exec --no-home google-deepvariant-1.9.0.img run_deepvariant --model_type WGS --ref GRCm39_masked.fa --reads MT_2.bam --output_vcf MT.vcf.gz --sample_name MT01 --num_shards 4 --regions MT --output_gvcf MT.gvcf.gz --verbosity 1

The error trace is as follows:

W0814 11:04:48.035857 23456246502016 postprocess_variants.py:624] Alt allele indices found from call_variants_outputs for variant reference_bases: "TG"
alternate_bases: "CG"
alternate_bases: "T"
calls {
  info {
    key: "AD"
    value {
      values {
        int_value: 2
      }
      values {
        int_value: 79
      }
      values {
        int_value: 91
      }
    }
  }
  info {
    key: "DP"
    value {
      values {
        int_value: 172
      }
    }
  }
  info {
    key: "MID"
    value {
      values {
        string_value: "small_model"
      }
    }
  }
  info {
    key: "VAF"
    value {
      values {
        number_value: 0.45930232558139533
      }
      values {
        number_value: 0.5290697674418605
      }
    }
  }
  genotype: -1
  genotype: -1
  call_set_name: "MT01"
}
end: 55
reference_name: "MT"
start: 53
 is [[0, 1]], which is invalid.
I0814 11:04:48.051477 23456246502016 postprocess_variants.py:1603] VCF and gVCF creation took 0.0002703746159871419 minutes.
I0814 11:04:48.052498 23456246502016 postprocess_variants.py:1603] VCF and gVCF creation took 0.0004706700642903646 minutes.
I0814 11:04:48.055323 23456246502016 postprocess_variants.py:1603] VCF and gVCF creation took 0.0004635254542032878 minutes.
Checking the headers of 4 files.
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
[E::naive_concat_check_headers] Failed to parse header: /tmp/tmpxpvturbb.gz

Fatal Python error: Segmentation fault

In attempting to debug this, we have zeroed in on a region around base position 53-56 that appears to have an insertion:

Image

In attempting to debug this, we have restricted the target calling region, and find that the command completes successfully under a set of conditions that does not make immediate sense to us:

  • --regions MT:1-100: Successful run with call:
    MT 53 . T TC 47.2 PASS . GT:GQ:DP:AD:VAF:MID:PL 1/1:47:946:12,931:0.984144:small_model:47,99,0

  • regions MT:1-500: Successful run with call:
    MT 55 . G T 30.4 PASS . GT:GQ:DP:AD:VAF:MID:PL 1/1:23:155:0,149:0.96129:small_model:30,23,0

  • regions MT:1-1000: Fails with segfault error above.

  • --regions MT:50-16299: Successful run with call:
    MT 54 . TG T 37.6 PASS . GT:GQ:DP:AD:VAF:MID:PL 1/1:37:145:2,143:0.986207:small_model:37,45,0

  • --regions MT:40-16299: Fails with segfault error above.

  • --regions MT:40-1800: Fails with segfault error above.

There is clearly something about the region around MT:53-56 that the algorithm is struggling with, and we are hoping for some guidance on the underlying cause of this and how to account for such samples moving forward.

Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions