Skip to content

gnomAD v4.1 SV database contains mix of SV and CNV - svdb problem #615

@fa2k

Description

@fa2k

Description of the bug

gnomAD SV v4.1 (https://gnomad.broadinstitute.org/news/2023-11-v4-structural-variants/) contains some CNVs that don't have AC or AF information in the vcf (gnomad.v4.1.sv.sites.vcf.gz).

svdb --query is refusing to annotate the vcf if the annotations for --in_occ or --in_frq are missing (in this case AC and AF) for some variants in the database, producing an output vcf without any variants.

It works if I remove lines without AC / AF from the gnomad file, but that means we remove this gnomAD information. Would it make sense to somehow integrate this CNV information for the annotation of SVs instead?

Command used and terminal output

NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_STRUCTURAL_VARIANTS:SVDB_QUERY_DB command.sh:

#!/bin/bash -euo pipefail
svdb \
    --merge \
    --pass_only --same_order \
    --priority tiddit,manta,cnvnator \
    --vcf  NA12878_tiddit.vcf.gz:tiddit NA12878_manta.diploid_sv.vcf.gz:manta NA12878_cnvnator.vcf.gz:cnvnator \
    > NA12878_sv.vcf
bgzip NA12878_sv.vcf

cat <<-END_VERSIONS > versions.yml
"NFCORE_RAREDISEASE:RAREDISEASE:CALL_STRUCTURAL_VARIANTS:SVDB_MERGE":
    svdb: $( echo $(svdb) | head -1 | sed 's/usage: SVDB-\([0-9]\.[0-9]\.[0-9]\).*/\1/' )
    samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONS

--------


Output (command.log): 

INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Error: frequency or hit tag not found! Make sure to set the --in_occ AND --in_frq to the number and frequency of alleles/individuals as presented in the INFO column of the input db

database variants not having the --in_occ or --in_frq tag must be removed
you may also skip these parameters and cluster based on the GT entry of the format column (if such exists)

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions