Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNVkit can't read its own target.bed during batch command #750

Closed
bozbezbozzel opened this issue Jul 29, 2022 · 4 comments
Closed

CNVkit can't read its own target.bed during batch command #750

bozbezbozzel opened this issue Jul 29, 2022 · 4 comments

Comments

@bozbezbozzel
Copy link

bozbezbozzel commented Jul 29, 2022

CNVkit is able to read the .bed file with targets that I give it, it makes a target.bed, it processes my normal samples which appears to go well.
Then I get the following error:

  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 199, in bedcov
    raw = pysam.bedcov(*cmd, split_lines=False)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/pysam/utils.py", line 69, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: "samtools returned with error 2: stdout=, stderr=Errors in BED line 'chr1\t14620\t14712\t-'

followed by "Errors in BED line" for every single line in the file,
followed by


Traceback (most recent call last):
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/batch.py", line 157, in batch_write_coverage
    cnarr = coverage.do_coverage(bed_fname, bam_fname, by_count, 0, processes, fasta)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 27, in do_coverage
    cnarr = interval_coverages(bed_fname, bam_fname, by_count, min_mapq,
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 57, in interval_coverages
    table = interval_coverages_pileup(bed_fname, bam_fname, min_mapq,
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 154, in interval_coverages_pileup
    table = bedcov(bed_fname, bam_fname, min_mapq, fasta)
  File "/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/.snakemake/conda/cc4932bedf8a5d696834bf7bea1f4764/lib/python3.9/site-packages/cnvlib/coverage.py", line 201, in bedcov
    raise ValueError("Failed processing %r coverages in %r regions. "
ValueError: Failed processing '/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/output/mapped/PD52898b_merged_hg38_clean_dedup_recal.bam' coverages in '/DATA/a.vliet/BMS/Melanoma_patients/DNA/Set2/output/CNVkit/results_normal/targets_exome_agilent_v5_chromnum.target.bed' regions. PySAM error: "samtools returned with error 2: stdout=, stderr=Errors in BED line 'chr1\t14620\t14712\t-'\n

with again every.single.line.

I can run samtools bedcov just fine on my original bed file. CNVkit can read, process, and write out this target bed. So what's throwing me for a loop is that CNVkit cannot adequately process the file it wrote out itself.

What I've done:

  • used bed from a different source (USCS genomebrowser vs AstraZeneca github)
  • changed chromosome names in the bed file to match the chromosome names in my bams (removed "chr")
  • checked if the bed was purely tab-separated with cat -t
1^I65509^I65625^I.
1^I65831^I65973^I.
1^I69481^I69600^IOR4F5
1^I786001^I786139^IRP11-206L10.9
1^I786150^I786426^IRP11-206L10.9
1^I786471^I786562^IRP11-206L10.9
1^I817536^I817655^IFAM87B
1^I826715^I826895^ILINC00115
1^I826900^I827034^ILINC00115
1^I827040^I827185^ILINC00115
  • verified that my file had no carriage returns:
(snakemake) a.vliet@pasteur:/DATA/a.vliet/reference_files/cnvkit_beds$ cat -e targets_exome_agilent_v5_chromnum.bed | head
1	65509	65625	.$
1	65831	65973	.$
1	69481	69600	OR4F5$
1	786001	786139	RP11-206L10.9$
1	786150	786426	RP11-206L10.9$
1	786471	786562	RP11-206L10.9$
1	817536	817655	FAM87B$
1	826715	826895	LINC00115$
1	826900	827034	LINC00115$
1	827040	827185	LINC00115$

I'm out of ideas. I have no idea what's causing this-- surely if my original bed had something wrong with it, CNVkit would fail on the first step of trying to write out the target.bed file?

I'm running CNVkit 0.9.9 in python 3.9 with snakemake 7.8.5. pysam 0.19.1 and samtools 1.10.

@bozbezbozzel
Copy link
Author

will try this suggestion: #678 and update

@bozbezbozzel
Copy link
Author

My reference genome uses 1, 2, 3 etc... for the chromosomes and so do my sample bams:

less Homo_sapiens.GRCh38.dna.primary_assembly.fa | grep ">"
>1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF
>10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF
>11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF
@HD     VN:1.6  GO:none SO:coordinate
@SQ     SN:1    LN:248956422    M5:2648ae1bacce4ec4b6cf337dcae37816  
@SQ     SN:10   LN:133797422    M5:907112d17fcb73bcab1ed1c72b97ce68  
@SQ     SN:11   LN:135086622    M5:1511375dc2dd1b633af8cf439ae90cec   

I used the exact same reference genome file for every step and I'm coming from unmapped BAMs.

@bozbezbozzel
Copy link
Author

Solved-- had to remove contigs from my antitarget and target input files.

@JiayangZhou
Copy link

got the same error, and my reference genome uses chr1, chr2..would you suggest I replace all with 1, 2, 3, and try again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants