-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in WGBS workflow (invalid 'times' argument) #898
Comments
Hi @katsikora, Thank you for your response. Also, I was wondering it is normal or not, but the mapping rate in bisulfite sequence analysis is incredibly poor, such like 0.8% to 1%. Thank you. |
Hi Kaznt, thanks for checking on our sample data. Best wishes, Katarzyna |
Hi @katsikora, I confirmed the species and re-run with GRCm38 dataset I downloaded from the site here (https://zenodo.org/record/4468065). ff817f24b945f9bd7cd5558b213b8679 SRX202087_R1.fastq.gz 4e004c5032ade9aa37cc795890051f7c genome.fa (GRCm38) Command I issue is below. Best regards, |
Additionally, when I mapped manually by using bwameth with SRX202087 fastq files above, map rate is >99% indicating that the dataset is fine. Best, |
I've checked bwameth log. Seems reffering c2t fa though... Found BWA MEM index
|
Hi Kaznt, thanks for looking into the details. Indeed it looks like a potential issue with bwameth. Either it's the parsing of chromosome names by bwameth (as samtools is complaining about empty strings in chromosome names), or indeed the reads were unmapped to start with. Best wishes, Katarzyna |
Hi katsikora, Sorry for a bit delay for my response. <As I shown below, I use independently prepared bwameth via conda> ln -s ../input/* . ### It is preparing input fastq files as to use same files of input for snakepipes 533342 + 16516 in total (QC-passed reads + QC-failed reads) Best regards, |
Hi Kaznt, thanks for providing your command, it all looks good. I rerun WGBS with the dataset you pointed to (https://www.ncbi.nlm.nih.gov/sra/?term=SRR17642783 ). I aligned to the human genome and obtained a mapping rate of Independently of that, I could reproduce your original error with the produce report rule:
I suspect it's because I only submitted one sample and then the PCA plot (understandably) failed. I'll add some code that will skip PCA when there is only one sample, but the other QC metrics will be returned. I hope this helps, Best wishes, Katarzyna |
Thank you for your help. I'll keep this topic open until when you check the zenodo reference. Best, |
Hi Kaznt, I get
when mapping In that sense, I cannot reproduce your observation related to mapping rates <1%. Best wishes, Katarzyna |
I'll close the issue for now, please feel free to reopen if needed. |
Hi team,
I ran WGBS workflow with publically available WGBS dataset and met an error.
I checked the issuses but couldn't find same error I met.
Below, I attached the logs.
Thank you for your help.
/home/user/miniconda3/envs/snakePipes/bin/WGBS -i input -o output GRCh38_gencode_release29 --local -j 30
TMPDIR=/tmp PYTHONNOUSERSITE=True /home/user/miniconda3/envs/snakePipes/bin/snakemake --use-conda --conda-prefix /home/user/miniconda3/envs --latency-wait 300 --snakefile /home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/WGBS/Snakefile --jobs 30 --directory /home/user/230523bisulfite_test/output --configfile /home/user/230523bisulfite_test/output/WGBS.config.yaml --keep-going
/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.
warnings.warn(
/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
---- This analysis has been done using snakePipes version 2.7.3 ----
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /bin/bash
Provided cores: 30
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
DepthOfCov 1 20 20
DepthOfCovGenome 1 20 20
FASTQ1 1 1 1
FASTQ2 1 1 1
all 1 1 1
bedGraphToBigWig 1 1 1
bwameth 1 20 20
calcCHHbias 1 10 10
calc_Mbias 1 10 10
conversionRate 1 1 1
getRandomCpGs 1 1 1
get_flagstat 1 1 1
indexMarkDupes 1 1 1
index_bam 1 1 1
link_deduped_bam 1 1 1
markDupes 1 10 10
methyl_extract 1 10 10
multiQC 1 1 1
origFASTQ1 1 1 1
origFASTQ2 1 1 1
produceReport 1 1 1
total 21 1 20
Select jobs to execute...
[Wed May 24 19:17:31 2023]
rule getRandomCpGs:
output: QC_metrics/randomCpG.bed
jobid: 12
reason: Missing output files: QC_metrics/randomCpG.bed
resources: tmpdir=/tmp
[Wed May 24 19:17:31 2023]
rule origFASTQ1:
input: /home/user/230523bisulfite_test/input/SRR17642783_R1.fastq.gz
output: originalFASTQ/SRR17642783_R1.fastq.gz
jobid: 5
reason: Missing output files: originalFASTQ/SRR17642783_R1.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
[Wed May 24 19:17:31 2023]
rule origFASTQ2:
input: /home/user/230523bisulfite_test/input/SRR17642783_R2.fastq.gz
output: originalFASTQ/SRR17642783_R2.fastq.gz
jobid: 7
reason: Missing output files: originalFASTQ/SRR17642783_R2.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
[Wed May 24 19:17:31 2023]
Finished job 5.
1 of 21 steps (5%) done
Select jobs to execute...
[Wed May 24 19:17:31 2023]
rule FASTQ1:
input: originalFASTQ/SRR17642783_R1.fastq.gz
output: FASTQ/SRR17642783_R1.fastq.gz
jobid: 4
reason: Missing output files: FASTQ/SRR17642783_R1.fastq.gz; Input files updated by another job: originalFASTQ/SRR17642783_R1.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
[Wed May 24 19:17:31 2023]
Finished job 7.
2 of 21 steps (10%) done
Select jobs to execute...
[Wed May 24 19:17:31 2023]
rule FASTQ2:
input: originalFASTQ/SRR17642783_R2.fastq.gz
output: FASTQ/SRR17642783_R2.fastq.gz
jobid: 6
reason: Missing output files: FASTQ/SRR17642783_R2.fastq.gz; Input files updated by another job: originalFASTQ/SRR17642783_R2.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
[Wed May 24 19:17:31 2023]
Finished job 4.
3 of 21 steps (14%) done
[Wed May 24 19:17:31 2023]
Finished job 6.
4 of 21 steps (19%) done
Select jobs to execute...
[Wed May 24 19:17:31 2023]
rule bwameth:
input: FASTQ/SRR17642783_R1.fastq.gz, FASTQ/SRR17642783_R2.fastq.gz
output: bwameth/SRR17642783.bam
log: bwameth/logs/SRR17642783.map_reads.err, bwameth/logs/SRR17642783.map_reads.out
jobid: 3
reason: Missing output files: bwameth/SRR17642783.bam; Input files updated by another job: FASTQ/SRR17642783_R1.fastq.gz, FASTQ/SRR17642783_R2.fastq.gz
wildcards: sample=SRR17642783
threads: 20
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.
warnings.warn(
/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
---- This analysis has been done using snakePipes version 2.7.3 ----
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 30
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Wed May 24 19:21:47 2023]
Finished job 12.
5 of 21 steps (24%) done
[Fri May 26 05:46:31 2023]
Finished job 3.
6 of 21 steps (29%) done
Select jobs to execute...
[Fri May 26 05:46:31 2023]
rule index_bam:
input: bwameth/SRR17642783.bam
output: bwameth/SRR17642783.bam.bai
log: bwameth/logs/SRR17642783.index_bam.err, bwameth/logs/SRR17642783.index_bam.out
jobid: 8
reason: Missing output files: bwameth/SRR17642783.bam.bai; Input files updated by another job: bwameth/SRR17642783.bam
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 06:00:18 2023]
Finished job 8.
7 of 21 steps (33%) done
Select jobs to execute...
[Fri May 26 06:00:18 2023]
rule markDupes:
input: bwameth/SRR17642783.bam, bwameth/SRR17642783.bam.bai
output: Sambamba/SRR17642783.markdup.bam
log: Sambamba/logs/SRR17642783.rm_dupes.err, Sambamba/logs/SRR17642783.rm_dupes.out
jobid: 2
reason: Missing output files: Sambamba/SRR17642783.markdup.bam; Input files updated by another job: bwameth/SRR17642783.bam, bwameth/SRR17642783.bam.bai
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/a2a5d9f83fe05e456d4c23bc00993556
[Fri May 26 10:03:51 2023]
Finished job 2.
8 of 21 steps (38%) done
Removing temporary output bwameth/SRR17642783.bam.
Removing temporary output bwameth/SRR17642783.bam.bai.
Select jobs to execute...
[Fri May 26 10:04:05 2023]
rule indexMarkDupes:
input: Sambamba/SRR17642783.markdup.bam
output: Sambamba/SRR17642783.markdup.bam.bai
log: Sambamba/logs/SRR17642783.indexMarkDupes.err, Sambamba/logs/SRR17642783.indexMarkDupes.out
jobid: 9
reason: Missing output files: Sambamba/SRR17642783.markdup.bam.bai; Input files updated by another job: Sambamba/SRR17642783.markdup.bam
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
Warning: the following output files of rule indexMarkDupes were not present when the DAG was created:
{'Sambamba/SRR17642783.markdup.bam.bai'}
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:22:17 2023]
Finished job 9.
9 of 21 steps (43%) done
Select jobs to execute...
[Fri May 26 10:22:17 2023]
rule link_deduped_bam:
input: Sambamba/SRR17642783.markdup.bam, Sambamba/SRR17642783.markdup.bam.bai
output: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
jobid: 1
reason: Missing output files: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam; Input files updated by another job: Sambamba/SRR17642783.markdup.bam, Sambamba/SRR17642783.markdup.bam.bai
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
[Fri May 26 10:22:17 2023]
Finished job 1.
10 of 21 steps (48%) done
Select jobs to execute...
[Fri May 26 10:22:18 2023]
rule calc_Mbias:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
output: QC_metrics/SRR17642783.Mbias.txt
log: QC_metrics/logs/SRR17642783.calc_Mbias.out
jobid: 15
reason: Missing output files: QC_metrics/SRR17642783.Mbias.txt; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
[Fri May 26 10:22:18 2023]
rule DepthOfCov:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/randomCpG.bed
output: QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt
log: QC_metrics/logs/DepthOfCov.err
jobid: 11
reason: Missing output files: QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/randomCpG.bed, filtered_bam/SRR17642783.filtered.bam
threads: 20
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:27:33 2023]
Finished job 15.
11 of 21 steps (52%) done
Select jobs to execute...
[Fri May 26 10:27:33 2023]
rule calcCHHbias:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
output: QC_metrics/SRR17642783.CHH.Mbias.txt
log: QC_metrics/logs/SRR17642783.calcCHHbias.err
jobid: 17
reason: Missing output files: QC_metrics/SRR17642783.CHH.Mbias.txt; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
[Fri May 26 10:29:01 2023]
Finished job 11.
12 of 21 steps (57%) done
Removing temporary output QC_metrics/randomCpG.bed.
Select jobs to execute...
[Fri May 26 10:29:01 2023]
rule DepthOfCovGenome:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
output: QC_metrics/genomeCoverage.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.coverageMetrics.txt
log: QC_metrics/logs/DepthOfCovGenome.err
jobid: 10
reason: Missing output files: QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.txt; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam
threads: 20
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:32:40 2023]
Finished job 10.
13 of 21 steps (62%) done
Select jobs to execute...
[Fri May 26 10:32:40 2023]
rule get_flagstat:
input: filtered_bam/SRR17642783.filtered.bam
output: QC_metrics/SRR17642783.flagstat
log: QC_metrics/logs/SRR17642783.get_flagstat.err
jobid: 18
reason: Missing output files: QC_metrics/SRR17642783.flagstat; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:32:41 2023]
rule methyl_extract:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/SRR17642783.Mbias.txt
output: MethylDackel/SRR17642783_CpG.bedGraph
log: MethylDackel/logs/SRR17642783.methyl_extract.err, MethylDackel/logs/SRR17642783.methyl_extract.out
jobid: 14
reason: Missing output files: MethylDackel/SRR17642783_CpG.bedGraph; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/SRR17642783.Mbias.txt, filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
[Fri May 26 10:39:04 2023]
Finished job 14.
14 of 21 steps (67%) done
Select jobs to execute...
[Fri May 26 10:39:04 2023]
rule bedGraphToBigWig:
input: MethylDackel/SRR17642783_CpG.bedGraph, /home/user/genome/GRCh38_gencode_release29/genome_fasta/genome.fa.fai
output: MethylDackel/SRR17642783_CpG.methylation.bw, MethylDackel/SRR17642783_CpG.coverage.bw
log: MethylDackel/logs/SRR17642783_bedGraphToBigWig.stderr
jobid: 20
reason: Missing output files: MethylDackel/SRR17642783_CpG.methylation.bw, MethylDackel/SRR17642783_CpG.coverage.bw; Input files updated by another job: MethylDackel/SRR17642783_CpG.bedGraph
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:39:43 2023]
Finished job 17.
15 of 21 steps (71%) done
Select jobs to execute...
[Fri May 26 10:39:43 2023]
rule conversionRate:
input: QC_metrics/SRR17642783.CHH.Mbias.txt
output: QC_metrics/SRR17642783.conv.rate.txt
log: QC_metrics/logs/SRR17642783.conversionRate.log
jobid: 16
reason: Missing output files: QC_metrics/SRR17642783.conv.rate.txt; Input files updated by another job: QC_metrics/SRR17642783.CHH.Mbias.txt
wildcards: sample=SRR17642783
resources: tmpdir=/tmp
[Fri May 26 10:39:43 2023]
Finished job 16.
16 of 21 steps (76%) done
Removing temporary output QC_metrics/SRR17642783.CHH.Mbias.txt.
[Fri May 26 10:43:20 2023]
Finished job 20.
17 of 21 steps (81%) done
[Fri May 26 10:45:34 2023]
Finished job 18.
18 of 21 steps (86%) done
Select jobs to execute...
[Fri May 26 10:45:34 2023]
rule multiQC:
input: QC_metrics/SRR17642783.flagstat
output: multiQC/multiqc_report.html
log: multiQC/multiQC.out, multiQC/multiQC.err
jobid: 19
reason: Missing output files: multiQC/multiqc_report.html; Input files updated by another job: QC_metrics/SRR17642783.flagstat
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:45:34 2023]
rule produceReport:
input: MethylDackel/SRR17642783_CpG.bedGraph, QC_metrics/genomeCoverage.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/SRR17642783.conv.rate.txt, QC_metrics/SRR17642783.Mbias.txt, QC_metrics/SRR17642783.flagstat
output: QC_metrics/QC_report.html
jobid: 13
reason: Missing output files: QC_metrics/QC_report.html; Input files updated by another job: QC_metrics/genomeCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/SRR17642783.conv.rate.txt, QC_metrics/SRR17642783.Mbias.txt, QC_metrics/CpGCoverage.png, MethylDackel/SRR17642783_CpG.bedGraph, QC_metrics/SRR17642783.flagstat, QC_metrics/genomeCoverage.txt
resources: tmpdir=/tmp
Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
[Fri May 26 10:45:42 2023]
Finished job 19.
19 of 21 steps (90%) done
Quitting from lines 197-212 (tmpwvy1gkwq.WGBS_QC_report_template.Rmd)
Error in rep(1, nrow(V)) : invalid 'times' argument
Calls: ... eval -> PCA -> svd.triplet -> as.vector -> crossprod
Execution halted
[Fri May 26 10:47:49 2023]
Error in rule produceReport:
jobid: 13
input: MethylDackel/SRR17642783_CpG.bedGraph, QC_metrics/genomeCoverage.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/SRR17642783.conv.rate.txt, QC_metrics/SRR17642783.Mbias.txt, QC_metrics/SRR17642783.flagstat
output: QC_metrics/QC_report.html
conda-env: /home/user/miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
RuleException:
CalledProcessError in line 302 of /home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/WGBS.snakefile:
Command 'source /home/user/miniconda3/bin/activate '/home/user/miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d'; set -euo pipefail; Rscript --vanilla -e 'rmarkdown::render("/home/user/230523bisulfite_test/output/.snakemake/scripts/tmpwvy1gkwq.WGBS_QC_report_template.Rmd", output_file="/home/user/230523bisulfite_test/output/QC_metrics/QC_report.html", quiet=TRUE, knit_root_dir = "/home/user/230523bisulfite_test/output", params = list(rmd="/home/user/230523bisulfite_test/output/.snakemake/scripts/tmpwvy1gkwq.WGBS_QC_report_template.Rmd"))'' returned non-zero exit status 1.
File "/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/WGBS.snakefile", line 302, in __rule_produceReport
File "/home/user/miniconda3/envs/snakePipes/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-05-24T191728.932387.snakemake.log
!!! ERROR in WGBS workflow! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The text was updated successfully, but these errors were encountered: