Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in WGBS workflow (invalid 'times' argument) #898

Closed
kaznt opened this issue May 29, 2023 · 12 comments · Fixed by #944
Closed

Error in WGBS workflow (invalid 'times' argument) #898

kaznt opened this issue May 29, 2023 · 12 comments · Fixed by #944

Comments

@kaznt
Copy link

kaznt commented May 29, 2023

Hi team,

I ran WGBS workflow with publically available WGBS dataset and met an error.
I checked the issuses but couldn't find same error I met.
Below, I attached the logs.

Thank you for your help.

/home/user/miniconda3/envs/snakePipes/bin/WGBS -i input -o output GRCh38_gencode_release29 --local -j 30

TMPDIR=/tmp PYTHONNOUSERSITE=True /home/user/miniconda3/envs/snakePipes/bin/snakemake --use-conda --conda-prefix /home/user/miniconda3/envs --latency-wait 300 --snakefile /home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/snakePipes/workflows/WGBS/Snakefile --jobs 30 --directory /home/user/230523bisulfite_test/output --configfile /home/user/230523bisulfite_test/output/WGBS.config.yaml --keep-going

/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.
warnings.warn(
/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

---- This analysis has been done using snakePipes version 2.7.3 ----
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /bin/bash
Provided cores: 30
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads


DepthOfCov 1 20 20
DepthOfCovGenome 1 20 20
FASTQ1 1 1 1
FASTQ2 1 1 1
all 1 1 1
bedGraphToBigWig 1 1 1
bwameth 1 20 20
calcCHHbias 1 10 10
calc_Mbias 1 10 10
conversionRate 1 1 1
getRandomCpGs 1 1 1
get_flagstat 1 1 1
indexMarkDupes 1 1 1
index_bam 1 1 1
link_deduped_bam 1 1 1
markDupes 1 10 10
methyl_extract 1 10 10
multiQC 1 1 1
origFASTQ1 1 1 1
origFASTQ2 1 1 1
produceReport 1 1 1
total 21 1 20

Select jobs to execute...

[Wed May 24 19:17:31 2023]
rule getRandomCpGs:
output: QC_metrics/randomCpG.bed
jobid: 12
reason: Missing output files: QC_metrics/randomCpG.bed
resources: tmpdir=/tmp

[Wed May 24 19:17:31 2023]
rule origFASTQ1:
input: /home/user/230523bisulfite_test/input/SRR17642783_R1.fastq.gz
output: originalFASTQ/SRR17642783_R1.fastq.gz
jobid: 5
reason: Missing output files: originalFASTQ/SRR17642783_R1.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

[Wed May 24 19:17:31 2023]
rule origFASTQ2:
input: /home/user/230523bisulfite_test/input/SRR17642783_R2.fastq.gz
output: originalFASTQ/SRR17642783_R2.fastq.gz
jobid: 7
reason: Missing output files: originalFASTQ/SRR17642783_R2.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

[Wed May 24 19:17:31 2023]
Finished job 5.
1 of 21 steps (5%) done
Select jobs to execute...

[Wed May 24 19:17:31 2023]
rule FASTQ1:
input: originalFASTQ/SRR17642783_R1.fastq.gz
output: FASTQ/SRR17642783_R1.fastq.gz
jobid: 4
reason: Missing output files: FASTQ/SRR17642783_R1.fastq.gz; Input files updated by another job: originalFASTQ/SRR17642783_R1.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

[Wed May 24 19:17:31 2023]
Finished job 7.
2 of 21 steps (10%) done
Select jobs to execute...

[Wed May 24 19:17:31 2023]
rule FASTQ2:
input: originalFASTQ/SRR17642783_R2.fastq.gz
output: FASTQ/SRR17642783_R2.fastq.gz
jobid: 6
reason: Missing output files: FASTQ/SRR17642783_R2.fastq.gz; Input files updated by another job: originalFASTQ/SRR17642783_R2.fastq.gz
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

[Wed May 24 19:17:31 2023]
Finished job 4.
3 of 21 steps (14%) done
[Wed May 24 19:17:31 2023]
Finished job 6.
4 of 21 steps (19%) done
Select jobs to execute...

[Wed May 24 19:17:31 2023]
rule bwameth:
input: FASTQ/SRR17642783_R1.fastq.gz, FASTQ/SRR17642783_R2.fastq.gz
output: bwameth/SRR17642783.bam
log: bwameth/logs/SRR17642783.map_reads.err, bwameth/logs/SRR17642783.map_reads.out
jobid: 3
reason: Missing output files: bwameth/SRR17642783.bam; Input files updated by another job: FASTQ/SRR17642783_R1.fastq.gz, FASTQ/SRR17642783_R2.fastq.gz
wildcards: sample=SRR17642783
threads: 20
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/google/protobuf/internal/api_implementation.py:110: UserWarning: Selected implementation cpp is not available.
warnings.warn(
/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

---- This analysis has been done using snakePipes version 2.7.3 ----
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 30
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Wed May 24 19:21:47 2023]
Finished job 12.
5 of 21 steps (24%) done
[Fri May 26 05:46:31 2023]
Finished job 3.
6 of 21 steps (29%) done
Select jobs to execute...

[Fri May 26 05:46:31 2023]
rule index_bam:
input: bwameth/SRR17642783.bam
output: bwameth/SRR17642783.bam.bai
log: bwameth/logs/SRR17642783.index_bam.err, bwameth/logs/SRR17642783.index_bam.out
jobid: 8
reason: Missing output files: bwameth/SRR17642783.bam.bai; Input files updated by another job: bwameth/SRR17642783.bam
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 06:00:18 2023]
Finished job 8.
7 of 21 steps (33%) done
Select jobs to execute...

[Fri May 26 06:00:18 2023]
rule markDupes:
input: bwameth/SRR17642783.bam, bwameth/SRR17642783.bam.bai
output: Sambamba/SRR17642783.markdup.bam
log: Sambamba/logs/SRR17642783.rm_dupes.err, Sambamba/logs/SRR17642783.rm_dupes.out
jobid: 2
reason: Missing output files: Sambamba/SRR17642783.markdup.bam; Input files updated by another job: bwameth/SRR17642783.bam, bwameth/SRR17642783.bam.bai
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/a2a5d9f83fe05e456d4c23bc00993556
[Fri May 26 10:03:51 2023]
Finished job 2.
8 of 21 steps (38%) done
Removing temporary output bwameth/SRR17642783.bam.
Removing temporary output bwameth/SRR17642783.bam.bai.
Select jobs to execute...

[Fri May 26 10:04:05 2023]
rule indexMarkDupes:
input: Sambamba/SRR17642783.markdup.bam
output: Sambamba/SRR17642783.markdup.bam.bai
log: Sambamba/logs/SRR17642783.indexMarkDupes.err, Sambamba/logs/SRR17642783.indexMarkDupes.out
jobid: 9
reason: Missing output files: Sambamba/SRR17642783.markdup.bam.bai; Input files updated by another job: Sambamba/SRR17642783.markdup.bam
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

Warning: the following output files of rule indexMarkDupes were not present when the DAG was created:
{'Sambamba/SRR17642783.markdup.bam.bai'}
Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:22:17 2023]
Finished job 9.
9 of 21 steps (43%) done
Select jobs to execute...

[Fri May 26 10:22:17 2023]
rule link_deduped_bam:
input: Sambamba/SRR17642783.markdup.bam, Sambamba/SRR17642783.markdup.bam.bai
output: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
jobid: 1
reason: Missing output files: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam; Input files updated by another job: Sambamba/SRR17642783.markdup.bam, Sambamba/SRR17642783.markdup.bam.bai
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

[Fri May 26 10:22:17 2023]
Finished job 1.
10 of 21 steps (48%) done
Select jobs to execute...

[Fri May 26 10:22:18 2023]
rule calc_Mbias:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
output: QC_metrics/SRR17642783.Mbias.txt
log: QC_metrics/logs/SRR17642783.calc_Mbias.out
jobid: 15
reason: Missing output files: QC_metrics/SRR17642783.Mbias.txt; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d

[Fri May 26 10:22:18 2023]
rule DepthOfCov:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/randomCpG.bed
output: QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt
log: QC_metrics/logs/DepthOfCov.err
jobid: 11
reason: Missing output files: QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/randomCpG.bed, filtered_bam/SRR17642783.filtered.bam
threads: 20
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:27:33 2023]
Finished job 15.
11 of 21 steps (52%) done
Select jobs to execute...

[Fri May 26 10:27:33 2023]
rule calcCHHbias:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
output: QC_metrics/SRR17642783.CHH.Mbias.txt
log: QC_metrics/logs/SRR17642783.calcCHHbias.err
jobid: 17
reason: Missing output files: QC_metrics/SRR17642783.CHH.Mbias.txt; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
[Fri May 26 10:29:01 2023]
Finished job 11.
12 of 21 steps (57%) done
Removing temporary output QC_metrics/randomCpG.bed.
Select jobs to execute...

[Fri May 26 10:29:01 2023]
rule DepthOfCovGenome:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai
output: QC_metrics/genomeCoverage.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.coverageMetrics.txt
log: QC_metrics/logs/DepthOfCovGenome.err
jobid: 10
reason: Missing output files: QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.txt; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, filtered_bam/SRR17642783.filtered.bam
threads: 20
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:32:40 2023]
Finished job 10.
13 of 21 steps (62%) done
Select jobs to execute...

[Fri May 26 10:32:40 2023]
rule get_flagstat:
input: filtered_bam/SRR17642783.filtered.bam
output: QC_metrics/SRR17642783.flagstat
log: QC_metrics/logs/SRR17642783.get_flagstat.err
jobid: 18
reason: Missing output files: QC_metrics/SRR17642783.flagstat; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:32:41 2023]
rule methyl_extract:
input: filtered_bam/SRR17642783.filtered.bam, filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/SRR17642783.Mbias.txt
output: MethylDackel/SRR17642783_CpG.bedGraph
log: MethylDackel/logs/SRR17642783.methyl_extract.err, MethylDackel/logs/SRR17642783.methyl_extract.out
jobid: 14
reason: Missing output files: MethylDackel/SRR17642783_CpG.bedGraph; Input files updated by another job: filtered_bam/SRR17642783.filtered.bam.bai, QC_metrics/SRR17642783.Mbias.txt, filtered_bam/SRR17642783.filtered.bam
wildcards: sample=SRR17642783
threads: 10
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
[Fri May 26 10:39:04 2023]
Finished job 14.
14 of 21 steps (67%) done
Select jobs to execute...

[Fri May 26 10:39:04 2023]
rule bedGraphToBigWig:
input: MethylDackel/SRR17642783_CpG.bedGraph, /home/user/genome/GRCh38_gencode_release29/genome_fasta/genome.fa.fai
output: MethylDackel/SRR17642783_CpG.methylation.bw, MethylDackel/SRR17642783_CpG.coverage.bw
log: MethylDackel/logs/SRR17642783_bedGraphToBigWig.stderr
jobid: 20
reason: Missing output files: MethylDackel/SRR17642783_CpG.methylation.bw, MethylDackel/SRR17642783_CpG.coverage.bw; Input files updated by another job: MethylDackel/SRR17642783_CpG.bedGraph
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac
[Fri May 26 10:39:43 2023]
Finished job 17.
15 of 21 steps (71%) done
Select jobs to execute...

[Fri May 26 10:39:43 2023]
rule conversionRate:
input: QC_metrics/SRR17642783.CHH.Mbias.txt
output: QC_metrics/SRR17642783.conv.rate.txt
log: QC_metrics/logs/SRR17642783.conversionRate.log
jobid: 16
reason: Missing output files: QC_metrics/SRR17642783.conv.rate.txt; Input files updated by another job: QC_metrics/SRR17642783.CHH.Mbias.txt
wildcards: sample=SRR17642783
resources: tmpdir=/tmp

[Fri May 26 10:39:43 2023]
Finished job 16.
16 of 21 steps (76%) done
Removing temporary output QC_metrics/SRR17642783.CHH.Mbias.txt.
[Fri May 26 10:43:20 2023]
Finished job 20.
17 of 21 steps (81%) done
[Fri May 26 10:45:34 2023]
Finished job 18.
18 of 21 steps (86%) done
Select jobs to execute...

[Fri May 26 10:45:34 2023]
rule multiQC:
input: QC_metrics/SRR17642783.flagstat
output: multiQC/multiqc_report.html
log: multiQC/multiQC.out, multiQC/multiQC.err
jobid: 19
reason: Missing output files: multiQC/multiqc_report.html; Input files updated by another job: QC_metrics/SRR17642783.flagstat
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/081ae7e483778bba002ffd0e803c7dac

[Fri May 26 10:45:34 2023]
rule produceReport:
input: MethylDackel/SRR17642783_CpG.bedGraph, QC_metrics/genomeCoverage.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/SRR17642783.conv.rate.txt, QC_metrics/SRR17642783.Mbias.txt, QC_metrics/SRR17642783.flagstat
output: QC_metrics/QC_report.html
jobid: 13
reason: Missing output files: QC_metrics/QC_report.html; Input files updated by another job: QC_metrics/genomeCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/SRR17642783.conv.rate.txt, QC_metrics/SRR17642783.Mbias.txt, QC_metrics/CpGCoverage.png, MethylDackel/SRR17642783_CpG.bedGraph, QC_metrics/SRR17642783.flagstat, QC_metrics/genomeCoverage.txt
resources: tmpdir=/tmp

Activating conda environment: ../../miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d
[Fri May 26 10:45:42 2023]
Finished job 19.
19 of 21 steps (90%) done
Quitting from lines 197-212 (tmpwvy1gkwq.WGBS_QC_report_template.Rmd)
Error in rep(1, nrow(V)) : invalid 'times' argument
Calls: ... eval -> PCA -> svd.triplet -> as.vector -> crossprod
Execution halted
[Fri May 26 10:47:49 2023]
Error in rule produceReport:
jobid: 13
input: MethylDackel/SRR17642783_CpG.bedGraph, QC_metrics/genomeCoverage.txt, QC_metrics/genomeCoverage.png, QC_metrics/genomeCoverage.coverageMetrics.txt, QC_metrics/CpGCoverage.txt, QC_metrics/CpGCoverage.png, QC_metrics/CpGCoverage.coverageMetrics.txt, QC_metrics/SRR17642783.conv.rate.txt, QC_metrics/SRR17642783.Mbias.txt, QC_metrics/SRR17642783.flagstat
output: QC_metrics/QC_report.html
conda-env: /home/user/miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d

RuleException:
CalledProcessError in line 302 of /home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/WGBS.snakefile:
Command 'source /home/user/miniconda3/bin/activate '/home/user/miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d'; set -euo pipefail; Rscript --vanilla -e 'rmarkdown::render("/home/user/230523bisulfite_test/output/.snakemake/scripts/tmpwvy1gkwq.WGBS_QC_report_template.Rmd", output_file="/home/user/230523bisulfite_test/output/QC_metrics/QC_report.html", quiet=TRUE, knit_root_dir = "/home/user/230523bisulfite_test/output", params = list(rmd="/home/user/230523bisulfite_test/output/.snakemake/scripts/tmpwvy1gkwq.WGBS_QC_report_template.Rmd"))'' returned non-zero exit status 1.
File "/home/user/miniconda3/envs/snakePipes/lib/python3.11/site-packages/snakePipes/shared/rules/WGBS.snakefile", line 302, in __rule_produceReport
File "/home/user/miniconda3/envs/snakePipes/lib/python3.11/concurrent/futures/thread.py", line 58, in run
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-05-24T191728.932387.snakemake.log

!!! ERROR in WGBS workflow! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

@kaznt kaznt changed the title Error in WGBS workflow Error in WGBS workflow (invalid 'times' argument) May 29, 2023
@katsikora
Copy link
Contributor

Hi @kaznt,

thanks for submitting your error log.
I cannot reproduce your issue with snakePipes 2.7.3 on our WGBS test data from zenodo.

Could you try rerunning with that test dataset?

Best wishes,

Katarzyna

@kaznt
Copy link
Author

kaznt commented Jun 3, 2023

Hi @katsikora,

Thank you for your response.
I've tested the samples you mentioned in zenodo, and yes, it passed all analysis without any problem, indicating that the issue is derived from the samples I throw-in.
I used publically available bisulfite sequence dataset from SRA, https://www.ncbi.nlm.nih.gov/sra/?term=SRR17642783
Do you have any idea why it fails to in PCA analysis?

Also, I was wondering it is normal or not, but the mapping rate in bisulfite sequence analysis is incredibly poor, such like 0.8% to 1%.
I'm not familier with bisulfite analysis, so if it is usual, then it is OK, but if it is not expected result, how I can improve the mapping rate?

Thank you.
Best regards,

@katsikora
Copy link
Contributor

Hi Kaznt,

thanks for checking on our sample data.
If the mapping rates are so low, are you sure you're mapping to the expected organism?
Also, did you dump the files with --split-files ?

Best wishes,

Katarzyna

@kaznt
Copy link
Author

kaznt commented Jun 8, 2023

Hi @katsikora,

I confirmed the species and re-run with GRCm38 dataset I downloaded from the site here (https://zenodo.org/record/4468065).
Not sure what does --split-files mean, but I listed the md5 hash below.

ff817f24b945f9bd7cd5558b213b8679 SRX202087_R1.fastq.gz
05998326f8083072b1ca3db14b10e976 SRX202087_R2.fastq.gz
3337cf4890af2944430cacd6bf90304a SRX202088_R1.fastq.gz
c9c63d241ba20431edbde8716b2eb209 SRX202088_R2.fastq.gz
d4b1eb33c1a10ab245bce2ba504c3fd4 SRX271141_R1.fastq.gz
dfd414db40067ef4e3f2d33045c79dea SRX271141_R2.fastq.gz
d2b31698889055fd49ee131c4161095c SRX271142_R1.fastq.gz
5fdbb6cc94b3d3e4cb5f17364c239d76 SRX271142_R2.fastq.gz

4e004c5032ade9aa37cc795890051f7c genome.fa (GRCm38)

I attached the map rate.
スクリーンショット 2023-06-08 10 54 04

Command I issue is below.
$ WGBS -i input -o output GRCm38_gencode_release19 --local -j 30

Best regards,

@kaznt
Copy link
Author

kaznt commented Jun 8, 2023

Additionally, when I mapped manually by using bwameth with SRX202087 fastq files above, map rate is >99% indicating that the dataset is fine.
I wonder if the pipeline refers genome.fa instead of genome.fa.bwameth.c2t

Best,

@kaznt
Copy link
Author

kaznt commented Jun 8, 2023

I've checked bwameth log. Seems reffering c2t fa though...


Found BWA MEM index
running: /home/user/miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d/bin/python /home/user/miniconda3/envs/e6ce94c5f4ccaba458228e8445f6d20d/bin/bwameth.py c2t FASTQ/SRX202087_R1.fastq.gz FASTQ/SRX202087_R2.fastq.gz |bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R '@rg\tID:SRX202087_R\tSM:SRX202087_R' -t 20 /home/user/genome/GRCm38_gencode_release19/BWAmethIndex/genome.fa.bwameth.c2t /dev/stdin

converting reads in FASTQ/SRX202087_R1.fastq.gz,FASTQ/SRX202087_R2.fastq.gz
[M::bwa_idx_load_from_disk] read 0 ALT contigs
WARNING: 20833 reads with length < 80
: this program is designed for long reads
[M::process] read 549664 sequences (53699076 bp)...
[M::process] 0 single-end sequences; 549664 paired-end sequences
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 217688, 2, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (162, 186, 214)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (58, 318)
[M::mem_pestat] mean and std.dev: (189.55, 39.70)
[M::mem_pestat] low and high boundaries for proper pairs: (6, 370)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 549664 reads in 308.283 CPU sec, 15.830 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -T 40 -B 2 -L 10 -CM -U 100 -p -R @rg\tID:SRX202087_R\tSM:SRX202087_R -t 20 /home/user/genome/GRCm38_gencode_release19/BWAmethIndex/genome.fa.bwameth.c2t /dev/stdin
[main] Real time: 33.293 sec; CPU: 314.966 sec
se1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[W::sam_parse1] unrecognized reference name ""; treated as unmapped
[bam_sort_core] merging from 0 files and 4 in-memory blocks...

Best,

@katsikora
Copy link
Contributor

Hi Kaznt,

thanks for looking into the details. Indeed it looks like a potential issue with bwameth. Either it's the parsing of chromosome names by bwameth (as samtools is complaining about empty strings in chromosome names), or indeed the reads were unmapped to start with.
Can you paste the command that you used to map manually and that returned high mapping rates? Did you use the same reference index and the same bwameth version as for snakePipes?
I'll try to reproduce the error on your dataset then.

Best wishes,

Katarzyna

@kaznt
Copy link
Author

kaznt commented Jun 21, 2023

Hi katsikora,

Sorry for a bit delay for my response.
I paste the comand and output below.

<As I shown below, I use independently prepared bwameth via conda>
conda create -n bwameth -c bioconda bwameth
conda activate bwameth
bwameth.py --version
bwa-meth.py 0.2.7

ln -s ../input/* . ### It is preparing input fastq files as to use same files of input for snakepipes
ln -s ~/data/genome/GRCm38_gencode_release19/BWAmethIndex/* . ### It is preparing same reference and index I download from zenodo server
bwameth.py --reference genome.fa SRX202087_R1.fastq.gz SRX202087_R2.fastq.gz > output.sam
samtools view -Sb output.sam > output.bam
samtools sort output.bam > output.sort.bam
samtools flagstat output.sort.bam

533342 + 16516 in total (QC-passed reads + QC-failed reads)
533192 + 16472 primary
150 + 44 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
533059 + 16516 mapped (99.95% : 100.00%)
532909 + 16472 primary mapped (99.95% : 100.00%)
533192 + 16472 paired in sequencing
266596 + 8236 read1
266596 + 8236 read2
531192 + 0 properly paired (99.62% : 0.00%)
532666 + 16472 with itself and mate mapped
243 + 0 singletons (0.05% : 0.00%)
22 + 2 with mate mapped to a different chr
9 + 1 with mate mapped to a different chr (mapQ>=5)

Best regards,

@katsikora
Copy link
Contributor

katsikora commented Jun 23, 2023

Hi Kaznt,

thanks for providing your command, it all looks good.

I rerun WGBS with the dataset you pointed to (https://www.ncbi.nlm.nih.gov/sra/?term=SRR17642783 ). I aligned to the human genome and obtained a mapping rate of 99.54%. In the case where you observed the 1% mapping rates, there is definitely something wrong. My vote would go either to the reads (library prep not WGBS?), or then perhaps the genome doesn't match, or there's something about the bwameth index. I'll double check the genome index we're providing on zenodo.

Independently of that, I could reproduce your original error with the produce report rule:

Quitting from lines 197-212 [unnamed-chunk-10] (tmpqedjdto9.WGBS_QC_report_template.Rmd)
Error in `rep()`:
! invalid 'times' argument
Backtrace:
 1. FactoMineR::PCA(t(methyl[IDX, ]), graph = FALSE)
 2. FactoMineR::svd.triplet(X, row.w = row.w, col.w = col.w, ncp = ncp)
 4. base::crossprod(rep(1, nrow(V)), as.matrix(V))

I suspect it's because I only submitted one sample and then the PCA plot (understandably) failed. I'll add some code that will skip PCA when there is only one sample, but the other QC metrics will be returned.

I hope this helps,

Best wishes,

Katarzyna

@kaznt
Copy link
Author

kaznt commented Jul 5, 2023

Thank you for your help.

I'll keep this topic open until when you check the zenodo reference.

Best,

@katsikora
Copy link
Contributor

Hi Kaznt,

I get

142236831 + 6645913 mapped (94.89% : 99.96%)

when mapping FASTQ/SRR610039_R1.fastq.gz, FASTQ/SRR610039_R2.fastq.gz to our GRCm38_gencode_release19 from zenodo.

In that sense, I cannot reproduce your observation related to mapping rates <1%.

Best wishes,

Katarzyna

@katsikora
Copy link
Contributor

I'll close the issue for now, please feel free to reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants