Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REFERENCE: Running the pipeline with 50 samples. #4

Closed
genomicsITER opened this issue May 25, 2020 · 11 comments
Closed

REFERENCE: Running the pipeline with 50 samples. #4

genomicsITER opened this issue May 25, 2020 · 11 comments

Comments

@genomicsITER
Copy link
Owner

hi, the problems is that run the pipline with 1 sample is perfectct,but my data has 50 samples,it always occurs error ,when i run the 50 samples with the parameter "--reads 'my path/*.fastq" .

Originally posted by @HaiyangDu in #1 (comment)

@genomicsITER
Copy link
Owner Author

Could you provide more details about the error?

I also recommend to run the pipeline with the new update and contact if the problem persist in this new issue. I've adapted the output plotting scripts to a big number of samples.

@DavidFY-Hub
Copy link

您能否提供有关该错误的更多详细信息?

我还建议使用新的更新来运行管道,如果此问题仍然存在,请联系。我已经将输出绘图脚本改编为大量示例。
thank,you.
yes,i am using the new pipline to run ,and later i will tell you the result whatever.

@DavidFY-Hub
Copy link

#######
cutor > local (112)
[49/e62e0e] process > QC (15) [100%] 34 of 34 ✔
[ac/29994e] process > fastqc (34) [100%] 34 of 34 ✔
[- ] process > multiqc [ 0%] 0 of 1
[90/a2d271] process > kmer_freqs (34) [100%] 34 of 34 ✔
[3e/2b620e] process > read_clustering (9) [ 18%] 6 of 33, failed: 3, retries: 3
[- ] process > split_by_cluster [ 0%] 0 of 3
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[bb/067704] process > output_documentation [100%] 1 of 1 ✔
[0;35m[nf-core/nanoclust] Pipeline completed with errors
[dc/639563] NOTE: Process read_clustering (5) terminated with an error exit status (137) -- Execution is retried (1)
WARN: Killing pending tasks (3)
Execution aborted due to an unexpected error
#######

i just run 34 samples ,the problems is coming,
i wander that one samlpes one by one to run and all samples run togehter ,whether the results are consistent?

@DavidFY-Hub
Copy link

  _   __                     ________    __  _____________
 / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
/  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /   

/ /| / // / / / / // / / // // // // // /
/
/ |
/_,// //_/ _/__/_//___///

NanoCLUST v1.0dev

Run Name : ridiculous_kare
Reads : 16s/*.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Container : docker - [:]
Output dir : result_nanopore6
Launch dir : /Volumes/rest/work/16s/0522/13
Working dir : /Volumes/rest/work/16s/0522/13/work
Script dir : /Volumes/rest/work/16s/0522/13/NanoCLUST
User : root
Config Profile : docker

executor > local (5)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
executor > local (6)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
executor > local (6)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
executor > local (6)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
executor > local (6)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
[31/6f7e04] process > fastqc (7) [100%] 7 of 7, cached: 7 ✔
executor > local (7)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
[31/6f7e04] process > fastqc (7) [100%] 7 of 7, cached: 7 ✔
executor > local (7)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
executor > local (9)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
[31/6f7e04] process > fastqc (7) [100%] 7 of 7, cached: 7 ✔
executor > local (11)
executor > local (12)
executor > local (1117)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
[31/6f7e04] process > fastqc (7) [100%] 7 of 7, cached: 7 ✔
[f6/04a5fb] process > multiqc [100%] 1 of 1 ✔
[d9/5e7d70] process > kmer_freqs (7) [100%] 7 of 7, cached: 7 ✔
executor > local (1118)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
[31/6f7e04] process > fastqc (7) [100%] 7 of 7, cached: 7 ✔
[f6/04a5fb] process > multiqc [100%] 1 of 1 ✔
[d9/5e7d70] process > kmer_freqs (7) [100%] 7 of 7, cached: 7 ✔
executor > local (1118)
[ed/b232b2] process > QC (4) [100%] 7 of 7, cached: 7 ✔
[31/6f7e04] process > fastqc (7) [100%] 7 of 7, cached: 7 ✔
[f6/04a5fb] process > multiqc [100%] 1 of 1 ✔
[d9/5e7d70] process > kmer_freqs (7) [100%] 7 of 7, cached: 7 ✔
[3f/aac4b5] process > read_clustering (6) [100%] 13 of 13, failed: 6, retries: 6 ✔
[d5/67e0c3] process > split_by_cluster (7) [100%] 7 of 7 ✔
[54/70be5a] process > read_correction (745) [100%] 746 of 746 ✔
[4e/9b2532] process > draft_selection (345) [ 46%] 346 of 746
[1c/0ad410] process > racon_pass (5) [ 1%] 5 of 345, failed: 1
[- ] process > medaka_pass [ 0%] 0 of 2
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[bb/067704] process > output_documentation [100%] 1 of 1, cached: 1 ✔
Execution cancelled -- Finishing pending tasks before exit
Error executing process > 'racon_pass (4)'

Caused by:
Process racon_pass (4) terminated with an error exit status (1)

Command executed:

minimap2 -ax map-ont --no-long-join -r100 -a draft_read.fasta corrected_reads.correctedReads.fasta -o aligned.sam
racon --quality-threshold=9 -w 250 corrected_reads.correctedReads.fasta aligned.sam draft_read.fasta > racon_consensus.fasta

Command exit status:
1

Command output:
(empty)

Command error:
[M::mm_idx_gen::0.0390.11] collected minimizers
[M::mm_idx_gen::0.040
0.14] sorted minimizers
[M::main::0.0410.14] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.041
0.14] mid_occ = 2
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.0410.14] distinct minimizers: 305 (100.00% are singletons); average occurrences: 1.000; average spacing: 5.472
[M::worker_pipeline::0.086
0.09] mapped 7 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -ax map-ont --no-long-join -r100 -a -o aligned.sam draft_read.fasta corrected_reads.correctedReads.fasta
[M::main] Real time: 0.088 sec; CPU: 0.009 sec; Peak RSS: 0.003 GB
[racon::Polisher::initialize] loaded target sequences 0.000883 s
[racon::Polisher::initialize] loaded sequences 0.000965 s
[racon::Polisher::initialize] error: empty overlap set!

Work dir:
/Volumes/rest/work/16s/0522/13/work/42/179274480642c682e93cc5a2f6bbc7

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

[1]+ Stopped sudo nextflow run NanoCLUST/main.nf --reads '16s/.fastq' -profile docker --db db/16S_ribosomal_RNA --tax db/taxdb/ --polishing_reads 8 --min_cluster_size 20 --outdir result_nanopore6 -resume nanaopore136
(base) dhys-MacBook-Air:13 dhy$ sudo nextflow run NanoCLUST/main.nf --reads '16s/
.fastq' -profile docker --db db/16S_ribosomal_RNA --tax db/taxdb/ --polishing_reads 5 --min_cluster_size 20 --outdir result_nanopore6 -resume nanaopore136
Password:
N E X T F L O W ~ version 20.04.1
Launching NanoCLUST/main.nf [cheesy_morse] - revision: 5166b629f0
Unable to acquire lock on session with ID 26c6a5d3-da6b-462a-b45a-1082dd786649

Common reasons of this error are:

  • You are trying to resume the execution of an already running pipeline
  • A previous execution was abruptly interrupted leaving the session open

You can check what process is holding the lock file by using the following command:

  • lsof /Volumes/rest/work/16s/0522/13/.nextflow/cache/26c6a5d3-da6b-462a-b45a-1082dd786649/db/LOCK
    #########
    this is the error ,i just run 7 samples

@genomicsITER
Copy link
Owner Author

genomicsITER commented May 26, 2020

Hi, thank you for the logs.

I've found some issues when running the pipeline setting the parameters min_cluster_size and polishing_reads with too low values. This may be a cause of problem in canu/racon/medaka processes.

The values assigned in the test profile (50 and 20) are too low and could be not suitable for real samples (no mock community) and bigger files. I recommend to set polishing_reads with 500-1000 and also provide a higher min_cluster_size value (100-300) and see if you don't get an error.

Thank you for your time and testing the tool.

@DavidFY-Hub
Copy link

ok,thans you,i will try,

@DavidFY-Hub
Copy link

i have changed the parament

nextflow run NanoCLUST/main.nf --reads 'no/*.fastq' -profile docker --db db/16S_ribosomal_RNA --tax db/taxdb/ --polishing_reads 500 --min_cluster_size 200 --outdir result_nanopore22222 -name nanaopore1all2
Password:
N E X T F L O W ~ version 20.04.1
Launching NanoCLUST/main.nf [nanaopore1all2] - revision: 5166b629f0

  _   __                     ________    __  _____________
 / | / /___ _____  ____     / ____/ /   / / / / ___/_  __/
/  |/ / __ `/ __ \/ __ \   / /   / /   / / / /\__ \ / /   

/ /| / // / / / / // / / // // // // // /
/
/ |
/_,// //_/ _/__/_//___///

NanoCLUST v1.0dev

Run Name : nanaopore1all2
Reads : no/*.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Container : docker - [:]
Output dir : result_nanopore22222
Launch dir : /Volumes/rest/work/16s/0522/13
Working dir : /Volumes/rest/work/16s/0522/13/work
Script dir : /Volumes/rest/work/16s/0522/13/NanoCLUST
User : root
Config Profile : docker

executor > local (109)
[97/baa6ed] process > QC (26) [100%] 26 of 26 ✔
[37/6bdebb] process > fastqc (26) [100%] 26 of 26 ✔
executor > local (110)
[97/baa6ed] process > QC (26) [100%] 26 of 26 ✔
[37/6bdebb] process > fastqc (26) [100%] 26 of 26 ✔
[76/65c519] process > multiqc [100%] 1 of 1 ✔
[f8/47682c] process > kmer_freqs (25) [100%] 26 of 26 ✔
executor > local (110)
[97/baa6ed] process > QC (26) [100%] 26 of 26 ✔
[37/6bdebb] process > fastqc (26) [100%] 26 of 26 ✔
[76/65c519] process > multiqc [100%] 1 of 1 ✔
[f8/47682c] process > kmer_freqs (25) [100%] 26 of 26 ✔
[51/3b98cb] process > read_clustering (3) [ 68%] 27 of 40, failed: 17, retries: 16
[0e/3f4833] process > split_by_cluster (1) [ 0%] 0 of 9
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[0e/0abcd4] process > output_documentation [100%] 1 of 1 ✔
[0;35m[nf-core/nanoclust] Pipeline completed with errors
[8a/9f3abc] NOTE: Process read_clustering (26) terminated with an error exit status (137) -- Execution is retried (1)
WARN: Killing pending tasks (3)
Error executing process > 'read_clustering (1)'

Caused by:
Process read_clustering (1) terminated with an error exit status (137)

Command executed [/Volumes/rest/work/16s/0522/13/NanoCLUST/templates/umap_hdbscan.py]:

#!/usr/bin/env python

import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan

df = pd.read_csv("input.1", delimiter="min_dis")

#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)

df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)

#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(200), cluster_selection_epsilon=int(0.5)).fit_predict(X)

#PLOT
plt.figure(figsize=(20,20))
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1)
plt.xlabel("UMAP1", fontsize=18)
plt.ylabel("UMAP2", fontsize=18)
plt.gca().set_aspect('equal', 'datalim')
plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)

for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)

plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)

Command exit status:
137

Command output:
(empty)

Command error:
.command.run: line 159: 22 Killed /usr/bin/env python .command.sh

Work dir:
/Volumes/rest/work/16s/0522/13/work/c9/fe155a066d9a48a085ce3e826e5aa4

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

(######

@genomicsITER
Copy link
Owner Author

genomicsITER commented May 26, 2020

This keeps happening when min_cluster_size is 50 and polishing_reads is still 500?

An excessive number of clusters due to low min_cluster_size is 50 could kill the conda env. If you are still getting this error I would try even higher values than 50 for min_cluster_size.

@genomicsITER
Copy link
Owner Author

genomicsITER commented May 27, 2020

Hi,

I've been inspecting your log and the error 137 you are getting in the process is because it ran out of memory. I've updated the nextflow.config file to use 8GB initially and try with more RAM if a process fail due to this error. We recommend at least 16GB of RAM in your machine. I hope this time you dont get memory errors.

EDIT: now fixing an issue with that commit. I will update when fixed

@DavidFY-Hub
Copy link

ok,
thkan you
i. will try,
and. my ram is 64GB, when i run the pipline ,i monitor the cpu and ram at the same time,the ram maxused is 54 GB.

@genomicsITER
Copy link
Owner Author

genomicsITER commented May 28, 2020

The issue with the commit is fixed and the memory adjustments for these processes have changed.

EDIT: Before the commit, the memory per process was limited up to 7GB. Even if your machine has enough memory, the process would fail.

I’m working also on limit the processes that can be run at certain pipeline stages to improve multiple sample files per run. We have tested it with 12 samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants