Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at read-correction step #1

Closed
fahadkhokhar opened this issue May 21, 2020 · 19 comments
Closed

Error at read-correction step #1

fahadkhokhar opened this issue May 21, 2020 · 19 comments

Comments

@fahadkhokhar
Copy link

fahadkhokhar commented May 21, 2020

Running the script with the test data provided, errors at the read-correction stage:

Starting command on Wed May 20 12:05:03 2020 with 39.037 GB free disk space

  cd /home/ubuntu/bin/NanoCLUST/work/0c/cdad9cbbc33ab4512480bf701f33cb
  sbatch \
    --cpus-per-task=1 \
    --mem-per-cpu=4g   \
    -D `pwd` \
    -J 'canu_corrected_reads' \
    -o canu-scripts/canu.01.out  canu-scripts/canu.01.sh

Finished on Wed May 20 12:05:03 2020 (like a bat out of hell) with 39.037 GB free disk space

gzip: corrected_reads.correctedReads.fasta.gz: No such file or directory

@genomicsITER
Copy link
Owner

Hi! Thank you for contacting us.

We've updated some Docker images at Dockerhub and also the repository Dockerfiles under conda_envs/ (including read_correction module) due to problems with the enviroment path.

The pipeline has been now tested in Ubuntu 18.04 using both conda (v4.8.3) and docker (v19.03.9) with Nextflow v20.01.0 and the test profile. If the problem persist, feel free to contact again and include the executing command and any information about the configuration used.

nextflow run main.nf -profile test,conda

@genomicsITER
Copy link
Owner

A new push has been made with the latests updates.

@fahadkhokhar
Copy link
Author

fahadkhokhar commented May 22, 2020

Thanks for your reply.

Still having the same issue though - I ran the command for the test data:

nextflow run main.nf -profile test,conda

On the first occasion it generated the .fasta.gz file but still gave the same error. I have just run again with the same error, this time no file was generated.

@genomicsITER
Copy link
Owner

If nextflow, python/pip + conda configuration is ok, it seems that the read_clustering conda environment is not working properly.

Try to remove this env (under work/conda directory) and run the pipeline again to reinstall the enviroment and retry the process. If it doesn't work, running the pipeline with '-profile test,docker' will automatically use docker images pulled from Dockerhub that are also tested. Please let us know if the problem persist with conda and docker profiles.

@DavidFY-Hub
Copy link

this pipeline could run in mac os???

@genomicsITER
Copy link
Owner

Nextflow and both conda and docker are compatible with Mac os. We've not tested on a Mac machine but maybe it could be run with the docker profile to avoid compatibility errors.

We've updated the pipeline and now we include the exact version tags in the enviroment.yml files used for conda envs. This should fix some errors with conda environments that arise in some machines. Also the docker images include a correct version of the environments.

@fahadkhokhar
Copy link
Author

fahadkhokhar commented May 23, 2020

Having more luck with the docker option which ran fine with the test data. However, still problem at the read correction with my own dataset even with the docker option:

Error executing process > 'read_correction (15)'

Caused by:
Process read_correction (15) terminated with an error exit status (1)

Command executed:

head -n$(( null*4 )) 20.fastq > subset.fastq
canu -correct -p corrected_reads -nanopore-raw subset.fastq genomeSize=1.5k stopOnLowCoverage=1 minInputCoverage=2
gunzip corrected_reads.correctedReads.fasta.gz
READ_COUNT=$(( $(awk '{print $1/2}' <(wc -l corrected_reads.correctedReads.fasta)) ))
cat 20.log > 20_racon.log
echo -n ";null;$READ_COUNT;" >> 20_racon.log && cp 20_racon.log 20_racon_.log

Command exit status:
1

Command output:
(empty)

Command error:
.command.sh: line 2: null: unbound variable

Work dir:
/home/ubuntu/bin/NanoCLUST/work/92/dbe622ba2c7fd61f3835b2b7a93174

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

Apologies if this is an obvious error on my part!

@fahadkhokhar
Copy link
Author

Also, the test data run errors at the classification step when specifying --db and --tax as I had originally downloaded these to a separate volume. Haven't got to this step on my own data yet.

Is there an option to change the working directory?

@genomicsITER
Copy link
Owner

The first problem you report is due to a typo in the assignment to the default value of --polishing-reads when it is not set up in the command. You can check in conf/test.conf that this value is set to 20 when using the test profile. We've updated the pipeline to fix the typo and set a default value of --polishing-reads to 100 when using no profile confs at all. If you are running the pipeline with your own data we strongly recommend to manually assign --polishing_reads and --min_cluster_size parameters in order to compare pipeline outputs specially at low taxonomic levels such as species.

For db and taxdb parameters, you should write the full path using double quotes:
--db "/home/nanoclust_vm/NanoCLUST/db/16S_ribosonal_RNA" --tax "/home/nanoclust_vm/NanoCLUST/db/taxdb/"

According to the Nextflow documentation, you can use -w set the working directory:

nextflow run <script> -w /some/scratch/dir

Thank you for your time and feedback! We've also modified the documentation to make those issues with paths and parameters clearer to users. Hope you can run NanoCLUST with no issues using your own data.

@DavidFY-Hub
Copy link

yes, the docker is ok,when run the test

when i use my own data , the same problems with Fahadkhokhar is comging ,

i will do it right now

@DavidFY-Hub
Copy link

你好

您是否在Mac OS中运行管道?
尝试使用docker配置文件和测试数据运行它

nextflow run main.nf -profile test,docker

根据Nextflow文档:

如果您在Mac OSX上运行Docker,请确保将本地/ Users目录挂载到Docker VM中,如本精彩教程:如何在OSX上使用Docker所述。

thank you

@genomicsITER
Copy link
Owner

Hi

Are you running the pipeline in mac os?
Try to run it using docker profile and test data

nextflow run main.nf -profile test,docker
According to the Nextflow documentation:

If you are running Docker on Mac OSX make sure you are mounting your local /Users directory into the Docker VM as explained in this excellent tutorial: How to use Docker on OSX.

PD: We updated the pipeline to avoid the Fahadkhokhar problem with read_correction when using their own data

@DavidFY-Hub
Copy link

您报告的第一个问题是由于在命令中未设置--polishing-reads的默认值时分配错误。使用测试配置文件时,您可以签入conf / test.conf将该值设置为20。我们已经更新了管道,以修复输入错误,并且在根本不使用配置文件conf的情况下,将--polishing-reads的默认值设置为100。如果您使用自己的数据运行管道,我们强烈建议手动分配--polishing_reads和--min_cluster_size参数,以便特别在物种等低分类级别比较管道输出。

对于db和taxdb参数,应使用双引号写完整路径:--
db“ / home / nanoclust_vm / NanoCLUST / db / 16S_ribosonal_RNA” --tax“ / home / nanoclust_vm / NanoCLUST / db / taxdb /”

根据Nextflow文档,您可以使用-w设置工作目录:

nextflow run <script> -w /some/scratch/dir

感谢您的时间和反馈!我们还修改了文档,以使用户更清楚地了解路径和参数方面的问题。希望您可以使用自己的数据毫无问题地运行NanoCLUST。

您报告的第一个问题是由于在命令中未设置--polishing-reads的默认值时分配错误。使用测试配置文件时,您可以签入conf / test.conf将该值设置为20。我们已经更新了管道,以修复输入错误,并且在根本不使用配置文件conf的情况下,将--polishing-reads的默认值设置为100。如果您使用自己的数据运行管道,我们强烈建议手动分配--polishing_reads和--min_cluster_size参数,以便特别在物种等低分类级别比较管道输出。

对于db和taxdb参数,应使用双引号写完整路径:--
db“ / home / nanoclust_vm / NanoCLUST / db / 16S_ribosonal_RNA” --tax“ / home / nanoclust_vm / NanoCLUST / db / taxdb /”

根据Nextflow文档,您可以使用-w设置工作目录:

nextflow run <script> -w /some/scratch/dir

感谢您的时间和反馈!我们还修改了文档,以使用户更清楚地了解路径和参数方面的问题。希望您可以使用自己的数据毫无问题地运行NanoCLUST。

yes,
useing the

--polishing_reads 60 --min_cluster_size 50

the problem is sovling,
thank you

@fahadkhokhar
Copy link
Author

The first problem you report is due to a typo in the assignment to the default value of --polishing-reads when it is not set up in the command. You can check in conf/test.conf that this value is set to 20 when using the test profile. We've updated the pipeline to fix the typo and set a default value of --polishing-reads to 100 when using no profile confs at all. If you are running the pipeline with your own data we strongly recommend to manually assign --polishing_reads and --min_cluster_size parameters in order to compare pipeline outputs specially at low taxonomic levels such as species.

For db and taxdb parameters, you should write the full path using double quotes:
--db "/home/nanoclust_vm/NanoCLUST/db/16S_ribosonal_RNA" --tax "/home/nanoclust_vm/NanoCLUST/db/taxdb/"

According to the Nextflow documentation, you can use -w set the working directory:

nextflow run <script> -w /some/scratch/dir

Thank you for your time and feedback! We've also modified the documentation to make those issues with paths and parameters clearer to users. Hope you can run NanoCLUST with no issues using your own data.

Many thanks for the reply. I can now proceed to the classification step, but there is error using both test and own data set in the classification, even without specifying the --db or --tax paths with the test data:

Error executing process > 'consensus_classification (1)'

Caused by:
Process consensus_classification (1) terminated with an error exit status (2)

Command error:
BLAST Database error: No alias or index file found for nucleotide database

@genomicsITER
Copy link
Owner

genomicsITER commented May 24, 2020

Hi. I don't know very well what's happening with the classification. The pipeline is working for me on a clean Ubuntu18 VM with the minimum dependencies. Just downloading the db using the exact script inside the NanoCLUST dir:

mkdir db db/taxdb
wget https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz && tar -xzvf 16S_ribosomal_RNA.tar.gz -C db
wget https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz && tar -xzvf taxdb.tar.gz -C db/taxdb

After that you should have the right directory tree with the db and the taxonomy. Then I manually set those in the command specifying:

--db "/home/nanoclust_vm/NanoCLUST/db/16S_ribosonal_RNA" --tax "/home/nanoclust_vm/NanoCLUST/db/taxdb/"

Seems that you may downloaded the db in a different way (resulting in not the same dir structure) or any other location other than the NanoCLUST dir?

I will try using BLAST databases in different systems and paths to make it more flexible.

Thanks again

@DavidFY-Hub
Copy link

hello,i want to compare the data to get the different species,and alpha ,beta analysis,

so where i can get the abundance table that like the ”otutab.txt“ (not the rel_abundance),

@genomicsITER
Copy link
Owner

genomicsITER commented May 24, 2020

Hi, HaiyangDu. At this time, we do not have an option to get an OTU table like the otutab command does yet. However, the nanoclust_out.txt file includes the number of reads assigned to the same taxonomic ID so it may not be hard to build a otutab.txt file to get the file for alpha and beta analysis.

We will work on an option to get the exact otutab file to make it easier for users to use NanoCLUST output in downstream analyses that require that file structure.

Thank you for your time and feedback.

@DavidFY-Hub
Copy link

Ok,thanks for you reply。

嗨,海阳杜 目前,我们没有像otutab命令一样获得OTU表的选项。但是,nanoclust_out.txt文件包含分配给相同分类ID的读取次数,因此构建otutab.txt文件以获取用于alpha和beta分析的文件可能并不困难。

我们将致力于获取确切的otutab文件的选项,以使用户可以更轻松地在需要该文件结构的下游分析中使用NanoCLUST输出。

感谢您的时间和反馈。

ok,thanks for your reply.

@DavidFY-Hub
Copy link

hi, the problems is that run the pipline with 1 sample is perfectct,but my data has 50 samples,it always occurs error ,when i run the 50 samples with the parameter "--reads 'my path/*.fastq" .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants