Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more space error on classifier #28

Closed
apeltzer opened this issue Nov 8, 2018 · 7 comments
Closed

No more space error on classifier #28

apeltzer opened this issue Nov 8, 2018 · 7 comments

Comments

@apeltzer
Copy link
Member

apeltzer commented Nov 8, 2018

Current path   : /home/alex/IDEA/nf-core/rrna-ampliseq
Script dir     : /home/alex/IDEA/nf-core/rrna-ampliseq
Config Profile : test,docker
=========================================
[warm up] executor > local
[8f/1bdbbf] Cached process > get_software_versions
[23/9a0ff7] Cached process > output_documentation
[08/bc4aca] Cached process > metadata_category_all (1)
[f1/6e41af] Cached process > metadata_category_pairwise (1)
[e0/1388c4] Cached process > fastqc (1a_S103)
[14/1981e3] Cached process > trimming (1a_S103)
[3e/3d4ff6] Cached process > trimming (2a_S115)
[f2/c6524d] Cached process > fastqc (2a_S115)
[40/c66b2f] Cached process > trimming (1_S103)
[7f/99415d] Cached process > fastqc (1_S103)
[46/12c4bd] Cached process > fastqc (2_S115)
[1a/047962] Cached process > trimming (2_S115)
[a0/8ebb9c] Cached process > qiime_import
[75/0fc607] Cached process > qiime_demux_visualize
[12/3e642e] Cached process > multiqc
[0a/954abc] Cached process > dada_trunc_parameter
[ec/712c2c] Cached process > dada_single
[3f/19e7c9] Submitted process > classifier (1)
ERROR ~ Error executing process > 'classifier (1)'

Caused by:
  Process `classifier (1)` terminated with an error exit status (1)

Command executed:

  qiime feature-classifier classify-sklearn  	--i-classifier GTGYCAGCMGCCGCGGTAA-GGACTACNVGGGTWTCTAAT-classifier.qza  	--p-n-jobs "-1"  	--i-reads rep-seqs.qza  	--o-classification taxonomy.qza  	--verbose
  
  qiime metadata tabulate  	--m-input-file taxonomy.qza  	--o-visualization taxonomy.qzv  	--verbose
  
  #produce "taxonomy/taxonomy.tsv"
  qiime tools export taxonomy.qza  	--output-dir taxonomy
  
  qiime tools export taxonomy.qzv  	--output-dir taxonomy

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
      results = action(**arguments)
    File "<decorator-gen-292>", line 2, in classify_sklearn
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
      output_types, provenance)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in _callable_executor_
      output_views = self._callable(**view_args)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2_feature_classifier/classifier.py", line 215, in classify_sklearn
      confidence=confidence)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/q2_feature_classifier/_skl.py", line 45, in predict
      for chunk in _chunks(reads, chunk_size)) for m in c)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 789, in __call__
      self.retrieve()
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/parallel.py", line 699, in retrieve
      self._output.extend(job.get(timeout=self.timeout))
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/multiprocessing/pool.py", line 644, in get
      raise self._value
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/multiprocessing/pool.py", line 424, in _handle_tasks
      put(task)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 371, in send
      CustomizablePickler(buffer, self._reducers).dump(obj)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/pool.py", line 240, in __call__
      for dumped_filename in dump(a, filename):
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 484, in dump
      NumpyPickler(f, protocol=protocol).dump(value)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/pickle.py", line 408, in dump
      self.save(obj)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 278, in save
      wrapper.write_array(obj, self)
    File "/opt/conda/envs/nf-core-rrna-ampliseq-1.0dev/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 93, in write_array
      pickler.file_handle.write(chunk.tostring('C'))
  OSError: [Errno 28] No space left on device
  
  Plugin error from feature-classifier:
  
    [Errno 28] No space left on device
  
  See above for debug info.

Work dir:
  /home/alex/IDEA/nf-core/rrna-ampliseq/work/3f/19e7c9e31a840d983cf984979ce1bc

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit
[nf-core/rrna-ampliseq] Pipeline Complete

@apeltzer
Copy link
Member Author

apeltzer commented Nov 8, 2018

Open an issue at the QIIME2 dev forum asking for more details on what is done in that specific step. It feels weird that neither submitting TMP or TMPDIR as environment variables is used by the environment at all.

@apeltzer
Copy link
Member Author

apeltzer commented Nov 8, 2018

@apeltzer
Copy link
Member Author

apeltzer commented Nov 8, 2018

Sorry @d4straub , I invested ~4hours today and can't get this to run at all. Need to wait for the qiime2 guys to help in here.

@apeltzer
Copy link
Member Author

apeltzer commented Nov 8, 2018

@apeltzer
Copy link
Member Author

apeltzer commented Nov 8, 2018

"Fun" fact:

[74/df79da] Cached process > multiqc
[40/71b95b] Cached process > dada_trunc_parameter
[49/3a2962] Cached process > dada_single
[5c/2e4105] Submitted process > classifier (1)
[3f/937df3] Submitted process > filter_taxa (1)

This runs well when using Singularity...

@apeltzer
Copy link
Member Author

apeltzer commented Nov 8, 2018

It even runs through entirely without a single error message.

=======================================================
Pipeline Name  : nf-core/rrna-ampliseq
Pipeline Version: 1.0dev
Run Name       : cheesy_coulomb
Reads          : data/*_L001_R{1,2}_001.fastq.gz
Data Type      : Paired-End
Max Memory     : 6 GB
Max CPUs       : 2
Max Time       : 2d
Output dir     : ./results
Working dir    : /home/alex/IDEA/nf-core/rrna-ampliseq/work
Container Engine: singularity
Container      : nfcore/rrna-ampliseq:latest
Current home   : /home/alex
Current user   : alex
Current path   : /home/alex/IDEA/nf-core/rrna-ampliseq
Script dir     : /home/alex/IDEA/nf-core/rrna-ampliseq
Config Profile : test,singularity
=========================================
[warm up] executor > local
[11/ed9b34] Cached process > get_software_versions
[8d/ab2a66] Cached process > output_documentation
[7e/7b98fb] Cached process > metadata_category_all (1)
[30/a8b348] Cached process > metadata_category_pairwise (1)
[c7/254f41] Cached process > fastqc (1_S103)
[6f/0948dd] Cached process > fastqc (1a_S103)
[6d/3a1a3c] Cached process > trimming (2_S115)
[03/385ca3] Cached process > trimming (1a_S103)
[7f/648968] Cached process > trimming (2a_S115)
[60/b91fd1] Cached process > fastqc (2a_S115)
[37/35e535] Cached process > trimming (1_S103)
[cc/5e3356] Cached process > fastqc (2_S115)
[9e/71976f] Cached process > qiime_import
[61/037f86] Cached process > qiime_demux_visualize
[74/df79da] Cached process > multiqc
[40/71b95b] Cached process > dada_trunc_parameter
[49/3a2962] Cached process > dada_single
[5c/2e4105] Submitted process > classifier (1)
[3f/937df3] Submitted process > filter_taxa (1)
[05/470625] Submitted process > RelativeAbundanceASV (1)
[6e/30bc88] Submitted process > RelativeAbundanceReducedTaxa (1)
[f3/3619b3] Submitted process > export_filtered_dada_output (1)
[82/49c195] Submitted process > tree (1)
[e3/b2432f] Submitted process > ancom (1)
[1a/f08022] Submitted process > barplot (1)
[da/aad654] Submitted process > combinetable (1)
[a3/20e214] Submitted process > report_filter_stats (1)
[01/69a2c5] Submitted process > diversity_core (1)
[5f/af8a16] Submitted process > alpha_rarefaction (1)
[47/28fc6c] Submitted process > beta_diversity_ordination (1)
[33/35de42] Submitted process > beta_diversity (1)
[f4/e7c281] Submitted process > alpha_diversity (1)
[nf-core/rrna-ampliseq] Pipeline Complete

@apeltzer
Copy link
Member Author

apeltzer commented Nov 9, 2018

Found the problem in the Docker thing as well after ~10 hours of debugging and without help at the QIIME2 user forum.

docker.temp = 'auto'
    env {
      JOBLIB_TEMP_FOLDER="/tmp"
    }

Nobody tells you about this here:

https://forum.qiime2.org/t/error-no-28-out-of-memory/5758
https://forum.qiime2.org/t/error-no-28-no-space-left-on-device-with-feature-classifier-sklearn/5961
https://forum.qiime2.org/t/error-in-filtering-using-dada2/3543

@KochTobi and I found this here:

https://scikit-learn.org/stable/modules/generated/sklearn.utils.Parallel.html

and that the scikit learn application apparently looks for the env variable JOBLIB_TEMP_FOLDER first, setting this fixed to /tmp and using nextflows docker.temp = 'auto' resolved the issue.

That means, we'Ll have working CI tests today!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant