Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryError #304

Closed
qbdong122 opened this issue Apr 26, 2023 · 4 comments
Closed

MemoryError #304

qbdong122 opened this issue Apr 26, 2023 · 4 comments

Comments

@qbdong122
Copy link

Hi guys, how to solve this?

Traceback (most recent call last):
  File "~/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "~/soft/picrust2/scripts/picrust2_pipeline.py", line 280, in <module>
    main()
  File "~/soft/picrust2/scripts/picrust2_pipeline.py", line 242, in main
    func_outfiles, pathway_outfiles = full_pipeline(study_fasta=args.study_fasta,
  File "~/soft/picrust2/picrust2/pipeline.py", line 120, in full_pipeline
    check_overlapping_seqs(study_fasta, input_table, verbose)
  File "~/soft/picrust2/picrust2/pipeline.py", line 367, in check_overlapping_seqs
    in_table = read_seqabun(in_tab)
  File "~/soft/picrust2/picrust2/util.py", line 331, in read_seqabun
    input_seqabun = biom.load_table(infile).to_dataframe(dense=True)
  File "~miniconda3/envs/picrust2/lib/python3.8/site-packages/biom/table.py", line 4261, in to_dataframe
    mat = self.matrix_data.toarray()
  File "~/miniconda3/envs/picrust2/lib/python3.8/site-packages/scipy/sparse/_compressed.py", line 1051, in toarray
    out = self._process_toarray_args(order, out)
  File "~/miniconda3/envs/picrust2/lib/python3.8/site-packages/scipy/sparse/_base.py", line 1298, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
numpy.core._exceptions.MemoryError: Unable to allocate 592. GiB for an array with shape (7839949, 10131) and data type float64
@gavinmdouglas
Copy link
Member

Hi there,

What command are you running and how large are your input files?

That error means that you have insufficient memory for the object (it's 592 GB).

Cheers,

Gavin

@qbdong122
Copy link
Author

Thank you for your reply!
The run command is :
picrust2_pipeline.py -s dna-sequences.fasta -i feature-table.biom -o picrust2_out -p 28
Given that the fasta file is 3.1 Gb in size and contains 10,131 samples, how can I run this entire pipeline in one execution? I want to avoid having to run it multiple times for different subsets of the data.”

@gavinmdouglas
Copy link
Member

You can look into pre filtering your data to remove sequences and samples with low prevalence. Otherwise you would need to manually split up your data to reduce the memory usage unfortunately.

@qbdong122
Copy link
Author

Thanks! I got it !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants