MemoryError #304

qbdong122 · 2023-04-26T02:28:17Z

Hi guys, how to solve this?

Traceback (most recent call last):
  File "~/miniconda3/envs/picrust2/bin/picrust2_pipeline.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "~/soft/picrust2/scripts/picrust2_pipeline.py", line 280, in <module>
    main()
  File "~/soft/picrust2/scripts/picrust2_pipeline.py", line 242, in main
    func_outfiles, pathway_outfiles = full_pipeline(study_fasta=args.study_fasta,
  File "~/soft/picrust2/picrust2/pipeline.py", line 120, in full_pipeline
    check_overlapping_seqs(study_fasta, input_table, verbose)
  File "~/soft/picrust2/picrust2/pipeline.py", line 367, in check_overlapping_seqs
    in_table = read_seqabun(in_tab)
  File "~/soft/picrust2/picrust2/util.py", line 331, in read_seqabun
    input_seqabun = biom.load_table(infile).to_dataframe(dense=True)
  File "~miniconda3/envs/picrust2/lib/python3.8/site-packages/biom/table.py", line 4261, in to_dataframe
    mat = self.matrix_data.toarray()
  File "~/miniconda3/envs/picrust2/lib/python3.8/site-packages/scipy/sparse/_compressed.py", line 1051, in toarray
    out = self._process_toarray_args(order, out)
  File "~/miniconda3/envs/picrust2/lib/python3.8/site-packages/scipy/sparse/_base.py", line 1298, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
numpy.core._exceptions.MemoryError: Unable to allocate 592. GiB for an array with shape (7839949, 10131) and data type float64

The text was updated successfully, but these errors were encountered:

gavinmdouglas · 2023-04-26T11:14:33Z

Hi there,

What command are you running and how large are your input files?

That error means that you have insufficient memory for the object (it's 592 GB).

Cheers,

Gavin

qbdong122 · 2023-04-26T13:00:51Z

Thank you for your reply!
The run command is :
picrust2_pipeline.py -s dna-sequences.fasta -i feature-table.biom -o picrust2_out -p 28
Given that the fasta file is 3.1 Gb in size and contains 10,131 samples, how can I run this entire pipeline in one execution? I want to avoid having to run it multiple times for different subsets of the data.”

gavinmdouglas · 2023-04-26T13:15:59Z

You can look into pre filtering your data to remove sequences and samples with low prevalence. Otherwise you would need to manually split up your data to reduce the memory usage unfortunately.

qbdong122 · 2023-04-26T13:31:09Z

Thanks! I got it !

R-Wright-1 closed this as completed Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MemoryError #304

MemoryError #304

qbdong122 commented Apr 26, 2023

gavinmdouglas commented Apr 26, 2023

qbdong122 commented Apr 26, 2023

gavinmdouglas commented Apr 26, 2023

qbdong122 commented Apr 26, 2023

MemoryError #304

MemoryError #304

Comments

qbdong122 commented Apr 26, 2023

gavinmdouglas commented Apr 26, 2023

qbdong122 commented Apr 26, 2023

gavinmdouglas commented Apr 26, 2023

qbdong122 commented Apr 26, 2023