Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge Memory consumption #3

Closed
Colorstorm opened this issue Apr 27, 2020 · 6 comments
Closed

Huge Memory consumption #3

Colorstorm opened this issue Apr 27, 2020 · 6 comments
Assignees

Comments

@Colorstorm
Copy link
Collaborator

Hi,

I have tried to run some test and had a lot of trouble with the huge amount of memory that is needed.

@konnosif which settings do you use?

Maybe we could think about making the settings with the least memory usage default and add a comment that other settings take a lot of memory

@konnosif
Copy link
Collaborator

konnosif commented Apr 27, 2020

@Colorstorm

--filter_sTF 1 --filter_sStart 3 --filter_sEnd 4 --suffix_label 0

--genus (should not affect)

--corr (should not affect)
Those should be the less memory using settings

if there is still a memory problem it might be because of the memory it needs to create the huge images. So input should not been affecting memory but image creation is affecting it.

@Colorstorm
Copy link
Collaborator Author

Thanks, I am trying the options now and will keep you up to date

@Colorstorm
Copy link
Collaborator Author

How much memory did you have allocated in slurm?

rcug_lw@hpc05:/working2/rcug_lw/konstantinos/fabian_test$ srun python3 heat5.py --batch_files marie.txt.filt.sort.csv.forpython.txt --filter_sTF 1 --filter_sStart 3 --filter_s   End 4 --suffix_label 0 --where ../cov_test_mariep/
srun: job 3367604 queued and waiting for resources
srun: error: Lookup failed: Unknown host
srun: job 3367604 has been allocated resources
/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm
####Current directory is : /working2/rcug_lw/konstantinos/fabian_test
####Current file.txt with filenames to open is:marie.txt.filt.sort.csv.forpython.txt
KGCF01_S4_R1 dataset
Successfully created the directory /working2/rcug_lw/konstantinos/fabian_test/KGCF01_S4_R1
Successfully created the directory /working2/rcug_lw/konstantinos/fabian_test/KGCF01_S4_R1/heatmap
Traceback (most recent call last):
  File "heat5.py", line 211, in <module>
    result = pd.merge(leftPANDA, rightPANDA, how='outer', on=['colA'])
  File "/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 88, in merge
    return op.get_result()
  File "/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 668, in get_result
    self._maybe_add_join_keys(result, left_indexer, right_indexer)
  File "/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 821, in _maybe_add_join_keys
    result[name] = key_col
  File "/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2938, in __setitem__
    self._set_item(key, value)
  File "/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3000, in _set_item
    value = self._sanitize_column(key, value)
  File "/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3645, in _sanitize_column
    value = value.copy(deep=True)
  File "/working2/rcug_lw/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 861, in copy
    new_index = self._shallow_copy(self._data.copy())
MemoryError: Unable to allocate 41.4 GiB for an array with shape (5556669383,) and data type object
srun: error: hpc-rc05: task 0: Exited with exit code 1

@colindaven
Copy link
Collaborator

@Colorstorm I haven't tried the script out, but I saw uses up to 480 GB RAM on one server this morning.

@colindaven
Copy link
Collaborator

colindaven commented Apr 28, 2020

Konstantinos' working command:

command i used on Ubuntu * ubuntu on windows 10..! on a 8 GB deskop on toy dataset (Github) :

python3 heat5.py --batch_files sample.txt --filter_sTF 1 --filter_sStart 0 --filter_sEnd 4 --suffix_label A --genus 1 --corr 0

@colindaven
Copy link
Collaborator

Huge RAM use was due to the coverage window, and not the bam.txt files, being used as input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants