-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory issues with RiboDetector #9
Comments
Upon looking into this further, potentially it is a SLURM configuration issue, at least in part. I will keep you posted if I figure it out! |
Hi, Thank you for reporting this issue. I have tested it on SGE and it worked without any issue. I guess SLURM should be similar. Are you able to run it on a computer/server interactively without using SLURM? Chunk size of 256 should use less than 10GB memory.
Output with gz will increase the runtime (compression needs time) but not the memory use. Looking forward to your updates. Best, |
Thanks for getting back to me! I tried running a single sample and got a
Using the following call to
with defined variable denoted with |
Which OS are you using? I think the code for multiprocessing does not work on Windows. |
This is on a CentOS 7 Linux system. |
Could you send me example fastq files which can reproduce this error? How many bases and reads in your input files? |
@akrinos Could you also check how many processes and threads from RiboDetector actually running when you submitted the job with |
Hi, thanks for all the help! The files are quite large, 5-8 GB per paired end gzipped fastq file (meta-omic data). I've been trying to get it to run with a single thread, but I am confused as to why the total size of the file would cause the memory issue so early on. The single-thread run has been going for several hours, so I'll let you know if I have any luck with that (it has not failed, though). I just also tried another parallelized run with relevant environment variables set before running. I will also see if I can figure out a test data set that is possible for me to send over! |
The large input files could be the reason. You can confirm this with subset of your data. Before the sequences being converted to features, they are all loaded into memory first. The current chunk size is used to control how much memory to be used to convert sequences into features (numbers). Usually, the size of the input sequence data (uncompressed) should not exceed the total RAM. 5-8 GB compressed file per end is unusually large. Even though 180 GB RAM should be enough. I guess the free memory on your server should be much smaller than 180GB because of the other running processes? I will add chunk size support for loading the sequences into memory in the future release. |
Another user also reported a similar memory issue while running RiboDetector with PBS. It would be great if you can post the job-script you used to submit the job. It will help us to locate the root cause of the issue. |
Hi, thanks so much for the work you're putting into this! I would just like to note that it appears that the job is being killed for lack of memory on a write-out step. As far as SLURM, I'm using Snakemake integration - these are the cluster parameters I most recently used for the submission when I tried really high memory:
|
Depending on the SLURM setup, cluster jobs might be executed within the context of a Linux cgroup, which limits the |
Does |
@dawnmy |
|
But, trying to increase threads still doesn't totally work, probably because of the large file size |
What do you mean by "doesn't totally work"? It only uses two CPU when loading the input paired end files into memory. After loading, the encoding and prediction will utilize all specified number CPUs. |
I say "doesn't totally work" because I started with 1 thread (which worked), tried 2 (didn't work on all samples), and 20 threads did not work regardless of chunk size - still doing some testing |
Did you run all samples at once or one by one? It is better to run only one sample at one time and with multiple CPU ( |
One by one, not at the same time |
This issue should have been solved with v0.2.4. Please update with:
|
This issue seems to be solved. I will close it. feel free to reopen if not working. |
Hi, thanks for the awesome tool! I have been trying to run it, and keep running into issues with memory in CPU mode. I am running with 10 threads and a chunk size of 256 on paired end sequence read files. The job appears to fail due to over-consumption of memory when trying to read out non-rRNA sequences, even though the node should have 180GB of RAM allocated. This is not within RiboDetector (though previously I did get a
MemoryError
within Python, but is rather the job scheduler cancelling the job.I have checked the read length and tried reducing the threads to 5 and chunk size to 128. Are there other things that can be tried - and does reading out as gz change both the memory and the time taken?
Thanks in advance!
The text was updated successfully, but these errors were encountered: