-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in repeat binary #18
Comments
Hi Zac, Thanks for the detailed report! This seems to be a strange error, provided that all memory/performance bottlenecks have passed... In the second run, the error says "Resource temporarily unavailable", which might mean many things. My first suspicion would be that abruijn can't create an output file for contigs - this might happen because of lack of space or other system policies. It seems that this error also might arise from exceeding memory/thread limitations (but it sounds less likely to me). Could you actually send me the full log file - it might help to figure where exactly error is happening? In the mean time, I suggest you to try the latest version from the devel branch (it contain a few updates that might be important). You still should be able to --resume though (however if it turns to be working I would suggest to restart from scratch later). Also check that there are no problems with creating files in your filesystem (maybe its full?). Best, |
Thanks for the response Mikhail, I'll re-run it with the latest commit - it took about 11 hrs to reach the point where it crashed when I resumed the run so it will take me a bit of time to see if the latest update changes anything. With regards to the filesystem, I should have no problems with file creation/filesystem limits, other assemblers that are currently running aren't having any issues. The HPC environment I use doesn't place limits on file use, but they would email me if I was abusing that. Full log should be attached here. |
Thanks! I am pretty sure the problem is related to generation of output files (might not be the filesystem though). If you will have a chance, could you also do 'ls -l' of the working directory? |
Okay, so I resumed the run using the latest version of ABruijn, and this time the program ends after 1 hr (instead of 11 hrs). The stderr says this:
Also, ls -l of the working directory is below.
I'm assuming the "Graph is not symmetric" line is relatively important, but I'm not sure how that relates to my input data. Perhaps I need to re-run the program from start to finish and see what changes? edit: Should have included the log. This is the new lines appended to the log I attached above.
|
Interesting. This error indicates an issue in the algorithm, rather than input data, so there is no need to re-run for now - I will try to fix this problem shortly. |
By the way, are you sure that you got the latest revision from devel branch (not the master)? This log looks a bit outdated.. |
You're right, I updated it with the master branch. I'll run the devel branch and give an update regarding what happens. |
Error appears to be the same, but the output is different. Stderr:
Appended to abruijn.log
|
Thank you! I got this error in one of my datasets as well - will try to fix shortly. |
Please try the latest version from the master branch, it should be resolved now. |
The original problem has been fixed now, but I unfortunately seem to have encountered another error. I'll rerun the program to see if it happens again, but because it took 12 hours to get to this point it will be a while before I can verify that it is repeatable/occurs at roughly the same time. This is the stderr:
ls -l:
And the details from the latest run in the log file are attached. The main thing I notice is that the program crashed at the same time the graph_before_rr.dot file was created, so it's possibly related. |
Thank you, Best, |
I have to admit I'm not 100% certain how the HPC system is set up, but to my knowledge it works by allocating the number of CPU cores and memory within the submission script, so I am not sure how that relates to the underlying kernel/threads. Previously, I was running the program with 12 abruijn threads and asking PBS for 12 cores. I'll try it with 12 threads but 16 cores (asking for more cores means a significantly longer queue time, and running it with less threads means longer run time), and if the same error occurs I'll lower the threads to 8 and ask for 16 cores again. Will update with results, and thanks for your help thus far. |
I've tried running the program with 16 cores/12 threads as well as 16 cores/8 threads with the same result as above. As soon as the program either starts writing the graph_before_rr.dot file or begins the step after writing this file, it crashes. At this point I'm not sure how to proceed other than by reducing the number of abruijn threads to 1 (which will take approximately 100-140 hours to run to the point where it is currently crashing), but I will try this to exhaust the possibilities that it's related to threading requirements. Below is the result of ulimit -a if that might reveal some obvious problem with the HPC setup, although I've never encountered any problems with this previously.
|
Ok, my current best guess is that system might not like threads to be created/destroyed too often. I tried to address this in the latest devel commit - could you check? Otherwise, is there any chance I can take a look on the data - it would be much faster I I could reproduce it on my machine.. Best, |
I'm currently running the devel branch now, and will let you know how it goes. If it doesn't work, I'll retry the single threaded master branch and discuss with my supervisors if they have any problems with sharing the data (data is still private, but they would probably not have any major problems with this). |
Let me know if the devel branch does not work - I will prepare a special single-threaded version. |
Success! The changes to the devel branch have allowed the program to make it through the point where it was normally crashing. It's currently running BLASR after the repeat processing step has completed. I'll let you know how the final output looks, and thanks again for all your help. |
Yay! Now when it's working I would actually suggest to re-run assembly module as well (if you haven't done so yet after using results from v2.1) :) Also, do you expect your genome to be highly diploid? I see some patterns of this in the log.. Currently ABruijn is a bit conservative and will not span alternative alleles if the structural difference is high, which might lead to fragmentation. |
I'll make sure to rerun it with the latest updates once this current run completes. With regards to its diploid nature... I'm not entirely certain since the animal is very distantly related to any relatives that have had a genome previously sequenced, and the organism was taken directly from the wild. I'll keep that in mind regarding ABruijn's assembly - perhaps something like Redundans might help? At the moment I'm trying everything I can get my hands on. |
Final update before I close this issue. I was able to run the assembly to completion, and will now begin to reassemble it completely using the latest updates. The final stats appear to reflect what you have suggested with respect to the genome having some heterozygosity, as the final genome size is a fair bit smaller after the final repeat and bubble correction compared to the polished_0.fasta file. I have been experimenting with programs like quickmerge which may make the ABruijn output useful as a 'backbone' assembly which can be filled in using other assemblies assuming that its repeat regions are better resolved than other assemblers. It will take me some time to figure out the optimal configuration of things. Thanks again for fixing these issues, best of luck with the future development of this program. Zac. |
Hi,
I've attempted to assemble a genome using PacBio Sequel reads, and encountered an error on my first run and when I attempt to --resume the run. I have been running these as jobs on a PBS system on SUSE. I don't think it is a memory error since the job would be "killed" if I tried to use more than the amount I allotted.
I am using github commit 9c3f166 (v 2.1b) to assemble this.
For further information, I'm assembling a genome from a eukaryotic organism that does not have closely related species (< 100mya) genomes previously sequenced, so I don't have a strong idea of the exact genome size. I have used kmer-based genome estimates on corrected reads and assembled this genome with about 6 assemblers, so the consensus seems to be a genome of roughly 295-360MB in size (kmer estimates provide the lower range, many assemblers including abruijn's polished assembly provide the upper range). Using the lower range of that estimate, I have roughly 115x coverage including all the reads in my subreads. The stats of my raw reads are below (just in case you need this information to track down why this is occurring).
Below are the stderr from the first run and the --resume run
This is an excerpt from the log file (it's quite large), I've tried to just get the relevant portions and have used ellipses to abbreviate repetitive sections. If you want to see the whole log file I can do that.
If you could give me a hand to find out what is causing this issue that would be really appreciated.
Thanks,
Zac.
The text was updated successfully, but these errors were encountered: