-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quast generating empty files #38
Comments
By the logs, it seems that you're trying to run without selecting any of the available configuration profiles for the tools, thus the pipeline is trying to load the tools and execute them from your machine, but it is written in a way to read from the pre-built docker/singularity/conda profile.
Take a look here to select the profile that best suits your needs: https://github.com/fmalmeida/MpGAP/tree/master#selecting-between-profiles |
Silly me. There was even a warning in the log. Thanks! So it seems to be running with no errors no... will let you know if it finish. Cool! Thanks for the kind support |
Hi, so a little update, I've been trying to run the pipeline including the profile as you mentioned (and clearly stated at the instructions, sorry about that). So I settled for the singularity profile because singularity is installed system-wide in our hpc (tried docker with errors, probably because it was not installed in my hpc, and tried conda also with errors, altough I'd say I followed the instructions). Anyway, with singularity looked good at the beginning, after ~7 hours running (slurm job) wtdbg2, raven and flye seem to have run. But it has also accumulated errors, it seems in quast definitevly after 3 attempts (errors seems to have been ignored though thanks to the code I added to the conf), and in unicycler first attempt. Could you maybe already take a look to see how can we avoid the errors? I've tried to gather the relevant logs in the attached zip, please let me know if I've missed any or if you need anything else. It may be a mess, please tell me if you have any suggestion on how better share the logs or the results. Thanks!! |
Hi @josruirod, By default, the assemblers first try to run with 6 cpus and 14 GB memory, and if it fails, they try with the maximum values you've set with But, I am not sure. Many of the directories you've sent on your zip file are empty. Make sure to use About the profiles: In general docker a not installed in hpcs, and singularity is the way to go. Conda profile is very tricky to use it and in general I'd say to avoid it. I know it works, but is very tricky to get it working. On the mean time, while you send me these files and I take a look to understand, please try to run again making sure to:
This may help us finding the source. I don't think the error is in the pipeline itself. But maybe on the profile or on this resource allocation 😄 |
Hi, thank you always for your time and the fast support! Thanks. |
Now it worked, and I found something: Process exceeded running time limit (1h) This is in your Try setting The parameter is set as:
Hope this is what's needed 😄 |
Got it, but the config already included for this run:
That's the parameter you were referring to, right? It was already 72h (and did not run for 72 hours before failing). I see it's recorded in the nextflow.log too. So that shouldn't be it? I doubled it just in case, maybe it's the sum for all the processes? I spotted somewhere in the logs the error:
Probably totally unrelated, but just in case it's sending something to the background and not setting time right? Something similar was happening with canu and that's why I turned the grid off. Anyway it's running again, errors not ignored, so maybe we can see when it fails again if there's any more info. I'm running in both hybrid strategy too, so let's see |
Interesting. Yes, this is the parameter I was referring. And yes, I saw it now on the log as well. I may have passed too fast haha, sorry 😅 About the slurm config you said, I think that if a conflict have hapenned, it may have not even been launched. But, in general, if you're not allowed to run for more than 48.h, then setting for 72.h will probably not override that (I think). This tput is more like a warning then an error. Don't think it has nothing to do with it. Good idea to run both! I hope not ignoring the errors help on getting more info to debug. But the way you're running now seems to already have everything set. Fingers crossed on this run 🤞🏼 |
So with this run I already have errors in unicycler and quast... regarding quast, I see again the
So max_time argument maybe not working? |
Is it still running? As I explained for the assemblers, quast also tries to first run with low resources. And the retry with full power if it first fails. But maybe I am telling them to start with too low. I can try to increase it. But need to check if this is really the issue. If the pipeline is still running then it will still retry this ones that failed because their first try is with low resources. |
I see. It seems that quast is not set for instant retry on the means I said. I will quickly change the config on the I tell you when I commit. |
That's great, thanks for such availability!
Anyhow, if you mention it should restart with full power and more time then I guess it's going to be done on the second chance. Waiting for the commit then, thanks! |
Yes, all these parameters can be manually modified by the user. If you get my configs and overwrite the resources definitions (making sure to rewrite the withName and withLabel definitions) and give it with That being said, the ones I set on the configs are just some sensible defaults so it can run in most of the cases and that we can get the most out of parallelization 😄 But, I agree, 1h is too low and I changed that. And it was nice that you encountered these errors, because on your logs I could see that even though I wanted, quast was not being retried with more power. It was failling on the first fail. That may be the case ... I hope so 🤞🏼 I just commited. You can try it with this new config (with more resource allocation on first try and making sure that quast retries). Thanking you for reporting, and for your patience on feedbacking and troubleshooting this. About your comment:
Yes. On the second try it should go full power. At least that's what I tried to setup, but maybe was not working for quast :) |
Super! Running again, will let you know. Thanks to you for being so available! Happy to help |
One note. Remember to use |
Oh, noted. Indeed, I was not doing that. I'll do that in the future. It's still running to finish the running things, but maybe could you already check? I don't understand the ones regarding canu and pilon (something to do with my hpc file system or mounting?) Hope you can provide any insights, thanks! |
Hi @josruirod, Going step by step.
Which make me thing that either the /proc have been suddenly / quickly unmounted, or something like that. Then the directories and everything were not accessible anymore, causing them to fail. Would be good to:
|
Hi there, The problem seems to be that the default folder for docker/singularity is /tmp/ (guessing). The hpc uses another directory for tmp files, scratch ($TMPDIR or $LOCAL_SCRATCH) so I have to try and change that. The singularity environmental variables should allow me to change the directory, right? Anyway, if you have any comment I would be grateful, but since the test run in the quickstart is working perfectly, and you already spent quite some time on this, we can close this issue related to quast, which is already working. Hopefully, I'll get this to work with my data eventually Thanks for the great support, a pleasure! Best. |
Hi @josruirod, Great to hear that it is not a problem in the code itself and that you could properly execute everything with the quickstart dataset. So, it seems that the solution would be rather more simple once your IT guys manage to understand how set this environements. Unfortunately, I cannot aid you much on that for two reasos: (1) HPC configurations tend to quite differ between setups and (2) I don't have much experience with it. But, I really hope that you manage to solve it and can use with your dataset. I will keep the issue open until I merge the modifications we did either for this issue and for issue #36. Once I merge, I close both. Many thanks for the feedback, for reporting the issues and the kind words. Best regards. |
Hi @josruirod, Glad to see it is going. And hopefully it will succeed. Indeed what you said makes sense and this is very straightforward with nextflow. By default, it has a list of priorities and everything you set in a config file with That being said, you can see in this file (https://github.com/fmalmeida/MpGAP/blob/master/conf/base.config) how I am setting resources allocation. And even this resources have some priorities as you can see here: https://www.nextflow.io/docs/latest/config.html#selector-priority priority: So, in order to adjust this, you can either change the resource allocation for the label For example, you could: process {
withName: strategy_2_pilon {
cpus = { params.max_cpus }
memory = { params.max_memory }
time = { params.max_time }
}
} This should set this specific process to allocate everything you set in using the params. You can understand more about this on nextflow manual: configuration explanation and all available directives for processes |
Thanks you so much! Nextflow is indeed awesome. I'm looking forward to learn more and maybe even prepare some silly pipelines of my own. So I'll try and hopefully will get it to work. So close, it's frustrating to see the nice commands in the ".command.sh" files and the input files ready, and that it fails due to some issue with my hpc and singularity/nextflow. Last question, since the command is provided there, is it crazy to just execute that .sh "manually" (outside nextflow) to get the final results? Guess the pipeline -resume won't detect them, but maybe I could make it work this way? |
Hi @josruirod, But, anyways, even if you are able to generate the results, the pipeline will much probably not be able to see them. One question, how are things going? You managed to execute? There are already tickets that I can close once I merge the 😄 |
Hi, I'm sorry it took me long to get back, did not get the notification. So I'm afraid I'm still trying to figure it out with our dataset, but it's definitely due to issues with the hpc and nextflow/singularity. The test data worked, and almost all steps are done. So I'd say you can close the tickets and that's all for now. If I keep struggling and need anything from you I'll let you know and open new issues. Thanks for the great support and sorry for the delay! |
So, I've consistently observed that the quast step fails, and even if the pipeline points to the directory of the work, so the files can be checked, these appear to be empty.
This is the folder:
I attach the logs, and the files that were not 0-sized.
Will let you know if it keeps failing during my tests, and if the fix you provided in the config file allows to bypass the erro and avoid the pipeline crash. Thank you so much
nextflow.log.txt
output.log.txt
quast_files.zip
The text was updated successfully, but these errors were encountered: