-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eggNOG_annotation takes very long #351
Comments
Hey @Sofie8 I put this question online as it might also help others. So yes you can use the virtual memory for the eggNOg mapper step. scratch should be fine but the default /dev/shm is even better if it exists on your server. If you set
the eggNog db gets copied to the virtual disk and accelerates the access. |
Why does it use 20gm ram, this is because I hardcoded the memory, which is not good. I try to fix this. |
Respons to : eggnogdb/eggnog-mapper#249 (comment) What you can do:
|
Hi Silas, Thanks for your fast reply!
eggNOG_annotation:
|
I don't think the mmseqs version already officially released? |
Hi Silas, It finally finished the annotate_hits_table step. I have several partial subsets now (next time I will specify smaller subsets from the beginning). Now I did:
Which gives me partial outputs like:
I will share you the files via drive, because probably they have an error where it wants to merge, a header line? or putting more items on one line? Thanks, |
In theory you can just cat them. The header get's added during the Unfortunately in subset3 and maybe in the others as well there are some errors that cause the error in atlas. E.g. in line 23052 of subset3 you have an unfinished line combined with the next line. I don't know a way to make sure that you got all annotations. I fear you would need to rerun the annotation step for subset3. The others are fine. I'm sorry you had to do this excursion and running eggNog mapper yourself. But if I understand correctly everything could be done inside atlas, isn't it? atlas allows to define smaller subsets and also to access the ramdisk to speed up the process. |
Ok, I deleted that one line for now, and I could merge the others.
Let me know when you have a new release that perhaps also fixes this mem usage. But you don't need to release a new version for only this, cause I changed it for now in the rule, or do I need to change it somewhere else, and where :-)? Thanks, |
I scanned the file and found other problematic ones but if you managed to merge maybe it't ok. What could be optimized? |
Yes it were in the end 2 lines in my subset 3 I think which throw the error, the others it passed without error. What could be optimized? ==> the hardcoded 20 Gb mem, I changed to reading in the mem in the config file. |
@Sofie8 Thank you for your suggestions.
For your cluster problem, try this out https://snakemake.readthedocs.io/en/stable/project_info/faq.html#how-can-i-run-snakemake-on-a-cluster-where-its-main-process-is-not-allowed-to-run-on-the-head-node |
So, when the pipeline is running other tools I see something like this: Whereas when running EggNogmapper, this is the command I see:
No sbatch anywhere. Also, it does not output a slurm output file, like I do see for the rest of the tools (I don't think this file is created only when the job is finished right? It should be created as soon as the job starts). |
@MalbertR I get your point. Seems eggNOG_homology_search is not submitted to the cluster. I don't know why. You don't have an old version of atlas, do you? |
From the code there is no reason why this rule should be executed locally and not the others. eggNOG_homology_search |
I will check my version and update to the latest (just in case). The re-run and come back to you. |
Hi Silas, The issue seems to be solved now. At first I thought it was probably thanks to the updated version of the pipeline, but then I went to have a look if there were differences with how I was running the pipeline before compared to this last time and realized that I wasn't adding the cluster profile before. So, that was probably the issue...Oops! |
This below is how it runs in the log, do you know why the mem is only 20, because I specified 160 in the config?
[Sun Jan 3 13:44:22 2021]
rule eggNOG_annotation:
input: /ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/EggNOGV2/eggnog.db, /ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/EggNOGV2/eggnog_proteins.dmnd, /ddn1/vol1/site_scratch/leuven/314/vsc31426/db/atlas/EggNOGV2/checksum_checked, Genecatalog/subsets/genes/subset3.emapper.seed_orthologs
output: Genecatalog/subsets/genes/subset3.emapper.annotations
log: Genecatalog/subsets/genes/logs/subset3/eggNOG_annotate_hits_table.log
jobid: 1075
wildcards: folder=Genecatalog/subsets/genes, prefix=subset3
threads: 36
resources: mem=20, time=5
The text was updated successfully, but these errors were encountered: