Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run snakemake on Cluster #23

Closed
orangelc1221 opened this issue Jul 24, 2022 · 3 comments
Closed

Run snakemake on Cluster #23

orangelc1221 opened this issue Jul 24, 2022 · 3 comments

Comments

@orangelc1221
Copy link

Hello,

I have been trying to run this pipeline on the cluster of my university. And in general, it works (sometimes) - I was able to go through it and got the final output - all.vcf.gz, although I needed to rerun it at some point, likely due to time limits on the variant calling step or so.

However, there is one error that keeps popping out,

Traceback (most recent call last):
File "/net/cephfs/data/XXX/grenepipe-master/profiles/slurm/slurm-submit.py", line 10, in
from snakemake.utils import read_job_properties
ModuleNotFoundError: No module named 'snakemake'

I searched online a bit and I found this similar issue: https://stackoverflow.com/questions/59493422/snakemake-cluster-script-importerror-snakemake-utils

As described, this issue seems to come and go randomly. However, I was not able to solve it based on the answers - since I am not sure where I should change the #PATH. Luckily in my successful trial, this error somehow disappeared at some point so I was able to finish the run. But now I am setting it to run with another dataset and starting to get this error again. Basically with such errors, seems no job was submitted to the cluster. In the output directory, there are two folders "logs" and "contig-groups" generated, both contain some empty files for the samples.

Any suggestions? Do you think this is something I should rather figure out with my university cluster setting? I had a meeting with our IT already, but he couldn't tell much since he didn't really know the pipeline settings... But he offered to set up the pipeline on his env if needed :)

Thank you very much!

Best,
lc

@orangelc1221
Copy link
Author

Update: I tried to change the setting of --conda-prefix, i.e. I created a new folder for the new project to save the environments, and it started to run without the "ModuleNotFoundError" shown above. Could this be the issue? So we have to create the environments again for different projects?

@lczech
Copy link
Member

lczech commented Jul 28, 2022

Hey @orangelc1221,

this sounds to me like an issue with some of the nodes of the cluster being set up differently from others, or some issue with conda not being properly set up. I've had an issue a while ago on our cluster that would also come and go "randomly", but it turned out that it always was the same node that caused trouble, because it had some broken config script or something, that IT was then able to fix.

For that particular reason, the grenepipe log at the very start of a run prints out all kind of information, so that we can debug these kinds of issues. Can you check the output or log files of runs, and see if the conda/python/snakemake versions are what you expect them to be?

Also, are you using the grenepipe conda environment when starting snakemake, as described in the wiki here? Basically, what you want to do in order to run the pipeline is

conda activate grenepipe 
snakemake ...

to make sure that the correct conda version is used.

Let me know if that helps. If it doesn't, I'd suggest to talk with IT again ;-)

Cheers
Lucas

@lczech
Copy link
Member

lczech commented Aug 19, 2022

Hey @orangelc1221,

it seems you solved the issue? Would you mind sharing how, so that users with the same issue in the future can learn from your solution as well?

Cheers and thanks
Lucas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants