Run snakemake on Cluster #23

orangelc1221 · 2022-07-24T11:04:46Z

Hello,

I have been trying to run this pipeline on the cluster of my university. And in general, it works (sometimes) - I was able to go through it and got the final output - all.vcf.gz, although I needed to rerun it at some point, likely due to time limits on the variant calling step or so.

However, there is one error that keeps popping out,

Traceback (most recent call last):
File "/net/cephfs/data/XXX/grenepipe-master/profiles/slurm/slurm-submit.py", line 10, in
from snakemake.utils import read_job_properties
ModuleNotFoundError: No module named 'snakemake'

I searched online a bit and I found this similar issue: https://stackoverflow.com/questions/59493422/snakemake-cluster-script-importerror-snakemake-utils

As described, this issue seems to come and go randomly. However, I was not able to solve it based on the answers - since I am not sure where I should change the #PATH. Luckily in my successful trial, this error somehow disappeared at some point so I was able to finish the run. But now I am setting it to run with another dataset and starting to get this error again. Basically with such errors, seems no job was submitted to the cluster. In the output directory, there are two folders "logs" and "contig-groups" generated, both contain some empty files for the samples.

Any suggestions? Do you think this is something I should rather figure out with my university cluster setting? I had a meeting with our IT already, but he couldn't tell much since he didn't really know the pipeline settings... But he offered to set up the pipeline on his env if needed :)

Thank you very much!

Best,
lc

orangelc1221 · 2022-07-25T14:28:18Z

Update: I tried to change the setting of --conda-prefix, i.e. I created a new folder for the new project to save the environments, and it started to run without the "ModuleNotFoundError" shown above. Could this be the issue? So we have to create the environments again for different projects?

lczech · 2022-07-28T03:41:13Z

Hey @orangelc1221,

this sounds to me like an issue with some of the nodes of the cluster being set up differently from others, or some issue with conda not being properly set up. I've had an issue a while ago on our cluster that would also come and go "randomly", but it turned out that it always was the same node that caused trouble, because it had some broken config script or something, that IT was then able to fix.

For that particular reason, the grenepipe log at the very start of a run prints out all kind of information, so that we can debug these kinds of issues. Can you check the output or log files of runs, and see if the conda/python/snakemake versions are what you expect them to be?

Also, are you using the grenepipe conda environment when starting snakemake, as described in the wiki here? Basically, what you want to do in order to run the pipeline is

conda activate grenepipe 
snakemake ...

to make sure that the correct conda version is used.

Let me know if that helps. If it doesn't, I'd suggest to talk with IT again ;-)

Cheers
Lucas

lczech · 2022-08-19T16:32:38Z

Hey @orangelc1221,

it seems you solved the issue? Would you mind sharing how, so that users with the same issue in the future can learn from your solution as well?

Cheers and thanks
Lucas

orangelc1221 closed this as completed Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run snakemake on Cluster #23

Run snakemake on Cluster #23

orangelc1221 commented Jul 24, 2022

orangelc1221 commented Jul 25, 2022

lczech commented Jul 28, 2022

lczech commented Aug 19, 2022

Run snakemake on Cluster #23

Run snakemake on Cluster #23

Comments

orangelc1221 commented Jul 24, 2022

orangelc1221 commented Jul 25, 2022

lczech commented Jul 28, 2022

lczech commented Aug 19, 2022