New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improved error handling for cluster status scripts and smarter job selector choice in case of cluster submission (use greedy for single jobs). #1142
Conversation
…ob selector choice in case of cluster submission (use greedy for single jobs).
Please format your code with black: |
…e/snakemake into cluster-statusscript-robustness
@vsoch do you have any idea why the GLS test suddenly fails here? I did not expect my changes to have any side effects on GLS. |
The last step in the GLS test it looks like is now using this greedy selector, so that's the change:
This one here: And IIRC this step was added to make sure that we can handle the directory creation:
I think it probably has to do with the fact that in this new function we try to calculate a size for the input directory, which won't have one / possibly hasn't been created yet. snakemake/snakemake/scheduler.py Line 897 in 3530f32
Which ultimately gets us here: Line 394 in 3530f32
and then calls this size_local function on it: snakemake/snakemake/remote/GS.py Line 220 in 3530f32
So ultimately it's this call that is wrong and isn't aware of directories perhaps? Line 574 in cd5c58d
If I remember, we had to add a bunch of new logic to handle directories, and I think it would need to be added in here somewhere so we essentially don't stat something that is an input but doesn't exist yet. That's my best guest without interacting with it directory - I would add directory handling, meaning some check using Line 372 in 3530f32
|
…f this is the reason for the failure in GLS).
Kudos, SonarCloud Quality Gate passed!
|
Thanks a lot @vsoch for figuring out the issue! |
I'm not sure if it's caused by these changes or not, but I've noticed recently that snakemake's status checking for cluster jobs is very delayed. For example, in the LSF profile, if we can't get the job status using I hope this makes sense? Basically I'm asking the there is some kind of new priority between submitting and status-checking? |
Sort of. It looks at the last 100 lsb.events files. Which means potentially opening 100 files. This is much slower than our current solution of looking in the job's cluster log file. Anway, that's not really the point of my question. I was just wanting to know if the submission/status priority has changed. |
Description
See title. Could help with issue #759.
QC
docs/
) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).