-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snakemake waiting for files much longer than latency-wait (slurm, NFS) #2739
Comments
Please list the plugins and the plugin versions you are listing. |
We finally were able to resolve the problem with delay. It happened because --max-status-checks-per-second set to 0.02, so it took ~5 hours just to check the status of the swarm of ~400 jobs. I still do not get why filesystem timestamp is different from both slurm-reported job end time and "snakemake's" job end time, but is not so important (at least for me). No plugins were used, native snakemake's slurm support only. snakemake v7.30.1 |
Ah, that's a good one: I will make a note for the SLURM executor plugin documentation. Thank you. |
Possibly related to: #2496 vvvv |
@conchoecia no: overwriting job-name will cause the job state testing to fail. What is needed is a mechanism to prevent this overwrite by users. |
When running large swarm of jobs on NFS cluster sometimes I see that snakemake is waiting for files way more longer than it is stated in latency-wait option
This is likely related to NFS problem first mentioned in this issue, #39 , but the difference that I do not get IncompleteFilesException . After few hours snakemake 'realizes' that the job is finished and execution continues.
For example, for one of the jobs I have 07:18:42 end time reported by sacct, 07:42 last modified time for output files reported by ls and 12:22:34 as an end time reported by snakemake log (same day).
Can it be related to this place in code, https://github.com/snakemake/snakemake/blob/977951ea541bceb97b6a77709fde863f6c638352/snakemake/io.py#L895C7-L895C8 ? - As far as I see snakemake just waits until file appear to exists in _IOCache, regardless to latency-wait
--latency-wait is set to 30 seconds, so expected behavior for such case would be crashing - is it intentional that it does not happen?
And is there a known workaround to avoid this problem without running ls in separate window or modifying snakemake's code (as mentioned in #39) ?
The text was updated successfully, but these errors were encountered: