Replies: 3 comments
-
Have enlarged the targets to make them fewer and slower. Will see if that works. |
Beta Was this translation helpful? Give feedback.
-
Sounds like a |
Beta Was this translation helpful? Give feedback.
-
wasn't reproducible so was likely hitting an IO resource limit. The only thing targets could consider doing is a few retries upon encountering saving errors? |
Beta Was this translation helpful? Give feedback.
-
Help
Description
I tried running a 600-worker crew on a slurm cluster and got this at the end:
I am using:
and
seconds_reporter=15, seconds_meta_append=15
.Any ideas for troubleshooting? I got the error quite quickly (~10 mins in) first try running, then added the
seconds_reporter=15, seconds_meta_append=15
, it ran fine for about 35 minutes and I interrupted, then it happened again the third time about 35 minutes in.Beta Was this translation helpful? Give feedback.
All reactions