Make ReturnnSearchJobV2 resumable? #329

albertz · 2022-10-22T19:16:18Z

Basically add resume="run" to the Task, but then also make sure the search output file is deleted in the beginning, because otherwise RETURNN fails with an exception.

As I understand, the Sisyphus resume logic would automatically increase the requirements when it crashed due to timeout or out-of-memory. This could be helpful, or not?

The text was updated successfully, but these errors were encountered:

JackTemaki · 2022-10-25T09:48:24Z

I never considered this for the search task, as if there was a timeout or out-of-memory issue for me this was always a more substantial problem for me that required changing the job parameters. I am not a big fan of the resume logic for non-resumable logic (e.g. the search will always start from the beginning again) as relying on this leads to wasted computation in the long term.

albertz · 2022-10-25T09:59:35Z

But I don't exactly understand the argumentation. If you configured too less memory for some job, why is it ok if it is resumable but not ok if it needs to start again from scratch?

JackTemaki · 2022-10-25T10:18:18Z

If you configured too less memory for some job, why is it ok if it is resumable

For me this is not "ok" in any way. If I notice that there are some memory/time issues I fix the setup right away, but do not rely on this for me to always run. Training jobs should be resumable, because they can actually continue from a checkpoint (independent of any resource issues, there might be other reasons for a process to be killed).

For me this automatic adjustment of resources is a relic of the GMM pipeline which distributed 100 parallel jobs, and you want 2-3 of them to restart with new requirements if they accidentally need a lot more resources (which can happen depending on the assigned segments). But even there I argue it is better to fix your search settings(e.g. max pruning) than to rely on that. I do not see the benefit for RETURNN recognition jobs...

albertz assigned christophmluscher, JackTemaki and michelwi Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ReturnnSearchJobV2 resumable? #329

Make ReturnnSearchJobV2 resumable? #329

albertz commented Oct 22, 2022

JackTemaki commented Oct 25, 2022

albertz commented Oct 25, 2022

JackTemaki commented Oct 25, 2022

Make ReturnnSearchJobV2 resumable? #329

Make ReturnnSearchJobV2 resumable? #329

Comments

albertz commented Oct 22, 2022

JackTemaki commented Oct 25, 2022

albertz commented Oct 25, 2022

JackTemaki commented Oct 25, 2022