For large batches it is very painful to have a cancellation timeout, because it spends a long time preparing for it (doing a topological sort of all jobs), which forces us to re-do all that work when we retry. Better to have some simple retries in place for the actual `bulk_cancel` call.
It never belonged there.
We want it to conform to the return values a caller would have had before they switched to using this new API.
Previously, each time we enumerated a JobBatchList, we started at 1 and counted up to the current counter value. This has gotten slower and slower over time as the value of the counter has increased. When we enumerate the list and find the first job batch at number n, we know that there will never again be any job batches found at numbers 1 through (n - 1), so we can safely skip those checks on future enumerations. This is a safe assumption because * New job batches get their number by incrementing the counter. Thus no new job batch could be created with a number lower than an existing job batch already has. * Once a job batch is no longer in redis (whether or through expiration or a manual delete), it will never come back.