-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Rails executor if code reloading is disabled #3221
Conversation
This is getting laughably complex. Can something simpler work? |
I considered adding the middleware in this case instead, but it felt weird to use a different mechanism to release the connection in development vs. production on the same version of Rails. If we reenqueued the job when the reloader raises, we could go back to enabling it in all environments. I think this is achievable, but it would involve more extensive changes than this patch (and I was discouraged from going in this direction by this warning!) |
Always invoke the reloader. #3212 was the right response for [the Rails 5 part of] #3154 (and incidentally seems suggestive of the problem on Rails 4 too: there are apparently places an exception can occur that will silently kill a thread, even when it has a job in flight). The reloader can definitely raise -- with eager load disabled, it'll load the whole application, and thus raise for any syntax errors etc, for example. I didn't realise we'd put it outside the error handling mechanism; we should instead do Not That. |
(It apparently didn't occur to anyone in #3154 😢, but if the reloader is in fact eating work / raising on things that aren't genuine application-level problems, I'd like to know about it, so we can fix it -- I'm not just suggesting "retry until it goes through", but until we know what was being raised, there isn't much to do.) |
@matthewd I think the name Reloader is a very poor name for what it is and the source of my confusion. I figured that if we don't want to reload code, we can just not use it. Now we learn that it has other responsibilities: releasing connections, running callbacks, etc. The reloader is not just a reloader, it does a lot more and needs to always be running in Rails 5. I would suggest it be renamed. |
f9bc299
to
31f87a3
Compare
We have to release the current thread's Active Record connection after performing each job, in case another thread is waiting to use it. On Rails 4 and earlier, this is handled with middleware. On Rails 5 in development mode, the reloader does it by delegating to the executor. However on Rails 5 in production mode, we're not adding the middleware or enabling the reloader, so connections will never be released. We can call the executor directly to have it release the connection for us in this case. By calling it inside the middleware stack, the job will be retried if the executor raises, avoiding the problem with lost jobs that led to the reloader being disabled in production.
31f87a3
to
3b5a469
Compare
The problem with #3212 is that it doesn't call the Reloader within the RetryJobs middleware so if it raises an exception, it'll be logged but the job will be lost. This PR fixes that but at the expense of tracking and invoking the reloader at two different places based on the current environment. I'm wondering if the Reloader should be integrated in as a new middleware which runs first thing in the chain. Notice that the klass and worker variables here aren't used by the Processor but actually made available for the middleware. We shouldn't constantize the job_class outside of the reloader block but we shouldn't run the reloader block outside of the middleware block -- we have a Catch-22. One solution is to refactor the retry behavior so it is no longer a middleware. Something like this pseudocode might work: worker = nil
begin
reload do
worker = job.constantize.new
execute_middleware(worker, ...) do
execute_job
end
end
rescue => ex
retry_failures(job_hash, worker, ex)
end |
That seems like the best long term approach alright. Would you consider it a breaking change? The retry middleware is mostly an internal concern, but there's some documentation on how to configure the global max retry attempts that will no longer work, and middleware that is currently being inserted before/after RetryJobs will end up in a different position if it's no longer in the default middleware stack. |
You are correct that there's potential for breakage here. I don't know that this warrants Sidekiq 5 but I also have other breaking changes I want to get into Sidekiq 5 that I was planning on gathering together over the next 6 months. I don't want to wait that long to fix this issue so I'm leaning towards 4.3.0. |
The only extra thing the reloader does is invoke the executor... so doing something along the lines of this PR would also be a viable option (though it doesn't need to be conditional; the executor correctly handles nested invocations as no-ops). "Always invoke the reloader" is a simplification, in that if you're using the reloader, you can/should just always call it, thereby saving you needing to deal with the executor as well. (And because that way the reloader can disable itself, under appropriate circumstances, without you needing to reproduce its conditional checks.) Most significantly for the original issue, though (and as suggested by the above claim that the reloader skips itself in production): the reported behaviour is coming from the executor... and while it wouldn't be as catastrophic, you don't really want that in development either. |
IOW, yes, the reloader should be wrapped by the error/retry handler, and almost certainly logging too. Whether that means pulling more things out of middleware a la #3235, or pushing the reloader down the stack as a middleware, is an architectural call for you. We obviously implement it as a (Rack) middleware, but as we don't control the top of the stack, we don't have much choice. |
@matthewd Thanks for the note. I can't implement the Reloader as a middleware since the middleware api is |
@mperham what do you think of making a final release in the 4.2.x series that uses this approach, and then removing it in 4.3.0? While #3235 is a more thorough fix, it's also a much bigger change. Sidekiq 4.1.4 was the last version that worked properly with Rails 5 - 4.2.0 to 4.2.2 could lose jobs, and 4.2.3 to 4.2.5 don't release connections. If someone with a Rails 5 app is waiting for a version without those problems, upgrading from 4.1 to 4.3 will involve dealing with the Sinatra removal, the reloader replacing the previous middleware approach, and the overhauled retry mechanism at the same time. That's a lot of stuff changing at once... |
I think that's a good idea. |
master should be finished for 4.2.6. Care to verify it works for you? |
Looks good. 👍 I ran a simple test to verify that Active Record connections are released properly: on a new Rails 5 app with a connection pool size of 5, and Sidekiq concurrency set to 25, I enqueued 100 jobs. On 4.2.5, the 20 threads that couldn't get a connection raised The main application I work on that uses Sidekiq is still on Rails 4.2, so I don't have a good way of verifying this behaviour under a production workload, if that's what you meant. |
4.2.6 is released with this change. I'll merge my redesign for 4.3.0 later this week unless plans change. |
* Rework job processing in light of Rails 5's Reloader, see #3221 * Ignore built gems * Documentation, testing * Add fallback for 'retry' value server-side, fixes #3234 * Fix job hash reporting in stats * cleanup * Add job for AJ testing * Push jobs with invalid JSON immediately to Dead set, fixes #3296 * Break retry logic into global and local parts, fixes #3306 * fix heisentest
We have to release the current thread's Active Record connection after performing each job, in case another thread is waiting to use it.
On Rails 4 and earlier, this is handled with middleware. On Rails 5 in development mode, the reloader does it by delegating to the executor. However on Rails 5 in production mode, we're not adding the middleware or enabling the reloader, so connections will never be released.
We can call the executor directly to have it release the connection for us in this case. By calling it inside the middleware stack, the job will be retried if the executor raises, avoiding the problem with lost jobs that led to the reloader being disabled in production.