Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Rails executor if code reloading is disabled #3221

Merged
merged 1 commit into from
Nov 12, 2016

Conversation

eugeneius
Copy link
Contributor

We have to release the current thread's Active Record connection after performing each job, in case another thread is waiting to use it.

On Rails 4 and earlier, this is handled with middleware. On Rails 5 in development mode, the reloader does it by delegating to the executor. However on Rails 5 in production mode, we're not adding the middleware or enabling the reloader, so connections will never be released.

We can call the executor directly to have it release the connection for us in this case. By calling it inside the middleware stack, the job will be retried if the executor raises, avoiding the problem with lost jobs that led to the reloader being disabled in production.

@mperham
Copy link
Collaborator

mperham commented Nov 6, 2016

This is getting laughably complex. Can something simpler work?

@eugeneius
Copy link
Contributor Author

I considered adding the middleware in this case instead, but it felt weird to use a different mechanism to release the connection in development vs. production on the same version of Rails.

If we reenqueued the job when the reloader raises, we could go back to enabling it in all environments. I think this is achievable, but it would involve more extensive changes than this patch (and I was discouraged from going in this direction by this warning!)

@eugeneius
Copy link
Contributor Author

@matthewd / @sj26: do you have any feedback here, or ideas for a less convoluted approach?

@matthewd
Copy link

matthewd commented Nov 6, 2016

Always invoke the reloader. #3212 was the right response for [the Rails 5 part of] #3154 (and incidentally seems suggestive of the problem on Rails 4 too: there are apparently places an exception can occur that will silently kill a thread, even when it has a job in flight).

The reloader can definitely raise -- with eager load disabled, it'll load the whole application, and thus raise for any syntax errors etc, for example. I didn't realise we'd put it outside the error handling mechanism; we should instead do Not That.

@matthewd
Copy link

matthewd commented Nov 6, 2016

(It apparently didn't occur to anyone in #3154 😢, but if the reloader is in fact eating work / raising on things that aren't genuine application-level problems, I'd like to know about it, so we can fix it -- I'm not just suggesting "retry until it goes through", but until we know what was being raised, there isn't much to do.)

@mperham
Copy link
Collaborator

mperham commented Nov 6, 2016

@matthewd I think the name Reloader is a very poor name for what it is and the source of my confusion. I figured that if we don't want to reload code, we can just not use it. Now we learn that it has other responsibilities: releasing connections, running callbacks, etc. The reloader is not just a reloader, it does a lot more and needs to always be running in Rails 5. I would suggest it be renamed.

We have to release the current thread's Active Record connection after
performing each job, in case another thread is waiting to use it.

On Rails 4 and earlier, this is handled with middleware. On Rails 5 in
development mode, the reloader does it by delegating to the executor.
However on Rails 5 in production mode, we're not adding the middleware
or enabling the reloader, so connections will never be released.

We can call the executor directly to have it release the connection for
us in this case. By calling it inside the middleware stack, the job will
be retried if the executor raises, avoiding the problem with lost jobs
that led to the reloader being disabled in production.
@mperham
Copy link
Collaborator

mperham commented Nov 7, 2016

The problem with #3212 is that it doesn't call the Reloader within the RetryJobs middleware so if it raises an exception, it'll be logged but the job will be lost. This PR fixes that but at the expense of tracking and invoking the reloader at two different places based on the current environment.

I'm wondering if the Reloader should be integrated in as a new middleware which runs first thing in the chain. Notice that the klass and worker variables here aren't used by the Processor but actually made available for the middleware. We shouldn't constantize the job_class outside of the reloader block but we shouldn't run the reloader block outside of the middleware block -- we have a Catch-22.

One solution is to refactor the retry behavior so it is no longer a middleware. Something like this pseudocode might work:

worker = nil
begin
  reload do
    worker = job.constantize.new
    execute_middleware(worker, ...) do
      execute_job
    end
  end
rescue => ex
  retry_failures(job_hash, worker, ex)
end

@eugeneius
Copy link
Contributor Author

That seems like the best long term approach alright.

Would you consider it a breaking change? The retry middleware is mostly an internal concern, but there's some documentation on how to configure the global max retry attempts that will no longer work, and middleware that is currently being inserted before/after RetryJobs will end up in a different position if it's no longer in the default middleware stack.

@mperham
Copy link
Collaborator

mperham commented Nov 8, 2016

You are correct that there's potential for breakage here. I don't know that this warrants Sidekiq 5 but I also have other breaking changes I want to get into Sidekiq 5 that I was planning on gathering together over the next 6 months. I don't want to wait that long to fix this issue so I'm leaning towards 4.3.0.

@mperham
Copy link
Collaborator

mperham commented Nov 9, 2016

#3235

@matthewd
Copy link

The only extra thing the reloader does is invoke the executor... so doing something along the lines of this PR would also be a viable option (though it doesn't need to be conditional; the executor correctly handles nested invocations as no-ops).

"Always invoke the reloader" is a simplification, in that if you're using the reloader, you can/should just always call it, thereby saving you needing to deal with the executor as well. (And because that way the reloader can disable itself, under appropriate circumstances, without you needing to reproduce its conditional checks.)

Most significantly for the original issue, though (and as suggested by the above claim that the reloader skips itself in production): the reported behaviour is coming from the executor... and while it wouldn't be as catastrophic, you don't really want that in development either.

@matthewd
Copy link

IOW, yes, the reloader should be wrapped by the error/retry handler, and almost certainly logging too. Whether that means pulling more things out of middleware a la #3235, or pushing the reloader down the stack as a middleware, is an architectural call for you. We obviously implement it as a (Rack) middleware, but as we don't control the top of the stack, we don't have much choice.

@mperham
Copy link
Collaborator

mperham commented Nov 11, 2016

@matthewd Thanks for the note. I can't implement the Reloader as a middleware since the middleware api is call(worker, msg, queue) where worker is an instance that needs to be loaded by Rails: a Catch-22. To break that cycle, I pull the retry logic out of middleware and modify it to not require a worker instance. #3235 contains this work.

@eugeneius
Copy link
Contributor Author

eugeneius commented Nov 12, 2016

@mperham what do you think of making a final release in the 4.2.x series that uses this approach, and then removing it in 4.3.0? While #3235 is a more thorough fix, it's also a much bigger change.

Sidekiq 4.1.4 was the last version that worked properly with Rails 5 - 4.2.0 to 4.2.2 could lose jobs, and 4.2.3 to 4.2.5 don't release connections. If someone with a Rails 5 app is waiting for a version without those problems, upgrading from 4.1 to 4.3 will involve dealing with the Sinatra removal, the reloader replacing the previous middleware approach, and the overhauled retry mechanism at the same time. That's a lot of stuff changing at once...

@mperham
Copy link
Collaborator

mperham commented Nov 12, 2016

I think that's a good idea.

@mperham mperham reopened this Nov 12, 2016
@mperham mperham merged commit f49b4f1 into sidekiq:master Nov 12, 2016
@mperham
Copy link
Collaborator

mperham commented Nov 12, 2016

master should be finished for 4.2.6. Care to verify it works for you?

@eugeneius
Copy link
Contributor Author

Looks good. 👍

I ran a simple test to verify that Active Record connections are released properly: on a new Rails 5 app with a connection pool size of 5, and Sidekiq concurrency set to 25, I enqueued 100 jobs. On 4.2.5, the 20 threads that couldn't get a connection raised ActiveRecord::ConnectionTimeoutError when they tried to access the database; on master, all of the jobs competed successfully.

The main application I work on that uses Sidekiq is still on Rails 4.2, so I don't have a good way of verifying this behaviour under a production workload, if that's what you meant.

@mperham
Copy link
Collaborator

mperham commented Nov 14, 2016

4.2.6 is released with this change. I'll merge my redesign for 4.3.0 later this week unless plans change.

mperham added a commit that referenced this pull request Jan 17, 2017
* Rework job processing in light of Rails 5's Reloader, see #3221

* Ignore built gems

* Documentation, testing

* Add fallback for 'retry' value server-side, fixes #3234

* Fix job hash reporting in stats

* cleanup

* Add job for AJ testing

* Push jobs with invalid JSON immediately to Dead set, fixes #3296

* Break retry logic into global and local parts, fixes #3306

* fix heisentest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants