Use Rails executor if code reloading is disabled #3221

eugeneius · 2016-11-06T01:19:10Z

We have to release the current thread's Active Record connection after performing each job, in case another thread is waiting to use it.

On Rails 4 and earlier, this is handled with middleware. On Rails 5 in development mode, the reloader does it by delegating to the executor. However on Rails 5 in production mode, we're not adding the middleware or enabling the reloader, so connections will never be released.

We can call the executor directly to have it release the connection for us in this case. By calling it inside the middleware stack, the job will be retried if the executor raises, avoiding the problem with lost jobs that led to the reloader being disabled in production.

mperham · 2016-11-06T01:28:58Z

This is getting laughably complex. Can something simpler work?

eugeneius · 2016-11-06T01:45:18Z

I considered adding the middleware in this case instead, but it felt weird to use a different mechanism to release the connection in development vs. production on the same version of Rails.

If we reenqueued the job when the reloader raises, we could go back to enabling it in all environments. I think this is achievable, but it would involve more extensive changes than this patch (and I was discouraged from going in this direction by this warning!)

eugeneius · 2016-11-06T09:24:23Z

@matthewd / @sj26: do you have any feedback here, or ideas for a less convoluted approach?

matthewd · 2016-11-06T16:54:58Z

Always invoke the reloader. #3212 was the right response for [the Rails 5 part of] #3154 (and incidentally seems suggestive of the problem on Rails 4 too: there are apparently places an exception can occur that will silently kill a thread, even when it has a job in flight).

The reloader can definitely raise -- with eager load disabled, it'll load the whole application, and thus raise for any syntax errors etc, for example. I didn't realise we'd put it outside the error handling mechanism; we should instead do Not That.

matthewd · 2016-11-06T17:02:48Z

(It apparently didn't occur to anyone in #3154 😢, but if the reloader is in fact eating work / raising on things that aren't genuine application-level problems, I'd like to know about it, so we can fix it -- I'm not just suggesting "retry until it goes through", but until we know what was being raised, there isn't much to do.)

mperham · 2016-11-06T20:22:25Z

@matthewd I think the name Reloader is a very poor name for what it is and the source of my confusion. I figured that if we don't want to reload code, we can just not use it. Now we learn that it has other responsibilities: releasing connections, running callbacks, etc. The reloader is not just a reloader, it does a lot more and needs to always be running in Rails 5. I would suggest it be renamed.

We have to release the current thread's Active Record connection after performing each job, in case another thread is waiting to use it. On Rails 4 and earlier, this is handled with middleware. On Rails 5 in development mode, the reloader does it by delegating to the executor. However on Rails 5 in production mode, we're not adding the middleware or enabling the reloader, so connections will never be released. We can call the executor directly to have it release the connection for us in this case. By calling it inside the middleware stack, the job will be retried if the executor raises, avoiding the problem with lost jobs that led to the reloader being disabled in production.

mperham · 2016-11-07T19:31:27Z

The problem with #3212 is that it doesn't call the Reloader within the RetryJobs middleware so if it raises an exception, it'll be logged but the job will be lost. This PR fixes that but at the expense of tracking and invoking the reloader at two different places based on the current environment.

I'm wondering if the Reloader should be integrated in as a new middleware which runs first thing in the chain. Notice that the klass and worker variables here aren't used by the Processor but actually made available for the middleware. We shouldn't constantize the job_class outside of the reloader block but we shouldn't run the reloader block outside of the middleware block -- we have a Catch-22.

One solution is to refactor the retry behavior so it is no longer a middleware. Something like this pseudocode might work:

worker = nil
begin
  reload do
    worker = job.constantize.new
    execute_middleware(worker, ...) do
      execute_job
    end
  end
rescue => ex
  retry_failures(job_hash, worker, ex)
end

eugeneius · 2016-11-08T00:14:51Z

That seems like the best long term approach alright.

Would you consider it a breaking change? The retry middleware is mostly an internal concern, but there's some documentation on how to configure the global max retry attempts that will no longer work, and middleware that is currently being inserted before/after RetryJobs will end up in a different position if it's no longer in the default middleware stack.

mperham · 2016-11-08T19:57:24Z

You are correct that there's potential for breakage here. I don't know that this warrants Sidekiq 5 but I also have other breaking changes I want to get into Sidekiq 5 that I was planning on gathering together over the next 6 months. I don't want to wait that long to fix this issue so I'm leaning towards 4.3.0.

mperham · 2016-11-09T23:47:37Z

#3235

matthewd · 2016-11-11T17:22:48Z

The only extra thing the reloader does is invoke the executor... so doing something along the lines of this PR would also be a viable option (though it doesn't need to be conditional; the executor correctly handles nested invocations as no-ops).

"Always invoke the reloader" is a simplification, in that if you're using the reloader, you can/should just always call it, thereby saving you needing to deal with the executor as well. (And because that way the reloader can disable itself, under appropriate circumstances, without you needing to reproduce its conditional checks.)

Most significantly for the original issue, though (and as suggested by the above claim that the reloader skips itself in production): the reported behaviour is coming from the executor... and while it wouldn't be as catastrophic, you don't really want that in development either.

matthewd · 2016-11-11T17:25:30Z

IOW, yes, the reloader should be wrapped by the error/retry handler, and almost certainly logging too. Whether that means pulling more things out of middleware a la #3235, or pushing the reloader down the stack as a middleware, is an architectural call for you. We obviously implement it as a (Rack) middleware, but as we don't control the top of the stack, we don't have much choice.

mperham · 2016-11-11T20:08:33Z

@matthewd Thanks for the note. I can't implement the Reloader as a middleware since the middleware api is call(worker, msg, queue) where worker is an instance that needs to be loaded by Rails: a Catch-22. To break that cycle, I pull the retry logic out of middleware and modify it to not require a worker instance. #3235 contains this work.

eugeneius · 2016-11-12T02:32:20Z

@mperham what do you think of making a final release in the 4.2.x series that uses this approach, and then removing it in 4.3.0? While #3235 is a more thorough fix, it's also a much bigger change.

Sidekiq 4.1.4 was the last version that worked properly with Rails 5 - 4.2.0 to 4.2.2 could lose jobs, and 4.2.3 to 4.2.5 don't release connections. If someone with a Rails 5 app is waiting for a version without those problems, upgrading from 4.1 to 4.3 will involve dealing with the Sinatra removal, the reloader replacing the previous middleware approach, and the overhauled retry mechanism at the same time. That's a lot of stuff changing at once...

mperham · 2016-11-12T03:36:03Z

I think that's a good idea.

mperham · 2016-11-12T14:21:11Z

master should be finished for 4.2.6. Care to verify it works for you?

eugeneius · 2016-11-13T02:59:16Z

Looks good. 👍

I ran a simple test to verify that Active Record connections are released properly: on a new Rails 5 app with a connection pool size of 5, and Sidekiq concurrency set to 25, I enqueued 100 jobs. On 4.2.5, the 20 threads that couldn't get a connection raised ActiveRecord::ConnectionTimeoutError when they tried to access the database; on master, all of the jobs competed successfully.

The main application I work on that uses Sidekiq is still on Rails 4.2, so I don't have a good way of verifying this behaviour under a production workload, if that's what you meant.

mperham · 2016-11-14T18:11:47Z

4.2.6 is released with this change. I'll merge my redesign for 4.3.0 later this week unless plans change.

* Rework job processing in light of Rails 5's Reloader, see #3221 * Ignore built gems * Documentation, testing * Add fallback for 'retry' value server-side, fixes #3234 * Fix job hash reporting in stats * cleanup * Add job for AJ testing * Push jobs with invalid JSON immediately to Dead set, fixes #3296 * Break retry logic into global and local parts, fixes #3306 * fix heisentest

eugeneius force-pushed the rails_executor branch from f9bc299 to 31f87a3 Compare November 7, 2016 09:19

eugeneius force-pushed the rails_executor branch from 31f87a3 to 3b5a469 Compare November 7, 2016 13:27

mperham added a commit that referenced this pull request Nov 9, 2016

Rework job processing in light of Rails 5's Reloader, see #3221

cccaf96

mperham closed this Nov 9, 2016

mperham mentioned this pull request Nov 9, 2016

Rework job processing in light of Rails 5's Reloader #3235

Merged

mperham reopened this Nov 12, 2016

mperham merged commit f49b4f1 into sidekiq:master Nov 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Rails executor if code reloading is disabled #3221

Use Rails executor if code reloading is disabled #3221

eugeneius commented Nov 6, 2016

mperham commented Nov 6, 2016

eugeneius commented Nov 6, 2016

eugeneius commented Nov 6, 2016

matthewd commented Nov 6, 2016

matthewd commented Nov 6, 2016

mperham commented Nov 6, 2016 •

edited

Loading

mperham commented Nov 7, 2016 •

edited

Loading

eugeneius commented Nov 8, 2016

mperham commented Nov 8, 2016

mperham commented Nov 9, 2016

matthewd commented Nov 11, 2016

matthewd commented Nov 11, 2016

mperham commented Nov 11, 2016

eugeneius commented Nov 12, 2016 •

edited

Loading

mperham commented Nov 12, 2016

mperham commented Nov 12, 2016

eugeneius commented Nov 13, 2016

mperham commented Nov 14, 2016

Use Rails executor if code reloading is disabled #3221

Use Rails executor if code reloading is disabled #3221

Conversation

eugeneius commented Nov 6, 2016

mperham commented Nov 6, 2016

eugeneius commented Nov 6, 2016

eugeneius commented Nov 6, 2016

matthewd commented Nov 6, 2016

matthewd commented Nov 6, 2016

mperham commented Nov 6, 2016 • edited Loading

mperham commented Nov 7, 2016 • edited Loading

eugeneius commented Nov 8, 2016

mperham commented Nov 8, 2016

mperham commented Nov 9, 2016

matthewd commented Nov 11, 2016

matthewd commented Nov 11, 2016

mperham commented Nov 11, 2016

eugeneius commented Nov 12, 2016 • edited Loading

mperham commented Nov 12, 2016

mperham commented Nov 12, 2016

eugeneius commented Nov 13, 2016

mperham commented Nov 14, 2016

mperham commented Nov 6, 2016 •

edited

Loading

mperham commented Nov 7, 2016 •

edited

Loading

eugeneius commented Nov 12, 2016 •

edited

Loading