New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign Sidekiq internals #2593

Merged
merged 58 commits into from Oct 28, 2015

Conversation

@mperham
Owner

mperham commented Oct 7, 2015

My recent experiment to remove Celluloid has reaped huge rewards. This branch does several things:

  1. Rewrite Sidekiq actors to use raw Threads and data structures.
  2. Make Processor#stats asynchronous, which removes two Redis round trips per job.
  3. Redesign job fetch so each Processor fetches its own job in parallel, this (along with (2)) makes the system much more resilient to higher Redis latency

Processing 100,000 no-op jobs with one process and 25 threads on MRI 2.2:

  • Master: 125 sec
  • Without Celluloid: 57 sec
  • Without Celluloid and Async stats: 20 sec
  • With 1ms of latency, sequential fetch: 7 min
  • With 1ms of latency, parallel fetch: 20 sec

This work would be released as Sidekiq 4.0, with an ETA of Thanksgiving.

@jonhyman

This comment has been minimized.

Show comment
Hide comment
@jonhyman

jonhyman Oct 7, 2015

Collaborator

😮

Collaborator

jonhyman commented Oct 7, 2015

😮

# long the job will take to finish. Instead we
# provide a `kill` method to call after the shutdown
# timeout passes.
@thread.raise ::Sidekiq::Shutdown

This comment has been minimized.

@ryansch

ryansch Oct 7, 2015

Collaborator

I've read in a number of places that Thread#raise and Thread#kill are unsafe. Has that changed? If so, what would the minimum ruby version be?

@ryansch

ryansch Oct 7, 2015

Collaborator

I've read in a number of places that Thread#raise and Thread#kill are unsafe. Has that changed? If so, what would the minimum ruby version be?

This comment has been minimized.

@mperham

mperham Oct 7, 2015

Owner

It's not safe but it's as safe as I can make it. Processor#kill is only called when the hard shutdown timeout has passed and the process must exit.

@mperham

mperham Oct 7, 2015

Owner

It's not safe but it's as safe as I can make it. Processor#kill is only called when the hard shutdown timeout has passed and the process must exit.

This comment has been minimized.

@ryansch

ryansch Oct 7, 2015

Collaborator

Gotcha. The whole building is coming down anyway.

@ryansch

ryansch Oct 7, 2015

Collaborator

Gotcha. The whole building is coming down anyway.

mperham added some commits Oct 7, 2015

Move fetching into the processor
This removes thread context switching and network delay.
@uhoh-itsmaciek

This comment has been minimized.

Show comment
Hide comment
@uhoh-itsmaciek

uhoh-itsmaciek Oct 8, 2015

Looks great! Did you by chance measure memory usage as well?

Looks great! Did you by chance measure memory usage as well?

@mperham

This comment has been minimized.

Show comment
Hide comment
@mperham

mperham Oct 8, 2015

Owner

8x less garbage generated!

On Oct 7, 2015, at 22:15, Maciek Sakrejda notifications@github.com wrote:

Looks great. Did you by chance measure memory usage as well?


Reply to this email directly or view it on GitHub.

Owner

mperham commented Oct 8, 2015

8x less garbage generated!

On Oct 7, 2015, at 22:15, Maciek Sakrejda notifications@github.com wrote:

Looks great. Did you by chance measure memory usage as well?


Reply to this email directly or view it on GitHub.

end
def start
@thread ||= safe_thread("scheduler") do

This comment has been minimized.

@tenderlove

tenderlove Oct 9, 2015

Is start meant to be thread safe? There's a read-check-write race here along with a race in terminate. Also since @thread isn't initialized, it will cause warnings.

@tenderlove

tenderlove Oct 9, 2015

Is start meant to be thread safe? There's a read-check-write race here along with a race in terminate. Also since @thread isn't initialized, it will cause warnings.

This comment has been minimized.

@tenderlove

tenderlove Oct 9, 2015

(probably should also raise an exception if @done is true when this method gets called)

@tenderlove

tenderlove Oct 9, 2015

(probably should also raise an exception if @done is true when this method gets called)

This comment has been minimized.

@mperham

mperham Oct 9, 2015

Owner

Poller#start should only ever be called once by the main thread when calling Launcher#start, so thread safety is not necessary here. There's definitely room for more defensive design.

@mperham

mperham Oct 9, 2015

Owner

Poller#start should only ever be called once by the main thread when calling Launcher#start, so thread safety is not necessary here. There's definitely room for more defensive design.

This comment has been minimized.

@tenderlove

tenderlove Oct 9, 2015

Presumably the allocated thread will just sleep while the queue is empty, so it might be possible to just make start private and call it from initialize. Then you can eliminate the conditional here and in terminate. < /armchairengineering >

@tenderlove

tenderlove Oct 9, 2015

Presumably the allocated thread will just sleep while the queue is empty, so it might be possible to just make start private and call it from initialize. Then you can eliminate the conditional here and in terminate. < /armchairengineering >

This comment has been minimized.

@mperham

mperham Oct 9, 2015

Owner

I like to be able to test the object's APIs without a background thread spinning up in the test suite.

@mperham

mperham Oct 9, 2015

Owner

I like to be able to test the object's APIs without a background thread spinning up in the test suite.

@tarcieri

This comment has been minimized.

Show comment
Hide comment
@tarcieri

tarcieri Oct 9, 2015

Collaborator

Have you considered using a ThreadPoolExecutor? This will give you a very fast and robust implementation on JRuby (now):

https://ruby-concurrency.github.io/concurrent-ruby/Concurrent/ThreadPoolExecutor.html

Collaborator

tarcieri commented Oct 9, 2015

Have you considered using a ThreadPoolExecutor? This will give you a very fast and robust implementation on JRuby (now):

https://ruby-concurrency.github.io/concurrent-ruby/Concurrent/ThreadPoolExecutor.html

@mperham

This comment has been minimized.

Show comment
Hide comment
@mperham

mperham Oct 9, 2015

Owner

@tarcieri I don't understand how a TPE would help vs me spinning up threads.

Owner

mperham commented Oct 9, 2015

@tarcieri I don't understand how a TPE would help vs me spinning up threads.

@tarcieri

This comment has been minimized.

Show comment
Hide comment
@tarcieri

tarcieri Oct 9, 2015

Collaborator

@mperham on JRuby at least, it's backed by a java.util.concurrent.ThreadPoolExecutor and should have both lower thread coordination and memory overhead than anything that can be done in pure Ruby

Collaborator

tarcieri commented Oct 9, 2015

@mperham on JRuby at least, it's backed by a java.util.concurrent.ThreadPoolExecutor and should have both lower thread coordination and memory overhead than anything that can be done in pure Ruby

@drewblas

This comment has been minimized.

Show comment
Hide comment
@drewblas

drewblas Oct 14, 2015

@mperham What happens if the status updator/heartbeat thread dies? Will it have status info about jobs that finished but it never sends that info back to Redis? It seems like the buffering of stats creates a hole for data to get lost in a crash, resulting in even more double-run or stuck jobs.

@mperham What happens if the status updator/heartbeat thread dies? Will it have status info about jobs that finished but it never sends that info back to Redis? It seems like the buffering of stats creates a hole for data to get lost in a crash, resulting in even more double-run or stuck jobs.

@nviennot

This comment has been minimized.

Show comment
Hide comment
@nviennot

nviennot Oct 14, 2015

@mperham Glad to see that you are dropping Celluloid :) -- good move!

@mperham Glad to see that you are dropping Celluloid :) -- good move!

@mperham

This comment has been minimized.

Show comment
Hide comment
@mperham

mperham Oct 14, 2015

Owner

@drewblas I'm not sure what you are referring to. If you can be more explicit about a particular failure case, I can explain what should happen.

Owner

mperham commented Oct 14, 2015

@drewblas I'm not sure what you are referring to. If you can be more explicit about a particular failure case, I can explain what should happen.

@Magicdream

This comment has been minimized.

Show comment
Hide comment
@Magicdream

Magicdream Oct 15, 2015

@mperham

8x less garbage generated!

Is it because of removed Celluloid?

@mperham

8x less garbage generated!

Is it because of removed Celluloid?

@mperham

This comment has been minimized.

Show comment
Hide comment
@mperham

mperham Oct 22, 2015

Owner

I did a quick Resque 1.25.2 benchmark for comparison. Time to process 10,000 noop jobs:

1 process, 96 sec
25 processes, 31 sec

Extrapolating to 100,000 jobs would be ~300 seconds. Sidekiq does the same in 20 seconds so Sidekiq's job overhead appears to be 15x less.

Owner

mperham commented Oct 22, 2015

I did a quick Resque 1.25.2 benchmark for comparison. Time to process 10,000 noop jobs:

1 process, 96 sec
25 processes, 31 sec

Extrapolating to 100,000 jobs would be ~300 seconds. Sidekiq does the same in 20 seconds so Sidekiq's job overhead appears to be 15x less.

mperham added some commits Oct 28, 2015

@mperham mperham merged commit 6ad6a3a into master Oct 28, 2015

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details
@halorgium

This comment has been minimized.

Show comment
Hide comment
@halorgium

halorgium Oct 28, 2015

Collaborator

👍

Collaborator

halorgium commented Oct 28, 2015

👍

@narapon

This comment has been minimized.

Show comment
Hide comment
@narapon

narapon Oct 28, 2015

Excited to try this, good work!
On 29 Oct 2015 03:05, "Tim Carey-Smith" notifications@github.com wrote:

[image: 👍]


Reply to this email directly or view it on GitHub
#2593 (comment).

narapon commented Oct 28, 2015

Excited to try this, good work!
On 29 Oct 2015 03:05, "Tim Carey-Smith" notifications@github.com wrote:

[image: 👍]


Reply to this email directly or view it on GitHub
#2593 (comment).

@hecbuma

This comment has been minimized.

Show comment
Hide comment
@hecbuma

hecbuma Oct 29, 2015

pretty nice 👍

hecbuma commented Oct 29, 2015

pretty nice 👍

@esbanarango

This comment has been minimized.

Show comment
Hide comment
@esbanarango

esbanarango Oct 29, 2015

@mperham Can we start using v4.0 now?

Thank you for all this work!! 👍

@mperham Can we start using v4.0 now?

Thank you for all this work!! 👍

@mperham mperham deleted the internal_rewrite branch Oct 29, 2015

@mperham

This comment has been minimized.

Show comment
Hide comment
@take-five

This comment has been minimized.

Show comment
Hide comment
@take-five

take-five Nov 17, 2015

@mperham I guess Processor:: namespace missed here and below

@mperham I guess Processor:: namespace missed here and below

@fedenusy fedenusy referenced this pull request May 12, 2016

Closed

Battle Testing #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment