New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with Sidekiq 5. #3824

Closed
bvirlet opened this Issue Apr 13, 2018 · 7 comments

Comments

4 participants
@bvirlet
Copy link

bvirlet commented Apr 13, 2018

I'm running into a memory leak (Sidekiq on Heroku) where the sidekiq process memory grows to more than 1 GB.

I can also reproduce the issue locally on my machine.

I'm using ActiveRecord so I have tried cleaning the cache at the end of every job, but that didn't help.

I have removed everything from my worker (an empty perform function…), and removed newrelic_rpm, honeybadger gems in case they were leaking.

I have used memory_profiler (as setup below in the Sidekiq initializer) to try to find out what the leak could be but I don't see anything suspicious and certainly not hundreds of megabytes of leaked data:

retained memory by class
-----------------------------------
  17844968  Thread
     95192  Hash
     23672  String
     15192  Thread::Backtrace
      8960  Array
      8432  File
      8400  Class
      7120  Proc
      6800  Redis::Connection::TCPSocket
      5280  Thread::Mutex
      2600  Redis::Client
      2400  Redis
      2000  Redis::Namespace
      1680  Sidekiq::Processor
      1440  Time
      1152  Method
      1008  IO::EAGAINWaitReadable
      1000  Redis::Client::Connector
      1000  Redis::Connection::Ruby
       600  Sidekiq::BasicFetch
       600  Sidekiq::JobLogger
       600  Sidekiq::JobRetry
       560  FaxWorker
       560  Sidekiq::BasicFetch::UnitOfWork
       528  Module
       510  Regexp
       120  Thread::ConditionVariable
        96  ConnectionPool::TimedStack
        96  Sidekiq::Launcher
        80  Concurrent::AtomicFixnum
        80  Sidekiq::Manager
        80  Sidekiq::Scheduled::Poller
        40  Concurrent::Map
        40  Set
        40  Sidekiq::ProcessSet
        40  Sidekiq::Scheduled::Enq

This points to me to, maybe, a leak in a C extension but at this point I would be interested in getting some advice on how to proceed forward. I'm not sure there are any C extensions involved in running a sidekiq process? Or maybe in the redis client?

Here is also my Gemfile.lock:

GEM
  remote: https://rubygems.org/
  specs:
    ZenTest (4.11.1)
    actionmailer (5.1.6)
      actionpack (= 5.1.6)
      actionview (= 5.1.6)
      activejob (= 5.1.6)
      mail (~> 2.5, >= 2.5.4)
      rails-dom-testing (~> 2.0)
    actionpack (5.1.6)
      actionview (= 5.1.6)
      activesupport (= 5.1.6)
      rack (~> 2.0)
      rack-test (>= 0.6.3)
      rails-dom-testing (~> 2.0)
      rails-html-sanitizer (~> 1.0, >= 1.0.2)
    actionview (5.1.6)
      activesupport (= 5.1.6)
      builder (~> 3.1)
      erubi (~> 1.4)
      rails-dom-testing (~> 2.0)
      rails-html-sanitizer (~> 1.0, >= 1.0.3)
    activejob (5.1.6)
      activesupport (= 5.1.6)
      globalid (>= 0.3.6)
    activemodel (5.1.6)
      activesupport (= 5.1.6)
    activerecord (5.1.6)
      activemodel (= 5.1.6)
      activesupport (= 5.1.6)
      arel (~> 8.0)
    activesupport (5.1.6)
      concurrent-ruby (~> 1.0, >= 1.0.2)
      i18n (>= 0.7, < 2)
      minitest (~> 5.1)
      tzinfo (~> 1.1)
    addressable (2.5.2)
      public_suffix (>= 2.0.2, < 4.0)
    andand (1.3.3)
    arel (8.0.0)
    ast (2.4.0)
    autotest (4.4.6)
      ZenTest (>= 4.4.1)
    aws-sdk (2.11.28)
      aws-sdk-resources (= 2.11.28)
    aws-sdk-core (2.11.28)
      aws-sigv4 (~> 1.0)
      jmespath (~> 1.0)
    aws-sdk-resources (2.11.28)
      aws-sdk-core (= 2.11.28)
    aws-sigv4 (1.0.2)
    backports (3.11.1)
    bcrypt (3.1.11)
    bcrypt-ruby (3.1.5)
      bcrypt (>= 3.1.3)
    benchmark-ips (2.7.2)
    builder (3.2.3)
    commander (4.4.4)
      highline (~> 1.7.2)
    concurrent-ruby (1.0.5)
    connection_pool (2.2.1)
    crack (0.4.3)
      safe_yaml (~> 1.0.0)
    crass (1.0.3)
    database_cleaner (1.6.2)
    derailed_benchmarks (1.3.4)
      benchmark-ips (~> 2)
      get_process_mem (~> 0)
      heapy (~> 0)
      memory_profiler (~> 0)
      rack (>= 1)
      rake (> 10, < 13)
      thor (~> 0.19)
    diff-lcs (1.3)
    domain_name (0.5.20170404)
      unf (>= 0.0.5, < 1.0.0)
    erubi (1.7.1)
    factory_bot (4.8.2)
      activesupport (>= 3.0.0)
    factory_bot_rails (4.8.2)
      factory_bot (~> 4.8.2)
      railties (>= 3.0.0)
    fakeredis (0.7.0)
      redis (>= 3.2, < 5.0)
    faraday (0.14.0)
      multipart-post (>= 1.2, < 3)
    fuubar (1.3.3)
      rspec (>= 2.14.0, < 3.1.0)
      ruby-progressbar (~> 1.4)
    get_process_mem (0.2.1)
    global_phone (1.0.1)
    globalid (0.4.1)
      activesupport (>= 4.2.0)
    haml (5.0.4)
      temple (>= 0.8.0)
      tilt
    hashdiff (0.3.7)
    heapy (0.1.3)
    highline (1.7.10)
    houston (2.2.4)
      commander (~> 4.4)
      json
    http-cookie (1.0.3)
      domain_name (~> 0.5)
    httparty (0.16.2)
      multi_xml (>= 0.5.2)
    i18n (1.0.0)
      concurrent-ruby (~> 1.0)
    jmespath (1.3.1)
    json (2.1.0)
    loofah (2.2.2)
      crass (~> 1.0.2)
      nokogiri (>= 1.5.9)
    mail (2.7.0)
      mini_mime (>= 0.1.1)
    memory_profiler (0.9.10)
    method_source (0.9.0)
    mime-types (3.1)
      mime-types-data (~> 3.2015)
    mime-types-data (3.2016.0521)
    mini_mime (1.0.0)
    mini_portile2 (2.3.0)
    minitest (5.11.3)
    multi_json (1.13.1)
    multi_xml (0.6.0)
    multipart-post (2.0.0)
    mustermann (1.0.2)
    netrc (0.11.0)
    nokogiri (1.8.2)
      mini_portile2 (~> 2.3.0)
    parallel (1.12.1)
    parser (2.5.0.5)
      ast (~> 2.4.0)
    pdf-core (0.7.0)
    pg (1.0.0)
    powerpack (0.1.1)
    prawn (2.2.2)
      pdf-core (~> 0.7.0)
      ttfunk (~> 1.5)
    prawn-table (0.2.2)
      prawn (>= 1.3.0, < 3.0.0)
    public_suffix (3.0.2)
    puma (3.11.3)
    pushmeup (0.3.0)
      httparty
      json
    rack (2.0.4)
    rack-protection (2.0.1)
      rack
    rack-test (1.0.0)
      rack (>= 1.0, < 3)
    rack-timeout (0.4.2)
    rails-dom-testing (2.0.3)
      activesupport (>= 4.2.0)
      nokogiri (>= 1.6)
    rails-html-sanitizer (1.0.4)
      loofah (~> 2.2, >= 2.2.2)
    railties (5.1.6)
      actionpack (= 5.1.6)
      activesupport (= 5.1.6)
      method_source
      rake (>= 0.8.7)
      thor (>= 0.18.1, < 2.0)
    rainbow (3.0.0)
    rake (10.5.0)
    redis (3.3.5)
    redis-namespace (1.6.0)
      redis (>= 3.0.4)
    rest-client (2.0.2)
      http-cookie (>= 1.0.2, < 2.0)
      mime-types (>= 1.16, < 4.0)
      netrc (~> 0.8)
    rspec (2.99.0)
      rspec-core (~> 2.99.0)
      rspec-expectations (~> 2.99.0)
      rspec-mocks (~> 2.99.0)
    rspec-core (2.99.2)
    rspec-expectations (2.99.2)
      diff-lcs (>= 1.1.3, < 2.0)
    rspec-its (1.0.1)
      rspec-core (>= 2.99.0.beta1)
      rspec-expectations (>= 2.99.0.beta1)
    rspec-mocks (2.99.4)
    rubocop (0.54.0)
      parallel (~> 1.10)
      parser (>= 2.5)
      powerpack (~> 0.1)
      rainbow (>= 2.2.2, < 4.0)
      ruby-progressbar (~> 1.7)
      unicode-display_width (~> 1.0, >= 1.0.1)
    rubocop-rspec (1.24.0)
      rubocop (>= 0.53.0)
    ruby-progressbar (1.9.0)
    safe_yaml (1.0.4)
    sidekiq (5.1.3)
      concurrent-ruby (~> 1.0)
      connection_pool (~> 2.2, >= 2.2.0)
      rack-protection (>= 1.5.0)
      redis (>= 3.3.5, < 5)
    sinatra (2.0.1)
      mustermann (~> 1.0)
      rack (~> 2.0)
      rack-protection (= 2.0.1)
      tilt (~> 2.0)
    sinatra-activerecord (2.0.13)
      activerecord (>= 3.2)
      sinatra (>= 1.0)
    sinatra-contrib (2.0.1)
      backports (>= 2.0)
      multi_json
      mustermann (~> 1.0)
      rack-protection (= 2.0.1)
      sinatra (= 2.0.1)
      tilt (>= 1.3, < 3)
    temple (0.8.0)
    thor (0.20.0)
    thread_safe (0.3.6)
    tilt (2.0.8)
    timecop (0.9.1)
    ttfunk (1.5.1)
    tzinfo (1.2.5)
      thread_safe (~> 0.1)
    unf (0.1.4)
      unf_ext
    unf_ext (0.0.7.5)
    unicode-display_width (1.3.0)
    webmock (3.3.0)
      addressable (>= 2.3.6)
      crack (>= 0.3.2)
      hashdiff
    yajl-ruby (1.3.1)

Below is the requested information about Sidekiq.

Thanks for any advice!


Ruby version: 2.4.4
Sidekiq / Pro / Enterprise version(s): Sidekiq 5.1.3

Please include your initializer and any error message with the full backtrace.

redis_uri = ENV['REDIS_URL'] || ENV['OPENREDIS_URL']
Sidekiq.configure_server do |config|
  config.redis = { url: redis_uri, namespace: 'sidekiq' }

  # Configure Sidekiq as suggested in the Heroku configuration
  # https://devcenter.heroku.com/articles/forked-pg-connections#sidekiq
  database_url = ENV['DATABASE_URL']
  if database_url
    ENV['DATABASE_URL'] = "#{database_url}?pool=25"
    ActiveRecord::Base.establish_connection(ENV['DATABASE_URL'])
  end

  config.on(:startup) do
    MemoryProfiler.start #(trace: [Thread])
  end

  config.on(:quiet) do
    puts "Got QUIET..."
      report = MemoryProfiler.stop
      report.pretty_print(to_file: "log/report.txt")
  end
end

Sidekiq.configure_client do |config|
  config.redis = { url: redis_uri, namespace: 'sidekiq', size: 5 }
end

Are you using an old version? No

Have you checked the changelogs to see if your issue has been fixed in a later version? Yes, but I'm running the latest version.

@mperham

This comment has been minimized.

Copy link
Owner

mperham commented Apr 13, 2018

I have no idea what the leak might be. Sidekiq is pure Ruby but you have many native gems that could have issues. There are many other things in MRI that can cause excessive memory usage:

https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html

@bvirlet

This comment has been minimized.

Copy link

bvirlet commented Apr 16, 2018

Thanks for your reply.

The problem appeared when switching to Sidekiq 5/Ruby 2.4 from Sidekiq 3/Ruby 2.1.4.

I'm not sure other gems would be responsible for this issue:

  • The process where memory increases is the Sidekiq process
  • The perform method is empty
    So the only part where the memory could increase for me is within Sidekiq.

I will investigate the MRI part but I'm using Ruby 2.4.4 which is fairly "standard" and I would have expected to see this problem in other places.

@bvirlet

This comment has been minimized.

Copy link

bvirlet commented Apr 17, 2018

For now it seems that reducing the Sidekiq concurrency from 25 to 5 seem to have helped contain the memory growth.

However, I don't understand why because while we had 25 threads for Sidekiq, we never have more than 2-3 jobs running at the same time and as far as I understand, a Sidekiq thread will not require much memory by itself and should share most of the memory with the other threads. So it seems that just having the ability to dispatch to more threads leads to an increase in memory.

@bvirlet

This comment has been minimized.

Copy link

bvirlet commented Apr 20, 2018

@mperham problem solved. Thanks for pointing us in the right direction.

The initial issue wasn't really a leak but just a memory bloat (meaning that too much memory was used) as you had suggested.

So that this might help others:

The issue appeared when upgrading Heroku from cedar-14 to heroku-16. We had to bump Ruby from 2.1 to 2.4. We also moved from Sidekiq 3 to Sidekiq 5. Lots of moving pieces so hard to pinpoint the issue.

First we checked that there was no leak using the memory_profiler gem.

We reduced the concurrency of Sidekiq from 25 to 5 and this clearly helped reduce the memory usage considerably even though we are not running at full concurrency at all (most of the time we only have a couple jobs running). But we could still see the memory slowly increase over time. Just "having" the threads by themselves seemed to have created a memory bloat even though they were not active at the same time; this is a bit counterintuitive because you would think Ruby requests more memory when two threads request are active at the same time and increase the contention.

Then we read the article you suggested above https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html and it made entirely sense with the behavior we were seeing: not a memory leak but slow memory increase because of memory fragmentation.

We decided to switch the memory allocator to jemalloc as described at the end of the article. Basically this reduced memory usage by 4 (so our sidekiq workers run well below 512 MB of memory now) and also stopped the slow increase in memory.

It's worth investing some time in understanding this issue, but in the end the solution is very simple and doesn't require anything complex.

@bvirlet bvirlet closed this Apr 20, 2018

@abinoda

This comment has been minimized.

Copy link

abinoda commented Apr 20, 2018

@bvirlet What tools did you use to monitor / debug memory on Heroku? Just the built-in Heroku web GUI? New Relic?

@bvirlet

This comment has been minimized.

Copy link

bvirlet commented Apr 20, 2018

@abinoda the built-in Heroku web gui was enough to see the memory trend.

@schneems

This comment has been minimized.

Copy link

schneems commented Apr 20, 2018

For anyone coming across this thread. I would recommend upgrading Ruby versions separately from migrating stacks, so that way it's easier to isolate if memory growth (or any other issues) happen due to the version changes or the stack change.

@bvirlet thanks for the detailed write up. Glad you got it working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment