Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak after update from 3.0.9 to 3.1.2 #401

Closed
nickooolas opened this issue Jul 9, 2017 · 21 comments
Closed

Memory Leak after update from 3.0.9 to 3.1.2 #401

nickooolas opened this issue Jul 9, 2017 · 21 comments

Comments

@nickooolas
Copy link

nickooolas commented Jul 9, 2017

Hey @phstc, thanks again for your work in building and maintaining Shoryuken, have been getting great use out of it!

After an unintended bundle update the other day, which bumped shoryuken from 3.0.9 to 3.1.2, I noticed my workers failing on Heroku, and a quick look at the stats indicated a significant memory leak (10mb / minute).

image

After ruling out any other gem updates being the cause of the increase, I rounded it down to simply the update to 3.1.2. I did this by just reverting to an existing working branch with only the shoryuken version changed in the Gemfile.lock from 3.0.9 to 3.1.2, which cause the worker to leak memory quite quickly.

I don't know too much around memory debugging, so can't provide any more info there, but would be happy to run any recommended debugging tools to get to the bottom of why this has started occurring in 3.1.2.

Cheers,

Nick

@phstc
Copy link
Collaborator

phstc commented Jul 9, 2017

Hi @nickooolas

Thanks for reporting this, I will look into it. Could you share your shoryuken.yml and how you start Shoryuken?

@phstc
Copy link
Collaborator

phstc commented Jul 9, 2017

@nickooolas also, if you could tell more about your workers, if they are busy all the time, if they do any heavy processing. I have an app running on Heroku (low SQS activity) and I didn't see any spike so far.

@nickooolas
Copy link
Author

nickooolas commented Jul 10, 2017

@phstc thanks for getting back so quickly, here's the details:

config/shoryuken.yml

concurrency: 5
delay: 5

My workers are actually quite often not busy, just sitting polling queues (generally empty receives), sitting idle for probably 80% of the time, but I still have the delay quite low (5) checking quite regularly, due to wanting to be responsive when there is work to be done.

The nature of the worker is quite straightforward - a bit of XML parsing, a couple of database writes, and then passing off to a service class for some PDF generation and finally some Slack notifications etc..

What else could I run to debug the cause of this with the shift to 3.1.2?

@phstc
Copy link
Collaborator

phstc commented Jul 10, 2017

hm @nickooolas one of the biggest changes in between 3.1.x and 3.0.x is the use of Concurrent.global_io_executor instead of a FixedThreadPool. If that error is easy to reproduce, it would be great if you could try 3.1.2 again with this monkey patch:

module Shoryuken
  class Launcher
    private

    def executor
      Concurrent::FixedThreadPool.new(5, max_queue: 5) # given that 5 is your concurrency
    end
  end
end

@phstc
Copy link
Collaborator

phstc commented Jul 11, 2017

@nickooolas actually this Concurrent::FixedThreadPool.new(10, max_queue: 10) for concurrency: 5 since the manager also uses the same executor. It's just to check if the issue is with the Concurrent.global_io_executor.

@nickooolas
Copy link
Author

Hey @phstc - I ran the monkey patch: Concurrent::FixedThreadPool.new(10, max_queue: 10)
on Shoryuken gem whilst it was updated to 3.1.2 - to me, it still appears that the memory leak is occurring (also confirmed even locally on Activity monitor for OSX) - after running for 30 minutes, memory usage on the worker went from 126MB -> 324MB.

Let me know if there are any other things you'd like me to try out. Otherwise, if it is alright for you, then what else possibly could I do to isolate an issue with my environment?

Cheers!

@ydkn
Copy link

ydkn commented Jul 12, 2017

Hi,

I'm having the same problem after the update to 3.1.2.
I'll try to investigate further but I'm currently not sure when I'll have time for this.

Cheers,
Florian

@phstc
Copy link
Collaborator

phstc commented Jul 12, 2017

also confirmed even locally on Activity monitor for OSX

@nickooolas I'm trying to reproduce that locally, let's see what I can find.

bundle exec ./bin/shoryuken -r ./tmp/test.rb -q test -L ./tmp/shoryuken.log -P ./tmp/shoryuken.pid -d
top -pid $(cat tmp/shoryuken.pid)

@phstc
Copy link
Collaborator

phstc commented Jul 13, 2017

@nickooolas I couldn't reproduce it locally. I'm wondering if it may be conflicting with another gem or even workers. If I create a repo with my test project, would you be able to test it? I just want to make sure, if for the same project (same gems) you are still being able to reproduce it.

@nickooolas @ydkn what is the version of the aws-sdk are you running?

bundle list
aws-sdk-core (2.10.9)

@nickooolas
Copy link
Author

Hi @phstc - no worries happy to help out there, let me know once you've setup the test repo.

In terms of aws-sdk - got the following aws-sdk-core (2.10.9)

phstc pushed a commit to phstc/test-memory-leak-401 that referenced this issue Jul 14, 2017
phstc pushed a commit to phstc/test-memory-leak-401 that referenced this issue Jul 14, 2017
phstc pushed a commit to phstc/test-memory-leak-401 that referenced this issue Jul 14, 2017
phstc pushed a commit to phstc/test-memory-leak-401 that referenced this issue Jul 14, 2017
@phstc
Copy link
Collaborator

phstc commented Jul 14, 2017

Hi @nickooolas

Awesome! Could you try this https://github.com/phstc/test-memory-leak-401?

@nickooolas
Copy link
Author

Hi @phstc

I tested that out today - it looks like the memory leak is still occurring when using that project as well:

In 6 minutes:

START
start

FINISH
finish

Let me know what else you'd like to do to debug. cheers

@phstc
Copy link
Collaborator

phstc commented Jul 17, 2017

@nickooolas hm I couldn't reproduce it locally OS X. I kept it running for 1h, the memory went from 22M to 28M.

What's the OS/MEM/CPU are you testing on?

@phstc
Copy link
Collaborator

phstc commented Jul 17, 2017

What's the OS/MEM/CPU are you testing on?

@ydkn would you mind share yours too?

@ethangunderson
Copy link

We had this happen on a project as well.

64bit Amazon Linux 2017.03, 8GB mem, 2CPU

I've already rolled back to our previous 2.x version, but I'm happy to provide any other information you might need.

@ydkn
Copy link

ydkn commented Jul 18, 2017

Hi @phstc

First of all I want to make clear that I'm currently not sure if the results below are really pointing into the right direction - please keep that in mind.

My test setup was my existing app with polling 3 empty queues every 30 seconds. The long interval is to avoid dumping the memory during polling a queue.

Using the following as the base for the debugging process:
http://blog.skylight.io/hunting-for-leaks-in-ruby/

The files needed for running the same debugging are located here:
https://gist.github.com/ydkn/11dc3458377aa93f7cb900dfea5d25d1

So far the best result I was able to come up with is this:

["/tmp/rh_1500409671.json", "/tmp/rh_1500410764.json", "/tmp/rh_1500411862.json"]
Leaked 149 DATA objects of size 0/23840 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:28
Leaked 144 OBJECT objects of size 0/13552 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/object.rb:56
Leaked 143 STRUCT objects of size 0/5720 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:29
Leaked 115 ARRAY objects of size 0/4600 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/synchronization/mri_lockable_object.rb:29
Leaked 57 HASH objects of size 0/12384 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:212
Leaked 54 HASH objects of size 0/2160 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/collection/copy_on_write_observer_set.rb:105
Leaked 51 IMEMO objects of size 0/2040 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:456
Leaked 48 ARRAY objects of size 0/1920 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:460
Leaked 46 ARRAY objects of size 0/1840 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/concern/obligation.rb:135
Leaked 42 DATA objects of size 0/3360 at: /app/shared/bundle/ruby/2.4.0/gems/shoryuken-3.1.4/lib/shoryuken/manager.rb:32
Leaked 35 DATA objects of size 0/2800 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:456
Leaked 30 DATA objects of size 0/2400 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:458
Leaked 24 IMEMO objects of size 0/960 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:455
Leaked 21 IMEMO objects of size 0/840 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:312
Leaked 17 DATA objects of size 0/1360 at: /app/shared/bundle/ruby/2.4.0/gems/shoryuken-3.1.4/lib/shoryuken/manager.rb:30
Leaked 14 DATA objects of size 0/1120 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:455
Leaked 13 IMEMO objects of size 0/520 at: /app/shared/bundle/ruby/2.4.0/gems/shoryuken-3.1.4/lib/shoryuken/manager.rb:30
Leaked 11 DATA objects of size 0/880 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:312
Leaked 7 IMEMO objects of size 0/280 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:458
Leaked 3 HASH objects of size 0/672 at: /app/shared/bundle/ruby/2.4.0/gems/aws-sdk-core-2.10.12/lib/aws-sdk-core/param_converter.rb:135
Leaked 3 HASH objects of size 0/576 at: /app/shared/bundle/ruby/2.4.0/gems/aws-sdk-core-2.10.12/lib/seahorse/client/net_http/connection_pool.rb:212
Leaked 1 IMEMO objects of size 0/40 at: /usr/local/rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/openssl/buffering.rb:300
Leaked 1 IMEMO objects of size 0/40 at: /usr/local/rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/net/protocol.rb:100
Leaked 1 HASH objects of size 0/192 at: /usr/local/rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/timeout.rb:76
Leaked 1 HASH objects of size 0/192 at: /app/shared/bundle/ruby/2.4.0/gems/aws-sdk-core-2.10.12/lib/aws-sdk-core/xml/parser/frame.rb:26
Leaked 1 IMEMO objects of size 0/40 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:326
Leaked 1 ARRAY objects of size 0/40 at: /app/shared/bundle/ruby/2.4.0/gems/shoryuken-3.1.4/lib/shoryuken/queue.rb:43
Leaked 1 DATA objects of size 0/96 at: /app/shared/bundle/ruby/2.4.0/gems/aws-sdk-core-2.10.12/lib/seahorse/client/net_http/connection_pool.rb:329
Leaked 1 STRING objects of size 3/40 at: /usr/local/rvm/rubies/ruby-2.4.1/lib/ruby/2.4.0/net/http/response.rb:43
Leaked 1 ARRAY objects of size 0/40 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/executor/ruby_thread_pool_executor.rb:190
Leaked 1 IMEMO objects of size 0/40 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:530
Leaked 1 DATA objects of size 0/80 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:501
Leaked 1 IMEMO objects of size 0/40 at: /app/shared/bundle/ruby/2.4.0/gems/concurrent-ruby-1.0.5/lib/concurrent/promise.rb:501
Leaked 1 ARRAY objects of size 0/40 at: /app/shared/bundle/ruby/2.4.0/gems/shoryuken-3.1.4/lib/shoryuken/polling/weighted_round_robin.rb:41
Leaked 1 DATA objects of size 0/96 at: /app/shared/bundle/ruby/2.4.0/gems/activesupport-5.1.2/lib/active_support/core_ext/time/calculations.rb:257


Total Size: 3/84840

The lines indicating a leak in application.rb can be ignored - those are generated from the debugging code itself.

I'm not familiar with the inner workings of concurrent-ruby so I can't really make anything of it right now. Maybe someone else can interpret it.

Cheers,
Florian

@bahaa
Copy link

bahaa commented Jul 22, 2017

I had the same issue too. I resolved it by using v3.0.9. The issue happens on the development macOS machine and on EC2. Shoryuken is started using -R option. The issue happens even with when I have no jobs defined at all.

phstc pushed a commit that referenced this issue Jul 23, 2017
Seems like the dispatch loop through `then {}.then {}` hold the
referenced threads
@phstc phstc mentioned this issue Jul 23, 2017
phstc pushed a commit that referenced this issue Jul 23, 2017
Seems like the dispatch loop through `then {}.then {}` hold the
referenced threads
phstc added a commit that referenced this issue Jul 23, 2017
@phstc
Copy link
Collaborator

phstc commented Jul 23, 2017

Thanks a lot for all of you, the guidance in the comments helped a ton. I managed to reproduce it by creating an Amazon Linux instance 🎉 and the good news, it's fixed in 3.1.5, could you guys give it a try?

Basically, the dispatch loop was implemented through Promise.execute { dispatch }.then { dispatch_loop } and it seems it was preventing threads to be garbage collected. I didn't troubleshoot that enough yet, but in order to keep it simple, I reverted to the previous dispatch_loop and all seem to be good now.

@bahaa
Copy link

bahaa commented Jul 24, 2017

@phstc I'm running v3.1.5 since you pushed it and since the memory growth is virtually non-existence. I've deployed it to our staging environment and it will be used heavily the next few days. I'll let you know if we faced any issues. Thanks a lot for the great work.

@phstc
Copy link
Collaborator

phstc commented Jul 24, 2017

@bahaa if you are using --delay, would you mind to update it again 3.1.6, otherwise it may hang depending on the delay.

@phstc I'm running v3.1.5 since you pushed

😍

@bahaa
Copy link

bahaa commented Jul 25, 2017

@phstc sure, I'll update you if we face any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants