Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dj: memory leak #978

Open
yhaenggi opened this issue Aug 20, 2015 · 81 comments
Open

dj: memory leak #978

yhaenggi opened this issue Aug 20, 2015 · 81 comments

Comments

@yhaenggi
Copy link
Contributor

delay_job is using more and more memory over time. it starts at ~150M and rises forever, slowly tough. As i have 16 workers, im affected much faster.
Any way to debug this? Im currently just restarting the workers every X hours.
Im using Web,Mail,MailDigest,XMPP,Trigger-Agents. With Web+Trigger being the most used.

Can foreman handle memory usage of dj? ie, restart a dj if mem is higher than X and no jobs currently running? restarting a worker after X jobs might also be an option.

@cantino
Copy link
Member

cantino commented Aug 21, 2015

Hey @K1773R, that's frustrating! I haven't noticed that. It's possible that upstart is killing and restarting them for me when ram runs low, but I haven't observed that. Are you running any unusual Agents? I suspect it's some Agent that I don't use often.

@yhaenggi
Copy link
Contributor Author

Are the ones i listed unusual agents?

@cantino
Copy link
Member

cantino commented Aug 21, 2015

XMPP seems the most likely. If you disable it, does memory stabilize?

@yhaenggi
Copy link
Contributor Author

it leaks even when there are no xmpp agents...

@cantino
Copy link
Member

cantino commented Aug 22, 2015

That's very odd. @dsander, have you seen memory issues with DJ?

@dsander
Copy link
Collaborator

dsander commented Aug 22, 2015

@cantino I did not really monitor the memory usage on my server, but I doubt the issue is DJ iteself.

@K1773R debugging memory leaks in ruby is a bit annoying, if you notice the leak within a few minutes the quickest way would probably be to disable all but one agent type, restart your worker and check if the leak still exists.

If that is not an option and you are using 2.1+ I will try to add some sort of memory profiling option tomorrow.

@yhaenggi
Copy link
Contributor Author

im using ruby 2.2.2p95

@virtadpt
Copy link
Collaborator

I may have just run into this (where 'just' == sometime early last evening). Two virtual CPUs, four job runners, and both jobs and the web user interface died with a signal 137 after staying online for something like four days running. I had to kill the delayed_job processes and restart from scratch to get Huginn running again. As an experiment I've cut my delayed job fleet to three to see how long it takes to run out of RAM.

Ruby v2.1.5p273.

@cantino
Copy link
Member

cantino commented Aug 23, 2015

Did this issue start recently?

@virtadpt
Copy link
Collaborator

The hang-up happened some time last night (maybe 1900 hours PDT). I last updated my running install on 19 August (commit c3e9545), after which memory utilization started creeping up until something broke. Going through log/development.log, there isn't anything in there because the error message was emitted by the shell Huginn was started from. It didn't say which process received the out-of-memory signal, though. I can fire up a few more DJ's to artifically use up RAM and see what part falls over if it would help.

@cantino
Copy link
Member

cantino commented Aug 23, 2015

What commit were you running before this, that seemed more stable? And are you running in RAILS_ENV=production or RAILS_ENV=development?

@virtadpt
Copy link
Collaborator

I was running commit 9e8f9bf in development mode. Still in development mode.

@virtadpt
Copy link
Collaborator

I have a stack trace from two days ago, from approximately the time of the crash. I don't know if it'll help debug this issue (I'm working to reproduce it independently, which is why I jumped into this ticket's discussion thread) but I can throw it on pastebin if it'll help.

@cantino
Copy link
Member

cantino commented Aug 23, 2015

Sure, let's see it.

@cantino
Copy link
Member

cantino commented Aug 23, 2015

Well, we shouldn't be running Huginn in development mode for production use, but there was really very little code change between your two commits.

The only things we did that I could imagine affecting RAM were to remove spring and to add letter_opener_web.

You should ensure spring isn't being used with spring stop and you could try commenting out gem 'letter_opener_web' in the Gemfile, config.action_mailer.delivery_method = :letter_opener_web in development.rb, and mount LetterOpenerWeb::Engine, at: "/letter_opener" in routes.rb, and see if that helps.

@virtadpt
Copy link
Collaborator

Here it is: http://pastebin.com/TcBR22tq

Removing Spring has done a world of good - things are much more stable now! I've been keeping an eye on it, and it appears to be gone for good.

@cantino
Copy link
Member

cantino commented Aug 23, 2015

That looks like the box running out of RAM or disk. Was this correlated with ballooning RAM? If so, do you know how high it got?

@virtadpt
Copy link
Collaborator

That was my analysis also - the amount of free RAM hit zero and the OOM killer took out a process. I don't know which one because this is all the kernel message buffer has in it:

[3813894.197850] Killed process 11626 (ruby) total-vm:1596164kB, anon-rss:610824kB, file-rss:0kB

So, some aspect of Huginn got killed but it's not clear which one it was. I'm trying to recreate this problem on my instance to help debug it, so I just captured the ruby processes running right now, and if one mysteriously dies I can compare the lists and see which one was the victim.

@dsander
Copy link
Collaborator

dsander commented Aug 24, 2015

@K1773R I updated my memory profile branch. If you import this scenario the MemoryProfileAgent will count the allocated objects every 5 minutes and send the data to a PeakDetectionAgent, when a peak is detected a memory dump will be written to the configured directory. It works with multiple background workers but the results will probably look a bit strange because the job will run on a random worker process. To get an general overview (see how many objects are leaked over time) it would probably the best to run just one worker for a few hours.

To analyze a memory dump a small script is added to the bin directory:

Get an overview how many objects have been allocated (and still are) in which garbage collection generation run: ruby bin/analyze.rb <pathto.json>. To show which file and line allocated the object add the generation as second parameter: ruby bin/analyze.rb <pathto.json> <generation number>

I ran the branch over night on my production server and also see a small leak of objects but was not able to find the source, if you are running more agents the results might be more conclusive.

@yhaenggi
Copy link
Contributor Author

@dsander switching to a single worker is currently not really an option.
how can i migrate from cantino/huginn master to your branch and afterwards back w/o breaking things?

@dsander
Copy link
Collaborator

dsander commented Aug 24, 2015

My branch is based on the current master branch and just adds the memory profile agent. As long as you delete that agent before you switch back there should not be any problems.

@virtadpt
Copy link
Collaborator

Data!

06:28:11 dj1.1      | script/delayed_job:5:in `<main>'
06:28:52 schedule.1 | Queuing event propagation
06:28:52 schedule.1 | Queuing schedule for every_1m
06:29:51 schedule.1 | Queuing schedule for every_2m
06:29:54 schedule.1 | Queuing event propagation
06:29:54 schedule.1 | Queuing schedule for every_1m
06:31:17 schedule.1 | Queuing event propagation
06:31:44 twitter.1  | Man in stable condition after East Oakland shooting http://t.co/owRD0aeFBa
06:31:49 twitter.1  |  -> Lifeline: Monitor Twitter for keywords
06:34:13 dj1.1      | exited with code 137
06:34:13 dj1.1      | Killed
06:34:13 system     | sending SIGTERM to all processes
06:34:13 web.1      | terminated by SIGTERM
06:34:13 schedule.1 | terminated by SIGTERM
06:34:13 twitter.1  | terminated by SIGTERM
06:34:13 dj0.1      | terminated by SIGTERM
06:34:13 dj2.1      | terminated by SIGTERM

dj.1 was killed with a signal 137 (out of memory). The 'system` process detected this and sent SIGTERMs to the web server, scheduler, Twitter subsystem, and dj0.1 and dj2.1. However, not everything is responding to SIGTERM correctly. From my currently running process list:

14106 pts/5    Sl    13:20 /home/drwho/.rvm/rubies/ruby-2.1.5/bin/ruby bin/rails server -b127.0.0.1
14114 pts/5    Sl   132:16 ruby script/delayed_job -i 0 run
14120 pts/5    Rl   136:16 ruby script/delayed_job -i 2 run

@cantino
Copy link
Member

cantino commented Aug 25, 2015

@virtadpt can you try @dsander's MemoryProfilingAgent?

@virtadpt
Copy link
Collaborator

virtadpt commented Sep 4, 2015

How up to date is @dsander's fork of the codebase?

@virtadpt
Copy link
Collaborator

virtadpt commented Sep 4, 2015

I'm trying to recreate the OOM problem by force by manually starting DJs. I'm capturing stdout to a logfile. When I have something I'll comment on this issue.

@virtadpt
Copy link
Collaborator

virtadpt commented Sep 4, 2015

When jobs aren't running, memory utilization doesn't change (e.g., 485 MB free and holding).

I'm up to 9 delayed job runners now with my usual agent load of approximately 750 agents.

@virtadpt
Copy link
Collaborator

virtadpt commented Sep 4, 2015

Results! I'll post what I got in separate Screen terminals in successive comments.

22:25:22 schedule.1 |   69896720     Mysql2::Error: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111): BEGIN
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `query'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `block in execute'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_adapter.rb:473:in `block in log'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_adapter.rb:467:in `log'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `execute'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/mysql2_adapter.rb:231:in `execute'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:315:in `begin_db_transaction'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:144:in `initialize'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:170:in `new'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:170:in `begin_transaction'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:187:in `within_new_transaction'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `transaction'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:220:in `transaction'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:344:in `with_transaction_returning_status'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:286:in `block in save'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:301:in `rollback_active_record_state!'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:285:in `save'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:35:in `block (2 levels) in enqueue'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `call'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `block in initialize'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `call'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `execute'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:40:in `run_callbacks'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:33:in `block in enqueue'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:32:in `tap'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:32:in `enqueue'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/message_sending.rb:12:in `method_missing'
22:25:22 schedule.1 |   69896720       /home/drwho/huginn/lib/huginn_scheduler.rb:153:in `block in run_schedule'
22:25:22 schedule.1 |   69896720       /home/drwho/huginn/lib/huginn_scheduler.rb:192:in `block (2 levels) in with_mutex'
22:25:22 schedule.1 |   69896720       /home/drwho/huginn/lib/huginn_scheduler.rb:191:in `synchronize'
22:25:22 schedule.1 |   69896720       /home/drwho/huginn/lib/huginn_scheduler.rb:191:in `block in with_mutex'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/connection_pool.rb:292:in `with_connection'
22:25:22 schedule.1 |   69896720       /home/drwho/huginn/lib/huginn_scheduler.rb:190:in `with_mutex'
22:25:22 schedule.1 |   69896720       /home/drwho/huginn/lib/huginn_scheduler.rb:151:in `run_schedule'
22:25:22 schedule.1 |   69896720       /home/drwho/huginn/lib/huginn_scheduler.rb:129:in `block (2 levels) in run!'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:234:in `call'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:234:in `do_call'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:258:in `do_trigger'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:300:in `block (3 levels) in start_work_thread'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:303:in `call'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:303:in `block (2 levels) in start_work_thread'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:289:in `loop'
22:25:22 schedule.1 |   69896720       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:289:in `block in start_work_thread'
22:25:22 schedule.1 |   69896720   tz:
22:25:22 schedule.1 |   69896720     ENV['TZ']:
22:25:22 schedule.1 |   69896720     Time.now: 2015-09-04 22:25:21 +0000
22:25:22 schedule.1 |   69896720   scheduler:
22:25:22 schedule.1 |   69896720     object_id: 87855320
22:25:22 schedule.1 |   69896720     opts:
22:25:22 schedule.1 |   69896720       {:frequency=>"1.0"}
22:25:22 schedule.1 |   69896720       frequency: 1.0
22:25:22 schedule.1 |   69896720       scheduler_lock: #<Rufus::Scheduler::NullLock:0x0000000a7a7100>
22:25:22 schedule.1 |   69896720       trigger_lock: #<Rufus::Scheduler::NullLock:0x0000000a7a70d8>
22:25:22 schedule.1 |   69896720     uptime: 21669.938076734543 (6h1m9s938)
22:25:22 schedule.1 |   69896720     down?: false
22:25:22 schedule.1 |   69896720     threads: 5
22:25:22 schedule.1 |   69896720       thread: #<Thread:0x0000000a7a6ed0>
22:25:22 schedule.1 |   69896720       thread_key: rufus_scheduler_87855320
22:25:22 schedule.1 |   69896720       work_threads: 4
22:25:22 schedule.1 |   69896720         active: 3
22:25:22 schedule.1 |   69896720         vacant: 1
22:25:22 schedule.1 |   69896720         max_work_threads: 28
22:25:22 schedule.1 |   69896720       mutexes: {}
22:25:22 schedule.1 |   69896720     jobs: 44
22:25:22 schedule.1 |   69896720       at_jobs: 0
22:25:22 schedule.1 |   69896720       in_jobs: 0
22:25:22 schedule.1 |   69896720       every_jobs: 16
22:25:22 schedule.1 |   69896720       interval_jobs: 0
22:25:22 schedule.1 |   69896720       cron_jobs: 28
22:25:22 schedule.1 |   69896720     running_jobs: 3
22:25:22 schedule.1 |   69896720     work_queue: 0
22:25:22 schedule.1 | } 69896720 .
22:25:22 schedule.1 | { 69995240 rufus-scheduler intercepted an error:
22:25:22 schedule.1 |   69995240   job:
22:25:22 schedule.1 |   69995240     Rufus::Scheduler::EveryJob "2m" {}
22:25:22 schedule.1 |   69995240   error:
22:25:22 schedule.1 |   69995240     69995240
22:25:22 schedule.1 |   69995240     ActiveRecord::StatementInvalid
22:25:22 schedule.1 |   69995240     Mysql2::Error: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111): BEGIN
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `query'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `block in execute'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_adapter.rb:473:in `block in log'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_adapter.rb:467:in `log'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `execute'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/mysql2_adapter.rb:231:in `execute'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:315:in `begin_db_transaction'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:144:in `initialize'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:170:in `new'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:170:in `begin_transaction'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/transaction.rb:187:in `within_new_transaction'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `transaction'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:220:in `transaction'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:344:in `with_transaction_returning_status'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:286:in `block in save'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:301:in `rollback_active_record_state!'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/transactions.rb:285:in `save'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:35:in `block (2 levels) in enqueue'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `call'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `block in initialize'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `call'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `execute'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:40:in `run_callbacks'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:33:in `block in enqueue'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:32:in `tap'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:32:in `enqueue'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/message_sending.rb:12:in `method_missing'
22:25:22 schedule.1 |   69995240       /home/drwho/huginn/lib/huginn_scheduler.rb:153:in `block in run_schedule'
22:25:22 schedule.1 |   69995240       /home/drwho/huginn/lib/huginn_scheduler.rb:192:in `block (2 levels) in with_mutex'
22:25:22 schedule.1 |   69995240       /home/drwho/huginn/lib/huginn_scheduler.rb:191:in `synchronize'
22:25:22 schedule.1 |   69995240       /home/drwho/huginn/lib/huginn_scheduler.rb:191:in `block in with_mutex'
22:25:22 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/connection_pool.rb:292:in `with_connection'
22:25:23 schedule.1 |   69995240       /home/drwho/huginn/lib/huginn_scheduler.rb:190:in `with_mutex'
22:25:23 schedule.1 |   69995240       /home/drwho/huginn/lib/huginn_scheduler.rb:151:in `run_schedule'
22:25:23 schedule.1 |   69995240       /home/drwho/huginn/lib/huginn_scheduler.rb:129:in `block (2 levels) in run!'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:234:in `call'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:234:in `do_call'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:258:in `do_trigger'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:300:in `block (3 levels) in start_work_thread'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:303:in `call'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:303:in `block (2 levels) in start_work_thread'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:289:in `loop'
22:25:23 schedule.1 |   69995240       /home/drwho/.rvm/gems/ruby-2.1.5/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:289:in `block in start_work_thread'

@virtadpt
Copy link
Collaborator

virtadpt commented Sep 4, 2015

Nothing emitted on stderr. Nothing in ~/stderr.log.

@virtadpt
Copy link
Collaborator

virtadpt commented Sep 4, 2015

In one of the extra workers I'd spawned:

Agent#563: Exception during check. Mysql2::Error: Lost connection to MySQL server during query: SELECT  agent_logs.id FROM `agent_logs` WHERE `agent_logs`.`agent_id` = 563  ORDER BY agent_logs.id desc LIMIT 1 OFFSET 249: /home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `query'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `block in execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_adapter.rb:473:in `block in log'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_adapter.rb:467:in `log'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/mysql2_adapter.rb:231:in `execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/mysql2_adapter.rb:235:in `exec_query'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/database_statements.rb:347:in `select'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/connection_adapters/abstract/query_cache.rb:70:in `select_all'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activerecord-4.2.2/lib/active_record/relation/calculations.rb:180:in `pluck'
/home/drwho/huginn/app/models/agent_log.rb:22:in `log_for_agent'
/home/drwho/huginn/app/models/agent.rb:212:in `log'
/home/drwho/huginn/app/concerns/dry_runnable.rb:61:in `log'
/home/drwho/huginn/app/models/agents/imap_folder_agent.rb:358:in `each_unread_mail'
/home/drwho/huginn/app/models/agents/imap_folder_agent.rb:196:in `check'
/home/drwho/huginn/app/jobs/agent_check_job.rb:7:in `perform'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/execution.rb:32:in `block in perform_now'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:117:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:117:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:555:in `block (2 levels) in compile'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:505:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:505:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:498:in `block (2 levels) in around'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:343:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:343:in `block (2 levels) in simple'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/logging.rb:23:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/logging.rb:23:in `block (4 levels) in <module:Logging>'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/notifications.rb:164:in `block in instrument'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/notifications.rb:164:in `instrument'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/logging.rb:22:in `block (3 levels) in <module:Logging>'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/logging.rb:43:in `block in tag_logger'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/tagged_logging.rb:68:in `block in tagged'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/tagged_logging.rb:26:in `tagged'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/tagged_logging.rb:68:in `tagged'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/logging.rb:43:in `tag_logger'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/logging.rb:19:in `block (2 levels) in <module:Logging>'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:441:in `instance_exec'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:441:in `block in make_lambda'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:342:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:342:in `block in simple'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:497:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:497:in `block in around'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:505:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:505:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:92:in `_run_callbacks'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:776:in `_run_perform_callbacks'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activesupport-4.2.2/lib/active_support/callbacks.rb:81:in `run_callbacks'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/execution.rb:31:in `perform_now'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/execution.rb:21:in `execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/activejob-4.2.2/lib/active_job/queue_adapters/delayed_job_adapter.rb:34:in `perform'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:94:in `block in invoke_job'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `block in initialize'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:40:in `run_callbacks'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/backend/base.rb:91:in `invoke_job'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:204:in `block (2 levels) in run'
/home/drwho/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout'
/home/drwho/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/timeout.rb:101:in `call'
/home/drwho/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/timeout.rb:101:in `timeout'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:204:in `block in run'
/home/drwho/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/benchmark.rb:294:in `realtime'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:203:in `run'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:280:in `block in reserve_and_run_one_job'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `block in initialize'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:40:in `run_callbacks'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:280:in `reserve_and_run_one_job'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:187:in `block in work_off'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:186:in `times'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:186:in `work_off'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:150:in `block (4 levels) in start'
/home/drwho/.rvm/rubies/ruby-2.1.5/lib/ruby/2.1.0/benchmark.rb:294:in `realtime'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:149:in `block (3 levels) in start'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:40:in `run_callbacks'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:148:in `block (2 levels) in start'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:147:in `loop'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:147:in `block in start'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/plugins/clear_locks.rb:7:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/plugins/clear_locks.rb:7:in `block (2 levels) in <class:ClearLocks>'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:79:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:79:in `block (2 levels) in add'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:61:in `block in initialize'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:79:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:79:in `block in add'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:66:in `execute'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/lifecycle.rb:40:in `run_callbacks'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/worker.rb:146:in `start'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/command.rb:124:in `run'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/command.rb:112:in `block in run_process'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/application.rb:255:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/application.rb:255:in `block in start_proc'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/application.rb:264:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/application.rb:264:in `start_proc'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/application.rb:296:in `start'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/controller.rb:73:in `run'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons.rb:197:in `block in run_proc'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/cmdline.rb:109:in `call'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons/cmdline.rb:109:in `catch_exceptions'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/daemons-1.1.9/lib/daemons.rb:196:in `run_proc'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/command.rb:110:in `run_process'
/home/drwho/.rvm/gems/ruby-2.1.5/gems/delayed_job-4.0.6/lib/delayed/command.rb:86:in `daemonize'
script/delayed_job:5:in `<main>'
Mysql2::Error: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111): UPDATE `delayed_jobs` SET `delayed_jobs`.`locked_by` = NULL, `delayed_jobs`.`locked_at` = NULL WHERE `delayed_jobs`.`locked_by` = 'delayed_job.6 host:exocortex pid:25535'

@virtadpt
Copy link
Collaborator

virtadpt commented Sep 4, 2015

The same stack trace happened in all of the extra DJs I spawned to force this to happen. It killed almost all of the spares I spawned to test this; a few survived because they were running at the time the others died, but now that the scheduler's died it's sitting there waiting for things to do.

Looking at the process list:

...
8881 pts/3    Sl     6:52 /home/drwho/.rvm/rubies/ruby-2.1.5/bin/ruby bin/rails server -b127.0.0.1
8884 pts/3    Sl     1:20 bin/schedule.rb
...

@dsander
Copy link
Collaborator

dsander commented Oct 4, 2015

You need to watch the disk usage though, per default the PeakDetectorAgent in the scenario will trigger a dump for every detected peak. I ran out of space of my root disk once 😄

@cantino
Copy link
Member

cantino commented Oct 4, 2015

Thanks! Just added a monitor to keep an eye on that.

@cantino
Copy link
Member

cantino commented Oct 28, 2015

I'm seeing the memory leak too after a few days running. I suspect it may be the cause of some of my crashes. According to #1114, this will happen with a completely empty agent too, which makes me suspect DJ or the scheduler?

@dsander
Copy link
Collaborator

dsander commented Oct 28, 2015

@cantino My test instance crashed for the same reason today, I recently added some more debugging information and found that multiple instances of TwitterStreamAgent::Worker were still in memory. Applied a fix and I am now waiting if that fixes the crashes.

This does not explain #1114 though.

@virtadpt
Copy link
Collaborator

My instance (now running with the multithreaded scheduler) is doing the same thing. I've had two lockups in as many days. Until I purged the pending jobs this morning, I had a backlog of 7900 jobs waiting to run.

I'm interested to see if commit 07e2d9c fixes it.

@jn11
Copy link

jn11 commented Nov 13, 2015

I'm reading in heroku discussion forum that some people was experimenting memory leaks running the latests version of Ruby. Have anybody tried running huggin with Ruby 2.0.0?

More info:
https://discussion.heroku.com/t/memory-leaks-with-ruby-2-1-2/644/23

@virtadpt
Copy link
Collaborator

The leak's happening in v2.1.5, too.

@cantino
Copy link
Member

cantino commented Nov 14, 2015

Are you still seeing a steady climb of memory?

@virtadpt
Copy link
Collaborator

Yes. However, JabberAgent's been locking up Huginn way more than OOMs have been killing Huginn lately (two weeks or therabouts).

@yhaenggi
Copy link
Contributor Author

@virtadpt thanks for telling, was just about to update huginn. a lockup due to jabberagent would be horrible for me. i'll try as soon as the problem with jabberagent is fixed.

@dsander
Copy link
Collaborator

dsander commented Nov 16, 2015

Pretty sure it is not an issue with the jabber agent in general, as far as I can remember we did not change it for a least a few months and I am using it as a receiver (JabberAgent::Worker) without any issues.

@virtadpt
Copy link
Collaborator

Insofar as the memory leak, I think the article that @jn11 linked to has a lot to do with it.

I'm using JabberAgent as a sender, which might be the problem. But that's a separate issue (literally).

@brianpetro
Copy link
Contributor

Thought of this issue when I read this How Ruby 2.2 can cause an out-of-memory server crash

@yhaenggi
Copy link
Contributor Author

@brianpetro versions below 2.2 are affected too

@cantino
Copy link
Member

cantino commented Dec 26, 2015

I think this would only happen if recursion was interrupted by a signal, so the GC got disabled? Then we wouldn't see a slow climb in memory, but probably a fairly quick one? But I could totally be wrong. If you're having memory issues, would you mind trying the newer version of Ruby and seeing if it helps? It'd be awesome if this fixed our issues.

@wasafiri
Copy link
Contributor

@cantino Jumping in here to help troubleshoot. As I was fiddling around with Huginn today I noticed that Celluloid 0.16.0 gets installed during setup. There is a known memory leak with that version of Celluloid, see celluloid/celluloid#455 You might look at requiring Celluloid version 0.15.2 or 0.17.2, see https://gist.github.com/gazay/3b518f72266b5a7e88ff

I also bookmarked a list of "leaky gems" and delayed_job and therubyracer, both of which are in the Huginn gemfile, are listed. See https://github.com/ASoftCo/leaky-gems

When I set up Huginn today I noticed that delayed_job version 4.0.6 was installed -- that version is known to have memory leaks: collectiveidea/delayed_job#776

therubyracer was installed with gem version 0.12.2 which fixes the memory leak that was previously reported, AND ~> 0.12.2 is specified in the gemfile, so that one should be fine. rubyjs/therubyracer#336

Hope that helps!

@cantino
Copy link
Member

cantino commented Jan 14, 2016

Thanks @wasafiri, this is very helpful! I'll try updating the Gemfile.

@wasafiri
Copy link
Contributor

wasafiri commented Feb 3, 2016

Does the memory leak persist @virtadpt @cantino ?

@cantino
Copy link
Member

cantino commented Feb 3, 2016

I haven't had issues recently, but it wasn't affecting me much. @K1773R?

@virtadpt
Copy link
Collaborator

virtadpt commented Feb 3, 2016

I think so. I'm running the threaded job runner (have been since I got Huginn into production mode) and it's been remarkably stable since I disabled all of my JabberAgents in favor of BeeperAgent instances.

@yhaenggi
Copy link
Contributor Author

yhaenggi commented Feb 4, 2016

@cantino I should have time for an upgrade at the weekend. i'll report back.
What is the Issue with the JabberAgent? Im relying on it and upgrading to a broken version would be bad for me.

@dsander dsander mentioned this issue Apr 7, 2016
7 tasks
@HyShai
Copy link
Contributor

HyShai commented Jan 6, 2017

I've been having memory leak issues for the past couple of months. I'm using the latest codebase on an ubuntu xenial box with 1 GB memory.
Huginn runs fine for a day - and then the server get's very sluggish - the threaded.rb process starts at 5% and then steadily increases - after a day it's around 16%....

The logs don't seem to show any errors.

@cantino
Copy link
Member

cantino commented Jan 7, 2017

That's not good. I've seen my RAM usage climb once in the last month I think. Seems like there's still some subtle issue.

@dsander
Copy link
Collaborator

dsander commented Jan 21, 2017

I did some monitoring of my installation and it looks like the memory usage stabilized after a few days. The amount of RAM used for the twitter stream process still surprises me a bit.

CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS              PORTS                      NAMES
8c4a99ed0f4c        cantino/huginn-single-process    "/scripts/init bin/tw"   4 minutes ago       Up 4 minutes        3000/tcp                   huginndocker_huginn_tw_1
6a9a28389370        cantino/huginn-single-process    "/scripts/init bin/sc"   4 minutes ago       Up 4 minutes        3000/tcp                   huginndocker_huginn_scheduler_1
2065a5122238        cantino/huginn-single-process    "/scripts/init"          4 minutes ago       Up 4 minutes        127.0.0.1:3000->3000/tcp   huginndocker_huginn_web_1
a859b884a9a4        cantino/huginn-single-process    "/scripts/init script"   4 minutes ago       Up 4 minutes        3000/tcp                   huginndocker_huginn_dj_1
cdd35eae5260        postgres:9.5                     "/docker-entrypoint.s"   4 minutes ago       Up 4 minutes        5432/tcp                   huginndocker_postgres_1
Mon Jan 16 17:47:37 CET 2017
CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O               BLOCK I/O             PIDS
8c4a99ed0f4c        0.12%               190.6 MiB / 31.32 GiB   0.59%               65.05 MB / 2.116 MB   0 B / 1.769 MB        0
6a9a28389370        0.07%               150.9 MiB / 31.32 GiB   0.47%               1.17 MB / 253.7 kB    0 B / 1.769 MB        0
2065a5122238        0.00%               262.1 MiB / 31.32 GiB   0.82%               1.513 MB / 602.9 kB   376.8 kB / 1.769 MB   0
a859b884a9a4        0.27%               162.6 MiB / 31.32 GiB   0.51%               1.355 MB / 450.1 kB   2.335 MB / 1.786 MB   0
cdd35eae5260        0.11%               69.21 MiB / 31.32 GiB   0.22%               2.09 MB / 61.83 MB    2.31 MB / 12.53 MB    0
Mon Jan 16 21:32:11 CET 2017
CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O               BLOCK I/O             PIDS
8c4a99ed0f4c        3.84%               225.7 MiB / 31.32 GiB   0.70%               862 MB / 148.5 MB     0 B / 1.769 MB        0
6a9a28389370        0.08%               190 MiB / 31.32 GiB     0.59%               1.717 MB / 981.1 kB   0 B / 1.769 MB        0
2065a5122238        0.00%               287.9 MiB / 31.32 GiB   0.90%               1.665 MB / 728.1 kB   376.8 kB / 1.769 MB   0
a859b884a9a4        0.12%               187.9 MiB / 31.32 GiB   0.59%               37.75 MB / 22.31 MB   2.335 MB / 1.786 MB   0
cdd35eae5260        1.38%               89 MiB / 31.32 GiB      0.28%               161.5 MB / 719.2 MB   2.31 MB / 1.092 GB    0
Tue Jan 17 22:15:49 CET 2017
CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O               BLOCK I/O             PIDS
8c4a99ed0f4c        2.10%               313.1 MiB / 31.32 GiB   0.98%               3.673 GB / 986.3 MB   0 B / 1.769 MB        0
6a9a28389370        0.08%               206.1 MiB / 31.32 GiB   0.64%               4.839 MB / 5.647 MB   0 B / 1.769 MB        0
2065a5122238        0.00%               448.4 MiB / 31.32 GiB   1.40%               16.38 MB / 12.57 MB   557.1 kB / 1.769 MB   0
a859b884a9a4        0.12%               189.3 MiB / 31.32 GiB   0.59%               274.9 MB / 165.4 MB   2.335 MB / 1.786 MB   0
cdd35eae5260        0.84%               113.2 MiB / 31.32 GiB   0.35%               1.05 GB / 1.598 GB    3.015 MB / 7.56 GB    0
Fri Jan 20 13:27:54 CET 2017
CONTAINER           CPU %               MEM USAGE / LIMIT       MEM %               NET I/O               BLOCK I/O             PIDS
8c4a99ed0f4c        0.14%               307.8 MiB / 31.32 GiB   0.96%               1.023 GB / 2.93 GB    0 B / 1.769 MB        0
6a9a28389370        0.09%               209.5 MiB / 31.32 GiB   0.65%               12.73 MB / 17.55 MB   0 B / 1.769 MB        0
2065a5122238        0.00%               455.7 MiB / 31.32 GiB   1.42%               18.2 MB / 14.03 MB    557.1 kB / 1.769 MB   0
a859b884a9a4        0.12%               188.7 MiB / 31.32 GiB   0.59%               884.1 MB / 531.6 MB   2.335 MB / 1.786 MB   0
cdd35eae5260        0.06%               122.7 MiB / 31.32 GiB   0.38%               3.147 GB / 3.795 GB   3.727 MB / 23.39 GB   0

@cantino
Copy link
Member

cantino commented Jan 22, 2017

How high does your RAM usage get @HyShai? I know that Ruby's generational garbage collector can take a while before it reaches a steady state, which is always higher than where it starts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants