Eye severe memory leak #204

innovia · 2017-11-29T16:18:20Z

Hi Konstantin

I have Eye 0.9.2 installed on ubuntu 14.04

the monitored processes are:

reDash
  celery_scheduler ................ up  (Nov26, 0%, 74Mb, <1290>)
    child-1303 .................... up  (Nov26, 0%, 67Mb, <1303>)
    child-25635 ................... up  (15:33, 0%, 87Mb, <25635>)
    child-25953 ................... up  (15:37, 0%, 80Mb, <25953>)
  celery_worker ................... up  (Nov26, 1%, 72Mb, <1127>)
    child-25875 ................... up  (15:36, 0%, 85Mb, <25875>)
    child-25946 ................... up  (15:37, 0%, 85Mb, <25946>)
    child-25947 ................... up  (15:37, 0%, 85Mb, <25947>)
    child-25948 ................... up  (15:37, 0%, 80Mb, <25948>)
  flower .......................... up  (Nov26, 0%, 197Mb, <1132>)
  gunicorn ........................ up  (Nov26, 0%, 14Mb, <1304>)
    child-10761 ................... up  (12:30, 0%, 80Mb, <10761>)
    child-12652 ................... up  (12:54, 0%, 85Mb, <12652>)
    child-13920 ................... up  (13:10, 0%, 81Mb, <13920>)
    child-15800 ................... up  (13:34, 0%, 81Mb, <15800>)
    child-23143 ................... up  (15:08, 0%, 79Mb, <23143>)
    child-27819 ................... up  (09:14, 0%, 96Mb, <27819>)
    child-28854 ................... up  (09:28, 0%, 83Mb, <28854>)
    child-29043 ................... up  (09:30, 0%, 81Mb, <29043>)
    child-29362 ................... up  (09:34, 0%, 81Mb, <29362>)
  nginx ........................... up  (Nov26, 0%, 4Mb, <1372>)
    child-1374 .................... up  (Nov26, 0%, 4Mb, <1374>)
    child-1375 .................... up  (Nov26, 0%, 4Mb, <1375>)
    child-1376 .................... up  (Nov26, 0%, 4Mb, <1376>)
    child-1377 .................... up  (Nov26, 0%, 4Mb, <1377>)
  postgresql ...................... up  (15:28, 0%, 188Mb, <25069>)
    child-25071 ................... up  (15:28, 0%, 53Mb, <25071>)
    child-25072 ................... up  (15:28, 0%, 50Mb, <25072>)
    child-25073 ................... up  (15:28, 0%, 1Mb, <25073>)
    child-25074 ................... up  (15:28, 0%, 2Mb, <25074>)
    child-25075 ................... up  (15:28, 0%, 1Mb, <25075>)
    child-25181 ................... up  (15:29, 0%, 9Mb, <25181>)
    child-25200 ................... up  (15:29, 0%, 9Mb, <25200>)
    child-25233 ................... up  (15:30, 0%, 9Mb, <25233>)
    child-25268 ................... up  (15:30, 0%, 11Mb, <25268>)
    child-25371 ................... up  (15:30, 0%, 9Mb, <25371>)
    child-25377 ................... up  (15:30, 0%, 9Mb, <25377>)
    child-25540 ................... up  (15:32, 0%, 9Mb, <25540>)
    child-25541 ................... up  (15:32, 0%, 9Mb, <25541>)
    child-25624 ................... up  (15:33, 0%, 9Mb, <25624>)
    child-25639 ................... up  (15:33, 0%, 14Mb, <25639>)
    child-25876 ................... up  (15:36, 0%, 13Mb, <25876>)
    child-25949 ................... up  (15:37, 0%, 13Mb, <25949>)
    child-25950 ................... up  (15:37, 0%, 13Mb, <25950>)
  redis ........................... up  (Nov26, 128%, 10Mb, <1159>)

they add up to ~1500MB

but Eye is taking 18GB!

root 1087 0.5 54.3 25626424 17915500 ? Sl Nov26 26:40 eye monitoring v0.9.2 [reDash] (in /home/ubuntu)

you can see in this picture that the memory is leaking:

Any Idea how to find this memory leak?

The text was updated successfully, but these errors were encountered:

kostya · 2017-11-29T16:31:15Z

show eye x output when it in 18gb

innovia · 2017-11-29T17:15:58Z

Ok i had to restart it - it will happen again

here's the x info now after restart:

about:     Eye v0.9.2 (c) 2012-2016 @kostya
resources: 16:19, 0%, 115Mb, <18658>
ruby:      ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]
gems:      ["Celluloid=0.17.3", "Celluloid::IO=0.17.3", "StateMachines=0.5.0", "NIO=2.1.0", "Timers=4.1.2", "Sigar=1.7.0.0"]
logger:    /opt/redash/logs/eye.log
home:      /root
dir:       /var/run/eye
pid_path:  /var/run/eye/pid
sock_path: /var/run/eye/sock
actors:    [["Eye::ChildProcess", 33], ["Eye::Process", 7], ["Eye::Group", 1], ["Eye::Controller", 1], ["Eye::SystemResources::Cache", 1], ["Eye::Server", 1], ["Celluloid::Supervision::Service::Public", 1], ["Celluloid::IncidentReporter", 1], ["Celluloid::Notifications::Fanout", 1], ["Celluloid::Supervision::Service::Root", 1]]

kostya · 2017-11-29T18:09:07Z

btw i not think you need to monitor children for postgres or nginx, its quite strange use case.

innovia · 2017-11-29T18:27:26Z

I just wanted to see how much each of the processes are taking but now it's seems like eye is the cause of that memory leak.

Once we stablize it I'll remove these children.

innovia · 2017-11-30T22:23:41Z

here it is after a day 6GB!

root@reDash:~# eye x
about:     Eye v0.9.2 (c) 2012-2016 @kostya
resources: Nov29, 0%, 6093Mb, <18658>
ruby:      ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]
gems:      ["Celluloid=0.17.3", "Celluloid::IO=0.17.3", "StateMachines=0.5.0", "NIO=2.1.0", "Timers=4.1.2", "Sigar=1.7.0.0"]
logger:    /opt/redash/logs/eye.log
home:      /root
dir:       /var/run/eye
pid_path:  /var/run/eye/pid
sock_path: /var/run/eye/sock
actors:    [["Eye::ChildProcess", 32], ["Eye::Process", 7], ["Eye::Group", 1], ["Eye::Controller", 1], ["Eye::SystemResources::Cache", 1], ["Eye::Server", 1], ["Celluloid::Supervision::Service::Public", 1], ["Celluloid::IncidentReporter", 1], ["Celluloid::Notifications::Fanout", 1], ["Celluloid::Supervision::Service::Root", 1]]

kostya · 2017-11-30T23:45:46Z

This is really strange, i not see leaking in actors.
So hard to say where it can be.
Can you show full config? may be you have some global variables which not cleans.
You have only 7 processes. I have eye process running for half year for 156 processes.
Also there is difference in ruby, and also in gems StateMachines, NIO, Timers (which potentially can give leak).

about:     Eye v0.9.1 (c) 2012-2016 @kostya
resources: Apr26, 0%, 106Mb, <32291>
ruby:      ruby 1.9.3p484 (2013-11-22) [x86_64-linux]
gems:      ["Celluloid=0.17.3", "Celluloid::IO=0.17.1", "StateMachines=0.4.0", "NIO=1.1.1", "Timers=4.1.1", "Sigar=1.7.0.0"]
logger:    /projects/client/log/eye.log
home:      /home/deploy
dir:       /home/deploy/.eye
pid_path:  /home/deploy/.eye/pid
sock_path: /home/deploy/.eye/sock
actors:    [["Eye::Process", 156], ["Eye::Group", 29], ["Eye::Server", 1], ["Eye::SystemResources::Cache", 1], ["Eye::Controller", 1], ["Celluloid::Supervision::Service::Public", 1], ["Celluloid::IncidentReporter", 1], ["Celluloid::Notifications::Fanout", 1], ["Celluloid::Supervision::Service::Root", 1]]

innovia · 2017-11-30T23:53:59Z

im trying to reinstall the server on 16.04, will let you know

kostya · 2017-11-30T23:57:38Z

i rerun my eye with

ruby:      ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
gems:      ["Celluloid=0.17.3", "Celluloid::IO=0.17.3", "StateMachines=0.5.0", "NIO=2.1.0", "Timers=4.1.2", "Sigar=1.7.0.0"]

so will see, is there any leak or not

kostya · 2017-12-01T21:32:52Z

i not see any leaks with new gems and ruby 2.2.2, so may be this is problem in config

innovia · 2017-12-04T10:32:42Z

here's my config

#!/usr/bin/env ruby
Eye.load '/etc/eye/mailer.rb' # mailer set params (like variables)
Eye.load '/etc/eye/cloudwatch.rb'
Eye.load '/etc/eye/config.rb' # config assign params values

Eye.application :reDash do
  working_dir "/opt/redash/current"

  load_env "/root/.env"

  trigger :flapping, times: 3, within: 1.minute, retry_in: 5.minutes
  notify :by_email, :info
  notify :cloudwatch, :info

  process(:postgresql) do
    pid_file "/var/run/postgresql/9.3-main.pid"
    stdall "/opt/redash/logs/postgresql"
    start_command "service postgresql start"
    stop_command "service postgresql stop"
    restart_command "service postgresql restart"
  end

  process(:nginx) do
    depend_on :gunicorn
    pid_file "/var/run/nginx.pid"
    stdall "/opt/redash/logs/nginx.log"
    start_command "/usr/sbin/nginx"
    stop_signals [:QUIT, 30.seconds, :TERM, 15.seconds, :KILL]
    restart_command "kill -HUP {PID}"
    daemonize true
  end

  process(:redis) do
    pid_file "/var/run/redis.pid"
    stdall "/opt/redash/logs/redis.log"
    start_command "/usr/local/bin/redis-server /etc/redis/6379.conf"
    stop_signals [:TERM, 30.seconds, :QUIT]
    restart_command  "kill -HUP {{PID}}"
    daemonize true
  end

  process(:gunicorn) do
    uid 'redash'
    gid 'nogroup'
    depend_on :redis
    pid_file "/var/run/gunicorn/gunicorn.pid"
    stdall "/opt/redash/logs/gunicorn.log"
    start_command "gunicorn -b unix:///var/run/gunicorn/gunicorn.sock --name redash -w 9 --max-requests 1000 redash.wsgi:app"
    stop_signals [:TERM, 30.seconds, :QUIT]
    restart_command  "kill -HUP {{PID}}"
    daemonize true
    monitor_children do
      stop_command "kill -TERM {PID}"
      check :cpu, :every => 30, :below => 80, :times => 3
      check :memory, :every => 30, :below => 350.megabytes, :times => [3,5]
    end
  end

  process(:flower) do
    uid 'redash'
    gid 'nogroup'
    pid_file "/var/run/celery/flower.pid"
    stdall "/opt/redash/logs/flower.log"
    start_command "celery flower -A redash.worker --logging=debug --broker=redis://127.0.0.1:6379/0 --broker_api=redis://127.0.0.1:6379/0 --address=0.0.0.0 --port=5555 --persistent"
    stop_signals [:TERM, 30.seconds, :QUIT]
    restart_command  "kill -HUP {{PID}}"
    check :cpu, :every => 30, :below => 80, :times => 3
    check :memory, :every => 30, :below => 250.megabytes, :times => [3,5]
    daemonize true
  end

  process(:celery_scheduler) do
    uid 'redash'
    gid 'nogroup'
    pid_file "/var/run/celery/celery_schedule_worker.pid"
    stdall "/opt/redash/logs/celery_schedule_worker.log"
    start_command "celery worker --app=redash.worker --beat --concurrency=4 --queues=queries,celery --maxtasksperchild=100 --events -Ofair --autoscale=10,4 -n redash_celery_scheduled@%h"
    stop_signals [:TERM, 30.seconds, :QUIT]
    restart_command  "kill -HUP {{PID}}"
    daemonize true
    monitor_children do
      stop_command "kill -TERM {PID}"
      check :cpu, :every => 30, :below => 80, :times => 3
      check :memory, :every => 30, :below => 512.megabytes, :times => [3,5]
    end
  end

   process(:celery_worker) do
    uid 'redash'
    gid 'nogroup'
    pid_file "/var/run/celery/celery_worker.pid"
    stdall "/opt/redash/logs/celery_worker.log"
    start_command "celery worker --app=redash.worker --concurrency=4 --queues=scheduled_queries --maxtasksperchild=100 --events -Ofair  --autoscale=10,4 -n redash_celery_worker@%h"
    stop_signals [:TERM, 30.seconds, :QUIT]
    restart_command  "kill -HUP {{PID}}"
    daemonize true
    monitor_children do
      stop_command "kill -TERM {PID}"
      check :cpu, :every => 30, :below => 80, :times => 3
      check :memory, :every => 30, :below => 400.megabytes, :times => [3,5]
    end
  end
end

kostya · 2017-12-04T10:49:27Z

not see anything bad in config:
i think we should try 3 things:

me for try ruby 2.4.1 (cause i try only 2.2.2)
you try remove monitor_children and see if leak still here.
you try remove depend_on and see if leak still here.

btw using restart_command and daemonize true quite strange. for example unicorn also used restart_command, but it run with daemonize false, because it potentially can create some problems with restarts.

innovia · 2017-12-04T11:54:26Z

Im trying to downgrade to 2.2.2

i'll do what you suggested

innovia · 2017-12-06T07:28:05Z

downgrading to 2.2.2 and disabling all of these dependancies and children stabilized it at 100Mb

i'll try adding back the children monitoring

kostya · 2017-12-07T15:52:58Z

how it is?

innovia · 2017-12-07T18:41:21Z

its growing

now at 700MB

eye x
about: Eye v0.9.2 (c) 2012-2016 @kostya
resources: Dec04, 0%, 776Mb, <1089>
ruby: ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
gems: ["Celluloid=0.17.3", "Celluloid::IO=0.17.3", "StateMachines=0.5.0", "NIO=2.1.0", "Timers=4.1.2", "Sigar=1.7.0.0"]
logger: /opt/redash/logs/eye.log
home: /home/ubuntu
dir: /var/run/eye
pid_path: /var/run/eye/pid
sock_path: /var/run/eye/sock
actors: [["Eye::ChildProcess", 18], ["Eye::Process", 7], ["Eye::Notify::Mail", 1], ["Eye::Group", 1], ["Eye::Controller", 1], ["Eye::SystemResources::Cache", 1], ["Eye::Server", 1], ["Celluloid::Supervision::Service::Public", 1], ["Celluloid::IncidentReporter", 1], ["Celluloid::Notifications::Fanout", 1], ["Celluloid::Supervision::Service::Root", 1]]

kostya · 2017-12-07T20:12:47Z

ok, can you test separately monitor_children and depend_on, to see where bug is.

innovia · 2017-12-07T20:43:45Z

I didnt add depend_on at all

kostya · 2017-12-07T21:16:06Z

so bug in monitor_children, it remove all of it, it just works without leaks.

kostya · 2017-12-08T10:34:05Z

confirm that leak in monitor_children, i'll try to fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eye severe memory leak #204

Eye severe memory leak #204

innovia commented Nov 29, 2017

kostya commented Nov 29, 2017

innovia commented Nov 29, 2017

kostya commented Nov 29, 2017

innovia commented Nov 29, 2017

innovia commented Nov 30, 2017

kostya commented Nov 30, 2017 •

edited

Loading

innovia commented Nov 30, 2017

kostya commented Nov 30, 2017

kostya commented Dec 1, 2017 •

edited

Loading

innovia commented Dec 4, 2017

kostya commented Dec 4, 2017

innovia commented Dec 4, 2017

innovia commented Dec 6, 2017

kostya commented Dec 7, 2017

innovia commented Dec 7, 2017

kostya commented Dec 7, 2017

innovia commented Dec 7, 2017

kostya commented Dec 7, 2017

kostya commented Dec 8, 2017 •

edited

Loading

Eye severe memory leak #204

Eye severe memory leak #204

Comments

innovia commented Nov 29, 2017

kostya commented Nov 29, 2017

innovia commented Nov 29, 2017

kostya commented Nov 29, 2017

innovia commented Nov 29, 2017

innovia commented Nov 30, 2017

kostya commented Nov 30, 2017 • edited Loading

innovia commented Nov 30, 2017

kostya commented Nov 30, 2017

kostya commented Dec 1, 2017 • edited Loading

innovia commented Dec 4, 2017

kostya commented Dec 4, 2017

innovia commented Dec 4, 2017

innovia commented Dec 6, 2017

kostya commented Dec 7, 2017

innovia commented Dec 7, 2017

kostya commented Dec 7, 2017

innovia commented Dec 7, 2017

kostya commented Dec 7, 2017

kostya commented Dec 8, 2017 • edited Loading

kostya commented Nov 30, 2017 •

edited

Loading

kostya commented Dec 1, 2017 •

edited

Loading

kostya commented Dec 8, 2017 •

edited

Loading