Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing with hundreds of processes #233

Open
rgaufman opened this issue Mar 8, 2020 · 13 comments
Open

Crashing with hundreds of processes #233

rgaufman opened this issue Mar 8, 2020 · 13 comments

Comments

@rgaufman
Copy link

rgaufman commented Mar 8, 2020

I'm trying to monitor around 400 processes, eye jumps to 100% CPU on 1 core and then threads start crashing:

2020-03-08 20:19:12.603496 E [91602:70218487656520 logger.rb:53] eye -- [celluloid] thread crashed
Celluloid::TaskTerminated: task was terminated
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:35:in `terminate'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:323:in `block in cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `each'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:307:in `shutdown'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:169:in `run'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:131:in `block in start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
	(celluloid):0:in `remote procedure call'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:45:in `value'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:22:in `method_missing'
	/data/deployer/timeagent/eye/helpers.rb:195:in `process_started?'
	/data/deployer/timeagent/eye/helpers.rb:205:in `block in wait_for_process'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `instance_exec'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `exec_proc'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger/starting_guard.rb:28:in `block in check_start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `public_send'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:16:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/future.rb:18:in `block in new'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
2020-03-08 20:19:12.603970 E [91602:70216306749380 logger.rb:53] eye -- [celluloid] thread crashed

I'm also seeing errors like this:

2020-03-08 10:25:15.579624 E [42676:70134578180800 logger.rb:53] eye -- [celluloid] Actor crashed!
Celluloid::DeadActorError: attempted to call a dead actor: proc_cpu
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:9:in `method_missing'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:26:in `start_time'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/system.rb:44:in `compare_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/monitor.rb:84:in `check_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:16:in `block in add_watchers'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'

and:

2020-03-08 10:25:15.601780 E [42676:70134578814120 logger.rb:53] eye -- [celluloid] Actor crashed!
Celluloid::DeadActorError: attempted to call a dead actor: proc_cpu
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:9:in `method_missing'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:26:in `start_time'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/system.rb:44:in `compare_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/monitor.rb:84:in `check_identity'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:16:in `block in add_watchers'
        /data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'
        /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'

This is an AMD EPYC 7451 24-Core CPU, so it can handle it a lot more, with all processes running it's only 10% loaded. Are there any parameters I can tune in eye to handle this many processes?

@rgaufman rgaufman changed the title Issue with hundreds of processes Crashing with hundreds of processes Mar 8, 2020
@kostya
Copy link
Owner

kostya commented Mar 8, 2020

i not tested so much processes, but ~100 was ok. Ruby uses 1 core for it concurrency, so 100% cpu usage possible. But still should not. Error repeated if restart eye? As workaround, you can try to run multiple eyes with group of processes in different folders (local eye - leye).

@rgaufman
Copy link
Author

rgaufman commented Mar 8, 2020

It's just constantly hovering with 100% and not responding to bundle exec eye i -- I can't really try multiple processes without re-architecturing how the app works.

Is there something, somewhere I can adjust to reduce how many times each process is checked or something? - anything else you can think of to help with reducing load with many many processes?

I temporarily reduced the number of processes to 300 and now it's taking 17 to 30% CPU - it seems it hits some kind of threshold and then everything stops working.

@kostya
Copy link
Owner

kostya commented Mar 8, 2020

To minimize load, you can remove cpu, memory checks. Also disable identity check check_identity: false, increase check_alive prediod: check_alive_period: 30.seconds

Also, you can try increase expire in cache of getting cpu, memory info from OS:

Eye::SystemResources.cache.setup_expire(30)

@rgaufman
Copy link
Author

I tried all of those things and still getting this:

2020-03-11 23:40:36.466493 E [79975:70006573067020 logger.rb:53] eye -- [recorder_5e65f1c2a79dfa42c5f0b87b:5e65f1c2a79dfa42c5f0b87b:live] check:cpu(<100%) Exception: attempted to call a dead actor: proc_cpu ["/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:9:in `method_missing'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.545344 E [79975:70006572737640 logger.rb:53] eye -- [recorder_5e65f1cfa79dfa42c5f0b8aa:5e65f1cfa79dfa42c5f0b8aa:proxy] check:memory(<300Mb) Exception: undefined method `proc_mem' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:9:in `memory'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/memory.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.655136 E [79975:70006577492940 logger.rb:53] eye -- [recorder_5e63325716a95a181c898f6c:5e63325716a95a181c898f6c:detector] process <17218> not found, it may have crashed (you should check the process logs ["/data/deployer/timeagent/log/recorder/detector-5e63325716a95a181c898f6c.log", "/data/deployer/timeagent/log/recorder/detector-5e63325716a95a181c898f6c.log"])
2020-03-11 23:40:36.655316 E [79975:70006577492940 logger.rb:53] eye -- [recorder_5e63325716a95a181c898f6c:5e63325716a95a181c898f6c:detector] process <17218> failed to start (:not_really_running)
2020-03-11 23:40:36.664231 E [79975:70006572911200 logger.rb:53] eye -- [recorder_5e65f1c7a79dfa42c5f0b889:5e65f1c7a79dfa42c5f0b889:detector] check:cpu(<100%) Exception: undefined method `proc_cpu' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.668949 E [79975:70006573728220 logger.rb:53] eye -- [recorder_5e633a3a4caa415b13ddccef:5e633a3a4caa415b13ddccef:proxy] check:cpu(<30%) Exception: undefined method `proc_cpu' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.673219 E [79975:70006573105240 logger.rb:53] eye -- [recorder_5e6349775f439f29fd773901:5e6349775f439f29fd773901:recorder] check:cpu(<100%) Exception: undefined method `proc_cpu' for nil:NilClass ["/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/system_resources.rb:15:in `cpu'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker/cpu.rb:10:in `get_value'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:136:in `get_value_safe'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/checker.rb:108:in `check'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:66:in `watcher_tick'", "/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/process/watchers.rb:48:in `block in add_watcher'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:339:in `block in task'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task.rb:44:in `block in initialize'", "/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:14:in `block in create'"]
2020-03-11 23:40:36.722170 E [79975:70006578352780 logger.rb:53] eye -- [celluloid] thread crashed
Celluloid::TaskTerminated: task was terminated
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/task/fibered.rb:35:in `terminate'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:323:in `block in cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `each'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:321:in `cleanup'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:307:in `shutdown'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:169:in `run'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor.rb:131:in `block in start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
	(celluloid):0:in `remote procedure call'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:45:in `value'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/proxy/sync.rb:22:in `method_missing'
	/data/deployer/timeagent/eye/helpers.rb:195:in `process_started?'
	/data/deployer/timeagent/eye/helpers.rb:205:in `block in wait_for_process'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `instance_exec'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger.rb:103:in `exec_proc'
	/data/deployer/timeagent/vendor/cache/eye-7b8f32d1cdd4/lib/eye/trigger/starting_guard.rb:28:in `block in check_start'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `public_send'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/calls.rb:28:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/call/sync.rb:16:in `dispatch'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/future.rb:18:in `block in new'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-essentials-0.20.5/lib/celluloid/internals/thread_handle.rb:14:in `block in initialize'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/actor/system.rb:78:in `block in get_thread'
	/data/deployer/timeagent/vendor/bundle/ruby/2.6.0/gems/celluloid-0.17.4/lib/celluloid/group/spawner.rb:50:in `block in instantiate'
2020-03-11 23:40:36.722753 E [79975:70006578844600 logger.rb:53] eye -- [celluloid] thread crashed
Celluloid::TaskTerminated: task was terminated

This is what's in top

Tasks: 1609 total,   8 running, 1070 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  6.1 sy, 12.3 ni, 80.6 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem : 65649456 total,  8356892 free, 15504224 used, 41788340 buff/cache
KiB Swap:  8388604 total,  8375792 free,    12812 used. 49448092 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
18519 deployer  35  15 45.117g 158928  12396 S 101.9  0.2   1:26.30 eye monitoring v0.10.1.pre [recorder_5e63325316a95a181c898f5d, recorder_5e63325416a95
13808 deployer  30  10 3291700 342060  14968 S  43.2  0.5   6:21.70 puma 4.3.1 (tcp://127.0.0.1:3001) [timeagent]
69395 deployer  35  15  171752  23540   5984 S  32.6  0.0   0:01.01 ruby2.6 /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/bin/eye xinfo
69712 deployer  35  15  171756  23584   6008 S  31.6  0.0   0:00.98 ruby2.6 /data/deployer/timeagent/vendor/bundle/ruby/2.6.0/bin/eye i -j
42865 deployer  35  15 8825944 495948  15276 S  20.3  0.8   3:40.19 sidekiq 6.0.5 timeagent [2 of 8 busy]
92068 deployer  35  15 13.847g 228424  35692 S  19.7  0.3   0:25.49 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a384caa415b13ddcce6 --listen_port 10738 --brand Te+
96621 deployer  35  15 13.846g 231320  35640 S  19.4  0.4   0:24.70 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e636e34a79dfa1d11329522 --listen_port 10810 --brand Te+
96309 deployer  35  15 13.846g 228176  35408 S  18.7  0.3   0:24.56 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e636e33a79dfa1d1132951c --listen_port 10807 --brand Te+
95581 deployer  35  15 13.846g 226416  35760 S  17.7  0.3   0:27.16 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349755f439f29fd7738f7 --listen_port 10792 --brand Te+
91260 deployer  35  15 13.847g 230768  35684 S  17.4  0.4   0:25.22 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a364caa415b13ddccd9 --listen_port 10723 --brand Te+
93139 deployer  35  15 13.847g 224536  36116 S  17.4  0.3   0:24.98 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349715f439f29fd7738d6 --listen_port 10759 --brand Te+
96870 deployer  35  15 13.846g 230900  35404 S  15.5  0.4   0:24.11 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e636e37a79dfa1d11329529 --listen_port 10813 --brand Te+
92506 deployer  35  15 13.847g 235220  35480 S  14.8  0.4   0:23.46 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a3b4caa415b13ddccf6 --listen_port 10756 --brand Te+
93298 deployer  35  15 7508252 183024  35376 S  14.5  0.3   0:19.63 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349725f439f29fd7738df --listen_port 10765 --brand Te+
96120 deployer  35  15 13.847g 228884  35860 S  14.5  0.3   0:26.14 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e6349775f439f29fd773901 --listen_port 10801 --brand Te+
90826 deployer  35  15 13.847g 224616  35204 S  13.9  0.3   0:24.04 python3 /data/deployer/timeagent/vendor/xanDetector/xanDetector.py --camera_id 5e633a364caa415b13ddccd5 --listen_port 10717 --brand Te+
...

I've added

Eye::SystemResources.cache.setup_expire(30)
Eye.application "recorder_#{camera[:camera_id]}" do
   ...
  check_identity false
  check_alive_period 30.seconds
end

Any other ideas/suggestions?

@kostya
Copy link
Owner

kostya commented Mar 12, 2020

hard to say, looks like may be celluloid bug. undefined method 'proc_cpu' for nil:NilClass this error impossible because proc_cpu called from global class, which impossible to be nil, it created when eye start. May be there was first error in log, after which this undefined method 'proc_cpu' for nil:NilClass appears.

@rgaufman
Copy link
Author

I have been trying to play with this without success :(

I think I am going to have to come up with another strategy. Maybe to create a simple layer to systemd, where adding an application will create .service files and reload the systemd daemon. Kind of like the whenever gem does with cron.

From there, the process will just be a single loop, that queries systemd and checks CPU/Ram and performs the required systemd action. This way, each iteration will just take a bit longer depending on the number of processes, without adding additional load.

@kostya
Copy link
Owner

kostya commented Mar 15, 2020

Yea ruby just bad, when high concurrency, try to split processes to multiple eyes or snt else, i dont know what to fix here.

@TeresaP
Copy link

TeresaP commented Jul 20, 2021

Have you made progress on this by chance @kostya?

@kostya
Copy link
Owner

kostya commented Jul 20, 2021

No, try all advices in this thread.

@rgaufman
Copy link
Author

rgaufman commented Jul 13, 2022

I ended up switching to systemd with a simple ruby script to manage the processes. It actually was relatively easy, easier than I expected. Just an erb template for the .service file and then adding/removing them as needed, doing a systemctl enable and systemctl daemon-reload and it's doing most of the monitoring eye does but taking up virtually 0% cpu vs 100%+ eye was taking. It is also easy to add/remove dependencies and systemd just handles it all for you.

The code is specific to my application, but some examples of how it works are:

def remove(service_name, service_path)
  system("sudo systemctl stop #{service_name}")
  system("sudo systemctl disable #{service_name}")
  system("sudo rm #{service_path}")
  system('sudo systemctl daemon-reload')
end

similarly for add except you need to generate it from an erb template. For checking status, you parse the output of systemctl status process-name. It is working like a dream and starting/stopping/restarting is lightening fast.

@grimm26
Copy link
Contributor

grimm26 commented Jul 13, 2022

@rgaufman With systemd, why use eye at all?

@rgaufman
Copy link
Author

rgaufman commented Jul 13, 2022

I removed eye from our application, it's fantastic for managing a smaller number of processes but seems celluloid didn't work well when it came to 100+ processes. But even with smaller number of processes, we now just script systemd as it's much more resource efficient and has some other advantages like better dependency handling.

@grimm26
Copy link
Contributor

grimm26 commented Jul 13, 2022

@rgaufman do you have anything to share on this on github?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants