Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[telemetry] Limit resource usage of ST and disable OOM killer #10017

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions rules/docker-telemetry.mk
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ endif

$(DOCKER_TELEMETRY)_CONTAINER_NAME = telemetry
$(DOCKER_TELEMETRY)_RUN_OPT += --privileged -t
$(DOCKER_TELEMETRY)_RUN_OPT += --memory 450m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure this scenario does not happen:

  1. the memory_checker seems running inside telemetry docker, and the threshold is 400m: https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-sonic-telemetry/base_image_files/monit_telemetry

  2. if memory_checker run first time check and current memory is 390m, memory_checker passed. memory_check will check 1 minute later.

  3.  then before memory next time run, telemetry use more than 450M memory.

  4. because can't allocate memory, memory_checker failed. telemetry container can't start again, telemetry container will hang there and memory will never release.

https://docs.docker.com/engine/reference/run/#user-memory-constraints

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #19179, we're using monit to monitor runtime memory usage of docker, @qiluo-msft do we still need this docker hard limit as double insurance?

$(DOCKER_TELEMETRY)_RUN_OPT += --cpus 0.3
$(DOCKER_TELEMETRY)_RUN_OPT += --oom-kill-disable
$(DOCKER_TELEMETRY)_RUN_OPT += -v /etc/sonic:/etc/sonic:ro
$(DOCKER_TELEMETRY)_RUN_OPT += -v /usr/share/sonic/scripts:/usr/share/sonic/scripts:ro
$(DOCKER_TELEMETRY)_RUN_OPT += -v /var/run/dbus:/var/run/dbus:rw
Expand Down