-
Notifications
You must be signed in to change notification settings - Fork 154
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3c05a27
commit d6fe7d6
Showing
17 changed files
with
321 additions
and
2 deletions.
There are no files selected for viewing
Binary file added
BIN
+191 KB
source/_static/kubernetes/concepts/configuration/k8s_limits_and_requests_cpu.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+167 KB
...ce/_static/kubernetes/concepts/configuration/k8s_limits_and_requests_figure.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+201 KB
...ce/_static/kubernetes/concepts/configuration/k8s_limits_and_requests_memory.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24 changes: 24 additions & 0 deletions
24
.../concepts/configuration/resource_management_for_pods_containers/deployment_resources.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
kind: Deployment | ||
apiVersion: extensions/v1beta1 | ||
… | ||
template: | ||
spec: | ||
containers: | ||
- name: redis | ||
image: redis:5.0.3-alpine | ||
resources: | ||
limits: | ||
memory: 600Mi | ||
cpu: 1 | ||
requests: | ||
memory: 300Mi | ||
cpu: 500m | ||
- name: busybox | ||
image: busybox:1.28 | ||
resources: | ||
limits: | ||
memory: 200Mi | ||
cpu: 300m | ||
requests: | ||
memory: 100Mi | ||
cpu: 100m |
15 changes: 15 additions & 0 deletions
15
.../concepts/configuration/resource_management_for_pods_containers/namespace_limitrange.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
apiVersion: v1 | ||
kind: LimitRange | ||
metadata: | ||
name: cpu-resource-constraint | ||
spec: | ||
limits: | ||
- default: | ||
cpu: 500m | ||
defaultRequest: | ||
cpu: 500m | ||
min: | ||
cpu: 100m | ||
max: | ||
cpu: "1" | ||
type: Container |
10 changes: 10 additions & 0 deletions
10
...ncepts/configuration/resource_management_for_pods_containers/namespace_resourcequota.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
apiVersion: v1 | ||
kind: ResourceQuota | ||
metadata: | ||
name: mem-cpu-demo | ||
spec: | ||
hard: | ||
requests.cpu: 2 | ||
requests.memory: 1Gi | ||
limits.cpu: 3 | ||
limits.memory: 2Gi |
1 change: 1 addition & 0 deletions
1
...cepts/configuration/resource_management_for_pods_containers/namespace_resourcequota_apply
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
kubectl apply -f resourcequota.yaml --namespace=mynamespace |
1 change: 1 addition & 0 deletions
1
...oncepts/configuration/resource_management_for_pods_containers/namespace_resourcequota_get
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
kubectl get resourcequota -n mynamespace |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
59 changes: 59 additions & 0 deletions
59
source/kubernetes/in_action/oom_in_k8s/k8s_exit_code_137.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
.. _k8s_exit_code_137: | ||
|
||
========================================== | ||
Kubernetes的pod异常退出(ExitCode:137) | ||
========================================== | ||
|
||
在生产环境中, ``OOM kill`` ( :ref:`linux_oom` )是非常常见的现象,对于 Kubernetes 而言,在 ``kubelet`` 日志中会记录类似: | ||
|
||
.. literalinclude:: k8s_exit_code_137/kubelet_oom_killed | ||
:caption: ``kubelet`` 日志中记录了 ``XXXXX`` pod被 ``OOMKilled`` | ||
:emphasize-lines: 1 | ||
|
||
Exit Code 137 | ||
================ | ||
|
||
Exit Code 137表示进程由于使用太多内存而被(意外)终止。 | ||
|
||
在Linux操作系统中,所有意外退出/杀死的进程都会返回一个退出码,以提供一个检查机制来通知用户、系统和应用程序进程为何停止。这个退出码(exit code)数值介于 ``0`` 到 ``255`` : | ||
|
||
- ``0`` 表示shell命令成功执行完成; **非零** 退出状态表示失败 | ||
- 当命令因为编号为 ``N`` 的 **致命信号** (fatal signal) 退出时,Bash就会使用 ``128+N`` 作为退出状态 | ||
|
||
- ``OOMKilled`` 进程收到的致命信号是 ``9`` ,也就是 ``SIGKILL (signal 9)`` 强制杀死 | ||
- ``128+9 = 137`` 表示pod进程是被操作系统直接杀死的 | ||
- 如果 ``kubelet`` 记录的 ``ExitCode`` 是 ``143`` ,则表明容器是被 ``SIGTERM (signal 15)`` **温柔** 终止 | ||
|
||
- 在bash中,最后一个命令的退出状态可以在特殊参数 ``$?`` 查看( ``echo $?`` ) | ||
|
||
- 当 Kubernetes 记录了容器或Pod因为内存使用过高而终止 ``ExitCode 137`` ,就应该详细调查程序是否存在内存泄漏或者编程不佳导致资源过渡消耗 | ||
|
||
常见的内存问题 | ||
================ | ||
|
||
容器使用超出配置的内存限制 | ||
--------------------------- | ||
|
||
容器超出内存限制( :ref:`k8s_limits_and_requests` )就会触发操作系统OOM kill,如果代码检查没有出现 **内存泄漏** 以及低效代码,则可以根据业务情况调整 :ref:`resource_management_for_pods_containers` 避免触发OOM Kill | ||
|
||
如果没有合理的容器内存限制,并且在容器内存使用达到 ``limits`` 之前及时告警和处理,有可能会触发物理服务器节点的操作系统级别的OOM kill,这种进程杀死是随机的不可预测的,有可能殃及并没有超出预期设置的正常pod被误杀。 | ||
|
||
应用程序内存泄漏 | ||
------------------- | ||
|
||
当应用程序使用内存但是操作完成后没有释放内存,就会发生内存泄漏,导致内存逐渐填满并耗尽所有可用容量。可以尝试采用 :ref:`valgrind` 这样的诊断工具来帮助排查内存泄漏。 | ||
|
||
负载监控 | ||
---------- | ||
|
||
随着业务增长,有可能内存消耗更多(内存密集型应用)。所以要进行长期广泛的监控( :ref:`prometheus` ),以便能够观察到变化趋势以及即使收到告警,以及采用自动扩展方案 | ||
|
||
|
||
|
||
参考 | ||
===== | ||
|
||
- `How to Fix Exit Code 137 | Kubernetes Memory Issues <https://foxutech.com/how-to-fix-exit-code-137-kubernetes-memory-issues/>`_ | ||
- `Bash: Exit Status <https://www.gnu.org/software/bash/manual/html_node/Exit-Status.html>`_ | ||
- `SIGKILL: Fast Termination Of Linux Containers | Signal 9 <https://komodor.com/learn/what-is-sigkill-signal-9-fast-termination-of-linux-containers/>`_ | ||
- `How to fix exit code 137 <https://www.airplane.dev/blog/exit-code-137>`_ |
3 changes: 3 additions & 0 deletions
3
source/kubernetes/in_action/oom_in_k8s/k8s_exit_code_137/kubelet_oom_killed
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
...ContainerStatuses:[{Name:XXXXX State:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:137,Signal:0,Reason:OOMKilled, | ||
Message:,StartedAt:2023-08-29 21:51:09 +0800 CST,FinishedAt:2023-08-30 06:09:36 +0800 CST, | ||
ContainerID:pouch://83887b90809b1d48e2c00d5e870061d29ad15d214cf90963e02903011d15e3d7,}}}]... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
...tes/monitor/prometheus/prometheus_exporters/node_exporter_killed_by_sigpipe.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
.. _node_exporter_killed_by_sigpipe: | ||
|
||
=================================== | ||
Node Exporter被 ``SIGPIPE`` 杀死 | ||
=================================== | ||
|
||
.. note:: | ||
|
||
这个问题我主要是记录,尚未真正遇到,不过,我遇到过类似生产环境组件因为 ``SIGPIPE`` 退出的问题,原理和情况相似。 | ||
|
||
这个 ``SIGPIPE`` 导致 Node Exporter 退出的问题会发生在早期版本,不过现在 Node Exporter 内置了 ``SIGPIPE`` 处理,所以已经不再出现因为 ``PIPE`` 管道异常导致退出的问题。这里仅作为记录参考 | ||
|
||
Node Exproter在 :ref:`systemd` 环境下运行,有时候你可能会遇到异常退出,检查 ``systemctl status node_exporter`` 可能会看到类似: | ||
|
||
.. literalinclude:: node_exporter_killed_by_sigpipe/systemd_service_killed_sigpipe | ||
:caption: 服务进程被 ``PIPE`` 信号杀死的案例 | ||
:emphasize-lines: 5 | ||
|
||
这里被 **信号** ``PIPE`` 杀死的原因: | ||
|
||
当使用已经失效的读取器写入管道时,写入器将收到 ``SIGPIPE`` 信号。默认情况下,这会终止进程。如果忽略这个信号,写入将返回错误 ``EPIPE`` 。无论 ``reader`` 是怎么死亡的,这种情况都会发生。 | ||
|
||
这里 ``node_exporter`` 服务配置是: | ||
|
||
.. literalinclude:: node_exporter_killed_by_sigpipe/node_exporter.service | ||
:caption: ``node_exporter.service`` 配置 | ||
:emphasize-lines: 13 | ||
|
||
默认配置是 ``Restart=on-failure`` ,早期版本没有处理 ``SIGPIPE`` 信号,所以会导致收到 ``PIPE`` 信号时候退出,但是因为没有作为Fail处理,所以也不会自动启动。 | ||
|
||
对于不能处理 ``SIGPIPE`` 信号的软件退出问题,可以修改 :ref:`systemd` 配置,修改为:: | ||
|
||
Restart=always | ||
|
||
或者加上:: | ||
|
||
RestartForceExitStatus=SIGPIPE | ||
|
||
参考 | ||
====== | ||
|
||
- `Is SIGPIPE signal received when reader is killed forcefully(kill -9)? <https://stackoverflow.com/questions/70648067/is-sigpipe-signal-received-when-reader-is-killed-forcefullykill-9>`_ |
19 changes: 19 additions & 0 deletions
19
...tor/prometheus/prometheus_exporters/node_exporter_killed_by_sigpipe/node_exporter.service
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
[Unit] | ||
Description=node_exporter | ||
Wants=network-online.target | ||
After=network-online.target | ||
|
||
StartLimitIntervalSec=500 | ||
StartLimitBurst=5 | ||
|
||
[Service] | ||
User=prometheus | ||
Group=prometheus | ||
Type=simple | ||
Restart=on-failure | ||
RestartSec=5s | ||
ExecStart=/usr/local/bin/node_exporter \ | ||
--collector.textfile.directory=/var/lib/node_exporter/textfile_collector | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
5 changes: 5 additions & 0 deletions
5
...theus/prometheus_exporters/node_exporter_killed_by_sigpipe/systemd_service_killed_sigpipe
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
● node_exporter.service - Prometheus exporter for machine metrics, written in Go with pluggable metric collectors. | ||
Loaded: loaded (/usr/lib/systemd/system/node_exporter.service; enabled; vendor preset: disabled) | ||
Active: inactive (dead) since Sat 2017-05-27 04:54:39 EDT; 1 day 20h ago | ||
Docs: https://prometheus.io | ||
Main PID: 27203 (code=killed, signal=PIPE) |