Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

간헐적 서비스 중단 발생 (2021-04-23 -) #249

Closed
lens0021 opened this issue Apr 23, 2021 · 29 comments
Closed

간헐적 서비스 중단 발생 (2021-04-23 -) #249

lens0021 opened this issue Apr 23, 2021 · 29 comments
Assignees

Comments

@lens0021
Copy link
Member

lens0021 commented Apr 23, 2021

Local time:
image

UTC:
image

@lens0021 lens0021 added this to To do in femiwiki.com via automation Apr 23, 2021
@lens0021
Copy link
Member Author

journalctl -eu nomad:

07:40:31.094Z [WARN]  nomad.heartbeat: node TTL expired: node_id=eef378d3-8278-c9d2-fbee-951065ce759e
07:40:47.947Z [ERROR] client.alloc_runner.task_runner.task_hook: CSI Plugin fingerprinting failed: alloc_id=9b777362-1940-c7fe-de7b-7e5cd2b99ef7 task=plugin error="failed to create csi client: failed to open grpc connection to addr: /opt/nomad/client/csi/controller/aws-ebs0/csi.sock, err: context deadline exceeded"
07:40:48.637Z [ERROR] client.alloc_runner.task_runner.task_hook: CSI Plugin fingerprinting failed: alloc_id=698ef4ef-cb54-8837-dbd7-5751adf68638 task=plugin error="failed to create csi client: failed to open grpc connection to addr: /opt/nomad/client/csi/node/aws-ebs0/csi.sock, err: context deadline exceeded"
07:41:05.760Z [WARN]  nomad.heartbeat: node TTL expired: node_id=eef378d3-8278-c9d2-fbee-951065ce759e
07:41:22.769Z [INFO]  client.aws-ebs0: finished client unary call: grpc.code=DeadlineExceeded duration=31.945803691s grpc.service=csi.v1.Node grpc.method=NodeGetCapabilities
07:41:23.153Z [INFO]  client.aws-ebs0: finished client unary call: grpc.code=DeadlineExceeded duration=32.057172067s grpc.service=csi.v1.Controller grpc.method=ControllerGetCapabilities
07:41:51.364Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: connection write timeout" rpc=Node.UpdateStatus server=172.31.22.50:4647
07:42:35.165Z [ERROR] nomad.client: status update failed: error="timed out enqueuing operation"
07:49:39.756Z [ERROR] client: yamux: keepalive failed: i/o deadline reached
07:52:19.311Z [ERROR] nomad.rpc: yamux: keepalive failed: i/o deadline reached
08:01:54.261Z [ERROR] client: yamux: established stream without inflight SYN (didn't have semaphore)
08:01:49.677Z [ERROR] nomad.rpc: multiplex_v2 conn accept failed: error="connection write timeout"
08:01:52.798Z [WARN]  client.driver_mgr.docker: failed to get Docker system info: driver=docker error="Get "http://unix.sock/info": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
08:01:54.365Z [WARN]  client.driver_mgr.docker: failed to remove dangling containers: driver=docker error="failed to find untracked containers: failed to list containers: context deadline exceeded"
08:01:52.798Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: connection write timeout" rpc=Node.UpdateStatus server=172.31.22.50:4647
08:01:54.511Z [ERROR] nomad.heartbeat: update node status failed: error="timed out enqueuing operation"
08:01:55.041Z [ERROR] client.rpc: error performing RPC to server: error="rpc error: EOF" rpc=Node.GetClientAllocs server=172.31.22.50:4647
08:01:55.053Z [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: EOF" rpc=Node.GetClientAllocs server=172.31.22.50:4647
08:01:55.370Z [ERROR] client: error heartbeating. retrying: error="failed to update status: rpc error: connection write timeout" period=29.806185786s
08:01:55.489Z [ERROR] client: error querying node allocations: error="rpc error: EOF"
08:01:59.336Z [ERROR] client: error discovering nomad servers: error="client.consul: unable to query Consul datacenters: Get "http://127.0.0.1:8500/v1/catalog/datacenters": dial tcp 127.0.0.1:8500: connect: connection refused"
08:02:06.470Z [INFO]  client: node registration complete
08:02:10.930Z [INFO]  client.gc: marking allocation for GC: alloc_id=07ce6d74-4d2e-6fa4-5411-cdc49b869d97
08:02:11.050070 [INFO] (runner) stopping
08:02:11.051316 [INFO] (runner) received finish
08:02:12.546Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=5bf763e3-76d6-f6bb-673c-ed9f73e950db task=wait-for-backend @module=logmon path=/opt/nomad/alloc/5bf763e3-76d6-f6bb-673c-ed9f73e950db/alloc/logs/.wait-for-backend.stdout.fifo timestamp=2021-04-23T08:02:12.546Z
08:02:12.546Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=5bf763e3-76d6-f6bb-673c-ed9f73e950db task=wait-for-backend @module=logmon path=/opt/nomad/alloc/5bf763e3-76d6-f6bb-673c-ed9f73e950db/alloc/logs/.wait-for-backend.stderr.fifo timestamp=2021-04-23T08:02:12.546Z
08:02:12.568Z [INFO]  client.driver_mgr.exec: starting task: driver=exec driver_cfg="{Command:sh Args:[-c while ! ncat --send-only 127.0.0.1 3306 < /dev/null; do sleep 1; done;while ! ncat --send-only 127.0.0.1 11211 < /dev/null; do sleep 1; done] ModePID: ModeIPC:}"
08:02:12.891Z [ERROR] client.driver_mgr.exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=5bf763e3-76d6-f6bb-673c-ed9f73e950db driver=exec task_name=wait-for-backend error="rpc error: code = Unavailable desc = transport is closing"
08:02:12.893Z [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=5bf763e3-76d6-f6bb-673c-ed9f73e950db task=wait-for-backend reason="Restart unnecessary as task terminated successfully"
08:02:12.906Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=5bf763e3-76d6-f6bb-673c-ed9f73e950db task=fastcgi @module=logmon path=/opt/nomad/alloc/5bf763e3-76d6-f6bb-673c-ed9f73e950db/alloc/logs/.fastcgi.stdout.fifo timestamp=2021-04-23T08:02:12.906Z
08:02:12.907Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=5bf763e3-76d6-f6bb-673c-ed9f73e950db task=fastcgi @module=logmon path=/opt/nomad/alloc/5bf763e3-76d6-f6bb-673c-ed9f73e950db/alloc/logs/.fastcgi.stderr.fifo timestamp=2021-04-23T08:02:12.907Z
08:02:13.044772 [INFO] (runner) creating new runner (dry: false, once: false)
08:02:13.045445 [INFO] (runner) creating watcher
08:02:13.045555 [INFO] (runner) starting
08:02:13.048959 [INFO] (runner) rendered "(dynamic)" => "/opt/nomad/alloc/5bf763e3-76d6-f6bb-673c-ed9f73e950db/fastcgi/local/Hotfix.php"
08:02:13.183Z [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=919b2def8141d3bb740d87a8558e28794fc580153f29d042e62c7c171696db1c
08:02:13.580Z [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=919b2def8141d3bb740d87a8558e28794fc580153f29d042e62c7c171696db1c
08:02:34.482Z [INFO]  client: node registration complete
09:13:58.192Z [INFO]  client.gc: garbage collecting allocation: alloc_id=07ce6d74-4d2e-6fa4-5411-cdc49b869d97 reason="forced collection"

@lens0021 lens0021 moved this from To do to In Progress in femiwiki.com Apr 23, 2021
@lens0021
Copy link
Member Author

journalctl -eu docker

"2021-04-22T02:55:07.945355124Z" level=warning msg="Health check for container 9cef2c3d1aff4f40fa35d0070642f4c0c3ceae1ebdbc97115863ef823983dc42 error: context deadline exceeded: unknown"
"2021-04-22T02:55:07.906417630Z" level=warning msg="Health check for container 25bb0ccca33ddd0075020032e35e8a567895d0a2d12471bd9e06411b6feb2e83 error: context deadline exceeded: unknown"
"2021-04-22T02:55:11.158859113Z" level=error msg="stream copy error: reading from a closed fifo"
"2021-04-22T02:55:11.106677874Z" level=error msg="stream copy error: reading from a closed fifo"
"2021-04-22T02:55:11.315779346Z" level=warning msg="Health check for container 3db5744a0a7694454dcacaf856ddc6bc120a878de1284f397e7844036c861061 error: context deadline exceeded: unknown"
"2021-04-23T07:40:23.434306765Z" level=warning msg="Health check for container 9cef2c3d1aff4f40fa35d0070642f4c0c3ceae1ebdbc97115863ef823983dc42 error: context deadline exceeded"
"2021-04-23T07:40:27.453973845Z" level=warning msg="Ignoring Exit Event, no such exec command found" container=9cef2c3d1aff4f40fa35d0070642f4c0c3ceae1ebdbc97115863ef823983dc42 exec-id=f36d4e990edd2c81a2cb1d7c77a5100a8c0f64b8ba2afdacc57422386008872f exec-pid=3337
"2021-04-23T07:40:28.102952744Z" level=error msg="stream copy error: reading from a closed fifo"
"2021-04-23T07:40:28.156263187Z" level=error msg="stream copy error: reading from a closed fifo"
"2021-04-23T07:40:28.494677518Z" level=warning msg="Health check for container 3db5744a0a7694454dcacaf856ddc6bc120a878de1284f397e7844036c861061 error: context deadline exceeded: unknown"
"2021-04-23T07:40:28.494957902Z" level=error msg="stream copy error: reading from a closed fifo"
"2021-04-23T07:40:28.514087911Z" level=error msg="stream copy error: reading from a closed fifo"
"2021-04-23T07:40:28.514199426Z" level=warning msg="Health check for container 25bb0ccca33ddd0075020032e35e8a567895d0a2d12471bd9e06411b6feb2e83 error: context deadline exceeded: unknown"

$ docker container ls -a

CONTAINER ID        IMAGE                                                  COMMAND                  CREATED             STATUS                    PORTS               NAMES
919b2def8141        ghcr.io/femiwiki/mediawiki:2021-04-20T08-41-c3cea3e5   "/tini -- /usr/local…"   2 hours ago         Up 2 hours                                    fastcgi-5bf763e3-76d6-f6bb-673c-ed9f73e950db
760753ac85e5        ghcr.io/femiwiki/backupbot:2021-04-17T09-31-5ce8638c   "/tini -- /usr/local…"   43 hours ago        Up 43 hours (unhealthy)                       backupbot-bce54ada-616e-0a13-58c2-53508dbfb444
567c8b62e91d        mysql/mysql-server:8.0.23                              "/entrypoint.sh --de…"   2 days ago          Up 2 days (healthy)                           mysql-7869f0a0-cb08-da7f-01c4-8a7434bddbf6
60308676482c        gcr.io/google_containers/pause-arm64:3.1               "/pause"                 2 days ago          Up 2 days                                     nomad_init_7869f0a0-cb08-da7f-01c4-8a7434bddbf6
3db5744a0a76        ghcr.io/femiwiki/mathoid:2021-04-17T11-11-4e1427b4     "docker-entrypoint.s…"   3 days ago          Up 3 days (healthy)                           mathoid-e64df37b-c0a2-0441-2808-24d14d059b1e
25bb0ccca33d        ghcr.io/femiwiki/parsoid:2021-04-17T11-06-0e9f9fb2     "docker-entrypoint.s…"   3 days ago          Up 3 days (healthy)                           parsoid-5a2e21b5-9a60-eb21-2b22-2af5c58ebe87
9cef2c3d1aff        ghcr.io/femiwiki/restbase:2021-04-17T11-25-e5b25017    "docker-entrypoint.s…"   3 days ago          Up 3 days (healthy)                           restbase-204d0e37-3b54-33bd-03f2-447394863c55
e410da24e883        gcr.io/google_containers/pause-arm64:3.1               "/pause"                 3 days ago          Up 3 days                                     nomad_init_e64df37b-c0a2-0441-2808-24d14d059b1e
129a70b547bc        ghcr.io/femiwiki/mediawiki:2021-04-19T12-14-11fd8960   "/tini -- caddy run"     3 days ago          Up 3 days                                     http-7819ca49-23a9-29f1-2829-2d333c668695
56e836e18396        memcached:1.6.9-alpine                                 "docker-entrypoint.s…"   3 days ago          Up 3 days                                     memcached-e47c24bc-1181-b939-e1d7-0dbdab058589
cd243ece0b0e        gcr.io/google_containers/pause-arm64:3.1               "/pause"                 3 days ago          Up 3 days                                     nomad_init_e47c24bc-1181-b939-e1d7-0dbdab058589
bcc30538d092        amazon/aws-ebs-csi-driver:v0.10.1                      "/bin/aws-ebs-csi-dr…"   3 days ago          Up 3 days                                     plugin-698ef4ef-cb54-8837-dbd7-5751adf68638
03b027334db9        amazon/aws-ebs-csi-driver:v0.10.1                      "/bin/aws-ebs-csi-dr…"   3 days ago          Up 3 days                                     plugin-9b777362-1940-c7fe-de7b-7e5cd2b99ef7
690ed1d067e8        mysql/mysql-server:8.0.23                              "/entrypoint.sh --de…"   3 days ago          Exited (137) 3 days ago                       mysql-d378f9e1-c7f1-8b47-c35b-514661eb1284
094ac7e2479a        ghcr.io/femiwiki/mediawiki:2021-04-19T12-14-11fd8960   "/tini -- caddy run"     3 days ago          Exited (2) 3 days ago                         http-7ff99fd4-362f-70ff-6e1f-245277f66a44

@lens0021 lens0021 changed the title 서버 장애 발생 (2021-04-23 -) 일시 서버 장애 발생 (2021-04-23 -) Apr 23, 2021
@lens0021
Copy link
Member Author

장애가 있던 시간 말고도 이전에도 메모리 사용량이 높아진 때가 있었던 것 같습니다.
image
(시간대: UTC)

atop 설치가 누락되어서 (#248) 당시 기록이 없습니다. 새로 설치했으니 비슷한 상황이 다시 생기면 확인해보겠습니다.

@lens0021
Copy link
Member Author

새 장애 발생 중
nomad alloc logs -job -stderr fastcgi fastcgi

127.0.0.1 -  24/Apr/2021:06:54:59 +0900 "GET /index.php" 200
127.0.0.1 -  24/Apr/2021:06:55:00 +0900 "GET /index.php" 200
[24-Apr-2021 06:55:04] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 7 total children
[24-Apr-2021 06:55:06] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 8 total children
[24-Apr-2021 06:55:07] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 9 total children
[24-Apr-2021 06:55:08] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 10 total children
[24-Apr-2021 06:55:09] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 11 total children
[24-Apr-2021 06:55:10] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 12 total children
[24-Apr-2021 06:55:11] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 13 total children
[24-Apr-2021 06:55:12] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 14 total children
[24-Apr-2021 06:55:13] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 15 total children
[24-Apr-2021 06:55:14] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 16 total children
[24-Apr-2021 06:55:15] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 17 total children
[24-Apr-2021 06:55:16] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 18 total children
[24-Apr-2021 06:55:17] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 19 total children
[24-Apr-2021 06:55:19] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[24-Apr-2021 06:55:47] WARNING: [pool www] child 2923 exited on signal 9 (SIGKILL) after 768.148260 seconds from start
[24-Apr-2021 06:55:47] NOTICE: [pool www] child 3942 started
127.0.0.1 -  24/Apr/2021:06:55:02 +0900 "GET /api.php" 200
[24-Apr-2021 06:55:49] WARNING: [pool www] child 3435 exited on signal 9 (SIGKILL) after 403.967949 seconds from start
[24-Apr-2021 06:55:49] NOTICE: [pool www] child 3945 started
[24-Apr-2021 06:57:19] WARNING: [pool www] child 3751 exited on signal 9 (SIGKILL) after 248.414172 seconds from start
[24-Apr-2021 06:57:19] NOTICE: [pool www] child 3946 started
[24-Apr-2021 06:57:19] WARNING: [pool www] child 3844 exited on signal 9 (SIGKILL) after 160.683849 seconds from start
...

@lens0021
Copy link
Member Author

lens0021 commented Apr 23, 2021

fastcgi 잡의 메모리(119.8MiB)가 소프트 리밋(110MiB)을 넘어가 있길래 150MiB으로 늘렸더니 괜찮아졌는데, 늘리면서 alloc도 새로 만들어진지라 괜찮아진게 재시작의 영향인지 리밋의 영향인지 확실하지가 않습니다

lens0021 added a commit to femiwiki/nomad that referenced this issue Apr 23, 2021
@lens0021
Copy link
Member Author

$ nomad job status -verbose fastcgi
...
Allocations
ID                                    Eval ID                               Node ID                               Node Name                                        Task Group  Version  Desired  Status    Created               Modified
22bd99c4-c0c5-8714-dc4d-6af9286ca73f  a8c2d42b-0744-8542-e3da-99af0f1a12f7  eef378d3-8278-c9d2-fbee-951065ce759e  ip-172-31-22-50.ap-northeast-1.compute.internal  fastcgi     2        run      running   2021-04-23T22:22:09Z  2021-04-23T22:22:23Z
4885ac93-ccf4-6836-baed-c4ff9fbb08eb  fef9a4c1-9c5e-216f-d677-3ec7f308ed0a  eef378d3-8278-c9d2-fbee-951065ce759e  ip-172-31-22-50.ap-northeast-1.compute.internal  fastcgi     1        stop     complete  2021-04-23T21:09:18Z  2021-04-23T22:22:10Z
5bf763e3-76d6-f6bb-673c-ed9f73e950db  01e3c3a5-8447-723a-d65d-0ecc502f6b1c  eef378d3-8278-c9d2-fbee-951065ce759e  ip-172-31-22-50.ap-northeast-1.compute.internal  fastcgi     0        stop     complete  2021-04-23T08:02:09Z  2021-04-23T21:09:18Z

시간대별 정리입니다

  • 21:55Z - fastcgi에서 "seems busy" 경고 출력
  • 22:02Z - 개발팀 채널에 장애 알림이 옴
  • 22:22Z - fastcgi 메모리 소프트 리밋을 110MiB에서 150MiB로 올림

journalctl -u nomad에는 21시 로그가 거의 없이 22시 22분 alloc 재시작 관련 내용만 나옵니다

@lens0021 lens0021 changed the title 일시 서버 장애 발생 (2021-04-23 -) 간헐적 서버 장애 발생 (2021-04-23 -) Apr 23, 2021
@lens0021 lens0021 self-assigned this Apr 23, 2021
@lens0021 lens0021 changed the title 간헐적 서버 장애 발생 (2021-04-23 -) 간헐적 서비스 중단 발생 (2021-04-23 -) Apr 23, 2021
@lens0021
Copy link
Member Author

메인 볼륨의 Burst Balance이 바닥났습니다

image

@lens0021
Copy link
Member Author

lens0021 commented Apr 24, 2021

EBS에 대해서 unlimited 모드에 상응하는 그런 게 없어 보여서, 꺼놨던 t3a 인스턴스를 살려서 MySQL을 제외한 서비스를 옮겼습니다 MySQL을 제외한건 data-dir가 아키텍처에 독립적인 내용이라는 보장이 없어서(옮길 때 한번 실패했으므로[1])입니다

@lens0021
Copy link
Member Author

디스크 입출력 많이했던 건 일단 php-fpm입니다

image

lens0021 added a commit to femiwiki/infra that referenced this issue Apr 24, 2021
@lens0021
Copy link
Member Author

메모리 부족으로 php-fpm이 패닉에 빠지는 것일 수 있을 것 같아 메모리 제한에 여유를 둬서 다시 시도해보려고 합니다

lens0021 added a commit to femiwiki/nomad that referenced this issue Apr 24, 2021
@lens0021
Copy link
Member Author

lens0021 commented Apr 24, 2021

php-fpm의 emergency_restart를 설정하기 전에는 php-fpm이 힘들어지면 정말 느리게 회복했는데(위에 나오듯이 회복하는 도중에 계속 뭐 하나 망가트렸음) 설정했더니 회복이 굉장히 바로 됩니다

emergency_restart_threshold int
If this number of child processes exit with SIGSEGV or SIGBUS within the time interval set by emergency_restart_interval, then FPM will restart. A value of 0 means 'Off'. Default value: 0 (Off).
-- https://www.php.net/manual/en/install.fpm.configuration.php

@lens0021
Copy link
Member Author

8시 30분부터 문제가 생겨 아예 복구되지 않은 채로 있었는데 emergency_restart 설정이 원인일 수 있을 것 같아 revert했습니다

@lens0021
Copy link
Member Author

lens0021 commented Apr 25, 2021

127.0.0.1 -  25/Apr/2021:05:51:26 +0900 "GET /index.php" 404
127.0.0.1 -  25/Apr/2021:05:51:26 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:28 +0900 "GET /index.php" 404
127.0.0.1 -  25/Apr/2021:05:51:28 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:28 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:28 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:29 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:29 +0900 "GET /index.php" 404
127.0.0.1 -  25/Apr/2021:05:51:32 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:33 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:34 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:36 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:37 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:32 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:05:51:38 +0900 "GET /index.php" 200
[25-Apr-2021 05:51:44] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 16 total children
[25-Apr-2021 05:51:45] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 17 total children
[25-Apr-2021 05:51:46] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 18 total children
[25-Apr-2021 05:51:47] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 19 total children
[25-Apr-2021 05:51:48] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
[25-Apr-2021 05:52:18] WARNING: [pool www] child 6310 exited on signal 9 (SIGKILL) after 1517.325675 seconds from start
[25-Apr-2021 05:52:18] NOTICE: [pool www] child 7999 started
[25-Apr-2021 05:52:22] WARNING: [pool www] child 6463 exited on signal 9 (SIGKILL) after 1361.632224 seconds from start
[25-Apr-2021 05:52:22] NOTICE: [pool www] child 8000 started
[25-Apr-2021 05:52:30] WARNING: [pool www] child 6357 exited on signal 9 (SIGKILL) after 1478.078848 seconds from start
[25-Apr-2021 05:52:30] NOTICE: [pool www] child 8002 started
[25-Apr-2021 05:53:48] WARNING: [pool www] child 7944 exited on signal 9 (SIGKILL) after 134.405323 seconds from start
[25-Apr-2021 05:53:48] NOTICE: [pool www] child 8003 started
127.0.0.1 -  25/Apr/2021:05:51:59 +0900 "GET /index.php" 500
[25-Apr-2021 05:55:02] WARNING: [pool www] child 7964 exited on signal 9 (SIGKILL) after 204.801575 seconds from start
[25-Apr-2021 05:55:02] NOTICE: [pool www] child 8004 started
[25-Apr-2021 05:55:02] WARNING: [pool www] child 7973 exited on signal 9 (SIGKILL) after 204.395423 seconds from start
[25-Apr-2021 05:55:02] NOTICE: [pool www] child 8005 started
127.0.0.1 -  25/Apr/2021:05:52:00 +0900 "GET /index.php" 500
127.0.0.1 -  25/Apr/2021:05:52:18 +0900 "GET /index.php" 500
NOTICE: PHP message: PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /srv/femiwiki.com/includes/HookContainer/DeprecatedHooks.php on line 157
NOTICE: PHP message: PHP Fatal error:  Maximum execution time of 30 seconds exceeded in /srv/femiwiki.com/includes/cache/localisation/LocalisationCache.php on line 336
127.0.0.1 -  25/Apr/2021:08:09:45 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:08:11:32 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:08:14:59 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:08:09:21 +0900 "GET /index.php" 200
[25-Apr-2021 08:22:55] WARNING: [pool www] child 8237 exited on signal 9 (SIGKILL) after 972.274869 seconds from start
[25-Apr-2021 08:22:55] NOTICE: [pool www] child 8270 started
NOTICE: PHP message: PHP Notice:  fwrite(): write of 36 bytes failed with errno=32 Broken pipe in /srv/femiwiki.com/extensions/Scribunto/includes/engines/LuaStandalone/LuaStandaloneInterpreter.php on line 446
127.0.0.1 -  25/Apr/2021:08:20:46 +0900 "GET /index.php" 302
[25-Apr-2021 08:24:53] WARNING: [pool www] child 8232 exited on signal 9 (SIGKILL) after 1220.967878 seconds from start
[25-Apr-2021 08:24:53] NOTICE: [pool www] child 8274 started
[25-Apr-2021 08:24:54] WARNING: [pool www] child 8236 exited on signal 9 (SIGKILL) after 1093.665108 seconds from start
[25-Apr-2021 08:24:54] NOTICE: [pool www] child 8275 started
[25-Apr-2021 08:25:06] WARNING: [pool www] child 8246 exited on signal 9 (SIGKILL) after 921.328801 seconds from start
[25-Apr-2021 08:25:06] NOTICE: [pool www] child 8277 started
[25-Apr-2021 08:26:57] WARNING: [pool www] child 8251 exited on signal 9 (SIGKILL) after 708.976851 seconds from start
[25-Apr-2021 08:26:57] NOTICE: [pool www] child 8281 started
NOTICE: PHP message: PHP Notice:  fread(): read of 8192 bytes failed with errno=12 Cannot allocate memory in /srv/femiwiki.com/vendor/wikimedia/cdb/src/Reader/PHP.php on line 150
[25-Apr-2021 08:26:59] WARNING: [pool www] child 8254 exited on signal 9 (SIGKILL) after 707.433404 seconds from start
[25-Apr-2021 08:26:59] NOTICE: [pool www] child 8282 started
127.0.0.1 -  25/Apr/2021:08:23:04 +0900 "GET /index.php" 301
[25-Apr-2021 08:27:01] WARNING: [pool www] child 8240 exited on signal 9 (SIGKILL) after 1213.192694 seconds from start
[25-Apr-2021 08:27:01] NOTICE: [pool www] child 8285 started
[25-Apr-2021 08:29:23] WARNING: [pool www] child 8242 exited on signal 9 (SIGKILL) after 1351.146407 seconds from start
[25-Apr-2021 08:29:23] NOTICE: [pool www] child 8288 started
[25-Apr-2021 08:30:28] WARNING: [pool www] child 8249 exited on signal 9 (SIGKILL) after 1141.189133 seconds from start
[25-Apr-2021 08:30:28] NOTICE: [pool www] child 8290 started
[25-Apr-2021 08:33:47] WARNING: [pool www] child 8262 exited on signal 9 (SIGKILL) after 818.410939 seconds from start
[25-Apr-2021 08:33:47] NOTICE: [pool www] child 8291 started
[25-Apr-2021 08:33:58] WARNING: [pool www] child 8252 exited on signal 9 (SIGKILL) after 1128.778933 seconds from start
[25-Apr-2021 08:33:58] NOTICE: [pool www] child 8292 started
[25-Apr-2021 08:34:03] WARNING: [pool www] child 8248 exited on signal 9 (SIGKILL) after 1358.681613 seconds from start
[25-Apr-2021 08:34:03] NOTICE: [pool www] child 8293 started
127.0.0.1 -  25/Apr/2021:08:22:59 +0900 "GET /index.php" 302
[25-Apr-2021 08:34:08] WARNING: [pool www] child 8255 exited on signal 9 (SIGKILL) after 1081.363525 seconds from start
[25-Apr-2021 08:34:09] NOTICE: [pool www] child 8294 started
127.0.0.1 -  25/Apr/2021:08:29:23 +0900 "GET /index.php" 301
[25-Apr-2021 08:36:06] WARNING: [pool www] child 8274 exited on signal 9 (SIGKILL) after 672.860414 seconds from start
[25-Apr-2021 08:36:06] NOTICE: [pool www] child 8297 started
[25-Apr-2021 08:36:09] WARNING: [pool www] child 8277 exited on signal 9 (SIGKILL) after 662.632057 seconds from start
[25-Apr-2021 08:36:09] NOTICE: [pool www] child 8298 started
127.0.0.1 -  25/Apr/2021:08:34:06 +0900 "GET /index.php" 301
[25-Apr-2021 08:36:19] WARNING: [pool www] child 8267 exited on signal 9 (SIGKILL) after 933.492708 seconds from start
[25-Apr-2021 08:36:19] NOTICE: [pool www] child 8302 started
[25-Apr-2021 08:36:34] WARNING: [pool www] child 8264 exited on signal 9 (SIGKILL) after 953.719445 seconds from start
[25-Apr-2021 08:36:34] NOTICE: [pool www] child 8303 started

@lens0021
Copy link
Member Author

lens0021 commented Apr 25, 2021

25일 8시 문제의 경우 중단 발생 시점에 CPU나 메모리에 특이 사항은 없어 보입니다
image
KST: 08:27, UTC: 23:27

@lens0021
Copy link
Member Author

lens0021 commented Apr 25, 2021

25일 12:37 경 php-fpm 프로세스가 종료되지 않고 수만 증가하여 최대치(20개)를 채우고 멈춰 복구되는 것 같지 않아 nomad alloc stop하였으나 계속 같은 증상, nomad job stop, nomad job run하였더니 괜찮아졌습니다.

127.0.0.1 -  25/Apr/2021:12:36:00 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:12:36:01 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:12:36:04 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:12:36:07 +0900 "GET /index.php" 200
127.0.0.1 -  25/Apr/2021:12:36:10 +0900 "GET /index.php" 404
127.0.0.1 -  25/Apr/2021:12:36:34 +0900 "GET /api.php" 200
127.0.0.1 -  25/Apr/2021:12:36:40 +0900 "GET /index.php" 301
[25-Apr-2021 12:37:02] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it
127.0.0.1 -  25/Apr/2021:12:36:29 +0900 "GET /index.php" 302
127.0.0.1 -  25/Apr/2021:12:36:43 +0900 "GET /index.php" 301
127.0.0.1 -  25/Apr/2021:12:36:20 +0900 "POST /api.php" 200
[25-Apr-2021 12:37:17] WARNING: [pool www] child 6008 exited on signal 9 (SIGKILL) after 101.699369 seconds from start
[25-Apr-2021 12:37:17] NOTICE: [pool www] child 6044 started
[25-Apr-2021 12:37:22] WARNING: [pool www] child 6006 exited on signal 9 (SIGKILL) after 109.874433 seconds from start
[25-Apr-2021 12:37:22] NOTICE: [pool www] child 6045 started
[25-Apr-2021 12:37:33] WARNING: [pool www] child 6015 exited on signal 9 (SIGKILL) after 101.449464 seconds from start
[25-Apr-2021 12:37:33] NOTICE: [pool www] child 6046 started
[25-Apr-2021 12:37:43] WARNING: [pool www] child 6034 exited on signal 9 (SIGKILL) after 74.109626 seconds from start
[25-Apr-2021 12:37:44] NOTICE: [pool www] child 6047 started
[25-Apr-2021 12:39:03] WARNING: [pool www] child 6028 exited on signal 9 (SIGKILL) after 162.393026 seconds from start
[25-Apr-2021 12:39:03] NOTICE: [pool www] child 6048 started
127.0.0.1 -  25/Apr/2021:12:37:16 +0900 "GET /index.php" 301

@lens0021
Copy link
Member Author

request-terminate-timeout 설정을 추가했습니다 → femiwiki/nomad@7b8911f

Reference:
https://www.php.net/manual/en/install.fpm.configuration.php#request-terminate-timeout

@lens0021
Copy link
Member Author

이런 로그가 가끔 뜹니다

127.0.0.1 -  25/Apr/2021:17:45:14 +0900 "GET /index.php" 200                                   
[25-Apr-2021 17:45:15] WARNING: [pool www] child 14266, script '/srv/femiwiki.com/index.php' (request: "GET /index.php") execution timed out (36.169007 sec), terminating                     
[25-Apr-2021 17:45:15] WARNING: [pool www] child 14266 exited on signal 15 (SIGTERM) after 183.607478 seconds from start                                                                      
[25-Apr-2021 17:45:15] NOTICE: [pool www] child 14564 started                                  
127.0.0.1 -  25/Apr/2021:17:45:18 +0900 "GET /index.php" 200       

@lens0021
Copy link
Member Author

캐디 로그가 너무 많이 나와서 지나가면 금방 지워져서 until [[ "$(nomad logs -stderr -tail -n 5 -job fastcgi fastcgi)" = *"SIGTERM"* ]]; do sleep 1; done; nomad logs -stderr -tail -n 10 -job http 이런거 시켜놓고 기다리는데 안나오네요 머쓱

@lens0021
Copy link
Member Author

lens0021 commented Apr 25, 2021

지금 문제 생겼었는데 프로세서가 30초 후에 죽기는 죽는데 그래도 20개 꽉 채운데다 여전히 5xx만 내고 있어서 2021-04-25T10:06:00Z에 alloc stop 했더니 괜찮아졌습니다

lens0021 added a commit to femiwiki/nomad that referenced this issue Apr 25, 2021
@lens0021
Copy link
Member Author

리퀘스트가 많아 문제를 생겼을 때 journalctl에 다음과 같이 표시됩니다

 4월 25 15:08:38: php-fpm invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null),  order=0, oom_score_adj=0
 4월 25 15:08:38: php-fpm cpuset=54a263f28a9fc2f0091415c60ae9d162a6a0bbf45ab0cb078d7c05ca948739b5 mems_allowed=0
 4월 25 15:08:38: CPU: 0 PID: 11400 Comm: php-fpm Tainted: G        W       4.14.225-169.362.amzn2.aarch64 #1
 4월 25 15:08:38: Hardware name: Amazon EC2 t4g.small/, BIOS 1.0 11/1/2018
 4월 25 15:08:38: Call trace:
 4월 25 15:08:38:  dump_backtrace+0x0/0x190
 4월 25 15:08:38:  show_stack+0x24/0x30
 4월 25 15:08:38:  dump_stack+0x98/0xc8
 4월 25 15:08:38:  dump_header+0x94/0x1d8
 4월 25 15:08:38:  oom_kill_process+0x2c8/0x560
 4월 25 15:08:38:  out_of_memory+0x108/0x550
 4월 25 15:08:38:  mem_cgroup_out_of_memory+0x68/0xa0
 4월 25 15:08:38:  mem_cgroup_oom_synchronize+0x3cc/0x480
 4월 25 15:08:38:  pagefault_out_of_memory+0x40/0x1e0
 4월 25 15:08:38:  do_page_fault+0x484/0x4a0
 4월 25 15:08:38:  do_translation_fault+0x6c/0x80
 4월 25 15:08:38:  do_mem_abort+0x6c/0xe8
 4월 25 15:08:38:  do_el0_ia_bp_hardening+0x60/0xa0
 4월 25 15:08:38:  el0_ia+0x1c/0x20
 4월 25 15:08:38: Task in /docker/54a263f28a9fc2f0091415c60ae9d162a6a0bbf45ab0cb078d7c05ca948739b5 killed as a result of limit of /docker/54a263f28a9fc2f0091415c60ae9d162a6a0bbf45ab0cb078d7c05ca948739b5
 4월 25 15:08:38: memory: usage 307200kB, limit 307200kB, failcnt 0
 4월 25 15:08:38: memory+swap: usage 307200kB, limit 307200kB, failcnt 943690
 4월 25 15:08:38: kmem: usage 6032kB, limit 9007199254740988kB, failcnt 0
 4월 25 15:08:38: Memory cgroup stats for /docker/54a263f28a9fc2f0091415c60ae9d162a6a0bbf45ab0cb078d7c05ca948739b5: cache:103460KB rss:197400KB rss_huge:4096KB shmem:102696KB mapped_file:72072KB dirty:132KB writeback:0KB swap:0KB inactive_anon:150208KB active_anon:150172KB inactive_file:128KB active_file:68KB unevictable:0KB
 4월 25 15:08:38: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
 4월 25 15:08:38: [ 1149]     0  1149      480       19       5       3        0             0 tini
 4월 25 15:08:38: [ 1174]     0  1174      901       70       6       3        0             0 run
 4월 25 15:08:38: [ 1213]     0  1213      870       53       6       3        0             0 cron
 4월 25 15:08:38: [ 1215]     0  1215    60127     1806      31       5        0             0 php-fpm
 4월 25 15:08:38: [ 6539]    33  6539    84753    23108      90       5        0             0 php-fpm
 4월 25 15:08:38: [ 8931]    33  8931    83458    20251      83       5        0             0 php-fpm
 4월 25 15:08:38: [10177]    33 10177    83793    18642      80       5        0             0 php-fpm
 4월 25 15:08:38: [11387]    33 11387    83754    16529      73       5        0             0 php-fpm
 4월 25 15:08:38: [11400]    33 11400    63972    14240      63       5        0             0 php-fpm
 4월 25 15:08:38: [11425]    33 11425    64533    14681      65       5        0             0 php-fpm
 4월 25 15:08:38: [11449]    33 11449    64005    14285      67       5        0             0 php-fpm
 4월 25 15:08:38: [11453]    33 11453     1116      293       5       3        0             0 lua
 4월 25 15:08:38: [11467]    33 11467    64022    14479      63       5        0             0 php-fpm
 4월 25 15:08:38: [11473]    33 11473     1155      289       6       4        0             0 lua
 4월 25 15:08:38: [11496]    33 11496    83056    15251      68       5        0             0 php-fpm
 4월 25 15:08:38: [11573]    33 11573    63973    13859      63       5        0             0 php-fpm
 4월 25 15:08:38: [11585]    33 11585     1611      782       6       4        0             0 lua
 4월 25 15:08:38: [11587]    33 11587     1615      786       8       4        0             0 lua
 4월 25 15:08:38: [11588]    33 11588    62341     9180      52       5        0             0 php-fpm
 4월 25 15:08:38: [11589]    33 11589     1149      293       7       3        0             0 lua
 4월 25 15:08:38: [11603]    33 11603     1552      721       7       4        0             0 lua
 4월 25 15:08:38: [11606]    33 11606     1155      344       6       4        0             0 lua
 4월 25 15:08:38: [11611]     0 11611    60218     1185      26       5        0             0 php-fpm
 4월 25 15:08:38: [11612]    33 11612      483       17       4       3        0             0 sh
 4월 25 15:08:38: [11620]     0 11620    60218     1185      26       5        0             0 php-fpm
 4월 25 15:08:38: [11623]     0 11623    60218     1185      26       5        0             0 php-fpm
 4월 25 15:08:38: [11737]     0 11737    60218     1184      26       5        0             0 php-fpm
 4월 25 15:08:38: [11740]     0 11740    60218     1182      26       5        0             0 php-fpm
 4월 25 15:08:38: [11799]     0 11799    60127     1182      25       4        0             0 php-fpm
 4월 25 15:08:38: [11858]     0 11858    60127     1182      25       4        0             0 php-fpm
 4월 25 15:08:38: [12028]     0 12028    60127     1183      25       4        0             0 php-fpm
 4월 25 15:08:38: Memory cgroup out of memory: Kill process 6539 (php-fpm) score 302 or sacrifice child
 4월 25 15:08:38: Killed process 6539 (php-fpm) total-vm:339012kB, anon-rss:24956kB, file-rss:0kB, shmem-rss:67476kB
 4월 25 15:08:38: oom_reaper: reaped process 6539 (php-fpm), now anon-rss:0kB, file-rss:0kB, shmem-rss:67476kB

@lens0021
Copy link
Member Author

April 26th 2021, 12:29:24 am femiwiki/nomad@76cd3de applied

@lens0021
Copy link
Member Author

fastcgi job의 메모리 리미트와 서버 중단 차트입니다. 파란색이 서버 중단 상태로 위에 있으면 정상, 아래 있으면 접속 불가입니다,.

image

  • - Burst balance가 떨어져서 t3a로 임시로 돌아간 시점입니다.
  • - 다시 t4g로 돌아온 시점입니다.

명쾌하지는 않지만 차트대로 해석한다면 ① 이전의 문제는 fastcgi의 soft limit이 너무 낮아서 발생한 것이고, 그 이후는 hard limit이 너무 낮아서 발생한 것같습니다.

@lens0021
Copy link
Member Author

lens0021 commented Apr 27, 2021

4월 27 05:57:18 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: caddy invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0

femiwiki/nomad@a6c905f

@lens0021
Copy link
Member Author

lens0021 commented Apr 27, 2021

journalctl:

 4월 27 11:09:32 ip-172-31-22-50.ap-northeast-1.compute.internal dockerd[910]: time="2021-04-27T11:09:32.540165163Z" level=warning msg="Health check for container 436bcb44bb98471118741c455acc87f9e66f1c1090dd6a6eb498e4804a54b799 error: context deadline exceeded"
 4월 27 11:09:34 ip-172-31-22-50.ap-northeast-1.compute.internal dockerd[910]: time="2021-04-27T11:09:34.011527145Z" level=warning msg="Ignoring Exit Event, no such exec command found" container=436bcb44bb98471118741c455acc87f9e66f1c1090dd6a6eb498e4804a54b799 exec-id=ed00abbc3c610acbcad16ea9e382884edc6ddacd8658924e2f8649dce978504f exec-pid=4346
 4월 27 11:09:36 ip-172-31-22-50.ap-northeast-1.compute.internal dockerd[910]: time="2021-04-27T11:09:36.372478668Z" level=warning msg="Health check for container 4822e4c68b65a2bad949437a86ce2dbfbf8f63159d1838054abb755616fb2801 error: context deadline exceeded"
 4월 27 11:09:36 ip-172-31-22-50.ap-northeast-1.compute.internal dockerd[910]: time="2021-04-27T11:09:36.538949449Z" level=warning msg="Ignoring Exit Event, no such exec command found" container=4822e4c68b65a2bad949437a86ce2dbfbf8f63159d1838054abb755616fb2801 exec-id=ec85d1a7f22e14e53c66899ff0d10c2750bb2cb10a04d121456595ecfe77af81 exec-pid=4424
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: php-fpm invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: php-fpm cpuset=54445844553ce63a63c1ae031f6ffb773c63633c9c39113e016d76c2f0b86db6 mems_allowed=0
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: CPU: 0 PID: 2639 Comm: php-fpm Tainted: G        W       4.14.225-169.362.amzn2.aarch64 #1
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: Hardware name: Amazon EC2 t4g.small/, BIOS 1.0 11/1/2018
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: Call trace:
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  dump_backtrace+0x0/0x190
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  show_stack+0x24/0x30
 4월 27 11:09:38 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  dump_stack+0x98/0xc8
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  dump_header+0x94/0x1d8
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  oom_kill_process+0x2c8/0x560
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  out_of_memory+0x108/0x550
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  mem_cgroup_out_of_memory+0x68/0xa0
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  mem_cgroup_oom_synchronize+0x3cc/0x480
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  pagefault_out_of_memory+0x40/0x1e0
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  do_page_fault+0x484/0x4a0
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  do_translation_fault+0x6c/0x80
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  do_mem_abort+0x6c/0xe8
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel:  el0_da+0x24/0x28
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: Task in /docker/54445844553ce63a63c1ae031f6ffb773c63633c9c39113e016d76c2f0b86db6 killed as a result of limit of /docker/54445844553ce63a63c1ae031f6ffb773c63633c9c39113e016d76c2f0b86db6
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: memory: usage 511968kB, limit 512000kB, failcnt 0
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: memory+swap: usage 511936kB, limit 512000kB, failcnt 137572
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: kmem: usage 6324kB, limit 9007199254740988kB, failcnt 0
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: Memory cgroup stats for /docker/54445844553ce63a63c1ae031f6ffb773c63633c9c39113e016d76c2f0b86db6: cache:121416KB rss:383952KB rss_huge:2048KB shmem:120236KB mapped_file:57288KB dirty:0KB writeback:132KB swap:0KB inactive_anon:252216KB active_anon:252064KB inactive_file:1012KB active_file:252KB unevictable:0
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [20860]     0 20860      480       19       5       3        0             0 tini
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [20876]     0 20876      901       66       6       3        0             0 run
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [20885]     0 20885      870       54       6       3        0             0 cron
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [20886]     0 20886    60127     2299      32       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2579]    33  2579    64535    16603      76       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2580]    33  2580    83601    17326      80       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2593]    33  2593    83676    17629      78       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2595]    33  2595    65095    17950      78       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2596]    33  2596    64551    16970      74       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2597]    33  2597    64007    16483      73       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2598]    33  2598    65096    17012      74       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2599]    33  2599    65063    17666      75       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2600]    33  2600    64007    15864      72       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2601]    33  2601    82495    14698      69       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2636]    33  2636    64616    17453      78       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2638]    33  2638    64568    16432      76       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2639]    33  2639    65063    16868      74       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2745]    33  2745    83082    15650      71       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 2746]    33  2746    64519    15930      71       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 3069]    33  3069    62983    14798      71       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 3070]    33  3070    64535    16863      76       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 3597]    33  3597    65063    16177      71       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 3598]    33  3598    63462    14678      68       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 3915]    33  3915    83575    15619      74       5        0             0 php-fpm
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4044]    33  4044     1346      486       7       3        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4050]    33  4050     1350      491       7       4        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4051]    33  4051     1316      488       8       4        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4064]    33  4064     1316      494       6       3        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4068]    33  4068     1315      492       6       3        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4069]    33  4069     1796      991       7       3        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4070]    33  4070     1924     1110       7       4        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4073]    33  4073     1772      967       7       4        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4127]    33  4127     1773      966       8       4        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4216]    33  4216     1149      294       6       3        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4262]    33  4262     1183      367       8       4        0             0 lua
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: [ 4447]    33  4447      483       17       4       3        0             0 sh
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: Memory cgroup out of memory: Kill process 2595 (php-fpm) score 140 or sacrifice child
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: Killed process 4044 (lua) total-vm:5384kB, anon-rss:1912kB, file-rss:32kB, shmem-rss:0kB
 4월 27 11:09:39 ip-172-31-22-50.ap-northeast-1.compute.internal kernel: oom_reaper: reaped process 4044 (lua), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

루아 바이너리에 뭐 문제가 있을 수도 있을 것 같구요... php-fpm.log에는 별 내용 없습니다

femiwiki/nomad@88cd8e6

@lens0021
Copy link
Member Author

lens0021 commented Apr 28, 2021

4월 28일 13:42 (아래 남긴 건 초입니다.)

08 kernel: mysqld invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null),  order=0, oom_score_adj=0
08 kernel: mysqld cpuset=1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8 mems_allowed=0
08 kernel: CPU: 1 PID: 19686 Comm: mysqld Tainted: G        W       4.14.225-169.362.amzn2.aarch64 #1
08 kernel: Hardware name: Amazon EC2 t4g.small/, BIOS 1.0 11/1/2018
08 kernel: Call trace:
08 kernel:  dump_backtrace+0x0/0x190
08 kernel:  show_stack+0x24/0x30
08 kernel:  dump_stack+0x98/0xc8
08 kernel:  dump_header+0x94/0x1d8
08 kernel:  oom_kill_process+0x2c8/0x560
08 kernel:  out_of_memory+0x108/0x550
08 kernel:  mem_cgroup_out_of_memory+0x68/0xa0
08 kernel:  mem_cgroup_oom_synchronize+0x3cc/0x480
08 kernel:  pagefault_out_of_memory+0x40/0x1e0
08 kernel:  do_page_fault+0x484/0x4a0
08 kernel:  do_translation_fault+0x6c/0x80
08 kernel:  do_mem_abort+0x6c/0xe8
08 kernel:  do_el0_ia_bp_hardening+0x60/0xa0
08 kernel:  el0_ia+0x1c/0x20
08 kernel: Task in /docker/1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8 killed as a result of limit of /docker/1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8
08 kernel: memory: usage 614400kB, limit 614400kB, failcnt 0
08 kernel: memory+swap: usage 614380kB, limit 614400kB, failcnt 9987286
08 kernel: kmem: usage 3504kB, limit 9007199254740988kB, failcnt 0
08 kernel: Memory cgroup stats for /docker/1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8: cache:3124KB rss:607656KB rss_huge:190464KB shmem:0KB mapped_file:132KB dirty:0KB writeback:132KB swap:0KB inactive_anon:304292KB active_anon:303472KB inactive_file:200KB active_file:112KB unevictable:0KB
08 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
08 kernel: [19587]     0 19587      887       76       6       3        0             0 entrypoint.sh
08 kernel: [19632]    27 19632   437587   151239     381       5        0             0 mysqld
08 kernel: [10530]     0 10530      871      535       5       3        0             0 healthcheck.sh
08 kernel: [10547]     0 10547     4463      502      12       3        0             0 mysqladmin
08 kernel: [11348]     0 11348      871      528       5       3        0             0 healthcheck.sh
08 kernel: [11359]     0 11359     4360      158      10       3        0             0 mysqladmin
08 kernel: [12148]     0 12148      871      535       5       3        0             0 healthcheck.sh
08 kernel: [12168]     0 12168     4360       63      10       3        0             0 mysqladmin
08 kernel: [12956]     0 12956      871      529       5       3        0             0 healthcheck.sh
08 kernel: [13017]     0 13017     3178       26       8       3        0             0 mysqladmin
08 kernel: Memory cgroup out of memory: Kill process 19632 (mysqld) score 987 or sacrifice child
08 kernel: Killed process 19632 (mysqld) total-vm:1750348kB, anon-rss:604956kB, file-rss:0kB, shmem-rss:0kB
08 kernel: oom_reaper: reaped process 19632 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
08 dockerd[910]: time="2021-04-28T13:42:08.829560666Z" level=warning msg="Ignoring Exit Event, no such exec command found" container=1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8 exec-id=2f1e597a15687ef82d82ee79db7bb72aad818d29e8d352346aefd8bfb3675653 exec-pid=10530
08 dockerd[910]: time="2021-04-28T13:42:08.965413432Z" level=warning msg="Ignoring Exit Event, no such exec command found" container=1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8 exec-id=44ba376fbdc5f221e472ce87aa3dad32c6dc6b99ef01fcd0f74ebfadd1a09a70 exec-pid=11348
09 dockerd[910]: time="2021-04-28T13:42:09.136515152Z" level=warning msg="Ignoring Exit Event, no such exec command found" container=1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8 exec-id=a5339ae4192ac58c22d2f06e88a750f1f47e10e706caf31150ff5aa2dc714cc4 exec-pid=12148
09 containerd[800]: time="2021-04-28T13:42:09.444623897Z" level=info msg="shim reaped" id=1010954e6907628e4c2bc4286fa054860b4ffb0699f6d7091af2b007ec161ae8
09 dockerd[910]: time="2021-04-28T13:42:09.458832269Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
09 nomad[896]: 2021-04-28T13:42:09.730Z [INFO]  client.alloc_runner.task_runner: restarting task: alloc_id=681acfc8-ff0f-a9c7-89f5-3842454176e3 task=mysql reason="Restart within policy" delay=15.607805408s
25 nomad[896]: 2021-04-28T13:42:25.352Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=681acfc8-ff0f-a9c7-89f5-3842454176e3 task=mysql @module=logmon path=/opt/nomad/alloc/681acfc8-ff0f-a9c7-89f5-3842454176e3/alloc/logs/.mysql.stdout.fifo timestamp=2021-04-28T13:42:25.352Z
25 nomad[896]: 2021-04-28T13:42:25.352Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=681acfc8-ff0f-a9c7-89f5-3842454176e3 task=mysql @module=logmon path=/opt/nomad/alloc/681acfc8-ff0f-a9c7-89f5-3842454176e3/alloc/logs/.mysql.stderr.fifo timestamp=2021-04-28T13:42:25.352Z
25 nomad[896]: 2021-04-28T13:42:25.534Z [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=a31cd33da466c34b61d981af391026476619134fddc6ef521d695326344a3b5c
25 containerd[800]: time="2021-04-28T13:42:25.594890929Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/a31cd33da466c34b61d981af391026476619134fddc6ef521d695326344a3b5c/shim.sock" debug=false pid=134

femiwiki/nomad@f91e6a5

@lens0021
Copy link
Member Author

lens0021 commented May 4, 2021

14:51(KST, 2021-05-04T05:51:46.141+0000)부터 서비스가 중단되었으나 대응이 늦어져 16:26(KST, 2021-05-04T07:26:46.280+0000)에 복구되었습니다.

image

중단된 서비스는 MySQL, Caddy, CSI-plugin-controller, CSI-plugin-node 였고 서비스 복구 때 복구했는데 Parsoid가 복구가 누락되어 있어서 8분 전에 복구했습니다.

@lens0021
Copy link
Member Author

lens0021 commented May 4, 2021

00시 00분 01초 kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
05시 00분 01초     82028   1917792     95.90     14456    410024   8272524    231.55    834752    860560        12
05시 01분 01초     80944   1918876     95.95     14608    414892   8314088    232.71    821036    875944       236
05시 02분 01초     88760   1911060     95.56     13692    411168   8266080    231.37    851140    836472       412
05시 03분 01초     82512   1917308     95.87     14304    399104   8339500    233.42    857080    835392       156
05시 04분 01초     85972   1913848     95.70     14564    412180   8270204    231.48    852404    837656       436
05시 05분 01초     68800   1931020     96.56     14772    419552   8288232    231.99    852800    853928       276
05시 06분 01초     87088   1912732     95.65     14592    407896   8287368    231.97    850996    837808       364
05시 07분 01초    117280   1882540     94.14     14188    388424   8285008    231.90    824884    833644       416
05시 08분 01초     97376   1902444     95.13     14968    400012   8287068    231.96    832036    846268       224
05시 09분 01초     99436   1900384     95.03     15128    398236   8266544    231.38    860324    816236       252
05시 10분 01초     93488   1906332     95.33     15304    402348   8281420    231.80    862800    819640       280
05시 11분 01초     94052   1905768     95.30     15556    406768   8282636    231.83    858124    823520       408
05시 12분 01초     96328   1903492     95.18     15568    398632   8291164    232.07    865132    814940       112
05시 13분 01초     88244   1911576     95.59     16112    405340   8273092    231.57    870276    817732       500
05시 14분 01초    124768   1875052     93.76       364    394136   8414628    235.53    774560    877568       164
05시 15분 01초     75236   1924584     96.24       580    412332   8319792    232.87    811568    891980       188
05시 16분 01초    100328   1899492     94.98       852    423736   8256352    231.10    758632    923624       224
05시 17분 01초     90280   1909540     95.49      1276    431548   8266796    231.39    758152    931864       472
05시 18분 01초     97500   1902320     95.12      1780    425200   8264104    231.31    769420    913268       340
05시 19분 01초     85856   1913964     95.71      2228    431744   8270660    231.50    771068    922924       540
05시 20분 01초    106024   1893796     94.70      2404    413972   8265404    231.35    765460    908616       204
05시 21분 01초     87616   1912204     95.62      2860    429856   8264432    231.32    763792    928812       456
05시 22분 05초     51364   1948456     97.43       176    184880   8818948    246.84    848984    895040         4
05시 23분 06초     49208   1950612     97.54       248    211540   8999176    251.89    794496    941132       724
05시 24분 06초    163124   1836696     91.84       448    222492   8957708    250.73    724592    899416       108
05시 25분 04초     61000   1938820     96.95       868    248932   9035280    252.90    770572    950932        40
05시 26분 05초    223484   1776336     88.82      1016    234496   8937104    250.15    708832    843932       136
05시 27분 04초     73084   1926736     96.35      1204    241204   9068596    253.83    777384    929404        12
05시 28분 04초     86600   1913220     95.67      1496    260592   8728952    244.33    771028    926036         0
05시 29분 04초     69384   1930436     96.53      1688    257968   8690276    243.24    809896    904152        52
05시 30분 03초     74476   1925344     96.28      1860    291200   8286704    231.95    801416    911992        24
05시 31분 04초     88616   1911204     95.57      2036    245428   8518148    238.42    788976    912216        28
05시 32분 02초     82008   1917812     95.90      2256    293472   8135060    227.70    777644    920876        80
05시 33분 02초     76928   1922892     96.15      2384    279952   8171576    228.72    784820    916816        48
05시 34분 05초    100176   1899644     94.99      2560    250412   8459480    236.78    829796    861128        60
05시 35분 03초     72564   1927256     96.37      2776    375276   8337692    233.37    749728    959360         0
05시 36분 04초     56480   1943340     97.18      2924    266304   8643580    241.94    817036    906188        32
05시 37분 03초     91452   1908368     95.43      3228    351036   8443868    236.35    749144    933876       332
05시 38분 03초     82888   1916932     95.86      3484    334016   8572128    239.94    789540    905664        40
05시 39분 03초     48952   1950868     97.55      3712    263368   8994572    251.76    842848    880620         8
05시 40분 05초     55440   1944380     97.23      4028    249196   9159044    256.36    804176    922064        16
05시 41분 04초     87348   1912472     95.63      4448    321516   8872148    248.33    797976    894908       168
05시 42분 04초     65544   1934276     96.72      4592    253980   9141620    255.88    779284    936800        40
05시 43분 04초     91112   1908708     95.44      4772    277112   9074332    253.99    777648    914892        12
05시 44분 04초     74096   1925724     96.29      5076    272724   8962304    250.86    774960    936588        44
05시 45분 02초     69236   1930584     96.54      5232    314552   8615168    241.14    755344    952492        48
05시 46분 03초     81924   1917896     95.90      5360    303136   8764008    245.31    790816    913780         4
05시 47분 06초     77116   1922704     96.14      5952    274648   9050620    253.33    787236    919548        20
05시 48분 05초     77348   1922472     96.13      6072    240160   9119668    255.26    771116    936656        20

05시 49분 01초    824284   1175536     58.78      6188    509756   4964832    138.97    540796    349900       316
05시 50분 01초    531352   1468468     73.43      6188    529756   6063828    169.73    766732    379104       288
05시 51분 01초    527676   1472144     73.61      6188    530764   6063728    169.72    736844    412776       228
05시 52분 01초    509952   1489868     74.50      6188    522636   6066728    169.81    734844    411876       148
05시 53분 01초    529504   1470316     73.52      6188    523508   6065776    169.78    734836    413172       180
05시 54분 01초    528664   1471156     73.56      6188    523796   6064004    169.73    734712    414008       284
05시 55분 01초    528132   1471688     73.59      6188    524180   6065136    169.76    735256    414080       160
05시 56분 01초    516764   1483056     74.16      6188    534708   6066672    169.81    741616    418892       408
05시 57분 01초    504584   1495236     74.77      6188    534772   6214080    173.93    753332    418908       228
05시 58분 01초    517148   1482672     74.14      6188    535464   6062656    169.69    740616    419576       204
05시 59분 01초    512456   1487364     74.37      6188    536008   6222012    174.16    745488    419324       288
06시 00분 01초    521000   1478820     73.95      6188    536096   6066096    169.79    735416    420376       288
06시 01분 01초    513284   1486536     74.33      6188    541108   6086940    170.37    737764    424784       292
06시 02분 01초    507728   1492092     74.61      6188    534436   6105032    170.88    744048    427060       228
06시 03분 01초    505344   1494476     74.73      6188    534804   6107760    170.96    744304    427196       156
06시 04분 01초    533220   1466600     73.34      6152    518680   6103368    170.83    718808    436612       204
06시 05분 01초    530876   1468944     73.45      6152    518724   6104904    170.88    719432    436556        12
06시 06분 01초    530480   1469340     73.47      6152    519212   6106372    170.92    719772    436656       164
06시 07분 01초    529852   1469968     73.51      6152    519532   6105544    170.90    720064    436856       220
06시 08분 01초    529720   1470100     73.51      6152    519612   6103776    170.85    720072    436924       268
06시 09분 01초    528964   1470856     73.55      6152    519688   6105932    170.91    720252    437380       292
06시 10분 01초    487324   1512496     75.63      6152    577508   6085316    170.33    709524    490180       284
06시 11분 01초    478372   1521448     76.08      6152    577836   6246584    174.84    716664    490468       288
06시 12분 01초    486256   1513564     75.69      6152    577996   6086448    170.36    708336    491532       276
06시 13분 01초    485808   1514012     75.71      6152    578376   6086704    170.37    708736    491900       216
06시 14분 01초    483608   1516212     75.82      6152    579740   6087360    170.39    709992    493400       300
06시 15분 01초    483408   1516412     75.83      6152    579760   6090032    170.46    710500    493352       292
06시 16분 01초    482384   1517436     75.88      6152    579948   6086768    170.37    710628    493424       284
06시 17분 01초    482084   1517736     75.89      6152    580100   6087600    170.39    710840    493552       284
06시 18분 01초    479680   1520140     76.01      6152    581380   6086636    170.37    711696    494088       252
06시 19분 01초    478892   1520928     76.05      6152    581536   6086832    170.37    711432    495148       288
06시 20분 01초    479372   1520448     76.03      6152    581812   6087088    170.38    711212    495396       296
06시 21분 01초    510284   1489536     74.48      6152    577748   6085424    170.33    707712    494980       272
06시 22분 01초    507604   1492216     74.62      6152    578044   6087024    170.38    708192    495080       312
06시 23분 01초    506984   1492836     74.65      6152    578056   6086704    170.37    708612    495084       216
06시 24분 01초    506852   1492968     74.66      6152    578264   6086448    170.36    708672    495236       288
06시 25분 01초    507340   1492480     74.63      6152    578392   6087296    170.38    708112    495348       288
06시 26분 01초    506348   1493472     74.68      6152    578500   6086356    170.36    709188    495412       288
06시 27분 01초    506300   1493520     74.68      6152    578544   6084628    170.31    709172    495452       288
06시 28분 01초    506972   1492848     74.65      6152    578756   6086704    170.37    709636    495608       300
06시 29분 01초    456292   1543528     77.18      6152    624012   6093916    170.57    737840    514972       280
06시 30분 01초    456376   1543444     77.18      6152    624116   6093796    170.57    737824    515112       280
06시 31분 01초    455900   1543920     77.20      6152    624528   6089616    170.45    739636    513616       216
06시 32분 01초    456008   1543812     77.20      6152    624532   6089532    170.45    739860    513744       368
06시 33분 01초    454792   1545028     77.26      6152    624664   6091196    170.49    739944    513864       280
06시 34분 01초    454268   1545552     77.28      6152    625664   6090156    170.46    740784    514076       220
06시 35분 01초    454036   1545784     77.30      6152    626180   6087952    170.40    742088    513308       244
06시 36분 01초    453508   1546312     77.32      6152    626448   6091516    170.50    742112    513580       140
06시 37분 01초    453772   1546048     77.31      6152    626624   6092028    170.52    742032    513744       312
06시 38분 01초    453004   1546816     77.35      6152    626652   6091784    170.51    742176    513756       268
06시 39분 01초    453152   1546668     77.34      6152    626692   6094092    170.57    742312    513780       216
06시 40분 01초    453136   1546684     77.34      6152    626768   6091452    170.50    742292    513816       360
06시 41분 01초    450588   1549232     77.47      6152    627724   6080076    170.18    753720    504204       292
06시 42분 01초    449432   1550388     77.53      6152    627876   6080700    170.20    754336    504308       344
06시 43분 01초    448152   1551668     77.59      6152    628092   6079932    170.18    755284    504684       292
06시 44분 01초    448592   1551228     77.57      6152    628232   6081276    170.22    755620    504488       260
06시 45분 01초    447972   1551848     77.60      6152    628280   6080060    170.18    755184    505364       268
06시 46분 01초    447460   1552360     77.62      6152    628384   6081596    170.23    755140    505632       200
06시 47분 01초    448972   1550848     77.55      6152    628484   6081852    170.23    754212    505580       292
06시 48분 01초    448368   1551452     77.58      6152    628568   6079804    170.17    754128    505668       340
06시 49분 01초    448296   1551524     77.58      6152    628584   6082108    170.24    754192    505664       224
06시 50분 01초    447808   1552012     77.61      6152    628768   6080156    170.18    754440    505792       376
06시 51분 01초    447916   1551904     77.60      6152    628860   6079868    170.18    754512    505808       220
06시 52분 01초    447552   1552268     77.62      6152    629176   6080988    170.21    754552    506240       364
06시 53분 01초    447668   1552152     77.61      6152    629168   6085112    170.32    754472    506272       260
06시 54분 01초    446668   1553152     77.66      6152    629312   6081276    170.22    754580    506384       300
06시 55분 01초    447212   1552608     77.64      6152    629332   6081468    170.22    766136    494468       288
06시 56분 01초    447056   1552764     77.65      6152    629416   6081852    170.23    785880    475808       292
06시 57분 01초    446928   1552892     77.65      6152    629592   6082108    170.24    785776    475972       216
06시 58분 01초    446936   1552884     77.65      6152    629708   6083356    170.27    785812    476072       236
06시 59분 01초    448860   1550960     77.55      6152    630644   6097468    170.67    786348    473368       492
07시 00분 01초    451048   1548772     77.45      6152    630788   6098996    170.71    784136    473508       220

07시 00분 59초       LINUX RESTART

07시 02분 01초 kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
07시 03분 01초    278316   1721504     86.08      2120    949960   5754988    161.08    582780    805732       304
07시 04분 01초    275168   1724652     86.24      2120    950004   5753920    161.05    586108    805044       344
07시 05분 01초    716384   1283436     64.18      2120    548888   5754240    161.06    578956    411728       376
07시 06분 01초    737420   1262400     63.13      2120    550828   5752452    161.01    576732    413952       328
07시 07분 01초    736640   1263180     63.16      2120    550812   5752300    161.01    575276    415668       368
07시 08분 01초    736244   1263576     63.18      2120    550916   5751216    160.98    576140    415700       316
07시 09분 01초    748272   1251548     62.58      2120    550956   5751092    160.97    563012    416820       312
07시 10분 01초    747612   1252208     62.62      2120    550996   5751216    160.98    563564    416848       324
07시 11분 01초    747092   1252728     62.64      2120    551156   5753680    161.05    564020    417016       312
07시 12분 01초    744160   1255660     62.79      2120    551552   5758932    161.19    565408    416132       352
07시 13분 01초    743516   1256304     62.82      2120    551728   5757908    161.16    566296    415864       316
07시 14분 01초    743020   1256800     62.85      2120    552900   5760916    161.25    566820    415800       328
07시 15분 01초    742160   1257660     62.89      2120    552988   5757776    161.16    567448    415740       256
07시 16분 01초    740888   1258932     62.95      2120    555176   5787916    162.00    573160    412064       348
07시 17분 01초    720108   1279712     63.99      2120    561136   5951308    166.58    586836    416880       296
07시 18분 02초     74660   1925160     96.27      4984    925408   7455568    208.68    886168    790836     86636
07시 19분 01초    254028   1745792     87.30      5076    725220   7269400    203.47    720940    754284       460
07시 20분 01초    222068   1777752     88.90      5204    727828   7651572    214.17    730496    770984       392
07시 21분 01초    233136   1766684     88.34      5300    728132   7273604    203.59    720100    771732       420
07시 22분 01초    230308   1769512     88.48      5388    728684   7271808    203.54    720388    774296       376
07시 23분 01초    103244   1896576     94.84      5728    663580   7364380    206.13    810424    786832       592
07시 24분 01초     96636   1903184     95.17      5896    658228   7390356    206.86    814900    798052       364
07시 25분 01초     98724   1901096     95.06      6352    633760   7376376    206.47    817584    792396       708
07시 26분 01초     94840   1904980     95.26      6528    601952   7426264    207.86    819340    795940       412
07시 27분 01초    107716   1892104     94.61      6676    607512   7385264    206.71    805296    795744       496
07시 28분 01초    103240   1896580     94.84      6636    596400   7381788    206.62    825404    780048       700

image

@lens0021
Copy link
Member Author

lens0021 commented May 16, 2021

사이트가 느려지다가 2021-05-16T07:05:38Z에 php-fpm alloc이 "Docker container exited with non-zero exit code: 143"라면서 저절로 재시작하고 회복되었습니다.
Nomad client의 gc threshold가 80이고 저희 디스크가 82%를 사용중이라 allocation의 로그가 바로 지워진 것 같습니다. 자세한 로그는 확인할 수 없었습니다.
메모리나 CPU 자원에 문제가 있어 보이지는 않았고 다만 php-fpm 프로세스 수가 4~10개 사이를 오락가락 했는데 10개일 때 사이트가 느려진 것 같습니다.

06:47-07:05 동안 yum-cron과 함께 디스크 입출력이 늘었던 기록이 있습니다.

image
image

비교를 위해 현재 atop

image

@lens0021
Copy link
Member Author

femiwiki/nomad#29 를 제외한 다른 문제는 거의 해결되어 닫습니다.

@lens0021 lens0021 moved this from In Progress to Done in femiwiki.com Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
femiwiki.com
  
Done
Development

No branches or pull requests

1 participant