You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pmm-client docker container has limit 32GB RAM and with random period of time,usually half hour, container reaches the RAM usage limit of 32GB and oom kill postgre_exporter inside container. The container is running on the same system, as monitored postgresql.
OS (monitored system): Ubuntu 20.04.4 LTS (Focal Fossa)
Linux kernel (monitored system): Linux HOSTNAME_REMOVED 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Docker image of pmm-client: percona/pmm-client: 2.39.0 (same problem on 2.33, 2.36, 2.40.1)
PMM servers version: 2.39 (same problem on 2.33, 2.36, 2.40.1)
Monitored service: postgresql (14.9)
Total RAM on monitored postgresql server: 128Gb
Available memory on the monitored host during reaching limit is about 59GB:
#free -hw
total used free shared buffers cache available
Mem: 125Gi 31Gi 841Mi 33Gi 387Mi 93Gi 59Gi
Swap: 0B 0B 0B
Expected Results
pmm-client container does not reach limit 32GB RAM
Actual Results
pmm-client container reaches its memory usage limit of 32GB
Version
PMM Server v2.39, PMM client 2.39
Steps to reproduce
Create new postgres cluster
Create many schemas (our case is about 10k)
Create many empty tables (our case is about 70k)
deploy pmm-client in docker and add a PostgreSQL service
Relevant logs
ppm-client docker container logs from start to first reach RAM limit:
INFO[2023-10-20T08:44:51.625+00:00] Run setup: false Sidecar mode: false component=entrypoint
INFO[2023-10-20T08:44:51.626+00:00] Starting 'pmm-admin run'... component=entrypoint
INFO[2023-10-20T08:44:51.726+00:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/node_exporter component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/rds_exporter component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/azure_exporter component=main
INFO[2023-10-20T08:44:51.726+00:00] Using /usr/local/percona/pmm2/exporters/vmagent component=main
INFO[2023-10-20T08:44:51.726+00:00] Runner capacity set to 32. component=runner
INFO[2023-10-20T08:44:51.726+00:00] Loading configuration file /usr/local/percona/pmm2/config/pmm-agent.yaml. component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/node_exporter component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/mysqld_exporter component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/mongodb_exporter component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/postgres_exporter component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/proxysql_exporter component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/rds_exporter component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/azure_exporter component=main
INFO[2023-10-20T08:44:51.727+00:00] Using /usr/local/percona/pmm2/exporters/vmagent component=main
ERRO[2023-10-20T08:44:52.995+00:00] ts=2023-10-20T08:44:52.932Z caller=diskstats_linux.go:264 level=error collector=diskstats msg="Failed to open directory, disabling udev device properties" path=/run/udev/data agentID=/agent_id/8b47212b-ec90-4c24-9a1d-b8c4cc3eaa63 component=agent-process type=node_exporter
ERRO[2023-10-20T09:18:58.502+00:00] ts=2023-10-20T09:18:58.495Z caller=postgres_exporter.go:750 level=error err="Error opening connection to database (postgres://pmm:PASSWORD_REMOVED@HOST_REMOVED:PORT_REMOVED/postgres?connect_timeout=1&sslmode=disable): driver: bad connection" agentID=/agent_id/ef46e805-0e0a-4246-9b94-d21be2e69ba7 component=agent-process type=postgres_exporter
WARN[2023-10-20T09:38:29.330+00:00] Process: exited: signal: killed. agentID=/agent_id/ef46e805-0e0a-4246-9b94-d21be2e69ba7 component=agent-process type=postgres_exporter
dmesg log:
[Fri Oct 20 09:38:24 2023] pmm-agent invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[Fri Oct 20 09:38:24 2023] CPU: 0 PID: 2403132 Comm: pmm-agent Not tainted 5.4.0-164-generic #181-Ubuntu
[Fri Oct 20 09:38:24 2023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[Fri Oct 20 09:38:24 2023] Call Trace:
[Fri Oct 20 09:38:24 2023] dump_stack+0x6d/0x8b
[Fri Oct 20 09:38:24 2023] dump_header+0x4f/0x1eb
[Fri Oct 20 09:38:24 2023] oom_kill_process.cold+0xb/0x10
[Fri Oct 20 09:38:24 2023] out_of_memory+0x1cf/0x500
[Fri Oct 20 09:38:24 2023] mem_cgroup_out_of_memory+0xbd/0xe0
[Fri Oct 20 09:38:24 2023] try_charge+0x77c/0x810
[Fri Oct 20 09:38:24 2023] mem_cgroup_try_charge+0x71/0x190
[Fri Oct 20 09:38:24 2023] __add_to_page_cache_locked+0x2ff/0x3f0
[Fri Oct 20 09:38:24 2023] ? scan_shadow_nodes+0x30/0x30
[Fri Oct 20 09:38:24 2023] add_to_page_cache_lru+0x4d/0xd0
[Fri Oct 20 09:38:24 2023] pagecache_get_page+0x101/0x300
[Fri Oct 20 09:38:24 2023] filemap_fault+0x6b2/0xa50
[Fri Oct 20 09:38:24 2023] ? unlock_page_memcg+0x12/0x20
[Fri Oct 20 09:38:24 2023] ? page_add_file_rmap+0xff/0x1a0
[Fri Oct 20 09:38:24 2023] ? xas_load+0xd/0x80
[Fri Oct 20 09:38:24 2023] ? xas_find+0x17f/0x1c0
[Fri Oct 20 09:38:24 2023] ? filemap_map_pages+0x24c/0x380
[Fri Oct 20 09:38:24 2023] ext4_filemap_fault+0x32/0x50
[Fri Oct 20 09:38:24 2023] __do_fault+0x3c/0x170
[Fri Oct 20 09:38:24 2023] do_fault+0x24b/0x640
[Fri Oct 20 09:38:24 2023] __handle_mm_fault+0x4c5/0x7a0
[Fri Oct 20 09:38:24 2023] handle_mm_fault+0xca/0x200
[Fri Oct 20 09:38:24 2023] do_user_addr_fault+0x1f9/0x450
[Fri Oct 20 09:38:24 2023] __do_page_fault+0x58/0x90
[Fri Oct 20 09:38:24 2023] do_page_fault+0x2c/0xe0
[Fri Oct 20 09:38:24 2023] do_async_page_fault+0x39/0x70
[Fri Oct 20 09:38:24 2023] async_page_fault+0x34/0x40
[Fri Oct 20 09:38:24 2023] RIP: 0033:0x43730f
[Fri Oct 20 09:38:24 2023] Code: Bad RIP value.
[Fri Oct 20 09:38:24 2023] RSP: 002b:00007f8f94ff84f8 EFLAGS: 00010206
[Fri Oct 20 09:38:24 2023] RAX: ffffffffffffff92 RBX: 0000000000000000 RCX: 0000000000473d63
[Fri Oct 20 09:38:24 2023] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000000236c030
[Fri Oct 20 09:38:24 2023] RBP: 00007f8f94ff8538 R08: 0000000000000000 R09: 0000000000000000
[Fri Oct 20 09:38:24 2023] R10: 00007f8f94ff8528 R11: 0000000000000206 R12: 00007f8f94ff8528
[Fri Oct 20 09:38:24 2023] R13: 0000000000000013 R14: 000000c0001036c0 R15: 000000c000452000
[Fri Oct 20 09:38:24 2023] memory: usage 33554432kB, limit 33554432kB, failcnt 640017
[Fri Oct 20 09:38:24 2023] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Fri Oct 20 09:38:24 2023] kmem: usage 90612kB, limit 9007199254740988kB, failcnt 0
[Fri Oct 20 09:38:24 2023] Memory cgroup stats for /docker/cc818c9a63d41a2755ed35aa07d5d06c357b5512972fc76d74941c1983731f9a:
[Fri Oct 20 09:38:24 2023] anon 34263621632
file 1167360
kernel_stack 1216512
slab 20291584
sock 0
shmem 0
file_mapped 0
file_dirty 0
file_writeback 0
anon_thp 4464836608
inactive_anon 0
active_anon 34263457792
inactive_file 0
active_file 0
unevictable 0
slab_reclaimable 11001856
slab_unreclaimable 9289728
pgfault 8587986
pgmajfault 71247
workingset_refault 1611621
workingset_activate 115170
workingset_nodereclaim 0
pgrefill 723458
pgscan 8194323
pgsteal 1630807
pgactivate 480414
pgdeactivate 599413
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 1617
thp_collapse_alloc 0
[Fri Oct 20 09:38:24 2023] Tasks state (memory values in pages):
[Fri Oct 20 09:38:24 2023] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[Fri Oct 20 09:38:24 2023] [2403075] 1002 2403075 178251 210 86016 0 0 pmm-agent-entry
[Fri Oct 20 09:38:24 2023] [2403121] 1002 2403121 349731 2272 274432 0 0 pmm-agent
[Fri Oct 20 09:38:24 2023] [2403137] 1002 2403137 180996 5064 163840 0 0 vmagent
[Fri Oct 20 09:38:24 2023] [2403139] 1002 2403139 181981 2540 159744 0 0 node_exporter
[Fri Oct 20 09:38:24 2023] [2403156] 1002 2403156 8558644 8354434 67321856 0 0 postgres_export
[Fri Oct 20 09:38:24 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=cc818c9a63d41a2755ed35aa07d5d06c357b5512972fc76d74941c1983731f9a,mems_allowed=0,oom_memcg=/docker/cc818c9a63d41a2755ed35aa07d5d06c357b5512972fc76d74941c1983731f9a,task_memcg=/docker/cc818c9a63d41a2755ed35aa07d5d06c357b5512972fc76d74941c1983731f9a,task=postgres_export,pid=2403156,uid=1002
[Fri Oct 20 09:38:24 2023] Memory cgroup out of memory: Killed process 2403156 (postgres_export) total-vm:34234576kB, anon-rss:33417736kB, file-rss:0kB, shmem-rss:0kB, UID:1002 pgtables:65744kB oom_score_adj:0
[Fri Oct 20 09:38:28 2023] oom_reaper: reaped process 2403156 (postgres_export), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Code of Conduct
I agree to follow Percona Community Code of Conduct
The text was updated successfully, but these errors were encountered:
Hello @Yuskovich, we are working on fixing this problem and we released some improvements in PMM 2.40.1 and going to provide more improvements in PMM 2.41.0. Please upgrade to 2.40.1 and provide feedback if it helped you.
Description
pmm-client docker container has limit 32GB RAM and with random period of time,usually half hour, container reaches the RAM usage limit of 32GB and oom kill postgre_exporter inside container. The container is running on the same system, as monitored postgresql.
OS (monitored system): Ubuntu 20.04.4 LTS (Focal Fossa)
Linux kernel (monitored system): Linux HOSTNAME_REMOVED 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Docker image of pmm-client: percona/pmm-client: 2.39.0 (same problem on 2.33, 2.36, 2.40.1)
PMM servers version: 2.39 (same problem on 2.33, 2.36, 2.40.1)
Monitored service: postgresql (14.9)
Total RAM on monitored postgresql server: 128Gb
Available memory on the monitored host during reaching limit is about 59GB:
Expected Results
pmm-client container does not reach limit 32GB RAM
Actual Results
pmm-client container reaches its memory usage limit of 32GB
Version
PMM Server v2.39, PMM client 2.39
Steps to reproduce
Create new postgres cluster
Create many schemas (our case is about 10k)
Create many empty tables (our case is about 70k)
deploy pmm-client in docker and add a PostgreSQL service
Relevant logs
Code of Conduct
The text was updated successfully, but these errors were encountered: