Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high memory usage in logmon #9858

Closed
anastazya opened this issue Jan 20, 2021 · 55 comments · Fixed by #11261
Closed

high memory usage in logmon #9858

anastazya opened this issue Jan 20, 2021 · 55 comments · Fixed by #11261

Comments

@anastazya
Copy link

I have a cluster of 20 nodes, all running "raw-exec" tasks in PHP.

At random intervals after a while i get a lot of OOM's.
I find the server with 100% swap usage and looking like this :

Screenshot 2021-01-20 at 17 45 21

If i restart the nomad agent, it all goes back to normal for a while.

I also get this in nomad log : "2021-01-20T17:52:28.522+0200 [INFO] client.gc: garbage collection skipped because no terminal allocations: reason="number of allocations (89) is over the limit (50)" <--- that message is extremely ambiguous as everything runs normal and nomad was just restarted.

@tgross
Copy link
Member

tgross commented Jan 20, 2021

Hi @anastazya! The htop output you have there is displaying all the threads for the Nomad agent process (all the green-colored lines). So that PID 1146 has only ~1GB RSS. Can you provide the rest of the memory stats? ps -eo pid,tid,class,rtprio,stat,vsz,rss,comm will give a reasonable table of the whole host.

@anastazya
Copy link
Author

anastazya commented Jan 20, 2021

ps -eo pid,tid,class,rtprio,stat,vsz,rss,comm

I can give the stats for another host in the same state, cause i restarted nomad on the host that i took the screenshot.

  PID   TID CLS RTPRIO STAT    VSZ   RSS COMMAND
    1     1 TS       - Ss    52876  4204 systemd
    2     2 TS       - S         0     0 kthreadd
    4     4 TS       - S<        0     0 kworker/0:0H
    6     6 TS       - S         0     0 ksoftirqd/0
    7     7 FF      99 S         0     0 migration/0
    8     8 TS       - S         0     0 rcu_bh
    9     9 TS       - S         0     0 rcu_sched
   10    10 TS       - S<        0     0 lru-add-drain
   11    11 FF      99 S         0     0 watchdog/0
   12    12 FF      99 S         0     0 watchdog/1
   13    13 FF      99 S         0     0 migration/1
   14    14 TS       - S         0     0 ksoftirqd/1
   16    16 TS       - S<        0     0 kworker/1:0H
   17    17 FF      99 S         0     0 watchdog/2
   18    18 FF      99 S         0     0 migration/2
   19    19 TS       - S         0     0 ksoftirqd/2
   21    21 TS       - S<        0     0 kworker/2:0H
   22    22 FF      99 S         0     0 watchdog/3
   23    23 FF      99 S         0     0 migration/3
   24    24 TS       - S         0     0 ksoftirqd/3
   26    26 TS       - S<        0     0 kworker/3:0H
   27    27 FF      99 S         0     0 watchdog/4
   28    28 FF      99 S         0     0 migration/4
   29    29 TS       - S         0     0 ksoftirqd/4
   31    31 TS       - S<        0     0 kworker/4:0H
   32    32 FF      99 S         0     0 watchdog/5
   33    33 FF      99 S         0     0 migration/5
   34    34 TS       - S         0     0 ksoftirqd/5
   36    36 TS       - S<        0     0 kworker/5:0H
   37    37 FF      99 S         0     0 watchdog/6
   38    38 FF      99 S         0     0 migration/6
   39    39 TS       - S         0     0 ksoftirqd/6
   41    41 TS       - S<        0     0 kworker/6:0H
   42    42 FF      99 S         0     0 watchdog/7
   43    43 FF      99 S         0     0 migration/7
   44    44 TS       - S         0     0 ksoftirqd/7
   46    46 TS       - S<        0     0 kworker/7:0H
   47    47 FF      99 S         0     0 watchdog/8
   48    48 FF      99 S         0     0 migration/8
   49    49 TS       - S         0     0 ksoftirqd/8
   51    51 TS       - S<        0     0 kworker/8:0H
   52    52 FF      99 S         0     0 watchdog/9
   53    53 FF      99 S         0     0 migration/9
   54    54 TS       - S         0     0 ksoftirqd/9
   56    56 TS       - S<        0     0 kworker/9:0H
   57    57 FF      99 S         0     0 watchdog/10
   58    58 FF      99 S         0     0 migration/10
   59    59 TS       - S         0     0 ksoftirqd/10
   61    61 TS       - S<        0     0 kworker/10:0H
   62    62 FF      99 S         0     0 watchdog/11
   63    63 FF      99 S         0     0 migration/11
   64    64 TS       - S         0     0 ksoftirqd/11
   66    66 TS       - S<        0     0 kworker/11:0H
   68    68 TS       - S         0     0 kdevtmpfs
   69    69 TS       - S<        0     0 netns
   70    70 TS       - S         0     0 khungtaskd
   71    71 TS       - S<        0     0 writeback
   72    72 TS       - S<        0     0 kintegrityd
   73    73 TS       - S<        0     0 bioset
   74    74 TS       - S<        0     0 bioset
   75    75 TS       - S<        0     0 bioset
   76    76 TS       - S<        0     0 kblockd
   77    77 TS       - S<        0     0 md
   78    78 TS       - S<        0     0 edac-poller
   79    79 TS       - S<        0     0 watchdogd
   85    85 TS       - S         0     0 kswapd0
   86    86 TS       - SN        0     0 ksmd
   87    87 TS       - SN        0     0 khugepaged
   88    88 TS       - S<        0     0 crypto
   96    96 TS       - S<        0     0 kthrotld
   98    98 TS       - S<        0     0 kmpath_rdacd
   99    99 TS       - S<        0     0 kaluad
  101   101 TS       - S<        0     0 kpsmoused
  102   102 TS       - S<        0     0 ipv6_addrconf
  115   115 TS       - S<        0     0 deferwq
  151   151 TS       - S         0     0 kauditd
  359   359 TS       - S<        0     0 ata_sff
  364   364 TS       - S         0     0 scsi_eh_0
  365   365 TS       - S<        0     0 scsi_tmf_0
  366   366 TS       - S         0     0 scsi_eh_1
  367   367 TS       - S<        0     0 scsi_tmf_1
  372   372 TS       - S<        0     0 ttm_swap
  378   378 TS       - S<        0     0 virtscsi-scan
  379   379 TS       - S         0     0 scsi_eh_2
  380   380 TS       - S<        0     0 scsi_tmf_2
  418   418 TS       - S<        0     0 kworker/0:1H
  459   459 TS       - S<        0     0 kdmflush
  460   460 TS       - S<        0     0 bioset
  469   469 TS       - S<        0     0 kdmflush
  470   470 TS       - S<        0     0 bioset
  485   485 TS       - S<        0     0 bioset
  486   486 TS       - S<        0     0 xfsalloc
  487   487 TS       - S<        0     0 xfs_mru_cache
  488   488 TS       - S<        0     0 xfs-buf/dm-0
  489   489 TS       - S<        0     0 xfs-data/dm-0
  490   490 TS       - S<        0     0 xfs-conv/dm-0
  491   491 TS       - S<        0     0 xfs-cil/dm-0
  492   492 TS       - S<        0     0 xfs-reclaim/dm-
  493   493 TS       - S<        0     0 xfs-log/dm-0
  494   494 TS       - S<        0     0 xfs-eofblocks/d
  495   495 TS       - S         0     0 xfsaild/dm-0
  577   577 TS       - Ss   240436 136660 systemd-journal
  597   597 TS       - Ss   198572   428 lvmetad
  613   613 TS       - Ss    44980   780 systemd-udevd
  660   660 TS       - S         0     0 hwrng
  669   669 TS       - S<        0     0 kworker/2:1H
  758   758 TS       - S         0     0 jbd2/sda1-8
  759   759 TS       - S<        0     0 ext4-rsv-conver
  780   780 TS       - S<sl  55532   812 auditd
  786   786 TS       - S<        0     0 rpciod
  787   787 TS       - S<        0     0 xprtiod
  810   810 TS       - Ss    69280   856 rpcbind
  811   811 TS       - Ss    21672   596 irqbalance
  812   812 TS       - Ssl  612364  8324 polkitd
  813   813 TS       - Ss    12032   320 statsd-aggregat
  814   814 TS       - Ss    58232  1468 dbus-daemon
  826   826 TS       - Ssl  195204   708 gssproxy
  834   834 TS       - Ss    44220  1448 qemu-ga
  853   853 TS       - Ss    26828  1664 systemd-logind
  858   858 TS       - Ss    47296  1480 ntpd
  880   880 TS       - Ssl  527804 15492 python
  885   885 TS       - Ss   126388  1028 crond
  994   994 TS       - S<        0     0 kworker/8:1H
 1001  1001 TS       - S<        0     0 kworker/9:1H
 1002  1002 TS       - S<        0     0 kworker/4:1H
 1057  1057 TS       - Ss   17253808 77440 php-fpm
 1059  1059 TS       - Ssl  716956 39420 mcollectived
 1063  1063 TS       - Ss   116740  2380 lldpd
 1066  1066 TS       - Ss   112924  2320 sshd
 1071  1071 TS       - Ssl  261420 48932 puppet
 1073  1073 TS       - S    116740  1412 lldpd
 1076  1076 TS       - Ssl  574300 12680 tuned
 1078  1078 TS       - Ssl  719924 22020 node_exporter
 1080  1080 TS       - Ssl  5099008 1507836 nomad
 1099  1099 TS       - SNl  4806612 391428 java
 1113  1113 TS       - Ssl  793528  3508 collectd
 1186  1186 TS       - Ssl  713480 72144 rsyslogd
 1212  1212 TS       - Ssl  321332 240152 consul-template
 1230  1230 TS       - Ssl  1415096 22308 containerd
 1256  1256 TS       - SNsl 117444  2468 osqueryd
 1282  1282 TS       - Ss    98408 15468 haproxy
 1314  1314 TS       - S    17253808 9048 php-fpm
 1315  1315 TS       - S    17253808 9048 php-fpm
 1318  1318 TS       - S    17253808 9048 php-fpm
 1322  1322 TS       - S    17253808 9048 php-fpm
 1324  1324 TS       - S    17253808 9052 php-fpm
 1330  1330 TS       - SNl  832772 20452 osqueryd
 1344  1344 TS       - Ss+  110204   300 agetty
 1528  1528 TS       - Ssl  1143392 50756 dockerd
 1658  1658 TS       - Ssl  134196 12244 unbound
 1666  1666 TS       - Ss    89708  1352 master
 1690  1690 TS       - S     89992  2100 qmgr
 1810  1810 TS       - S         0     0 kworker/10:1
 1821  1821 TS       - Sl   1692208 22136 sensu-client
 2248  2248 TS       - S<        0     0 kworker/1:1H
 2620  2620 TS       - Ssl  1672444 15528 nomad
 2645  2645 TS       - S    626840 84268 php
 3060  3060 TS       - Sl   1928244 132140 nomad
 3067  3067 TS       - Sl   1729264 66020 nomad
 3078  3078 TS       - Sl   1666356 135940 nomad
 3079  3079 TS       - Sl   1869568 332664 nomad
 3106  3106 TS       - Sl   1869312 324672 nomad
 3112  3112 TS       - Sl   1663728 65752 nomad
 3123  3123 TS       - Sl   1665844 142764 nomad
 3135  3135 TS       - Sl   1943300 323592 nomad
 3144  3144 TS       - Sl   1804288 320004 nomad
 3164  3164 TS       - Sl   1598448 69464 nomad
 3275  3275 TS       - Sl   1875392 254960 nomad
 3288  3288 TS       - Sl   1729264 70084 nomad
 3305  3305 TS       - Sl   1664048 70556 nomad
 3315  3315 TS       - Sl   1945640 424600 nomad
 3333  3333 TS       - Sl   1597680 71896 nomad
 3344  3344 TS       - Sl   2011176 421620 nomad
 3357  3357 TS       - Sl   1741092 418900 nomad
 3358  3358 TS       - Sl   1532976 69468 nomad
 3377  3377 TS       - Sl   1597680 83116 nomad
 3527  3527 TS       - Sl   1664048 68144 nomad
 3528  3528 TS       - Sl   1598192 65324 nomad
 3534  3534 TS       - Sl   1663728 72516 nomad
 3541  3541 TS       - Sl   1794544 76672 nomad
 3564  3564 TS       - Sl   1729072 79336 nomad
 3574  3574 TS       - Sl   1532976 65360 nomad
 3604  3604 TS       - Sl   1663728 81436 nomad
 3628  3628 TS       - Sl   1532720 74684 nomad
 3665  3665 TS       - Sl   1598192 86096 nomad
 3700  3700 TS       - Sl   1466864 62912 nomad
 3721  3721 TS       - Sl   1532400 63456 nomad
 3753  3753 TS       - Sl   1663728 80948 nomad
 3801  3801 TS       - Sl   1598448 75284 nomad
 3829  3829 TS       - Sl   1728496 84708 nomad
 3858  3858 TS       - Sl   1597936 71556 nomad
 3902  3902 TS       - Sl   1663472 89948 nomad
 3944  3944 TS       - Sl   1466608 61384 nomad
 3945  3945 TS       - Sl   1598192 67168 nomad
 3959  3959 TS       - Sl   1598192 64032 nomad
 4004  4004 TS       - Sl   1467376 77692 nomad
 4037  4037 TS       - Sl   1598192 79208 nomad
 4067  4067 TS       - Sl   1467376 69844 nomad
 4098  4098 TS       - Sl   1467120 77180 nomad
 4128  4128 TS       - Sl   1598448 67808 nomad
 4153  4153 TS       - Sl   1532912 64836 nomad
 4188  4188 TS       - Sl   1597680 69220 nomad
 4226  4226 TS       - Sl   1663472 83192 nomad
 4345  4345 TS       - Sl   1597936 60992 nomad
 4369  4369 TS       - Sl   1532656 80740 nomad
 4395  4395 TS       - Sl   1532656 68400 nomad
 4408  4408 TS       - Sl   1598448 61784 nomad
 4444  4444 TS       - Sl   1663792 78628 nomad
 4454  4454 TS       - Sl   1663984 70584 nomad
 4694  4694 TS       - S         0     0 kworker/4:0
 5642  5642 TS       - Ss   158928  5748 sshd
 5723  5723 TS       - Ss+  116472  3052 bash
 7355  7355 TS       - S<        0     0 kworker/3:1H
 7391  7391 TS       - Ssl  1532152 18224 nomad
 7402  7402 TS       - S    624668 88888 php
 7830  7830 TS       - S         0     0 kworker/u192:1
 7980  7980 TS       - S         0     0 kworker/10:2
 8084  8084 TS       - Sl   1595820 10828 nomad
 8441  8441 TS       - Ssl  1597944 18220 nomad
 8452  8452 TS       - S    533228 82400 php
 9335  9335 TS       - Ssl  1532152 18352 nomad
 9345  9345 TS       - S    552968 100960 php
 9714  9714 TS       - S<        0     0 kworker/7:1H
10637 10637 TS       - Ssl  1597944 17968 nomad
10652 10652 TS       - S    521780 70132 php
12436 12436 TS       - S         0     0 kworker/8:1
12490 12490 TS       - S         0     0 kworker/9:1
13161 13161 TS       - S         0     0 kworker/0:0
13252 13252 TS       - S         0     0 kworker/1:1
14079 14079 TS       - S<        0     0 kworker/11:1H
14947 14947 TS       - S         0     0 kworker/0:2
14955 14955 TS       - S<        0     0 kworker/6:1H
14979 14979 TS       - S         0     0 kworker/5:2
15116 15116 TS       - S         0     0 kworker/6:1
15737 15737 TS       - Ssl  1646832 17792 nomad
15747 15747 TS       - S    519732 68740 php
15755 15755 TS       - Ssl  1449968 17824 nomad
15774 15774 TS       - S    519732 68736 php
15803 15803 TS       - Ssl  1597944 18020 nomad
15813 15813 TS       - S    519744 68736 php
15815 15815 TS       - Ssl  1598200 18036 nomad
15825 15825 TS       - S    519732 68736 php
15963 15963 TS       - Ssl  1507052 17860 nomad
15974 15974 TS       - S    519732 68732 php
16056 16056 TS       - Ssl  1507564 18332 nomad
16067 16067 TS       - S    519732 68732 php
16116 16116 TS       - Ssl  1515248 17980 nomad
16126 16126 TS       - S    519732 68732 php
16133 16133 TS       - Ssl  1597176 17888 nomad
16144 16144 TS       - S    519744 68736 php
19972 19972 TS       - Ssl  1573100 18004 nomad
19982 19982 TS       - S    526196 73520 php
20260 20260 TS       - Ssl  1523956 18168 nomad
20271 20271 TS       - S    519732 68720 php
20277 20277 TS       - S         0     0 kworker/7:2
20278 20278 TS       - S         0     0 kworker/7:3
20279 20279 TS       - S         0     0 kworker/7:4
20280 20280 TS       - S         0     0 kworker/7:5
20281 20281 TS       - S         0     0 kworker/7:6
20282 20282 TS       - S         0     0 kworker/7:7
20283 20283 TS       - S         0     0 kworker/7:8
20284 20284 TS       - S         0     0 kworker/7:9
20285 20285 TS       - S         0     0 kworker/7:10
20286 20286 TS       - S         0     0 kworker/7:11
20287 20287 TS       - S         0     0 kworker/7:12
20288 20288 TS       - S         0     0 kworker/7:13
20289 20289 TS       - S         0     0 kworker/7:14
20290 20290 TS       - S         0     0 kworker/7:15
20291 20291 TS       - S         0     0 kworker/7:16
20292 20292 TS       - S         0     0 kworker/7:17
20293 20293 TS       - S         0     0 kworker/7:18
20294 20294 TS       - S         0     0 kworker/7:19
20295 20295 TS       - S         0     0 kworker/7:20
20296 20296 TS       - S         0     0 kworker/7:21
20297 20297 TS       - S         0     0 kworker/7:22
20298 20298 TS       - S         0     0 kworker/7:23
20299 20299 TS       - S         0     0 kworker/7:24
20300 20300 TS       - S         0     0 kworker/7:25
20301 20301 TS       - S         0     0 kworker/7:26
20302 20302 TS       - S         0     0 kworker/7:27
20303 20303 TS       - S         0     0 kworker/7:28
20304 20304 TS       - S         0     0 kworker/7:29
20305 20305 TS       - S         0     0 kworker/7:30
20306 20306 TS       - Ssl  1450224 18360 nomad
20316 20316 TS       - S    519732 68600 php
20318 20318 TS       - Ssl  1523956 18092 nomad
20328 20328 TS       - S    519732 68604 php
20330 20330 TS       - Ssl  1458932 17980 nomad
20337 20337 TS       - Ssl  1524212 18156 nomad
20346 20346 TS       - S    519744 68604 php
20351 20351 TS       - S    519744 68600 php
20424 20424 TS       - S         0     0 kworker/0:3
20425 20425 TS       - S         0     0 kworker/0:4
20426 20426 TS       - S         0     0 kworker/0:5
20427 20427 TS       - S         0     0 kworker/0:6
20428 20428 TS       - S         0     0 kworker/0:7
20429 20429 TS       - S         0     0 kworker/0:8
20430 20430 TS       - S         0     0 kworker/0:9
20432 20432 TS       - S         0     0 kworker/0:10
20433 20433 TS       - S         0     0 kworker/0:11
20434 20434 TS       - S         0     0 kworker/0:12
20435 20435 TS       - S         0     0 kworker/0:13
20437 20437 TS       - S         0     0 kworker/0:14
20438 20438 TS       - S         0     0 kworker/0:15
20440 20440 TS       - S         0     0 kworker/0:16
20441 20441 TS       - S         0     0 kworker/0:17
20442 20442 TS       - S         0     0 kworker/0:18
20443 20443 TS       - S         0     0 kworker/0:19
20446 20446 TS       - S         0     0 kworker/0:20
20461 20461 TS       - S         0     0 kworker/3:0
20483 20483 TS       - Ssl  1597688 18124 nomad
20493 20493 TS       - S    543012 90800 php
20660 20660 TS       - S         0     0 kworker/10:0
20661 20661 TS       - Ssl  1581040 18376 nomad
20671 20671 TS       - S    552692 100416 php
20673 20673 TS       - S         0     0 kworker/8:0
20805 20805 TS       - S<        0     0 kworker/10:1H
23052 23052 TS       - Ssl  1449968 17792 nomad
23064 23064 TS       - S    535252 83784 php
23200 23200 TS       - Ssl  1515760 17536 nomad
23212 23212 TS       - S    521784 69528 php
23633 23633 TS       - Ssl  1450224 17576 nomad
23644 23644 TS       - S    536956 85500 php
23713 23713 TS       - Ssl  1663992 18200 nomad
23723 23723 TS       - S    553080 101248 php
25207 25207 TS       - Ssl  1376492 17964 nomad
25218 25218 TS       - S    526004 72996 php
26538 26538 TS       - Ssl  1467128 18552 nomad
26550 26550 TS       - Ssl  1597688 18736 nomad
26562 26562 TS       - S    550208 98308 php
26568 26568 TS       - S    549868 98472 php
26583 26583 TS       - Ssl  1523700 17432 nomad
26593 26593 TS       - S    519732 68732 php
26640 26640 TS       - Ssl  1433576 17728 nomad
26651 26651 TS       - S    519732 68632 php
26680 26680 TS       - Ssl  1598200 18540 nomad
26690 26690 TS       - S    549624 98304 php
26692 26692 TS       - Ssl  1597944 18688 nomad
26703 26703 TS       - S    549884 97904 php
26784 26784 TS       - Ssl  1663736 18964 nomad
26795 26795 TS       - S    549836 98120 php
26837 26837 TS       - Ssl  1663736 18812 nomad
26848 26848 TS       - S    549648 97464 php
27156 27156 TS       - Ssl  1384432 18416 nomad
27173 27173 TS       - S    523428 72012 php
27694 27694 TS       - Sl   1456552 10712 nomad
27793 27793 TS       - Sl   1660908 10212 nomad
27898 27898 TS       - Ssl  1376236 17868 nomad
27909 27909 TS       - S    521780 69524 php
27946 27946 TS       - Sl   1522088 8816 nomad
28009 28009 TS       - Ssl  1441260 18060 nomad
28019 28019 TS       - S    521780 69524 php
28106 28106 TS       - Sl   1464492 8808 nomad
28130 28130 TS       - Sl   1661100 8676 nomad
28131 28131 TS       - Sl   1595884 10188 nomad
28400 28400 TS       - Sl   1660588 9776 nomad
29624 29624 TS       - S<        0     0 kworker/5:1H
29724 29724 TS       - Ssl  1597944 18808 nomad
29735 29735 TS       - S    553444 101292 php
30323 30323 TS       - Ss   158928  5696 sshd
30326 30326 TS       - D    159240  2600 sshd
30327 30327 TS       - Ss   116476  3024 bash
30532 30532 TS       - S    241328  4712 sudo
30533 30533 TS       - S    191876  2364 su
30534 30534 TS       - S    116492  3052 bash
30544 30544 TS       - Ssl  1597944 18120 nomad
30561 30561 TS       - S    536364 83904 php
31040 31040 TS       - Ssl  1062996 15036 nomad
31049 31049 TS       - Ssl  1128788 15360 nomad
31067 31067 TS       - S    521648 68732 php
31069 31069 TS       - S    521648 68732 php
31080 31080 TS       - Sl   1399212 9732 nomad
31090 31090 TS       - S    113280  1416 bash
31108 31108 TS       - Sl   785904 27808 consul
31119 31119 TS       - R+   153348  1504 ps
31692 31692 TS       - Sl   1399468 9276 nomad
31835 31835 TS       - Sl   1464748 10112 nomad
32327 32327 TS       - Ssl  1663480 19052 nomad
32338 32338 TS       - S    536452 85600 php
32378 32378 TS       - S         0     0 kworker/11:1
32978 32978 TS       - S         0     0 kworker/2:2
33060 33060 TS       - Sl   1398956 10128 nomad
33344 33344 TS       - Sl   1333164 10748 nomad
33367 33367 TS       - Sl   1399468 11320 nomad
34118 34118 TS       - Ssl  1532664 17568 nomad
34129 34129 TS       - S    521648 68736 php
34130 34130 TS       - Ssl  1532664 18948 nomad
34141 34141 TS       - S    521652 68740 php
35483 35483 TS       - Sl   1932732 241616 nomad
35501 35501 TS       - Sl   2002980 408460 nomad
35502 35502 TS       - Sl   1806628 407244 nomad
35504 35504 TS       - Sl   1814824 406064 nomad
35531 35531 TS       - Sl   1945896 411540 nomad
35553 35553 TS       - Sl   2002980 434228 nomad
35555 35555 TS       - Sl   1937700 418228 nomad
35565 35565 TS       - Sl   1806628 424916 nomad
35594 35594 TS       - Sl   1872164 428600 nomad
35599 35599 TS       - Sl   2011176 421008 nomad
35643 35643 TS       - Sl   1598512 65360 nomad
35672 35672 TS       - Sl   1532656 65568 nomad
35691 35691 TS       - Sl   1532464 64748 nomad
35693 35693 TS       - Sl   1729264 67460 nomad
35710 35710 TS       - Sl   1532464 66608 nomad
35778 35778 TS       - Sl   1729520 64968 nomad
35784 35784 TS       - Sl   1598000 66292 nomad
35791 35791 TS       - Sl   1663472 68904 nomad
35798 35798 TS       - Sl   1467120 66256 nomad
35831 35831 TS       - Sl   2002980 414260 nomad
35900 35900 TS       - Sl   1401392 71608 nomad
35912 35912 TS       - Sl   2002980 423752 nomad
35914 35914 TS       - Sl   1467120 66736 nomad
35931 35931 TS       - Sl   1597936 69128 nomad
35961 35961 TS       - Sl   1533232 78188 nomad
36011 36011 TS       - Sl   1598000 74420 nomad
36050 36050 TS       - Sl   1598192 64412 nomad
36727 36727 TS       - S         0     0 kworker/7:0
39137 39137 TS       - S         0     0 kworker/3:1
40264 40264 TS       - Ssl  1597944 18280 nomad
40274 40274 TS       - S    521648 68872 php
42031 42031 TS       - Ssl  1597944 18072 nomad
42042 42042 TS       - S    629892 101148 php
43813 43813 TS       - S         0     0 kworker/6:2
50847 50847 TS       - Ssl  1598200 18432 nomad
50858 50858 TS       - R    564072 112376 php
52182 52182 TS       - S         0     0 kworker/u192:2
52844 52844 TS       - Ssl  1794552 18264 nomad
52854 52854 TS       - S    521652 68760 php
53710 53710 TS       - Ssl  1598200 18836 nomad
53721 53721 TS       - S    611952 75528 php
54540 54540 TS       - Ssl  1729016 18196 nomad
54551 54551 TS       - S    521648 68744 php
56017 56017 TS       - S         0     0 kworker/1:0
56643 56643 TS       - Ssl  1663736 18328 nomad
56654 56654 TS       - S    550656 99024 php
56756 56756 TS       - Ssl  1597432 18024 nomad
56767 56767 TS       - S    521648 68740 php
58710 58710 TS       - Ssl  1598200 19424 nomad
58721 58721 TS       - S    626860 98992 php
58860 58860 TS       - Ssl  1532152 18212 nomad
58872 58872 TS       - S    521648 68840 php
59376 59376 TS       - Ssl  1532152 18200 nomad
59390 59390 TS       - S    521648 68840 php
59444 59444 TS       - Ssl  1466616 18264 nomad
59455 59455 TS       - S    521652 68840 php
60018 60018 TS       - S         0     0 kworker/4:2
60788 60788 TS       - Ssl  1597944 18004 nomad
60799 60799 TS       - S    521648 68736 php
61008 61008 TS       - Ssl  1729272 18008 nomad
61018 61018 TS       - S    521648 68736 php
61020 61020 TS       - Ssl  1663480 17980 nomad
61031 61031 TS       - S    521648 68736 php
63084 63084 TS       - S         0     0 kworker/2:0
63701 63701 TS       - S         0     0 kworker/5:1
64856 64856 TS       - Ssl  1466872 18104 nomad
64866 64866 TS       - S    609924 80952 php
64915 64915 TS       - Ssl  1532408 18484 nomad
64927 64927 TS       - S    609956 80868 php
64928 64928 TS       - Ssl  1605884 18184 nomad
64938 64938 TS       - S    609844 80724 php
65295 65295 TS       - Ssl  1729016 18196 nomad
65306 65306 TS       - S    522124 71028 php
66651 66651 TS       - Ssl  1531896 17784 nomad
66661 66661 TS       - S    523828 71376 php
67059 67059 TS       - Ssl  1401400 18216 nomad
67069 67069 TS       - S    523828 71376 php
67213 67213 TS       - Ssl  1597944 18120 nomad
67224 67224 TS       - S    582592 130464 php
73161 73161 TS       - Sl   912420 18708 haproxy
79202 79202 TS       - Ssl  1597944 18156 nomad
79213 79213 TS       - S    519732 68732 php
80194 80194 TS       - Ssl  1532920 18136 nomad
80204 80204 TS       - S    525876 73156 php
82688 82688 TS       - Ssl  1458676 18476 nomad
82699 82699 TS       - S    522380 71488 php
82845 82845 TS       - Ssl  1532152 17964 nomad
82855 82855 TS       - S    539332 87284 php
82856 82856 TS       - Ssl  1728760 18204 nomad
82866 82866 TS       - S    539144 86672 php
82928 82928 TS       - Ssl  1532408 17992 nomad
82939 82939 TS       - S    539056 86876 php
83027 83027 TS       - Ssl  1524468 17520 nomad
83037 83037 TS       - S    539260 87016 php
83122 83122 TS       - Ssl  1663224 18024 nomad
83132 83132 TS       - S    539368 87136 php
83133 83133 TS       - Ssl  1663736 18792 nomad
83144 83144 TS       - S    539296 86964 php
83146 83146 TS       - Ssl  1729528 18408 nomad
83156 83156 TS       - S    537108 85280 php
83742 83742 TS       - Ssl  1532664 17628 nomad
83753 83753 TS       - S    539288 87332 php
85638 85638 TS       - S         0     0 kworker/8:2
86302 86302 TS       - Ssl  1466616 18236 nomad
86303 86303 TS       - Ssl  1523956 18620 nomad
86322 86322 TS       - S    521804 68748 php
86323 86323 TS       - S    521804 69012 php
86325 86325 TS       - Ssl  1794808 17688 nomad
86332 86332 TS       - Ssl  1532152 17936 nomad
86342 86342 TS       - S    521804 68744 php
86347 86347 TS       - S    521804 68748 php
86349 86349 TS       - Ssl  1597944 18140 nomad
86360 86360 TS       - S    521804 69012 php
86361 86361 TS       - Ssl  1597944 18628 nomad
86372 86372 TS       - S    521808 68748 php
86373 86373 TS       - Ssl  1598200 18700 nomad
86386 86386 TS       - S    521808 68744 php
86387 86387 TS       - Ssl  1532664 17776 nomad
86399 86399 TS       - S    521804 69012 php
86401 86401 TS       - Ssl  1597688 18352 nomad
86411 86411 TS       - S    521804 68744 php
87124 87124 TS       - Ssl  1597432 17920 nomad
87135 87135 TS       - S    523112 70484 php
87779 87779 TS       - S         0     0 kworker/9:2
88651 88651 TS       - Ssl  1597432 17824 nomad
88662 88662 TS       - S    523320 71008 php
89440 89440 TS       - S     89812  4060 pickup
90989 90989 TS       - S         0     0 kworker/11:2
91605 91605 TS       - Ssl  1532472 18860 nomad
91616 91616 TS       - S    524788 72560 php
91825 91825 TS       - S         0     0 kworker/1:2
92695 92695 TS       - Ssl  1523700 18040 nomad
92706 92706 TS       - S    523228 72428 php
94473 94473 TS       - Ssl  1663480 18056 nomad
94482 94482 TS       - S    519732 68672 php
94586 94586 TS       - Ssl  905640  9716 filebeat
94906 94906 TS       - S         0     0 kworker/7:1
95248 95248 TS       - S         0     0 kworker/0:1
95554 95554 TS       - Ssl  1598200 18116 nomad
95562 95562 TS       - Ssl  789232 48752 consul
95574 95574 TS       - S    536040 83960 php
95799 95799 TS       - Ssl  1655284 18664 nomad
95810 95810 TS       - S    559348 109760 php
97514 97514 TS       - S         0     0 kworker/3:2

@tgross
Copy link
Member

tgross commented Jan 20, 2021

Ok, so the first thing I worried about was leaking Nomad processes. You've got 93 php processes (not counting the php-fpm) and 187 nomad processes. That's 2 nomad processes for each php process, plus 1 more for the Nomad agent. That's what I'd expect to see, as each task has a task executor and a logmon log collector. We'd generally expect that to add up to a couple of GB of RSS just because of the sheer number of processes.

If we do a bit of math we can see that ~7GB of RSS are dedicated to php but a whopping 14.6GB is dedicated to nomad! I plotted a quickie histogram and (if we discard the outlier which is the Nomad agent), we can see there's a group of ~20 processes that are eating up almost half of that total and are individually 100-300MB of RSS.

Screen Shot 2021-01-20 at 1 54 03 PM

A couple things that would be useful to help debug this:

  • Are those large processes executor or logmon processes?
  • What exact Nomad version are you using (nomad --version)?

It might also be useful to see if there's a relationship between the etimes and the RSS of the Nomad process. Something like ps -eo pid,comm,args,rss,etimes would give us a table of that relationship, including the args that let us differentiate between executor and logmon, or if you have graphs from whatever you're sending those collectd metrics to. That would let us see if one of the specific processes has a memory leak.

I've done some very quick tests here and I don't see any obvious leaks on the executor or logmon for a process that's doing frequent logging. But you may be encountering this issue more seriously because of the large number of tasks on the host. It looks like you're running a task-per-php, so if it's possible to spawn them behind a single php-fpm you might have efficiency gains that will help reduce the problem while we work on figuring this out.

(Also, I hope you don't mind but I edited your last response to wrap that ps dump in a <details> block for readability.)

@anastazya
Copy link
Author

anastazya commented Jan 21, 2021

Nomad v1.0.0 (cfca640)

The rest of the details i will compile today as this is somewhat production environment.

 PID COMMAND         COMMAND                       RSS ELAPSED
    1 systemd         /usr/lib/systemd/systemd --  4184  700688
    2 kthreadd        [kthreadd]                      0  700688
    4 kworker/0:0H    [kworker/0:0H]                  0  700688
    6 ksoftirqd/0     [ksoftirqd/0]                   0  700688
    7 migration/0     [migration/0]                   0  700688
    8 rcu_bh          [rcu_bh]                        0  700688
    9 rcu_sched       [rcu_sched]                     0  700688
   10 lru-add-drain   [lru-add-drain]                 0  700688
   11 watchdog/0      [watchdog/0]                    0  700688
   12 watchdog/1      [watchdog/1]                    0  700688
   13 migration/1     [migration/1]                   0  700688
   14 ksoftirqd/1     [ksoftirqd/1]                   0  700688
   16 kworker/1:0H    [kworker/1:0H]                  0  700688
   17 watchdog/2      [watchdog/2]                    0  700688
   18 migration/2     [migration/2]                   0  700688
   19 ksoftirqd/2     [ksoftirqd/2]                   0  700688
   21 kworker/2:0H    [kworker/2:0H]                  0  700688
   22 watchdog/3      [watchdog/3]                    0  700688
   23 migration/3     [migration/3]                   0  700688
   24 ksoftirqd/3     [ksoftirqd/3]                   0  700688
   26 kworker/3:0H    [kworker/3:0H]                  0  700688
   27 watchdog/4      [watchdog/4]                    0  700688
   28 migration/4     [migration/4]                   0  700688
   29 ksoftirqd/4     [ksoftirqd/4]                   0  700688
   31 kworker/4:0H    [kworker/4:0H]                  0  700688
   32 watchdog/5      [watchdog/5]                    0  700688
   33 migration/5     [migration/5]                   0  700688
   34 ksoftirqd/5     [ksoftirqd/5]                   0  700688
   36 kworker/5:0H    [kworker/5:0H]                  0  700688
   37 watchdog/6      [watchdog/6]                    0  700688
   38 migration/6     [migration/6]                   0  700688
   39 ksoftirqd/6     [ksoftirqd/6]                   0  700688
   41 kworker/6:0H    [kworker/6:0H]                  0  700688
   42 watchdog/7      [watchdog/7]                    0  700688
   43 migration/7     [migration/7]                   0  700688
   44 ksoftirqd/7     [ksoftirqd/7]                   0  700688
   46 kworker/7:0H    [kworker/7:0H]                  0  700688
   47 watchdog/8      [watchdog/8]                    0  700688
   48 migration/8     [migration/8]                   0  700688
   49 ksoftirqd/8     [ksoftirqd/8]                   0  700688
   51 kworker/8:0H    [kworker/8:0H]                  0  700688
   52 watchdog/9      [watchdog/9]                    0  700688
   53 migration/9     [migration/9]                   0  700688
   54 ksoftirqd/9     [ksoftirqd/9]                   0  700688
   56 kworker/9:0H    [kworker/9:0H]                  0  700688
   57 watchdog/10     [watchdog/10]                   0  700688
   58 migration/10    [migration/10]                  0  700688
   59 ksoftirqd/10    [ksoftirqd/10]                  0  700688
   61 kworker/10:0H   [kworker/10:0H]                 0  700688
   62 watchdog/11     [watchdog/11]                   0  700688
   63 migration/11    [migration/11]                  0  700688
   64 ksoftirqd/11    [ksoftirqd/11]                  0  700688
   66 kworker/11:0H   [kworker/11:0H]                 0  700688
   68 kdevtmpfs       [kdevtmpfs]                     0  700688
   69 netns           [netns]                         0  700688
   70 khungtaskd      [khungtaskd]                    0  700688
   71 writeback       [writeback]                     0  700688
   72 kintegrityd     [kintegrityd]                   0  700688
   73 bioset          [bioset]                        0  700688
   74 bioset          [bioset]                        0  700688
   75 bioset          [bioset]                        0  700688
   76 kblockd         [kblockd]                       0  700688
   77 md              [md]                            0  700688
   78 edac-poller     [edac-poller]                   0  700688
   79 watchdogd       [watchdogd]                     0  700688
   85 kswapd0         [kswapd0]                       0  700688
   86 ksmd            [ksmd]                          0  700688
   87 khugepaged      [khugepaged]                    0  700688
   88 crypto          [crypto]                        0  700688
   96 kthrotld        [kthrotld]                      0  700688
   98 kmpath_rdacd    [kmpath_rdacd]                  0  700688
   99 kaluad          [kaluad]                        0  700688
  101 kpsmoused       [kpsmoused]                     0  700688
  102 ipv6_addrconf   [ipv6_addrconf]                 0  700688
  115 deferwq         [deferwq]                       0  700688
  153 kauditd         [kauditd]                       0  700688
  358 ata_sff         [ata_sff]                       0  700688
  362 scsi_eh_0       [scsi_eh_0]                     0  700688
  363 scsi_tmf_0      [scsi_tmf_0]                    0  700688
  364 scsi_eh_1       [scsi_eh_1]                     0  700688
  365 scsi_tmf_1      [scsi_tmf_1]                    0  700688
  370 ttm_swap        [ttm_swap]                      0  700687
  377 virtscsi-scan   [virtscsi-scan]                 0  700687
  378 scsi_eh_2       [scsi_eh_2]                     0  700687
  379 scsi_tmf_2      [scsi_tmf_2]                    0  700687
  414 kworker/0:1H    [kworker/0:1H]                  0  700687
  459 kdmflush        [kdmflush]                      0  700687
  460 bioset          [bioset]                        0  700687
  471 kdmflush        [kdmflush]                      0  700687
  472 bioset          [bioset]                        0  700687
  485 bioset          [bioset]                        0  700687
  486 xfsalloc        [xfsalloc]                      0  700687
  487 xfs_mru_cache   [xfs_mru_cache]                 0  700687
  488 xfs-buf/dm-0    [xfs-buf/dm-0]                  0  700687
  489 xfs-data/dm-0   [xfs-data/dm-0]                 0  700687
  490 xfs-conv/dm-0   [xfs-conv/dm-0]                 0  700687
  491 xfs-cil/dm-0    [xfs-cil/dm-0]                  0  700687
  492 xfs-reclaim/dm- [xfs-reclaim/dm-]               0  700687
  493 xfs-log/dm-0    [xfs-log/dm-0]                  0  700687
  494 xfs-eofblocks/d [xfs-eofblocks/d]               0  700687
  495 xfsaild/dm-0    [xfsaild/dm-0]                  0  700687
  577 systemd-journal /usr/lib/systemd/systemd-jo 38160  700687
  597 lvmetad         /usr/sbin/lvmetad -f         2544  700686
  613 systemd-udevd   /usr/lib/systemd/systemd-ud  1352  700686
  660 hwrng           [hwrng]                         0  700686
  695 kworker/5:1H    [kworker/5:1H]                  0  700686
  747 jbd2/sda1-8     [jbd2/sda1-8]                   0  700685
  748 ext4-rsv-conver [ext4-rsv-conver]               0  700685
  781 auditd          /sbin/auditd                  716  700685
  785 rpciod          [rpciod]                        0  700685
  786 xprtiod         [xprtiod]                       0  700685
  809 rpcbind         /sbin/rpcbind -w              636  700685
  810 statsd-aggregat /usr/bin/statsd-aggregator    436  700685
  813 dbus-daemon     /usr/bin/dbus-daemon --syst  1404  700685
  841 qemu-ga         /usr/bin/qemu-ga --method=v  1496  700685
  863 irqbalance      /usr/sbin/irqbalance --fore   696  700685
  878 ntpd            /usr/sbin/ntpd -u ntp:ntp -  1436  700685
  924 crond           /usr/sbin/crond -n           1120  700685
  995 kworker/8:1H    [kworker/8:1H]                  0  700685
  997 kworker/10:1H   [kworker/10:1H]                 0  700682
  998 kworker/2:1H    [kworker/2:1H]                  0  700682
 1060 node_exporter   /opt/node_exporter/node_exp 25040  700681
 1063 lldpd           /usr/sbin/lldpd              1668  700681
 1064 puppet          /usr/bin/ruby /usr/bin/pupp 37104  700681
 1073 lldpd           /usr/sbin/lldpd              1276  700681
 1079 sshd            /usr/sbin/sshd -D            1572  700681
 1081 tuned           /usr/bin/python2 -Es /usr/s 11940  700681
 1082 php-fpm         php-fpm: master process (/e 10224  700681
 1086 java            /bin/java -Djava.io.tmpdir= 307984 700681
 1102 consul          /usr/bin/consul agent -conf 50516  700681
 1169 rsyslogd        /usr/sbin/rsyslogd -n       24200  700681
 1197 consul-template /usr/bin/consul-template -c 235744 700681
 1207 containerd      /usr/bin/containerd         20388  700681
 1241 osqueryd        /usr/bin/osqueryd --flagfil  1228  700681
 1295 haproxy         /usr/sbin/haproxy -sf 3464  12328  700680
 1312 unbound         /usr/sbin/unbound -d        11540  700680
 1344 osqueryd        /usr/bin/osqueryd           24808  700680
 1418 agetty          /sbin/agetty --noclear tty1   408  700680
 1445 php-fpm         php-fpm: pool www             872  700680
 1447 php-fpm         php-fpm: pool www             896  700680
 1450 php-fpm         php-fpm: pool www             860  700680
 1451 php-fpm         php-fpm: pool www             892  700680
 1452 php-fpm         php-fpm: pool www            2948  700680
 1572 dockerd         /usr/bin/dockerd -H fd:// - 50260  700680
 1640 master          /usr/libexec/postfix/master  1428  700680
 1643 qmgr            qmgr -l -t unix -u           1468  700680
 1801 sensu-client    /opt/sensu/embedded/bin/rub 19284  700680
 2189 kworker/7:1H    [kworker/7:1H]                  0  700676
 2488 nomad           /usr/bin/nomad executor {"L 19744    1099
 2498 php             /usr/bin/php /var/www/worke 71344    1099
 2946 systemd-logind  /usr/lib/systemd/systemd-lo  1728  700666
 3006 nomad           /usr/bin/nomad executor {"L 18824    1089
 3017 php             /usr/bin/php /var/www/worke 87188    1089
 3110 nomad           /usr/bin/nomad executor {"L 10648  700666
 3116 nomad           /usr/bin/nomad executor {"L 16576  700666
 3157 nomad           /usr/bin/nomad executor {"L  5052  700666
 3160 collectd        /usr/sbin/collectd           1548  700666
 3173 nomad           /usr/bin/nomad executor {"L 14712  700666
 3192 nomad           /usr/bin/nomad logmon       13476  700666
 3237 nomad           /usr/bin/nomad executor {"L 12740  700666
 3238 nomad           /usr/bin/nomad logmon       17256  700666
 3241 nomad           /usr/bin/nomad executor {"L 14868  700666
 3258 nomad           /usr/bin/nomad executor {"L  5144  700666
 3288 nomad           /usr/bin/nomad executor {"L  4968  700666
 3304 nomad           /usr/bin/nomad executor {"L  5180  700666
 3308 nomad           /usr/bin/nomad executor {"L  5136  700666
 3842 nomad           /usr/bin/nomad logmon       15536  700658
 3844 nomad           /usr/bin/nomad logmon       17096  700658
 3864 nomad           /usr/bin/nomad logmon        5168  700658
 3870 nomad           /usr/bin/nomad logmon        5152  700658
 3892 nomad           /usr/bin/nomad logmon        5096  700658
 3893 nomad           /usr/bin/nomad logmon        5140  700658
 3894 nomad           /usr/bin/nomad logmon        5132  700658
 3905 nomad           /usr/bin/nomad logmon        4876  700658
 4073 nomad           /usr/bin/nomad agent -confi 1541964 700656
 4179 kworker/6:1H    [kworker/6:1H]                  0  700653
 4718 nomad           /usr/bin/nomad executor {"L 13560  700650
 4934 nomad           /usr/bin/nomad executor {"L 13616  700650
 5862 kworker/4:1H    [kworker/4:1H]                  0  700647
 6023 nomad           /usr/bin/nomad logmon       318864 700644
 6024 nomad           /usr/bin/nomad logmon       57544  700644
 6052 nomad           /usr/bin/nomad logmon       314160 700644
 6064 nomad           /usr/bin/nomad logmon       59340  700644
 6074 nomad           /usr/bin/nomad logmon       316272 700644
 6084 nomad           /usr/bin/nomad logmon       319728 700644
 6109 nomad           /usr/bin/nomad logmon       312672 700644
 6129 nomad           /usr/bin/nomad logmon       56112  700644
 6151 nomad           /usr/bin/nomad logmon       161628 700644
 6169 nomad           /usr/bin/nomad logmon       61172  700644
 6194 nomad           /usr/bin/nomad logmon       58776  700644
 6212 nomad           /usr/bin/nomad logmon       314736 700644
 6226 nomad           /usr/bin/nomad logmon       349440 700644
 6242 nomad           /usr/bin/nomad logmon       315620 700644
 6279 nomad           /usr/bin/nomad logmon       71804  700644
 6300 nomad           /usr/bin/nomad logmon       52312  700644
 6317 nomad           /usr/bin/nomad logmon       72740  700644
 6348 nomad           /usr/bin/nomad logmon       62420  700644
 6363 nomad           /usr/bin/nomad logmon       316572 700644
 6374 nomad           /usr/bin/nomad logmon       310544 700644
 6410 nomad           /usr/bin/nomad logmon       61344  700644
 6436 nomad           /usr/bin/nomad logmon       67696  700644
 6454 nomad           /usr/bin/nomad logmon       48784  700644
 6470 nomad           /usr/bin/nomad logmon       56688  700644
 6503 nomad           /usr/bin/nomad logmon       60448  700644
 6521 nomad           /usr/bin/nomad logmon       320804 700644
 6531 nomad           /usr/bin/nomad logmon       341828 700644
 6557 nomad           /usr/bin/nomad logmon       322156 700644
 6560 nomad           /usr/bin/nomad logmon       154872 700644
 6625 nomad           /usr/bin/nomad logmon       61668  700644
 6634 nomad           /usr/bin/nomad logmon       56748  700644
 6660 nomad           /usr/bin/nomad logmon       64356  700644
 6688 nomad           /usr/bin/nomad logmon       62024  700644
 6691 nomad           /usr/bin/nomad logmon       318896 700643
 6713 nomad           /usr/bin/nomad logmon       318220 700643
 6731 nomad           /usr/bin/nomad logmon       322184 700643
 6745 nomad           /usr/bin/nomad logmon       316588 700643
 6788 nomad           /usr/bin/nomad logmon       54464  700643
 6799 nomad           /usr/bin/nomad logmon       323116 700643
 6813 nomad           /usr/bin/nomad logmon       321448 700643
 6871 nomad           /usr/bin/nomad logmon       321516 700643
 6879 nomad           /usr/bin/nomad logmon       321296 700643
 6895 nomad           /usr/bin/nomad logmon       1399932 700643
 6925 nomad           /usr/bin/nomad logmon       56132  700643
 6931 nomad           /usr/bin/nomad logmon       75528  700643
 6955 nomad           /usr/bin/nomad logmon       58340  700643
 7045 nomad           /usr/bin/nomad logmon       250960 700643
 7051 nomad           /usr/bin/nomad logmon       47116  700643
 7073 nomad           /usr/bin/nomad logmon       1381968 700643
 7074 nomad           /usr/bin/nomad logmon       60984  700643
 7095 nomad           /usr/bin/nomad logmon       57224  700643
 7437 kworker/1:1H    [kworker/1:1H]                  0  700641
 7546 kworker/4:0     [kworker/4:0]                   0   10171
 9119 nomad           /usr/bin/nomad executor {"L 19492    1002
 9131 php             /usr/bin/php /var/www/worke 86944    1002
 9412 nomad           /usr/bin/nomad executor {"L 18588     995
 9422 php             /usr/bin/php /var/www/worke 88152     995
11994 kworker/3:1H    [kworker/3:1H]                  0  700565
12987 kworker/7:2     [kworker/7:2]                   0    2741
13098 nomad           /usr/bin/nomad executor {"L 18188     931
13109 php             /usr/bin/php /var/www/worke 69948     931
13111 nomad           /usr/bin/nomad executor {"L 18956     929
13122 php             /usr/bin/php /var/www/worke 69948     929
14200 nomad           /usr/bin/nomad executor {"L 18988     908
14211 php             /usr/bin/php /var/www/worke 75164     908
14256 nomad           /usr/bin/nomad executor {"L 18760     907
14267 php             /usr/bin/php /var/www/worke 73424     907
14380 nomad           /usr/bin/nomad executor {"L 18832     906
14391 php             /usr/bin/php /var/www/worke 73352     906
14392 nomad           /usr/bin/nomad executor {"L 18612     906
14403 php             /usr/bin/php /var/www/worke 74908     906
14494 nomad           /usr/bin/nomad executor {"L 18672     903
14505 php             /usr/bin/php /var/www/worke 74520     903
14516 nomad           /usr/bin/nomad executor {"L 18124     902
14527 php             /usr/bin/php /var/www/worke 74412     902
14543 nomad           /usr/bin/nomad executor {"L 17880     902
14553 php             /usr/bin/php /var/www/worke 74856     902
14580 nomad           /usr/bin/nomad executor {"L 19328     902
14590 php             /usr/bin/php /var/www/worke 146460    902
14760 kworker/3:1     [kworker/3:1]                   0   59815
15117 nomad           /usr/bin/nomad executor {"L 18272     894
15128 php             /usr/bin/php /var/www/worke 73684     894
15130 nomad           /usr/bin/nomad executor {"L 19116     893
15141 php             /usr/bin/php /var/www/worke 74636     893
15696 nomad           /usr/bin/nomad executor {"L 19588     882
15706 php             /usr/bin/php -dmemory_limit 153804    882
15708 nomad           /usr/bin/nomad executor {"L 18612     882
15718 php             /usr/bin/php -dmemory_limit 141988    882
15851 kworker/11:1H   [kworker/11:1H]                 0  700494
19663 kworker/8:0     [kworker/8:0]                   0    2614
20768 nomad           /usr/bin/nomad executor {"L 18108     789
20779 php             /usr/bin/php /var/www/worke 68896     788
23702 nomad           /usr/bin/nomad executor {"L 18592     729
23713 php             /usr/bin/php /var/www/worke 71808     729
24006 kworker/1:2     [kworker/1:2]                   0     723
24562 nomad           /usr/bin/nomad executor {"L 18972     716
24572 php             /usr/bin/php /var/www/worke 70192     716
25410 nomad           /usr/bin/nomad executor {"L 17784     698
25420 php             /usr/bin/php /var/www/worke 94872     698
27468 kworker/11:1    [kworker/11:1]                  0    7953
27872 kworker/9:1H    [kworker/9:1H]                  0  700273
29344 nomad           /usr/bin/nomad executor {"L 18336     629
29356 php             /usr/bin/php /var/www/worke 82528     629
31148 nomad           /usr/bin/nomad executor {"L 18676     597
31157 php             /usr/bin/php -dmemory_limit 75648     597
32569 filebeat        /usr/share/filebeat/bin/fil 22760  700193
32674 python          /usr/bin/python /usr/share/ 12876  700192
33015 kworker/u192:1  [kworker/u192:1]                0     564
33272 mcollectived    /usr/bin/ruby /usr/sbin/mco 44880  700184
34642 nomad           /usr/bin/nomad executor {"L 18224     538
34652 php             /usr/bin/php /var/www/worke 71508     538
35622 nomad           /usr/bin/nomad executor {"L 18216     522
35633 php             /usr/bin/php -dmemory_limit 107460    522
35986 kworker/5:2     [kworker/5:2]                   0     514
35999 kworker/2:2     [kworker/2:2]                   0     513
37008 nomad           /usr/bin/nomad executor {"L 18772     494
37018 php             /usr/bin/php /var/www/worke 114116    494
39347 kworker/7:0     [kworker/7:0]                   0     453
42596 kworker/11:0    [kworker/11:0]                  0     394
42707 kworker/1:1     [kworker/1:1]                   0     392
44527 kworker/9:1     [kworker/9:1]                   0    2133
46559 nomad           /usr/bin/nomad executor {"L 19412    2095
46581 php             /usr/bin/php /var/www/worke 111400   2095
46920 nomad           /usr/bin/nomad executor {"L 18276     313
46931 php             /usr/bin/php /var/www/worke 130436    313
47307 kworker/10:0    [kworker/10:0]                  0    2081
48697 haproxy         /usr/sbin/haproxy -sf 3464  17848   42357
48996 nomad           /usr/bin/nomad executor {"L 19584    2048
49006 php             /usr/bin/php /var/www/worke 99548    2048
49343 nomad           /usr/bin/nomad executor {"L 18296     269
49353 php             /usr/bin/php /var/www/worke 68636     269
49442 kworker/8:1     [kworker/8:1]                   0     266
49503 nomad           /usr/bin/nomad executor {"L 19036     266
49518 php             /usr/bin/php /var/www/worke 68636     266
49573 nomad           /usr/bin/nomad executor {"L 17940     265
49584 php             /usr/bin/php /var/www/worke 68636     265
49845 nomad           /usr/bin/nomad executor {"L 18140     260
49856 php             /usr/bin/php /var/www/worke 68616     260
52954 nomad           /usr/bin/nomad executor {"L 18788     201
52965 php             /usr/bin/php -dmemory_limit 122648    201
53723 nomad           /usr/bin/nomad executor {"L 18036     187
53734 php             /usr/bin/php /var/www/worke 86884     187
53735 nomad           /usr/bin/nomad executor {"L 17836     187
53743 nomad           /usr/bin/nomad executor {"L 18544     187
53752 php             /usr/bin/php /var/www/worke 87068     187
53757 php             /usr/bin/php /var/www/worke 86832     187
53759 nomad           /usr/bin/nomad executor {"L 17808     186
53769 php             /usr/bin/php /var/www/worke 87200     186
53940 nomad           /usr/bin/nomad executor {"L 17948     184
53951 php             /usr/bin/php /var/www/worke 87256     184
53958 nomad           /usr/bin/nomad executor {"L 18132     183
53968 php             /usr/bin/php /var/www/worke 86516     183
54871 nomad           /usr/bin/nomad executor {"L 18360     165
54882 php             /usr/bin/php /var/www/worke 87096     165
55156 nomad           /usr/bin/nomad executor {"L 17656     159
55166 php             /usr/bin/php /var/www/worke 110572    159
55418 nomad           /usr/bin/nomad executor {"L 18672     156
55428 php             /usr/bin/php /var/www/worke 110208    156
55430 nomad           /usr/bin/nomad executor {"L 18924     156
55440 php             /usr/bin/php /var/www/worke 107664    156
55582 kworker/10:1    [kworker/10:1]                  0     153
56031 nomad           /usr/bin/nomad executor {"L 17920     144
56042 php             /usr/bin/php /var/www/worke 88728     144
56382 kworker/3:0     [kworker/3:0]                   0    7413
57053 nomad           /usr/bin/nomad executor {"L 18364     123
57064 php             /usr/bin/php /var/www/worke 68628     123
57134 kworker/6:1     [kworker/6:1]                   0    1894
57154 kworker/0:1     [kworker/0:1]                   0   29732
57232 nomad           /usr/bin/nomad executor {"L 18888     122
57242 php             /usr/bin/php /var/www/worke 68632     122
57331 nomad           /usr/bin/nomad executor {"L 18160     120
57341 php             /usr/bin/php /var/www/worke 68632     120
57823 nomad           /usr/bin/nomad executor {"L 18276     112
57833 php             /usr/bin/php /var/www/worke 68628     112
57918 nomad           /usr/bin/nomad executor {"L 18364     109
57928 php             /usr/bin/php /var/www/worke 68716     109
58104 nomad           /usr/bin/nomad executor {"L 17604     107
58114 php             /usr/bin/php /var/www/worke 68632     107
58117 nomad           /usr/bin/nomad executor {"L 17604     106
58127 php             /usr/bin/php /var/www/worke 68628     106
58492 nomad           /usr/bin/nomad executor {"L 18096      99
58503 php             /usr/bin/php /var/www/worke 68632      99
58678 nomad           /usr/bin/nomad executor {"L 19344      96
58688 php             /usr/bin/php /var/www/worke 68628      96
58799 nomad           /usr/bin/nomad executor {"L 18508      93
58809 php             /usr/bin/php /var/www/worke 72292      93
59295 kworker/3:2     [kworker/3:2]                   0      86
59642 nomad           /usr/bin/nomad executor {"L 18432      77
59653 php             /usr/bin/php /var/www/worke 68632      77
59902 nomad           /usr/bin/nomad executor {"L 17552      75
59913 php             /usr/bin/php /var/www/worke 68632      75
59918 nomad           /usr/bin/nomad executor {"L 18048      74
59928 php             /usr/bin/php /var/www/worke 68632      74
60027 nomad           /usr/bin/nomad executor {"L 17900      71
60038 php             /usr/bin/php /var/www/worke 68632      71
60300 nomad           /usr/bin/nomad executor {"L 18684      66
60311 php             /usr/bin/php /var/www/worke 73156      66
60485 nomad           /usr/bin/nomad executor {"L 17772      63
60496 php             /usr/bin/php -dmemory_limit 68620      63
62019 kworker/6:2     [kworker/6:2]                   0   24095
62082 crond           /usr/sbin/CROND -n           2564      33
62083 sh              /bin/sh -c /opt/pupstream/p  1196      33
62085 puppet_cron.sh  /bin/sh /opt/pupstream/pupp  1420      33
62092 sleep           sleep 127                     356      33
62337 nomad           /usr/bin/nomad executor {"L 16392      28
62351 php             /usr/bin/php /var/www/worke 72300      28
62889 sshd            sshd: ansible-deploy [priv]  5696      17
62891 sshd            sshd: ansible-deploy@pts/0   2616      16
62892 bash            -bash                        3024      16
63033 nomad           /usr/bin/nomad executor {"L 16768      15
63045 php             /usr/bin/php -dmemory_limit 69160      15
63097 sudo            sudo su                      4716      14
63098 su              su                           2360      14
63099 bash            bash                         2964      14
63630 nomad           /usr/bin/nomad executor {"L 15592       4
63640 php             /usr/bin/php /var/www/worke 68628       4
63730 ps              ps -eo pid,comm,args,rss,et  1504       0
63801 nomad           /usr/bin/nomad executor {"L 18592    1767
63811 php             /usr/bin/php /var/www/worke 70188    1767
67436 nomad           /usr/bin/nomad logmon       251576 617599
67452 nomad           /usr/bin/nomad logmon       47124  617599
67453 nomad           /usr/bin/nomad logmon       48092  617599
67479 nomad           /usr/bin/nomad logmon       66592  617599
67485 nomad           /usr/bin/nomad logmon       58244  617599
67489 nomad           /usr/bin/nomad logmon       61000  617599
67514 nomad           /usr/bin/nomad logmon       275304 617599
67535 nomad           /usr/bin/nomad logmon       54628  617599
67563 nomad           /usr/bin/nomad logmon       51568  617598
67599 nomad           /usr/bin/nomad logmon       52988  617598
67620 nomad           /usr/bin/nomad logmon       48380  617598
67628 nomad           /usr/bin/nomad logmon       44244  617598
67649 nomad           /usr/bin/nomad logmon       69452  617598
67659 nomad           /usr/bin/nomad logmon       62412  617598
67703 nomad           /usr/bin/nomad logmon       50692  617598
67706 kworker/0:14    [kworker/0:14]                  0   36635
67720 nomad           /usr/bin/nomad logmon       68512  617598
67721 nomad           /usr/bin/nomad logmon       71844  617598
67742 nomad           /usr/bin/nomad logmon       74704  617598
67888 nomad           /usr/bin/nomad logmon       45880  617598
67903 nomad           /usr/bin/nomad logmon       52728  617598
67914 nomad           /usr/bin/nomad logmon       59180  617598
67936 nomad           /usr/bin/nomad logmon       48196  617598
67941 nomad           /usr/bin/nomad logmon       51140  617598
67987 nomad           /usr/bin/nomad logmon       45484  617598
67992 nomad           /usr/bin/nomad logmon       57856  617598
68007 nomad           /usr/bin/nomad logmon       46296  617598
68383 nomad           /usr/bin/nomad executor {"L 18524    1678
68394 php             /usr/bin/php /var/www/worke 95760    1678
70201 nomad           /usr/bin/nomad executor {"L 19688    1644
70211 php             /usr/bin/php /var/www/worke 71232    1644
71939 kworker/10:8    [kworker/10:8]                  0   51263
72485 pickup          pickup -l -t unix -u         3816    3454
72619 kworker/u192:2  [kworker/u192:2]                0    1603
73652 kworker/4:1     [kworker/4:1]                   0    1590
74925 kworker/9:2     [kworker/9:2]                   0   14493
79622 nomad           /usr/bin/nomad executor {"L 19084    1481
79636 php             /usr/bin/php /var/www/worke 108536   1481
79950 kworker/5:0     [kworker/5:0]                   0   54877
80027 nomad           /usr/bin/nomad executor {"L 18884    1475
80037 php             /usr/bin/php /var/www/worke 94060    1475
80136 kworker/8:2     [kworker/8:2]                   0    1474
84300 nomad           /usr/bin/nomad executor {"L 19108    1399
84311 php             /usr/bin/php /var/www/worke 76528    1399
84312 nomad           /usr/bin/nomad executor {"L 19280    1399
84324 php             /usr/bin/php /var/www/worke 68896    1399
86573 nomad           /usr/bin/nomad executor {"L 18872    1357
86583 php             /usr/bin/php /var/www/worke 115768   1357
86681 kworker/1:0     [kworker/1:0]                   0    1355
88503 kworker/0:0     [kworker/0:0]                   0   36255
91194 nomad           /usr/bin/nomad executor {"L 18940    1271
91205 php             /usr/bin/php /var/www/worke 90428    1271
91404 nomad           /usr/bin/nomad executor {"L 18188    1268
91414 php             /usr/bin/php /var/www/worke 110592   1267
94313 nomad           /usr/bin/nomad executor {"L 19696    1213
94323 php             /usr/bin/php /var/www/worke 103596   1213
94872 kworker/2:1     [kworker/2:1]                   0   32608
97865 nomad           /usr/bin/nomad executor {"L 18344    1143
97875 php             /usr/bin/php /var/www/worke 134060   1143

@tgross
Copy link
Member

tgross commented Jan 21, 2021

@anastazya I did some quick rummaging thru those numbers and it looks like there's no relationship between etime and memory (if anything, the executors seem to free up memory a tiny bit the older they are). But the logmon processes are grabbing a lot more RSS than I might expect, and there are two outlier logmon processes that are each eating up 1+GB of memory each. The primary way I'd expect to see this is if the logmon processes have queued up a lot of messages that aren't yet committed to disk, although I'm surprised that wouldn't result in dropped messages first... I need to dig into that code. In any it might be worth looking at the host's disk IO utilization.

With that clue, I'll see if I can set up a reproduction that'll cause logmon's memory to expand unexpectedly.

@tgross tgross self-assigned this Jan 22, 2021
@tgross
Copy link
Member

tgross commented Jan 22, 2021

The primary way I'd expect to see this is if the logmon processes have queued up a lot of messages that aren't yet committed to disk, although I'm surprised that wouldn't result in dropped messages first... I need to dig into that code.

Well that was totally off-base... we have a fifo with blocking reads with a small buffer, so that's just not something that would happen without also blocking the log writer of your application.

I've been trying to get a reproduction here by having an raw_exec task that sends "excessive" logs (~100MB/s worth). At that volume, I get the RSS of the logmon to slowly climb up to ~35MB and then it levels out. I've tried varying the line-length of the log to see if there was a bug in the way we try to span newlines for file rotation and that hasn't panned out. I've tried varying the log file size via logs.max_file_size to see more rapid log rotation can detect a leak we'd otherwise miss, but so far I've found nothing there either. I thought I had a promising lead for file handles leaking, but that turned out to be references to the same file handle for log rotation.

The total RSS does seem proportion to log volume though: if I tune the throughput of the logs by sending small web request logs instead of literally running cat shakespere_macbeth.txt in a loop, the RSS is considerably smaller. Given that the RSS of go applications is effectively a "high water mark" because the runtime almost never gives memory back to the OS, that suggests the logmon is just creating the various buffers it needs to do Write and rotation, but that there's something unusual about this log workload that I haven't quite figured out yet.

With more information about your log load, I might be able to narrow things down. Do you have a logs configuration? Are the logs unusually high volume? Do they have long lines (maybe they're wide structured events)?

@anastazya
Copy link
Author

The primary way I'd expect to see this is if the logmon processes have queued up a lot of messages that aren't yet committed to disk, although I'm surprised that wouldn't result in dropped messages first... I need to dig into that code.

Well that was totally off-base... we have a fifo with blocking reads with a small buffer, so that's just not something that would happen without also blocking the log writer of your application.

I've been trying to get a reproduction here by having an raw_exec task that sends "excessive" logs (~100MB/s worth). At that volume, I get the RSS of the logmon to slowly climb up to ~35MB and then it levels out. I've tried varying the line-length of the log to see if there was a bug in the way we try to span newlines for file rotation and that hasn't panned out. I've tried varying the log file size via logs.max_file_size to see more rapid log rotation can detect a leak we'd otherwise miss, but so far I've found nothing there either. I thought I had a promising lead for file handles leaking, but that turned out to be references to the same file handle for log rotation.

The total RSS does seem proportion to log volume though: if I tune the throughput of the logs by sending small web request logs instead of literally running cat shakespere_macbeth.txt in a loop, the RSS is considerably smaller. Given that the RSS of go applications is effectively a "high water mark" because the runtime almost never gives memory back to the OS, that suggests the logmon is just creating the various buffers it needs to do Write and rotation, but that there's something unusual about this log workload that I haven't quite figured out yet.

With more information about your log load, I might be able to narrow things down. Do you have a logs
? Are the logs unusually high volume? Do they have long lines (maybe they're wide structured events)?

The logs config across all jobs is like this :

  logs {
    max_files = 1
    max_file_size = 1
  } 

I will get back with the log volume, needs investigation.

@tgross
Copy link
Member

tgross commented Jan 27, 2021

That's very interesting... I'd expect that to result in very rapid log rotation if the volume of logs is reasonably high, which might be triggering buffering while the rotation is happening. That gives me something else to look at, at least, but the log volume would be interesting to see for sure.

May I ask why such a low value for these?

@tgross tgross changed the title Nomad eating up ALL memory after random number of days operating normally. high memory usage in logmon Jan 27, 2021
@anastazya
Copy link
Author

That's very interesting... I'd expect that to result in very rapid log rotation if the volume of logs is reasonably high, which might be triggering buffering while the rotation is happening. That gives me something else to look at, at least, but the log volume would be interesting to see for sure.

May I ask why such a low value for these?

We thought in the beginning to log only bear minimum and minimise the storage impact and IOPS usage. But if you say this has a great impact on log rotating, we will change those values. Any recommendation ?

@tgross
Copy link
Member

tgross commented Jan 28, 2021

But if you say this has a great impact on log rotating, we will change those values. Any recommendation ?

That entirely depends on the log volume. If you're only sending a trickle of logs it doesn't matter. But you're sending >1MB of logs a second then you're rotating the logs continuously. You almost certainly want the logs to be rotating less often than it the time it takes to actually rotate them. We don't compress the logs, so it should be quick unless the disk is getting hammered (which of course it might be if you're writing that many logs!). It's definitely worth comparing the log volume to the overall IOPS usage.

@anastazya
Copy link
Author

But if you say this has a great impact on log rotating, we will change those values. Any recommendation ?

That entirely depends on the log volume. If you're only sending a trickle of logs it doesn't matter. But you're sending >1MB of logs a second then you're rotating the logs continuously. You almost certainly want the logs to be rotating less often than it the time it takes to actually rotate them. We don't compress the logs, so it should be quick unless the disk is getting hammered (which of course it might be if you're writing that many logs!). It's definitely worth comparing the log volume to the overall IOPS usage.

I will be doing in depth tests next week and write here if i find something LOG related. As far as i see the logging is not that aggressive.

@Nickleby9
Copy link

Nickleby9 commented Feb 1, 2021

I've encountered a scenario where Nomad consumed a lot of memory (over 5GB), I eventually found out it happened because of tasks that keep restarting forever due to insufficient resources or bad arguments.
Maybe this will help you.

Edit:
Just to elaborate, the task restarting issue helped to decrease the memory usage to 1GB in some cases, in others it released some memory but was still high.
I can't say for sure that the trigger to that issue is the fault task but I didn't encounter the high memory usage where all of my tasks were valid.

@anastazya
Copy link
Author

I've encountered a scenario where Nomad consumed a lot of memory (over 5GB), I eventually found out it happened because of tasks that keep restarting forever due to insufficient resources or bad arguments.
Maybe this will help you.

Edit:
Just to elaborate, the task restarting issue helped to decrease the memory usage to 1GB in some cases, in others it released some memory but was still high.
I can't say for sure that the trigger to that issue is the fault task but I didn't encounter the high memory usage where all of my tasks were valid.

I can assure you that all the tasks run well and don't restart with errors.

@Nickleby9
Copy link

I looked at it a bit more, the issue I described is referring only to when Nomad agent uses high memory.
I also have the same issue with "nomad logmon" without any errors on the Nomad monitor logs.
I'm using Windows server, if you need any additional details that will help you investigate let me know.
Nomad version is 0.12.7

@tgross
Copy link
Member

tgross commented Feb 1, 2021

@Nickleby9 let's open that as a new issue then, so as not to confuse this issue.

@anastazya
Copy link
Author

Screenshot 2021-02-04 at 19 59 14

After adding this modification :

logs {​​​​​​​​
max_files = 10
max_file_size = 10
}​​​​​​​​

Today i found half the Nomad nodes like the picture.
Besides restarting the nomad agent, there was nothing i could do.
After nomad agent restart, it started working normal again.

@tgross tgross added this to In Progress in Nomad - Community Issues Triage Feb 12, 2021
@tgross
Copy link
Member

tgross commented Feb 16, 2021

I still need to come back to review this, but just saw that go1.16 dropped with the following improvement (ref https://golang.org/doc/go1.16#runtime):

On Linux, the runtime now defaults to releasing memory to the operating system promptly (using MADV_DONTNEED), rather than lazily when the operating system is under memory pressure (using MADV_FREE). This means process-level memory statistics like RSS will more accurately reflect the amount of physical memory being used by Go processes. Systems that are currently using GODEBUG=madvdontneed=1 to improve memory monitoring behavior no longer need to set this environment variable.

I suspect this is going to help us out in Nomad quite a bit because logmon is in the allocation resource boundary. I'm not sure if we'll ship that in the upcoming Nomad 1.0.4, but certainly for Nomad 1.1.

@bubejur
Copy link

bubejur commented May 20, 2021

@tgross sorry, any updates? I can see this problem still not solved in 1.1 version of Nomad?

@tgross
Copy link
Member

tgross commented May 20, 2021

@tgross sorry, any updates? I can see this problem still not solved in 1.1 version of Nomad?

Hi @bubejur we just shipped Nomad 1.1 a couple days ago, so I'm sure @anastazya hasn't had a chance to try it out in their environment to verify whether MADV_DONTNEED helped us out here. If you were able to reliably reproduce the problem yourself on an earlier version of Nomad and can test it out, we'd love to hear about it.

@bubejur
Copy link

bubejur commented May 20, 2021

Okay. As i can see it:
In previous versions (I tried all from 1.0.0 - till 1.0.6) there were problems with:
[ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=2eba8886-94a2-0afc-8e2b-24e6af270728 driver=raw_exec task_name=Group-1-worker-0 error="rpc error: code = Unavailable desc = transport is closing"
and some time ago (about 8h) logmon process takes >5% memory usage. And after 1 day I really need to restart nomad process to free up node memory.
I hoped a lot about fix it at 1.1.0, but still have this issue...

@tgross
Copy link
Member

tgross commented May 20, 2021

In previous versions (I tried all from 1.0.0 - till 1.0.6) there were problems with:
[ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=2eba8886-94a2-0afc-8e2b-24e6af270728 driver=raw_exec task_name=Group-1-worker-0 error="rpc error: code = Unavailable desc = transport is closing"

Ok, please open a new issue if you're having problems with log monitoring of your tasks rather than piling on this one.

@tgross tgross removed their assignment Jul 7, 2021
@tgross tgross moved this from In Progress to Needs Roadmapping in Nomad - Community Issues Triage Jul 7, 2021
@bubejur
Copy link

bubejur commented Jul 7, 2021

pmap -xp nomad agent process

9218:   /usr/local/bin/nomad agent -config=/etc/nomad
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000   61404   25724       0 r-x-- /opt/puppet-archive/nomad-1.1.2/nomad
00000000041f6000       4       4       4 r---- /opt/puppet-archive/nomad-1.1.2/nomad
00000000041f7000    1596     516     296 rw--- /opt/puppet-archive/nomad-1.1.2/nomad
0000000004386000     292     148     148 rw---   [ anon ]
0000000004fd7000     132       4       4 rw---   [ anon ]
000000c000000000 1900544 1869948 1869948 rw---   [ anon ]
00007f3f2cd96000   46148   44584   44584 rw---   [ anon ]
00007f3f2faae000    5448    5432    5432 rw---   [ anon ]
00007f3f30000000     132       8       8 rw---   [ anon ]
00007f3f30021000   65404       0       0 -----   [ anon ]
00007f3f34000000     132       4       4 rw---   [ anon ]
00007f3f34021000   65404       0       0 -----   [ anon ]
00007f3f38000000     132       4       4 rw---   [ anon ]
00007f3f38021000   65404       0       0 -----   [ anon ]
00007f3f3c000000     132       4       4 rw---   [ anon ]
00007f3f3c021000   65404       0       0 -----   [ anon ]
00007f3f40000000     132       4       4 rw---   [ anon ]
00007f3f40021000   65404       0       0 -----   [ anon ]
00007f3f44000000     132       4       4 rw---   [ anon ]
00007f3f44021000   65404       0       0 -----   [ anon ]
00007f3f48000000     132       8       8 rw---   [ anon ]
00007f3f48021000   65404       0       0 -----   [ anon ]
00007f3f4c000000     132       8       8 rw---   [ anon ]
00007f3f4c021000   65404       0       0 -----   [ anon ]
00007f3f5000b000    9868    9812    9812 rw---   [ anon ]
00007f3f509b8000    6408    6384    6384 rw---   [ anon ]
00007f3f50ffa000       4       0       0 -----   [ anon ]
00007f3f50ffb000    8192       8       8 rw---   [ anon ]
00007f3f517fb000       4       0       0 -----   [ anon ]
00007f3f517fc000    8192       8       8 rw---   [ anon ]
00007f3f51ffc000       4       0       0 -----   [ anon ]
00007f3f51ffd000    8192       8       8 rw---   [ anon ]
00007f3f527fd000       4       0       0 -----   [ anon ]
00007f3f527fe000    8192       8       8 rw---   [ anon ]
00007f3f52ffe000       4       0       0 -----   [ anon ]
00007f3f52fff000    8192      12      12 rw---   [ anon ]
00007f3f537ff000       4       0       0 -----   [ anon ]
00007f3f53800000    8192       8       8 rw---   [ anon ]
00007f3f54000000     132       4       4 rw---   [ anon ]
00007f3f54021000   65404       0       0 -----   [ anon ]
00007f3f58000000     132       8       8 rw---   [ anon ]
00007f3f58021000   65404       0       0 -----   [ anon ]
00007f3f5c000000     132       8       8 rw---   [ anon ]
00007f3f5c021000   65404       0       0 -----   [ anon ]
00007f3f60000000     132       8       8 rw---   [ anon ]
00007f3f60021000   65404       0       0 -----   [ anon ]
00007f3f64000000     132       8       8 rw---   [ anon ]
00007f3f64021000   65404       0       0 -----   [ anon ]
00007f3f68000000     132       8       8 rw---   [ anon ]
00007f3f68021000   65404       0       0 -----   [ anon ]
00007f3f6c005000    3012    2996    2996 rw---   [ anon ]
00007f3f6c2f7000    5128    5120    5120 rw---   [ anon ]
00007f3f6c7f9000       4       0       0 -----   [ anon ]
00007f3f6c7fa000    8192       8       8 rw---   [ anon ]
00007f3f6cffa000       4       0       0 -----   [ anon ]
00007f3f6cffb000    8192       8       8 rw---   [ anon ]
00007f3f6d7fb000       4       0       0 -----   [ anon ]
00007f3f6d7fc000    8192       8       8 rw---   [ anon ]
00007f3f6dffc000       4       0       0 -----   [ anon ]
00007f3f6dffd000    8192       8       8 rw---   [ anon ]
00007f3f6e7fd000       4       0       0 -----   [ anon ]
00007f3f6e7fe000    8192       8       8 rw---   [ anon ]
00007f3f6effe000       4       0       0 -----   [ anon ]
00007f3f6efff000    8192       8       8 rw---   [ anon ]
00007f3f6f7ff000       4       0       0 -----   [ anon ]
00007f3f6f800000    8192       8       8 rw---   [ anon ]
00007f3f70000000     132       4       4 rw---   [ anon ]
00007f3f70021000   65404       0       0 -----   [ anon ]
00007f3f74000000     132       8       8 rw---   [ anon ]
00007f3f74021000   65404       0       0 -----   [ anon ]
00007f3f78000000     132       8       8 rw---   [ anon ]
00007f3f78021000   65404       0       0 -----   [ anon ]
00007f3f7c000000     132       8       8 rw---   [ anon ]
00007f3f7c021000   65404       0       0 -----   [ anon ]
00007f3f80000000     132       8       8 rw---   [ anon ]
00007f3f80021000   65404       0       0 -----   [ anon ]
00007f3f84000000     132       8       8 rw---   [ anon ]
00007f3f84021000   65404       0       0 -----   [ anon ]
00007f3f88000000     132       8       8 rw---   [ anon ]
00007f3f88021000   65404       0       0 -----   [ anon ]
00007f3f8c000000     132      16      16 rw---   [ anon ]
00007f3f8c021000   65404       0       0 -----   [ anon ]
00007f3f9000d000    1280    1260    1260 rw---   [ anon ]
00007f3f90153000    3908    3876    3876 rw---   [ anon ]
00007f3f90528000    2884    2868    2868 rw---   [ anon ]
00007f3f907f9000       4       0       0 -----   [ anon ]
00007f3f907fa000    8192       8       8 rw---   [ anon ]
00007f3f90ffa000       4       0       0 -----   [ anon ]
00007f3f90ffb000    8192       8       8 rw---   [ anon ]
00007f3f917fb000       4       0       0 -----   [ anon ]
00007f3f917fc000    8192       8       8 rw---   [ anon ]
00007f3f91ffc000       4       0       0 -----   [ anon ]
00007f3f91ffd000    8192       8       8 rw---   [ anon ]
00007f3f927fd000       4       0       0 -----   [ anon ]
00007f3f927fe000    8192       8       8 rw---   [ anon ]
00007f3f92ffe000       4       0       0 -----   [ anon ]
00007f3f92fff000    8192       8       8 rw---   [ anon ]
00007f3f937ff000       4       0       0 -----   [ anon ]
00007f3f93800000    8192      16      16 rw---   [ anon ]
00007f3f94000000     132       8       8 rw---   [ anon ]
00007f3f94021000   65404       0       0 -----   [ anon ]
00007f3f98000000     132       8       8 rw---   [ anon ]
00007f3f98021000   65404       0       0 -----   [ anon ]
00007f3f9c000000     132      12      12 rw---   [ anon ]
00007f3f9c021000   65404       0       0 -----   [ anon ]
00007f3fa0000000     132       8       8 rw---   [ anon ]
00007f3fa0021000   65404       0       0 -----   [ anon ]
00007f3fa4000000     132       8       8 rw---   [ anon ]
00007f3fa4021000   65404       0       0 -----   [ anon ]
00007f3fa800e000     256     256     256 rw---   [ anon ]
00007f3fa8056000    4164    4160    4160 rw---   [ anon ]
00007f3fa8467000       4       0       0 -----   [ anon ]
00007f3fa8468000    8192       8       8 rw---   [ anon ]
00007f3fa8c68000      48      24       0 r-x-- /usr/lib64/libnss_files-2.17.so
00007f3fa8c74000    2044       0       0 ----- /usr/lib64/libnss_files-2.17.so
00007f3fa8e73000       4       4       4 r---- /usr/lib64/libnss_files-2.17.so
00007f3fa8e74000       4       4       4 rw--- /usr/lib64/libnss_files-2.17.so
00007f3fa8e75000     280     244     244 rw---   [ anon ]
00007f3fa8ebb000    8192    2612       0 r--s- /data/nomad/client/state.db
00007f3fa96bb000       4       0       0 -----   [ anon ]
00007f3fa96bc000    8448     264     264 rw---   [ anon ]
00007f3fa9efc000       4       0       0 -----   [ anon ]
00007f3fa9efd000    9216    1032    1032 rw---   [ anon ]
00007f3faa7fd000       4       0       0 -----   [ anon ]
00007f3faa7fe000    8192       8       8 rw---   [ anon ]
00007f3faaffe000       4       0       0 -----   [ anon ]
00007f3faafff000    8192       8       8 rw---   [ anon ]
00007f3fab7ff000       4       0       0 -----   [ anon ]
00007f3fab800000    8192       8       8 rw---   [ anon ]
00007f3fac000000     132       4       4 rw---   [ anon ]
00007f3fac021000   65404       0       0 -----   [ anon ]
00007f3fb0002000    2304    2304    2304 rw---   [ anon ]
00007f3fb0242000       4       0       0 -----   [ anon ]
00007f3fb0243000    8192       8       8 rw---   [ anon ]
00007f3fb0a43000       4       0       0 -----   [ anon ]
00007f3fb0a44000    8192       8       8 rw---   [ anon ]
00007f3fb1244000       4       0       0 -----   [ anon ]
00007f3fb1245000   44100    2172    2172 rw---   [ anon ]
00007f3fb3d56000  263680       0       0 -----   [ anon ]
00007f3fc3ed6000       4       4       4 rw---   [ anon ]
00007f3fc3ed7000  293564       0       0 -----   [ anon ]
00007f3fd5d86000       4       4       4 rw---   [ anon ]
00007f3fd5d87000   36692       0       0 -----   [ anon ]
00007f3fd815c000       4       4       4 rw---   [ anon ]
00007f3fd815d000    4068       0       0 -----   [ anon ]
00007f3fd8556000    1808     416       0 r-x-- /usr/lib64/libc-2.17.so
00007f3fd871a000    2044       0       0 ----- /usr/lib64/libc-2.17.so
00007f3fd8919000      16      16      16 r---- /usr/lib64/libc-2.17.so
00007f3fd891d000       8       8       8 rw--- /usr/lib64/libc-2.17.so
00007f3fd891f000      20      20      20 rw---   [ anon ]
00007f3fd8924000       8       8       0 r-x-- /usr/lib64/libdl-2.17.so
00007f3fd8926000    2048       0       0 ----- /usr/lib64/libdl-2.17.so
00007f3fd8b26000       4       4       4 r---- /usr/lib64/libdl-2.17.so
00007f3fd8b27000       4       4       4 rw--- /usr/lib64/libdl-2.17.so
00007f3fd8b28000      92      60       0 r-x-- /usr/lib64/libpthread-2.17.so
00007f3fd8b3f000    2044       0       0 ----- /usr/lib64/libpthread-2.17.so
00007f3fd8d3e000       4       4       4 r---- /usr/lib64/libpthread-2.17.so
00007f3fd8d3f000       4       4       4 rw--- /usr/lib64/libpthread-2.17.so
00007f3fd8d40000      16       4       4 rw---   [ anon ]
00007f3fd8d44000     136     116       0 r-x-- /usr/lib64/ld-2.17.so
00007f3fd8d68000     576     572     572 rw---   [ anon ]
00007f3fd8df8000     512       0       0 -----   [ anon ]
00007f3fd8e78000       4       4       4 rw---   [ anon ]
00007f3fd8e79000     508       0       0 -----   [ anon ]
00007f3fd8ef8000     400      68      68 rw---   [ anon ]
00007f3fd8f64000       4       4       4 rw---   [ anon ]
00007f3fd8f65000       4       4       4 r---- /usr/lib64/ld-2.17.so
00007f3fd8f66000       4       4       4 rw--- /usr/lib64/ld-2.17.so
00007f3fd8f67000       4       4       4 rw---   [ anon ]
00007ffdfa5f3000     132      16      16 rw---   [ stack ]
00007ffdfa789000       8       4       0 r-x--   [ anon ]
ffffffffff600000       4       0       0 r-x--   [ anon ]
---------------- ------- ------- ------- 
total kB         4783664 1993532 1964348

ps output take Wed Jul 7 12:50:52 UTC 2021

date
Wed Jul  7 12:50:52 UTC 2021
[root@cron01 ~]# ps -eo pid,comm,args,rss,vsz,etimes
  PID COMMAND         COMMAND                       RSS    VSZ ELAPSED
    1 systemd         /usr/lib/systemd/systemd --  6276 129260 17363171
    2 kthreadd        [kthreadd]                      0      0 17363171
    4 kworker/0:0H    [kworker/0:0H]                  0      0 17363171
    6 ksoftirqd/0     [ksoftirqd/0]                   0      0 17363171
    7 migration/0     [migration/0]                   0      0 17363171
    8 rcu_bh          [rcu_bh]                        0      0 17363171
    9 rcu_sched       [rcu_sched]                     0      0 17363171
   10 lru-add-drain   [lru-add-drain]                 0      0 17363171
   11 watchdog/0      [watchdog/0]                    0      0 17363171
   12 watchdog/1      [watchdog/1]                    0      0 17363171
   13 migration/1     [migration/1]                   0      0 17363171
   14 ksoftirqd/1     [ksoftirqd/1]                   0      0 17363171
   16 kworker/1:0H    [kworker/1:0H]                  0      0 17363171
   17 watchdog/2      [watchdog/2]                    0      0 17363171
   18 migration/2     [migration/2]                   0      0 17363171
   19 ksoftirqd/2     [ksoftirqd/2]                   0      0 17363171
   21 kworker/2:0H    [kworker/2:0H]                  0      0 17363171
   22 watchdog/3      [watchdog/3]                    0      0 17363171
   23 migration/3     [migration/3]                   0      0 17363171
   24 ksoftirqd/3     [ksoftirqd/3]                   0      0 17363171
   26 kworker/3:0H    [kworker/3:0H]                  0      0 17363171
   28 kdevtmpfs       [kdevtmpfs]                     0      0 17363171
   29 netns           [netns]                         0      0 17363171
   30 khungtaskd      [khungtaskd]                    0      0 17363171
   31 writeback       [writeback]                     0      0 17363171
   32 kintegrityd     [kintegrityd]                   0      0 17363171
   33 bioset          [bioset]                        0      0 17363171
   34 bioset          [bioset]                        0      0 17363171
   35 bioset          [bioset]                        0      0 17363171
   36 kblockd         [kblockd]                       0      0 17363171
   37 md              [md]                            0      0 17363171
   38 edac-poller     [edac-poller]                   0      0 17363171
   39 watchdogd       [watchdogd]                     0      0 17363171
   45 kswapd0         [kswapd0]                       0      0 17363171
   46 ksmd            [ksmd]                          0      0 17363171
   47 khugepaged      [khugepaged]                    0      0 17363171
   48 crypto          [crypto]                        0      0 17363171
   56 kthrotld        [kthrotld]                      0      0 17363171
   58 kmpath_rdacd    [kmpath_rdacd]                  0      0 17363171
   59 kaluad          [kaluad]                        0      0 17363171
   61 kpsmoused       [kpsmoused]                     0      0 17363171
   62 ipv6_addrconf   [ipv6_addrconf]                 0      0 17363171
   75 deferwq         [deferwq]                       0      0 17363171
  112 kauditd         [kauditd]                       0      0 17363171
  281 ata_sff         [ata_sff]                       0      0 17363171
  292 scsi_eh_0       [scsi_eh_0]                     0      0 17363171
  293 scsi_tmf_0      [scsi_tmf_0]                    0      0 17363171
  294 scsi_eh_1       [scsi_eh_1]                     0      0 17363171
  295 scsi_tmf_1      [scsi_tmf_1]                    0      0 17363171
  296 virtscsi-scan   [virtscsi-scan]                 0      0 17363171
  299 scsi_eh_2       [scsi_eh_2]                     0      0 17363171
  300 scsi_tmf_2      [scsi_tmf_2]                    0      0 17363171
  303 ttm_swap        [ttm_swap]                      0      0 17363171
  367 kworker/2:1H    [kworker/2:1H]                  0      0 17363170
  427 kdmflush        [kdmflush]                      0      0 17363170
  428 bioset          [bioset]                        0      0 17363170
  446 jbd2/dm-0-8     [jbd2/dm-0-8]                   0      0 17363170
  447 ext4-rsv-conver [ext4-rsv-conver]               0      0 17363170
  547 systemd-journal /usr/lib/systemd/systemd-jo 110216 169968 17363169
  566 rpciod          [rpciod]                        0      0 17363169
  568 xprtiod         [xprtiod]                       0      0 17363169
  569 lvmetad         /usr/sbin/lvmetad -f         3740 274980 17363169
  582 systemd-udevd   /usr/lib/systemd/systemd-ud  2236  48640 17363169
  604 kworker/0:1     [kworker/0:1]                   0      0     370
  606 hwrng           [hwrng]                         0      0 17363169
  659 nfit            [nfit]                          0      0 17363169
  679 jbd2/sda1-8     [jbd2/sda1-8]                   0      0 17363169
  680 ext4-rsv-conver [ext4-rsv-conver]               0      0 17363169
  682 kdmflush        [kdmflush]                      0      0 17363168
  683 bioset          [bioset]                        0      0 17363168
  688 jbd2/dm-1-8     [jbd2/dm-1-8]                   0      0 17363168
  689 ext4-rsv-conver [ext4-rsv-conver]               0      0 17363168
  692 kdmflush        [kdmflush]                      0      0 17363167
  693 bioset          [bioset]                        0      0 17363167
  695 kdmflush        [kdmflush]                      0      0 17363167
  696 bioset          [bioset]                        0      0 17363167
  710 kworker/0:1H    [kworker/0:1H]                  0      0 17363167
  716 jbd2/dm-2-8     [jbd2/dm-2-8]                   0      0 17363167
  717 ext4-rsv-conver [ext4-rsv-conver]               0      0 17363167
  719 jbd2/dm-3-8     [jbd2/dm-3-8]                   0      0 17363167
  720 ext4-rsv-conver [ext4-rsv-conver]               0      0 17363167
  748 auditd          /sbin/auditd                  752  55532 17363167
  770 dbus-daemon     /usr/bin/dbus-daemon --syst  1496  88108 17363167
  772 NetworkManager  /usr/sbin/NetworkManager --  2716 476436 17363167
  773 sssd            /usr/sbin/sssd -i --logger=  1260 268652 17363167
  777 polkitd         /usr/lib/polkit-1/polkitd -  9276 625200 17363167
  778 irqbalance      /usr/sbin/irqbalance --fore   576  21596 17363167
  780 qemu-ga         /usr/bin/qemu-ga --method=v  1400  44220 17363167
  796 gssproxy        /usr/sbin/gssproxy -D         584 269132 17363167
  802 python          /usr/bin/python /usr/share/ 46624 532144 17363167
  817 sssd_be         /usr/libexec/sssd/sssd_be -  8080 419828 17363167
  823 rpc.gssd        /usr/sbin/rpc.gssd            388  40340 17363167
  839 sssd_nss        /usr/libexec/sssd/sssd_nss   2900 277484 17363167
  840 sssd_sudo       /usr/libexec/sssd/sssd_sudo  2320 249444 17363167
  841 sssd_pam        /usr/libexec/sssd/sssd_pam   2476 255632 17363167
  842 sssd_ssh        /usr/libexec/sssd/sssd_ssh   2228 257128 17363167
  843 sssd_pac        /usr/libexec/sssd/sssd_pac   1568 293736 17363167
  873 systemd-logind  /usr/lib/systemd/systemd-lo  1248  37088 17363167
  886 crond           /usr/sbin/crond -n            948 126392 17363167
  889 atd             /usr/sbin/atd -f              208  25908 17363167
  896 agetty          /sbin/agetty --noclear tty1   132 110208 17363167
  899 crond           /usr/sbin/CROND -n           2864 219584     353
  900 crond           /usr/sbin/CROND -n           2864 219584     353
  901 crond           /usr/sbin/CROND -n           2864 219584     353
  902 crond           /usr/sbin/CROND -n           2864 219584     353
  908 php             /bin/php /data/ecogate/cli/ 23528 439100     353
  909 php             /bin/php /data/ecogate/cli/ 23520 439100     353
  910 php             /bin/php /data/ecogate/cli/ 24732 439100     353
  912 php             /bin/php /data/ecogate/cli/ 23524 439100     353
 1063 sshd            /usr/sbin/sshd -D            1272 112936 17363166
 1064 tuned           /usr/bin/python2 -Es /usr/s 13460 586428 17363166
 1066 node_exporter   /usr/local/bin/node_exporte  2780 121856 17363166
 1067 python          /usr/bin/python /usr/bin/go 133652 1103260 17363166
 1069 oddjobd         /usr/sbin/oddjobd -n -p /va   452  54832 17363166
 1195 ossec-execd     /var/ossec/bin/ossec-execd    952  61332 17363166
 1218 ossec-agentd    /var/ossec/bin/ossec-agentd  1252  48764 17363166
 1228 ossec-agentd    /var/ossec/bin/ossec-agentd   760  48668 17363166
 1251 ossec-logcollec /var/ossec/bin/ossec-logcol   940  44176 17363166
 1285 ossec-syscheckd /var/ossec/bin/ossec-sysche 14768  57368 17363166
 1547 trivial-rewrite trivial-rewrite -n rewrite   4384 107864    2033
 1736 kworker/1:1H    [kworker/1:1H]                  0      0 17363165
 1871 master          /usr/libexec/postfix/master  1300  96968 17363164
 1875 qmgr            qmgr -l -t unix -u           1628 108028 17363164
 2022 splunkd         splunkd -p 8089 start       148952 336140 17363163
 2026 splunkd         [splunkd pid=2022] splunkd  11292  84300 17363163
 2169 kworker/3:1H    [kworker/3:1H]                  0      0 17363160
 4694 sshd            sshd: ybubentsov [priv]      5784 189232     150
 4703 sshd            sshd: ybubentsov@pts/0       2512 189232     149
 4708 bash            -bash                        3220 127336     149
 4745 bash            -bash                        1892 127336     149
 4746 tee             tee -ai /home/ybubentsov/.b   668 108056     149
 4747 bash            -bash                        1832 127336     149
 4748 sudo            sudo su -                    4916 280180     149
 4750 su              su -                         2780 237644     149
 4751 bash            -bash                        3164 116620     149
 5033 kworker/1:2     [kworker/1:2]                   0      0    1852
 5530 nomad           /opt/puppet-archive/nomad-1 17008 1120284     91
 5540 php             /usr/bin/php /data/ecogate/ 24188 439100      91
 5541 nomad           /opt/puppet-archive/nomad-1 17020 1054748     90
 5550 php             /usr/bin/php /data/ecogate/ 24180 439100      90
 5559 nomad           /opt/puppet-archive/nomad-1 17204 1120540     90
 5567 php             /usr/bin/php /data/ecogate/ 24192 439100      90
 5688 nomad           /opt/puppet-archive/nomad-1 16996 1186076     86
 5689 nomad           /opt/puppet-archive/nomad-1 17120 1194272     86
 5706 php             /usr/bin/php /data/ecogate/ 24188 439100      86
 5708 php             /usr/bin/php /data/ecogate/ 24188 439100      86
 5721 nomad           /opt/puppet-archive/nomad-1 17376 1055004     85
 5730 php             /usr/bin/php /data/ecogate/ 24192 439100      85
 5731 nomad           /opt/puppet-archive/nomad-1 16736 1120540     85
 5740 php             /usr/bin/php /data/ecogate/ 24188 439100      85
 5741 nomad           /opt/puppet-archive/nomad-1 17300 1259552     85
 5749 php             /usr/bin/php /data/ecogate/ 24188 439100      85
 5751 nomad           /opt/puppet-archive/nomad-1 17312 1186076     85
 5761 php             /usr/bin/php /data/ecogate/ 24188 439100      85
 5763 nomad           /opt/puppet-archive/nomad-1 17156 1194272     85
 5773 php             /usr/bin/php /data/ecogate/ 24188 439100      85
 5774 nomad           /opt/puppet-archive/nomad-1 17200 1120284     85
 5783 php             /usr/bin/php /data/ecogate/ 24188 439100      85
 5784 nomad           /opt/puppet-archive/nomad-1 16740 1120540     85
 5792 php             /usr/bin/php /data/ecogate/ 24188 439100      84
 5794 nomad           /opt/puppet-archive/nomad-1 17308 1055004     84
 5800 nomad           /opt/puppet-archive/nomad-1 17368 1120284     84
 5811 php             /usr/bin/php /data/ecogate/ 24180 439100      84
 5814 php             /usr/bin/php /data/ecogate/ 24188 439100      84
 5822 nomad           /opt/puppet-archive/nomad-1 17244 1120540     82
 5832 php             /usr/bin/php /data/ecogate/ 24176 439100      82
 5837 nomad           /opt/puppet-archive/nomad-1 17456 1120540     82
 5846 php             /usr/bin/php /data/ecogate/ 24192 439100      82
 5847 nomad           /opt/puppet-archive/nomad-1 17192 1120796     82
 5857 php             /usr/bin/php /data/ecogate/ 24180 439100      82
 5858 nomad           /opt/puppet-archive/nomad-1 16884 1128736     82
 5868 php             /usr/bin/php /data/ecogate/ 24192 439100      82
 5869 nomad           /opt/puppet-archive/nomad-1 16992 1055004     81
 5878 php             /usr/bin/php /data/ecogate/ 24188 439100      81
 5879 nomad           /opt/puppet-archive/nomad-1 16980 1055004     81
 5892 php             /usr/bin/php /data/ecogate/ 24188 439100      81
 5893 nomad           /opt/puppet-archive/nomad-1 17152 1120540     80
 5906 php             /usr/bin/php /data/ecogate/ 24188 439100      80
 5907 nomad           /opt/puppet-archive/nomad-1 17392 1120540     80
 5916 php             /usr/bin/php /data/ecogate/ 24188 439100      80
 5918 nomad           /opt/puppet-archive/nomad-1 17176 1120284     80
 5927 php             /usr/bin/php /data/ecogate/ 24176 439100      80
 5928 nomad           /opt/puppet-archive/nomad-1 17048 1186332     80
 5937 php             /usr/bin/php /data/ecogate/ 24176 439100      80
 5939 nomad           /opt/puppet-archive/nomad-1 16788 1185820     80
 5946 nomad           /opt/puppet-archive/nomad-1 17076 1186076     80
 5955 php             /usr/bin/php /data/ecogate/ 24188 439100      80
 5960 php             /usr/bin/php /data/ecogate/ 24188 439100      80
 5961 nomad           /opt/puppet-archive/nomad-1 17132 989468      79
 5970 php             /usr/bin/php /data/ecogate/ 24188 439100      79
 5975 nomad           /opt/puppet-archive/nomad-1 16904 1120540     79
 5985 php             /usr/bin/php /data/ecogate/ 24188 439100      79
 5986 nomad           /opt/puppet-archive/nomad-1 17240 1120540     78
 5989 unbound         /usr/sbin/unbound -d        13344 285396 2858805
 5997 nomad           /opt/puppet-archive/nomad-1 17184 1054748     78
 6003 nomad           /opt/puppet-archive/nomad-1 16720 1185820     78
 6007 php             /usr/bin/php /data/ecogate/ 24188 439100      78
 6015 php             /usr/bin/php /data/ecogate/ 24188 439100      78
 6020 php             /usr/bin/php /data/ecogate/ 24188 439100      78
 6021 nomad           /opt/puppet-archive/nomad-1 17228 989468      78
 6026 nomad           /opt/puppet-archive/nomad-1 17196 1120540     78
 6039 php             /usr/bin/php /data/ecogate/ 24192 439100      78
 6040 php             /usr/bin/php /data/ecogate/ 24176 439100      78
 6041 nomad           /opt/puppet-archive/nomad-1 16936 1120540     78
 6051 php             /usr/bin/php /data/ecogate/ 24188 439100      78
 6052 nomad           /opt/puppet-archive/nomad-1 17364 1251612     78
 6053 nomad           /opt/puppet-archive/nomad-1 17116 1055004     78
 6069 php             /usr/bin/php /data/ecogate/ 24188 439100      78
 6070 php             /usr/bin/php /data/ecogate/ 24188 439100      78
 6073 nomad           /opt/puppet-archive/nomad-1 16788 989468      78
 6079 nomad           /opt/puppet-archive/nomad-1 16628 1120540     78
 6081 php             /usr/bin/php /data/ecogate/ 24192 439100      78
 6083 nomad           /opt/puppet-archive/nomad-1 16968 1120540     78
 6099 php             /usr/bin/php /data/ecogate/ 24176 439100      78
 6100 php             /usr/bin/php /data/ecogate/ 24188 439100      78
 6103 nomad           /opt/puppet-archive/nomad-1 16908 1185820     77
 6111 php             /usr/bin/php /data/ecogate/ 24188 439100      77
 6115 nomad           /opt/puppet-archive/nomad-1 17068 1120540     77
 6124 php             /usr/bin/php /data/ecogate/ 24180 439100      77
 6125 nomad           /opt/puppet-archive/nomad-1 16788 1120540     77
 6134 php             /usr/bin/php /data/ecogate/ 24188 439100      77
 6135 nomad           /opt/puppet-archive/nomad-1 17232 1054812     77
 6144 php             /usr/bin/php /data/ecogate/ 24188 439100      77
 6145 nomad           /opt/puppet-archive/nomad-1 17084 1120284     77
 6155 php             /usr/bin/php /data/ecogate/ 24176 439100      77
 6159 nomad           /opt/puppet-archive/nomad-1 17028 1120540     76
 6168 php             /usr/bin/php /data/ecogate/ 24192 439100      76
 6170 nomad           /opt/puppet-archive/nomad-1 17136 1055004     76
 6180 php             /usr/bin/php /data/ecogate/ 24188 439100      76
 6186 nomad           /opt/puppet-archive/nomad-1 17180 1259808     75
 6193 nomad           /opt/puppet-archive/nomad-1 17056 1186076     75
 6204 php             /usr/bin/php /data/ecogate/ 24188 439100      75
 6205 php             /usr/bin/php /data/ecogate/ 24188 439100      75
 6222 nomad           /opt/puppet-archive/nomad-1 17124 1046808     74
 6238 php             /usr/bin/php /data/ecogate/ 24188 439100      74
 6269 nomad           /opt/puppet-archive/nomad-1 17156 1054492     74
 6286 php             /usr/bin/php /data/ecogate/ 24188 439100      74
 6601 kworker/1:0     [kworker/1:0]                   0      0      53
 7198 nomad           /opt/puppet-archive/nomad-1 14036 1110040      3
 7208 php             /usr/bin/php /data/ecogate/ 23380 439100       3
 7215 nomad           /opt/puppet-archive/nomad-1 14036 1037204      2
 7224 php             /usr/bin/php /data/ecogate/ 23660 439100       2
 7225 nomad           /opt/puppet-archive/nomad-1 14088 1101588      2
 7234 php             /usr/bin/php /data/ecogate/ 23660 439100       2
 7235 nomad           /opt/puppet-archive/nomad-1 13884 1176728      2
 7245 php             /usr/bin/php /data/ecogate/ 23664 439100       2
 7247 nomad           /opt/puppet-archive/nomad-1 14076 971924       2
 7253 nomad           /opt/puppet-archive/nomad-1 14272 1110936      2
 7261 php             /usr/bin/php /data/ecogate/ 23396 439100       2
 7265 php             /usr/bin/php /data/ecogate/ 23404 439100       2
 7266 nomad           /opt/puppet-archive/nomad-1 14076 980376       2
 7276 php             /usr/bin/php /data/ecogate/ 23404 439100       2
 7277 nomad           /opt/puppet-archive/nomad-1 13988 972180       2
 7284 nomad           /opt/puppet-archive/nomad-1 13756 1035796      2
 7286 php             /usr/bin/php /data/ecogate/ 23400 439100       2
 7295 php             /usr/bin/php /data/ecogate/ 23400 439100       2
 7298 bash            -bash                        1856 116620       2
 7299 tee             tee -ai /root/.bash_history   668 108056       2
 7300 bash            -bash                        2140 116620       2
 7306 nomad           /opt/puppet-archive/nomad-1 14172 1110040      2
 7307 nomad           /opt/puppet-archive/nomad-1 13784 1045656      2
 7326 php             /usr/bin/php /data/ecogate/ 23404 439100       2
 7327 php             /usr/bin/php /data/ecogate/ 23400 439100       2
 7334 ps              ps -eo pid,comm,args,rss,vs  1492 153328       0
 9218 nomad           /usr/local/bin/nomad agent  1993868 4783660 85673
 9277 nomad           /opt/puppet-archive/nomad-1 438872 2209776 85664
 9281 nomad           /opt/puppet-archive/nomad-1 67136 1196132  85664
 9291 nomad           /opt/puppet-archive/nomad-1 434460 2078448 85664
 9310 nomad           /opt/puppet-archive/nomad-1 63872 1196388  85664
 9316 nomad           /opt/puppet-archive/nomad-1 67192 1319008  85664
 9325 nomad           /opt/puppet-archive/nomad-1 65280 1187936  85664
 9336 nomad           /opt/puppet-archive/nomad-1 443512 2078704 85664
 9337 nomad           /opt/puppet-archive/nomad-1 60456 1187936  85664
 9341 nomad           /opt/puppet-archive/nomad-1 69016 1187936  85664
 9342 nomad           /opt/puppet-archive/nomad-1 68656 1253472  85664
 9356 nomad           /opt/puppet-archive/nomad-1 66148 1327460  85664
 9367 nomad           /opt/puppet-archive/nomad-1 68524 1261668  85664
 9376 nomad           /opt/puppet-archive/nomad-1 64624 1261924  85664
 9377 nomad           /opt/puppet-archive/nomad-1 73072 1319264  85664
 9387 nomad           /opt/puppet-archive/nomad-1 432132 2144168 85664
 9392 nomad           /opt/puppet-archive/nomad-1 63188 1327204  85664
 9438 nomad           /opt/puppet-archive/nomad-1 66020 1261668  85664
 9448 nomad           /opt/puppet-archive/nomad-1 64396 1122400  85664
 9449 nomad           /opt/puppet-archive/nomad-1 66764 1122656  85664
 9451 nomad           /opt/puppet-archive/nomad-1 69820 1253472  85664
 9452 nomad           /opt/puppet-archive/nomad-1 66664 1187680  85664
 9453 nomad           /opt/puppet-archive/nomad-1 60308 1187936  85664
 9498 nomad           /opt/puppet-archive/nomad-1 443140 2144496 85664
 9500 nomad           /opt/puppet-archive/nomad-1  9968 1111192  85664
 9506 nomad           /opt/puppet-archive/nomad-1 65612 1187936  85664
 9518 nomad           /opt/puppet-archive/nomad-1 443828 2144496 85664
 9531 nomad           /opt/puppet-archive/nomad-1 59832 1130596  85664
 9540 nomad           /opt/puppet-archive/nomad-1 62988 1253728  85664
 9562 nomad           /opt/puppet-archive/nomad-1 68736 1196132  85664
 9577 nomad           /opt/puppet-archive/nomad-1 69740 1122400  85664
 9578 nomad           /opt/puppet-archive/nomad-1 64000 1196132  85664
 9579 nomad           /opt/puppet-archive/nomad-1 14732 1194016  85664
 9580 nomad           /opt/puppet-archive/nomad-1 62664 1261924  85664
 9581 nomad           /opt/puppet-archive/nomad-1 63348 1196388  85664
 9594 nomad           /opt/puppet-archive/nomad-1 62224 1261924  85664
 9604 nomad           /opt/puppet-archive/nomad-1 68504 1261668  85664
 9607 nomad           /opt/puppet-archive/nomad-1 60264 1122656  85664
 9633 nomad           /opt/puppet-archive/nomad-1 64216 1327460  85664
 9642 nomad           /opt/puppet-archive/nomad-1 69144 1253472  85664
 9660 nomad           /opt/puppet-archive/nomad-1 65312 1327204  85664
 9661 nomad           /opt/puppet-archive/nomad-1 442044 2078448 85664
 9712 nomad           /opt/puppet-archive/nomad-1 68068 1253728  85664
 9725 nomad           /opt/puppet-archive/nomad-1 64960 1188192  85664
 9737 nomad           /opt/puppet-archive/nomad-1 65260 1187936  85664
 9746 nomad           /opt/puppet-archive/nomad-1 64292 1261668  85664
 9755 nomad           /opt/puppet-archive/nomad-1 60944 1261924  85664
 9770 nomad           /opt/puppet-archive/nomad-1 72076 1253536  85664
 9778 nomad           /opt/puppet-archive/nomad-1 60224 1122400  85664
 9789 nomad           /opt/puppet-archive/nomad-1 59944 1253472  85664
 9794 nomad           /opt/puppet-archive/nomad-1 66404 1253472  85664
 9802 nomad           /opt/puppet-archive/nomad-1 438492 2209960 85664
 9808 nomad           /opt/puppet-archive/nomad-1 19280 1120028  85664
 9815 nomad           /opt/puppet-archive/nomad-1 444064 2144240 85664
 9824 nomad           /opt/puppet-archive/nomad-1 66100 1122656  85664
 9834 nomad           /opt/puppet-archive/nomad-1 65828 1253472  85664
 9862 nomad           /opt/puppet-archive/nomad-1 68680 1261668  85664
 9872 nomad           /opt/puppet-archive/nomad-1 439576 2078704 85664
 9882 nomad           /opt/puppet-archive/nomad-1 59960 1056864  85664
 9888 nomad           /opt/puppet-archive/nomad-1 65112 1187936  85664
 9889 nomad           /opt/puppet-archive/nomad-1 74252 1188192  85664
 9900 nomad           /opt/puppet-archive/nomad-1 60696 1188192  85664
 9917 nomad           /opt/puppet-archive/nomad-1 63564 1261924  85664
 9957 nomad           /opt/puppet-archive/nomad-1 55468 1327204  85664
10122 nomad           /opt/puppet-archive/nomad-1 440628 2144240 85663
13006 kworker/u32:2   [kworker/u32:2]                 0      0    4912
15222 chronyd         /usr/sbin/chronyd             944 120588 17354725
15733 zabbix_agentd   /usr/sbin/zabbix_agentd -c    808  79076 17354709
15734 zabbix_agentd   /usr/sbin/zabbix_agentd: co  1416  79076 17354709
15735 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2028 100632 17354709
15736 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2032 100632 17354709
15737 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2060 100632 17354709
15738 zabbix_agentd   /usr/sbin/zabbix_agentd: ac  1708 100640 17354709
16776 kworker/2:0     [kworker/2:0]                   0      0    1252
16778 kworker/3:3     [kworker/3:3]                   0      0    1252
21142 agent-linux     /data/agent-linux --collect  6708 122624   35032
21645 nomad           /opt/puppet-archive/nomad-1 18152 1259808  35030
21772 php             /usr/bin/php /data/ecogate/ 62932 709920   35030
22328 rsyslogd        /usr/sbin/rsyslogd -n       58468 612576  120713
22852 nomad           /opt/puppet-archive/nomad-1 17444 1120540    905
22862 php             /usr/bin/php /data/ecogate/ 24504 439100     905
22864 nomad           /opt/puppet-archive/nomad-1 17720 1186076    904
22874 php             /usr/bin/php /data/ecogate/ 24340 670496     904
23033 td-agent-bit    /opt/td-agent-bit/bin/td-ag 10124 145592   34935
23524 bounce          bounce -z -t unix -u         4384 107892     893
27441 kworker/0:2     [kworker/0:2]                   0      0    2411
27774 kworker/3:1     [kworker/3:1]                   0      0     650
29854 kworker/1:1     [kworker/1:1]                   0      0     532
29861 pickup          pickup -l -t unix -u         4380 107856     532
30127 kworker/u32:1   [kworker/u32:1]                 0      0     511
31040 cleanup         cleanup -z -t unix -u        4440 108004    2213
31042 local           local -t unix                4936  95100    2213
31133 kworker/3:2     [kworker/3:2]                   0      0     473
31261 consul          /usr/local/bin/consul agent 28576 785124  430248
31931 kworker/2:1     [kworker/2:1]                   0      0     413

@bubejur
Copy link

bubejur commented Jul 7, 2021

pmap -xp nomad agent process 1h later

pmap -xp 9218
9218:   /usr/local/bin/nomad agent -config=/etc/nomad
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000   61404   25300       0 r-x-- /opt/puppet-archive/nomad-1.1.2/nomad
00000000041f6000       4       4       4 r---- /opt/puppet-archive/nomad-1.1.2/nomad
00000000041f7000    1596     516     296 rw--- /opt/puppet-archive/nomad-1.1.2/nomad
0000000004386000     292     148     148 rw---   [ anon ]
0000000004fd7000     132       4       4 rw---   [ anon ]
000000c000000000 2031616 1989576 1989576 rw---   [ anon ]
00007f3f2c649000   34200   31936   31936 rw---   [ anon ]
00007f3f2e861000   18712   18320   18320 rw---   [ anon ]
00007f3f2faae000    5448    5432    5432 rw---   [ anon ]
00007f3f30000000     132       8       8 rw---   [ anon ]
00007f3f30021000   65404       0       0 -----   [ anon ]
00007f3f34000000     132       4       4 rw---   [ anon ]
00007f3f34021000   65404       0       0 -----   [ anon ]
00007f3f38000000     132       4       4 rw---   [ anon ]
00007f3f38021000   65404       0       0 -----   [ anon ]
00007f3f3c000000     132       4       4 rw---   [ anon ]
00007f3f3c021000   65404       0       0 -----   [ anon ]
00007f3f40000000     132       4       4 rw---   [ anon ]
00007f3f40021000   65404       0       0 -----   [ anon ]
00007f3f44000000     132       4       4 rw---   [ anon ]
00007f3f44021000   65404       0       0 -----   [ anon ]
00007f3f48000000     132       8       8 rw---   [ anon ]
00007f3f48021000   65404       0       0 -----   [ anon ]
00007f3f4c000000     132       8       8 rw---   [ anon ]
00007f3f4c021000   65404       0       0 -----   [ anon ]
00007f3f5000b000    9868    9812    9812 rw---   [ anon ]
00007f3f509b8000    6408    6384    6384 rw---   [ anon ]
00007f3f50ffa000       4       0       0 -----   [ anon ]
00007f3f50ffb000    8192       8       8 rw---   [ anon ]
00007f3f517fb000       4       0       0 -----   [ anon ]
00007f3f517fc000    8192       8       8 rw---   [ anon ]
00007f3f51ffc000       4       0       0 -----   [ anon ]
00007f3f51ffd000    8192       8       8 rw---   [ anon ]
00007f3f527fd000       4       0       0 -----   [ anon ]
00007f3f527fe000    8192       8       8 rw---   [ anon ]
00007f3f52ffe000       4       0       0 -----   [ anon ]
00007f3f52fff000    8192      12      12 rw---   [ anon ]
00007f3f537ff000       4       0       0 -----   [ anon ]
00007f3f53800000    8192       8       8 rw---   [ anon ]
00007f3f54000000     132       4       4 rw---   [ anon ]
00007f3f54021000   65404       0       0 -----   [ anon ]
00007f3f58000000     132       8       8 rw---   [ anon ]
00007f3f58021000   65404       0       0 -----   [ anon ]
00007f3f5c000000     132       8       8 rw---   [ anon ]
00007f3f5c021000   65404       0       0 -----   [ anon ]
00007f3f60000000     132       8       8 rw---   [ anon ]
00007f3f60021000   65404       0       0 -----   [ anon ]
00007f3f64000000     132       8       8 rw---   [ anon ]
00007f3f64021000   65404       0       0 -----   [ anon ]
00007f3f68000000     132       8       8 rw---   [ anon ]
00007f3f68021000   65404       0       0 -----   [ anon ]
00007f3f6c005000    3012    2996    2996 rw---   [ anon ]
00007f3f6c2f7000    5128    5120    5120 rw---   [ anon ]
00007f3f6c7f9000       4       0       0 -----   [ anon ]
00007f3f6c7fa000    8192       8       8 rw---   [ anon ]
00007f3f6cffa000       4       0       0 -----   [ anon ]
00007f3f6cffb000    8192       8       8 rw---   [ anon ]
00007f3f6d7fb000       4       0       0 -----   [ anon ]
00007f3f6d7fc000    8192       8       8 rw---   [ anon ]
00007f3f6dffc000       4       0       0 -----   [ anon ]
00007f3f6dffd000    8192       8       8 rw---   [ anon ]
00007f3f6e7fd000       4       0       0 -----   [ anon ]
00007f3f6e7fe000    8192       8       8 rw---   [ anon ]
00007f3f6effe000       4       0       0 -----   [ anon ]
00007f3f6efff000    8192       8       8 rw---   [ anon ]
00007f3f6f7ff000       4       0       0 -----   [ anon ]
00007f3f6f800000    8192       8       8 rw---   [ anon ]
00007f3f70000000     132       4       4 rw---   [ anon ]
00007f3f70021000   65404       0       0 -----   [ anon ]
00007f3f74000000     132       8       8 rw---   [ anon ]
00007f3f74021000   65404       0       0 -----   [ anon ]
00007f3f78000000     132       8       8 rw---   [ anon ]
00007f3f78021000   65404       0       0 -----   [ anon ]
00007f3f7c000000     132       8       8 rw---   [ anon ]
00007f3f7c021000   65404       0       0 -----   [ anon ]
00007f3f80000000     132       8       8 rw---   [ anon ]
00007f3f80021000   65404       0       0 -----   [ anon ]
00007f3f84000000     132       8       8 rw---   [ anon ]
00007f3f84021000   65404       0       0 -----   [ anon ]
00007f3f88000000     132       8       8 rw---   [ anon ]
00007f3f88021000   65404       0       0 -----   [ anon ]
00007f3f8c000000     132      16      16 rw---   [ anon ]
00007f3f8c021000   65404       0       0 -----   [ anon ]
00007f3f9000d000    1280    1260    1260 rw---   [ anon ]
00007f3f90153000    3908    3876    3876 rw---   [ anon ]
00007f3f90528000    2884    2868    2868 rw---   [ anon ]
00007f3f907f9000       4       0       0 -----   [ anon ]
00007f3f907fa000    8192       8       8 rw---   [ anon ]
00007f3f90ffa000       4       0       0 -----   [ anon ]
00007f3f90ffb000    8192       8       8 rw---   [ anon ]
00007f3f917fb000       4       0       0 -----   [ anon ]
00007f3f917fc000    8192       8       8 rw---   [ anon ]
00007f3f91ffc000       4       0       0 -----   [ anon ]
00007f3f91ffd000    8192       8       8 rw---   [ anon ]
00007f3f927fd000       4       0       0 -----   [ anon ]
00007f3f927fe000    8192       8       8 rw---   [ anon ]
00007f3f92ffe000       4       0       0 -----   [ anon ]
00007f3f92fff000    8192       8       8 rw---   [ anon ]
00007f3f937ff000       4       0       0 -----   [ anon ]
00007f3f93800000    8192      16      16 rw---   [ anon ]
00007f3f94000000     132       8       8 rw---   [ anon ]
00007f3f94021000   65404       0       0 -----   [ anon ]
00007f3f98000000     132       8       8 rw---   [ anon ]
00007f3f98021000   65404       0       0 -----   [ anon ]
00007f3f9c000000     132      12      12 rw---   [ anon ]
00007f3f9c021000   65404       0       0 -----   [ anon ]
00007f3fa0000000     132       8       8 rw---   [ anon ]
00007f3fa0021000   65404       0       0 -----   [ anon ]
00007f3fa4000000     132       8       8 rw---   [ anon ]
00007f3fa4021000   65404       0       0 -----   [ anon ]
00007f3fa800e000     256     256     256 rw---   [ anon ]
00007f3fa8056000    4164    4160    4160 rw---   [ anon ]
00007f3fa8467000       4       0       0 -----   [ anon ]
00007f3fa8468000    8192       8       8 rw---   [ anon ]
00007f3fa8c68000      48      24       0 r-x-- /usr/lib64/libnss_files-2.17.so
00007f3fa8c74000    2044       0       0 ----- /usr/lib64/libnss_files-2.17.so
00007f3fa8e73000       4       4       4 r---- /usr/lib64/libnss_files-2.17.so
00007f3fa8e74000       4       4       4 rw--- /usr/lib64/libnss_files-2.17.so
00007f3fa8e75000     280     244     244 rw---   [ anon ]
00007f3fa8ebb000    8192    2576       0 r--s- /data/nomad/client/state.db
00007f3fa96bb000       4       0       0 -----   [ anon ]
00007f3fa96bc000    8448     264     264 rw---   [ anon ]
00007f3fa9efc000       4       0       0 -----   [ anon ]
00007f3fa9efd000    9216    1032    1032 rw---   [ anon ]
00007f3faa7fd000       4       0       0 -----   [ anon ]
00007f3faa7fe000    8192       8       8 rw---   [ anon ]
00007f3faaffe000       4       0       0 -----   [ anon ]
00007f3faafff000    8192       8       8 rw---   [ anon ]
00007f3fab7ff000       4       0       0 -----   [ anon ]
00007f3fab800000    8192       8       8 rw---   [ anon ]
00007f3fac000000     132       4       4 rw---   [ anon ]
00007f3fac021000   65404       0       0 -----   [ anon ]
00007f3fb0002000    2304    2304    2304 rw---   [ anon ]
00007f3fb0242000       4       0       0 -----   [ anon ]
00007f3fb0243000    8192       8       8 rw---   [ anon ]
00007f3fb0a43000       4       0       0 -----   [ anon ]
00007f3fb0a44000    8192       8       8 rw---   [ anon ]
00007f3fb1244000       4       0       0 -----   [ anon ]
00007f3fb1245000   44100    2176    2176 rw---   [ anon ]
00007f3fb3d56000  263680       0       0 -----   [ anon ]
00007f3fc3ed6000       4       4       4 rw---   [ anon ]
00007f3fc3ed7000  293564       0       0 -----   [ anon ]
00007f3fd5d86000       4       4       4 rw---   [ anon ]
00007f3fd5d87000   36692       0       0 -----   [ anon ]
00007f3fd815c000       4       4       4 rw---   [ anon ]
00007f3fd815d000    4068       0       0 -----   [ anon ]
00007f3fd8556000    1808     416       0 r-x-- /usr/lib64/libc-2.17.so
00007f3fd871a000    2044       0       0 ----- /usr/lib64/libc-2.17.so
00007f3fd8919000      16      16      16 r---- /usr/lib64/libc-2.17.so
00007f3fd891d000       8       8       8 rw--- /usr/lib64/libc-2.17.so
00007f3fd891f000      20      20      20 rw---   [ anon ]
00007f3fd8924000       8       8       0 r-x-- /usr/lib64/libdl-2.17.so
00007f3fd8926000    2048       0       0 ----- /usr/lib64/libdl-2.17.so
00007f3fd8b26000       4       4       4 r---- /usr/lib64/libdl-2.17.so
00007f3fd8b27000       4       4       4 rw--- /usr/lib64/libdl-2.17.so
00007f3fd8b28000      92      60       0 r-x-- /usr/lib64/libpthread-2.17.so
00007f3fd8b3f000    2044       0       0 ----- /usr/lib64/libpthread-2.17.so
00007f3fd8d3e000       4       4       4 r---- /usr/lib64/libpthread-2.17.so
00007f3fd8d3f000       4       4       4 rw--- /usr/lib64/libpthread-2.17.so
00007f3fd8d40000      16       4       4 rw---   [ anon ]
00007f3fd8d44000     136     116       0 r-x-- /usr/lib64/ld-2.17.so
00007f3fd8d68000     576     572     572 rw---   [ anon ]
00007f3fd8df8000     512       0       0 -----   [ anon ]
00007f3fd8e78000       4       4       4 rw---   [ anon ]
00007f3fd8e79000     508       0       0 -----   [ anon ]
00007f3fd8ef8000     400      68      68 rw---   [ anon ]
00007f3fd8f64000       4       4       4 rw---   [ anon ]
00007f3fd8f65000       4       4       4 r---- /usr/lib64/ld-2.17.so
00007f3fd8f66000       4       4       4 rw--- /usr/lib64/ld-2.17.so
00007f3fd8f67000       4       4       4 rw---   [ anon ]
00007ffdfa5f3000     132      16      16 rw---   [ stack ]
00007ffdfa789000       8       4       0 r-x--   [ anon ]
ffffffffff600000       4       0       0 r-x--   [ anon ]
---------------- ------- ------- ------- 
total kB         4921500 2118376 2089652

ps output take Wed Jul 7 14:20:47 UTC 2021

date
Wed Jul  7 14:20:47 UTC 2021
[root@cron01 ~]# ps -eo pid,comm,args,rss,vsz,etimes
  PID COMMAND         COMMAND                       RSS    VSZ ELAPSED
    1 systemd         /usr/lib/systemd/systemd --  6184 129260 17368570
    2 kthreadd        [kthreadd]                      0      0 17368570
    4 kworker/0:0H    [kworker/0:0H]                  0      0 17368570
    6 ksoftirqd/0     [ksoftirqd/0]                   0      0 17368570
    7 migration/0     [migration/0]                   0      0 17368570
    8 rcu_bh          [rcu_bh]                        0      0 17368570
    9 rcu_sched       [rcu_sched]                     0      0 17368570
   10 lru-add-drain   [lru-add-drain]                 0      0 17368570
   11 watchdog/0      [watchdog/0]                    0      0 17368570
   12 watchdog/1      [watchdog/1]                    0      0 17368570
   13 migration/1     [migration/1]                   0      0 17368570
   14 ksoftirqd/1     [ksoftirqd/1]                   0      0 17368570
   16 kworker/1:0H    [kworker/1:0H]                  0      0 17368570
   17 watchdog/2      [watchdog/2]                    0      0 17368570
   18 migration/2     [migration/2]                   0      0 17368570
   19 ksoftirqd/2     [ksoftirqd/2]                   0      0 17368570
   21 kworker/2:0H    [kworker/2:0H]                  0      0 17368570
   22 watchdog/3      [watchdog/3]                    0      0 17368570
   23 migration/3     [migration/3]                   0      0 17368570
   24 ksoftirqd/3     [ksoftirqd/3]                   0      0 17368570
   26 kworker/3:0H    [kworker/3:0H]                  0      0 17368570
   28 kdevtmpfs       [kdevtmpfs]                     0      0 17368570
   29 netns           [netns]                         0      0 17368570
   30 khungtaskd      [khungtaskd]                    0      0 17368570
   31 writeback       [writeback]                     0      0 17368570
   32 kintegrityd     [kintegrityd]                   0      0 17368570
   33 bioset          [bioset]                        0      0 17368570
   34 bioset          [bioset]                        0      0 17368570
   35 bioset          [bioset]                        0      0 17368570
   36 kblockd         [kblockd]                       0      0 17368570
   37 md              [md]                            0      0 17368570
   38 edac-poller     [edac-poller]                   0      0 17368570
   39 watchdogd       [watchdogd]                     0      0 17368570
   45 kswapd0         [kswapd0]                       0      0 17368570
   46 ksmd            [ksmd]                          0      0 17368570
   47 khugepaged      [khugepaged]                    0      0 17368570
   48 crypto          [crypto]                        0      0 17368570
   56 kthrotld        [kthrotld]                      0      0 17368570
   58 kmpath_rdacd    [kmpath_rdacd]                  0      0 17368570
   59 kaluad          [kaluad]                        0      0 17368570
   61 kpsmoused       [kpsmoused]                     0      0 17368570
   62 ipv6_addrconf   [ipv6_addrconf]                 0      0 17368570
   75 deferwq         [deferwq]                       0      0 17368570
  112 kauditd         [kauditd]                       0      0 17368570
  281 ata_sff         [ata_sff]                       0      0 17368570
  292 scsi_eh_0       [scsi_eh_0]                     0      0 17368570
  293 scsi_tmf_0      [scsi_tmf_0]                    0      0 17368570
  294 scsi_eh_1       [scsi_eh_1]                     0      0 17368570
  295 scsi_tmf_1      [scsi_tmf_1]                    0      0 17368570
  296 virtscsi-scan   [virtscsi-scan]                 0      0 17368570
  299 scsi_eh_2       [scsi_eh_2]                     0      0 17368570
  300 scsi_tmf_2      [scsi_tmf_2]                    0      0 17368570
  303 ttm_swap        [ttm_swap]                      0      0 17368570
  367 kworker/2:1H    [kworker/2:1H]                  0      0 17368569
  427 kdmflush        [kdmflush]                      0      0 17368569
  428 bioset          [bioset]                        0      0 17368569
  446 jbd2/dm-0-8     [jbd2/dm-0-8]                   0      0 17368569
  447 ext4-rsv-conver [ext4-rsv-conver]               0      0 17368569
  547 systemd-journal /usr/lib/systemd/systemd-jo 112088 186772 17368568
  566 rpciod          [rpciod]                        0      0 17368568
  568 xprtiod         [xprtiod]                       0      0 17368568
  569 lvmetad         /usr/sbin/lvmetad -f         3740 274980 17368568
  582 systemd-udevd   /usr/lib/systemd/systemd-ud  2236  48640 17368568
  606 hwrng           [hwrng]                         0      0 17368568
  659 nfit            [nfit]                          0      0 17368568
  679 jbd2/sda1-8     [jbd2/sda1-8]                   0      0 17368568
  680 ext4-rsv-conver [ext4-rsv-conver]               0      0 17368568
  682 kdmflush        [kdmflush]                      0      0 17368567
  683 bioset          [bioset]                        0      0 17368567
  688 jbd2/dm-1-8     [jbd2/dm-1-8]                   0      0 17368567
  689 ext4-rsv-conver [ext4-rsv-conver]               0      0 17368567
  692 kdmflush        [kdmflush]                      0      0 17368566
  693 bioset          [bioset]                        0      0 17368566
  695 kdmflush        [kdmflush]                      0      0 17368566
  696 bioset          [bioset]                        0      0 17368566
  710 kworker/0:1H    [kworker/0:1H]                  0      0 17368566
  716 jbd2/dm-2-8     [jbd2/dm-2-8]                   0      0 17368566
  717 ext4-rsv-conver [ext4-rsv-conver]               0      0 17368566
  719 jbd2/dm-3-8     [jbd2/dm-3-8]                   0      0 17368566
  720 ext4-rsv-conver [ext4-rsv-conver]               0      0 17368566
  748 auditd          /sbin/auditd                  752  55532 17368566
  770 dbus-daemon     /usr/bin/dbus-daemon --syst  1496  88108 17368566
  772 NetworkManager  /usr/sbin/NetworkManager --  2716 476436 17368566
  773 sssd            /usr/sbin/sssd -i --logger=  1260 268652 17368566
  777 polkitd         /usr/lib/polkit-1/polkitd -  9276 625200 17368566
  778 irqbalance      /usr/sbin/irqbalance --fore   576  21596 17368566
  780 qemu-ga         /usr/bin/qemu-ga --method=v  1400  44220 17368566
  796 gssproxy        /usr/sbin/gssproxy -D         584 269132 17368566
  802 python          /usr/bin/python /usr/share/ 46624 532144 17368566
  817 sssd_be         /usr/libexec/sssd/sssd_be -  7944 419828 17368566
  823 rpc.gssd        /usr/sbin/rpc.gssd            388  40340 17368566
  839 sssd_nss        /usr/libexec/sssd/sssd_nss   2864 277484 17368566
  840 sssd_sudo       /usr/libexec/sssd/sssd_sudo  2320 249444 17368566
  841 sssd_pam        /usr/libexec/sssd/sssd_pam   2460 255632 17368566
  842 sssd_ssh        /usr/libexec/sssd/sssd_ssh   2212 257128 17368566
  843 sssd_pac        /usr/libexec/sssd/sssd_pac   1568 293736 17368566
  873 systemd-logind  /usr/lib/systemd/systemd-lo  1244  37088 17368566
  886 crond           /usr/sbin/crond -n            948 126392 17368566
  889 atd             /usr/sbin/atd -f              208  25908 17368566
  896 agetty          /sbin/agetty --noclear tty1   132 110208 17368566
 1063 sshd            /usr/sbin/sshd -D            1272 112936 17368565
 1064 tuned           /usr/bin/python2 -Es /usr/s 13460 586428 17368565
 1066 node_exporter   /usr/local/bin/node_exporte  2780 121856 17368565
 1067 python          /usr/bin/python /usr/bin/go 133652 1103260 17368565
 1069 oddjobd         /usr/sbin/oddjobd -n -p /va   452  54832 17368565
 1195 ossec-execd     /var/ossec/bin/ossec-execd    952  61332 17368565
 1218 ossec-agentd    /var/ossec/bin/ossec-agentd  1252  48764 17368565
 1228 ossec-agentd    /var/ossec/bin/ossec-agentd   760  48668 17368565
 1251 ossec-logcollec /var/ossec/bin/ossec-logcol   932  44176 17368565
 1285 ossec-syscheckd /var/ossec/bin/ossec-sysche 14660  57368 17368565
 1736 kworker/1:1H    [kworker/1:1H]                  0      0 17368564
 1871 master          /usr/libexec/postfix/master  1300  96968 17368563
 1875 qmgr            qmgr -l -t unix -u           1628 108028 17368563
 1892 kworker/2:2     [kworker/2:2]                   0      0     532
 1987 kworker/3:1     [kworker/3:1]                   0      0     521
 2022 splunkd         splunkd -p 8089 start       148752 336140 17368562
 2026 splunkd         [splunkd pid=2022] splunkd  11292  84300 17368562
 2169 kworker/3:1H    [kworker/3:1H]                  0      0 17368559
 3008 kworker/0:1     [kworker/0:1]                   0      0     472
 3009 kworker/0:3     [kworker/0:3]                   0      0     472
 4694 sshd            sshd: ybubentsov [priv]      5784 189232    5549
 4703 sshd            sshd: ybubentsov@pts/0       2512 189232    5548
 4708 bash            -bash                        3220 127336    5548
 4745 bash            -bash                        1892 127336    5548
 4746 tee             tee -ai /home/ybubentsov/.b   668 108056    5548
 4747 bash            -bash                        1832 127336    5548
 4748 sudo            sudo su -                    4916 280180    5548
 4750 su              su -                         2780 237644    5548
 4751 bash            -bash                        3172 116620    5548
 5989 unbound         /usr/sbin/unbound -d        13344 285396 2864204
 6172 bounce          bounce -z -t unix -u         4392 107892     292
 6305 nomad           /opt/puppet-archive/nomad-1 17648 1186076    282
 6315 php             /usr/bin/php /data/ecogate/ 23524 439100     282
 6317 nomad           /opt/puppet-archive/nomad-1 17428 1120540    280
 6327 php             /usr/bin/php /data/ecogate/ 24336 670496     280
 7527 kworker/1:0     [kworker/1:0]                   0      0     232
 7675 kworker/3:2     [kworker/3:2]                   0      0     220
 9218 nomad           /usr/local/bin/nomad agent  2117280 4921496 91072
 9277 nomad           /opt/puppet-archive/nomad-1 474208 2277428 91063
 9281 nomad           /opt/puppet-archive/nomad-1 72676 1196132  91063
 9291 nomad           /opt/puppet-archive/nomad-1 470516 2146612 91063
 9310 nomad           /opt/puppet-archive/nomad-1 68888 1196388  91063
 9316 nomad           /opt/puppet-archive/nomad-1 72992 1319008  91063
 9325 nomad           /opt/puppet-archive/nomad-1 70560 1187936  91063
 9336 nomad           /opt/puppet-archive/nomad-1 480916 2146356 91063
 9337 nomad           /opt/puppet-archive/nomad-1 64924 1187936  91063
 9341 nomad           /opt/puppet-archive/nomad-1 74812 1187936  91063
 9342 nomad           /opt/puppet-archive/nomad-1 73924 1253472  91063
 9356 nomad           /opt/puppet-archive/nomad-1 70872 1327460  91063
 9367 nomad           /opt/puppet-archive/nomad-1 74056 1261668  91063
 9376 nomad           /opt/puppet-archive/nomad-1 69108 1261924  91063
 9377 nomad           /opt/puppet-archive/nomad-1 77812 1319264  91063
 9387 nomad           /opt/puppet-archive/nomad-1 469396 2211892 91063
 9392 nomad           /opt/puppet-archive/nomad-1 67672 1327460  91063
 9438 nomad           /opt/puppet-archive/nomad-1 71024 1261668  91063
 9448 nomad           /opt/puppet-archive/nomad-1 70192 1122400  91063
 9449 nomad           /opt/puppet-archive/nomad-1 72824 1122656  91063
 9451 nomad           /opt/puppet-archive/nomad-1 74300 1253472  91063
 9452 nomad           /opt/puppet-archive/nomad-1 71416 1187680  91063
 9453 nomad           /opt/puppet-archive/nomad-1 64512 1187936  91063
 9498 nomad           /opt/puppet-archive/nomad-1 478992 2212148 91063
 9500 nomad           /opt/puppet-archive/nomad-1 10204 1111192  91063
 9506 nomad           /opt/puppet-archive/nomad-1 71144 1187936  91063
 9518 nomad           /opt/puppet-archive/nomad-1 481268 2212148 91063
 9531 nomad           /opt/puppet-archive/nomad-1 64580 1130596  91063
 9540 nomad           /opt/puppet-archive/nomad-1 67208 1253728  91063
 9562 nomad           /opt/puppet-archive/nomad-1 72956 1196132  91063
 9577 nomad           /opt/puppet-archive/nomad-1 74752 1122400  91063
 9578 nomad           /opt/puppet-archive/nomad-1 69012 1196132  91063
 9579 nomad           /opt/puppet-archive/nomad-1 14992 1194016  91063
 9580 nomad           /opt/puppet-archive/nomad-1 68200 1261924  91063
 9581 nomad           /opt/puppet-archive/nomad-1 68616 1196388  91063
 9594 nomad           /opt/puppet-archive/nomad-1 66692 1261924  91063
 9604 nomad           /opt/puppet-archive/nomad-1 72728 1261668  91063
 9607 nomad           /opt/puppet-archive/nomad-1 64748 1122656  91063
 9633 nomad           /opt/puppet-archive/nomad-1 70280 1327460  91063
 9642 nomad           /opt/puppet-archive/nomad-1 73892 1253472  91063
 9660 nomad           /opt/puppet-archive/nomad-1 71376 1327204  91063
 9661 nomad           /opt/puppet-archive/nomad-1 478432 2146612 91063
 9712 nomad           /opt/puppet-archive/nomad-1 73072 1253984  91063
 9725 nomad           /opt/puppet-archive/nomad-1 69176 1188192  91063
 9737 nomad           /opt/puppet-archive/nomad-1 71056 1187936  91063
 9746 nomad           /opt/puppet-archive/nomad-1 68500 1261668  91063
 9755 nomad           /opt/puppet-archive/nomad-1 65424 1261924  91063
 9770 nomad           /opt/puppet-archive/nomad-1 78396 1253536  91063
 9778 nomad           /opt/puppet-archive/nomad-1 65760 1122400  91063
 9789 nomad           /opt/puppet-archive/nomad-1 64424 1253472  91063
 9794 nomad           /opt/puppet-archive/nomad-1 71416 1253472  91063
 9802 nomad           /opt/puppet-archive/nomad-1 476060 2277684 91063
 9808 nomad           /opt/puppet-archive/nomad-1 20068 1120028  91063
 9815 nomad           /opt/puppet-archive/nomad-1 481516 2211892 91063
 9824 nomad           /opt/puppet-archive/nomad-1 70840 1122656  91063
 9834 nomad           /opt/puppet-archive/nomad-1 70836 1253728  91063
 9862 nomad           /opt/puppet-archive/nomad-1 73676 1261668  91063
 9872 nomad           /opt/puppet-archive/nomad-1 474900 2146356 91063
 9882 nomad           /opt/puppet-archive/nomad-1 62584 1130596  91063
 9888 nomad           /opt/puppet-archive/nomad-1 70648 1188192  91063
 9889 nomad           /opt/puppet-archive/nomad-1 78212 1188192  91063
 9900 nomad           /opt/puppet-archive/nomad-1 65168 1188192  91063
 9917 nomad           /opt/puppet-archive/nomad-1 68840 1261924  91063
 9957 nomad           /opt/puppet-archive/nomad-1 58880 1327204  91063
10122 nomad           /opt/puppet-archive/nomad-1 478596 2212148 91062
10263 kworker/0:0     [kworker/0:0]                   0      0     111
10266 kworker/2:1     [kworker/2:1]                   0      0     111
10690 nomad           /opt/puppet-archive/nomad-1 17548 1186076     74
10700 php             /usr/bin/php /data/ecogate/ 24188 439100      74
10701 nomad           /opt/puppet-archive/nomad-1 16596 1119708     74
10709 php             /usr/bin/php /data/ecogate/ 24192 439100      74
10712 nomad           /opt/puppet-archive/nomad-1 17204 1054748     73
10721 php             /usr/bin/php /data/ecogate/ 24176 439100      73
10730 nomad           /opt/puppet-archive/nomad-1 17036 1120220     69
10740 php             /usr/bin/php /data/ecogate/ 23924 439100      69
10741 nomad           /opt/puppet-archive/nomad-1 16904 1259232     69
10754 php             /usr/bin/php /data/ecogate/ 23924 439100      69
10778 nomad           /opt/puppet-archive/nomad-1 16592 1120220     69
10786 nomad           /opt/puppet-archive/nomad-1 16296 1112024     69
10788 php             /usr/bin/php /data/ecogate/ 23924 439100      69
10798 php             /usr/bin/php /data/ecogate/ 23928 439100      69
10811 nomad           /opt/puppet-archive/nomad-1 16800 1120220     68
10820 php             /usr/bin/php /data/ecogate/ 23924 439100      68
10822 nomad           /opt/puppet-archive/nomad-1 16952 1120220     68
10832 php             /usr/bin/php /data/ecogate/ 23924 439100      68
10833 nomad           /opt/puppet-archive/nomad-1 16412 1120220     68
10839 nomad           /opt/puppet-archive/nomad-1 16352 1054684     68
10863 php             /usr/bin/php /data/ecogate/ 23924 439100      68
10864 php             /usr/bin/php /data/ecogate/ 23924 439100      68
10895 nomad           /opt/puppet-archive/nomad-1 16328 1119964     68
10933 php             /usr/bin/php /data/ecogate/ 23916 439100      68
10949 nomad           /opt/puppet-archive/nomad-1 16760 1119964     68
10959 php             /usr/bin/php /data/ecogate/ 23928 439100      68
10960 nomad           /opt/puppet-archive/nomad-1 16976 1119964     68
10970 php             /usr/bin/php /data/ecogate/ 23928 439100      68
10973 nomad           /opt/puppet-archive/nomad-1 16700 1054684     66
10983 php             /usr/bin/php /data/ecogate/ 23924 439100      65
10984 nomad           /opt/puppet-archive/nomad-1 16972 1054684     65
10994 php             /usr/bin/php /data/ecogate/ 23924 439100      65
10995 nomad           /opt/puppet-archive/nomad-1 16256 1119964     65
11004 php             /usr/bin/php /data/ecogate/ 23912 439100      65
11005 nomad           /opt/puppet-archive/nomad-1 16872 1193696     65
11016 php             /usr/bin/php /data/ecogate/ 23912 439100      65
11017 nomad           /opt/puppet-archive/nomad-1 16584 1185500     64
11027 php             /usr/bin/php /data/ecogate/ 23924 439100      64
11028 nomad           /opt/puppet-archive/nomad-1 16436 1120220     64
11037 php             /usr/bin/php /data/ecogate/ 23924 439100      64
11038 nomad           /opt/puppet-archive/nomad-1 16956 1120540     64
11048 php             /usr/bin/php /data/ecogate/ 23916 439100      64
11049 nomad           /opt/puppet-archive/nomad-1 16348 1046232     64
11058 php             /usr/bin/php /data/ecogate/ 23928 439100      64
11060 nomad           /opt/puppet-archive/nomad-1 16824 1119964     64
11070 php             /usr/bin/php /data/ecogate/ 23924 439100      64
11071 nomad           /opt/puppet-archive/nomad-1 17032 1251292     63
11077 nomad           /opt/puppet-archive/nomad-1 16668 1111768     63
11083 nomad           /opt/puppet-archive/nomad-1 15804 1185500     63
11093 php             /usr/bin/php /data/ecogate/ 23912 439100      63
11097 php             /usr/bin/php /data/ecogate/ 23924 439100      63
11100 php             /usr/bin/php /data/ecogate/ 23928 439100      63
11101 nomad           /opt/puppet-archive/nomad-1 16628 1054428     63
11111 php             /usr/bin/php /data/ecogate/ 23928 439100      63
11112 nomad           /opt/puppet-archive/nomad-1 16436 1054684     62
11121 php             /usr/bin/php /data/ecogate/ 23928 439100      62
11125 nomad           /opt/puppet-archive/nomad-1 16360 1128416     62
11135 php             /usr/bin/php /data/ecogate/ 23924 439100      62
11136 nomad           /opt/puppet-archive/nomad-1 15800 1185756     62
11146 php             /usr/bin/php /data/ecogate/ 23928 439100      62
11147 nomad           /opt/puppet-archive/nomad-1 16356 1193440     62
11156 php             /usr/bin/php /data/ecogate/ 23928 439100      62
11157 nomad           /opt/puppet-archive/nomad-1 16696 1119964     62
11165 nomad           /opt/puppet-archive/nomad-1 16596 1120220     62
11167 php             /usr/bin/php /data/ecogate/ 23916 439100      62
11175 php             /usr/bin/php /data/ecogate/ 23928 439100      62
11176 nomad           /opt/puppet-archive/nomad-1 16128 1054684     61
11186 php             /usr/bin/php /data/ecogate/ 23924 439100      61
11187 nomad           /opt/puppet-archive/nomad-1 16200 1119964     61
11197 php             /usr/bin/php /data/ecogate/ 23924 439100      61
11204 nomad           /opt/puppet-archive/nomad-1 16432 1185500     60
11214 php             /usr/bin/php /data/ecogate/ 23924 439100      60
11215 nomad           /opt/puppet-archive/nomad-1 16160 1185500     60
11224 php             /usr/bin/php /data/ecogate/ 23916 439100      60
11229 nomad           /opt/puppet-archive/nomad-1 16676 1185756     60
11238 php             /usr/bin/php /data/ecogate/ 23924 439100      60
11239 nomad           /opt/puppet-archive/nomad-1 16228 1119964     59
11249 php             /usr/bin/php /data/ecogate/ 23924 439100      59
11255 nomad           /opt/puppet-archive/nomad-1 16412 1185756     59
11264 nomad           /opt/puppet-archive/nomad-1 15904 1259488     59
11271 php             /usr/bin/php /data/ecogate/ 23924 439100      59
11273 php             /usr/bin/php /data/ecogate/ 23916 439100      59
11275 nomad           /opt/puppet-archive/nomad-1 15804 1185756     58
11284 php             /usr/bin/php /data/ecogate/ 23924 439100      58
11285 nomad           /opt/puppet-archive/nomad-1 16156 1119964     58
11295 php             /usr/bin/php /data/ecogate/ 23924 439100      58
11300 nomad           /opt/puppet-archive/nomad-1 15808 1185500     58
11309 php             /usr/bin/php /data/ecogate/ 23924 439100      58
11316 nomad           /opt/puppet-archive/nomad-1 16052 1120220     57
11325 php             /usr/bin/php /data/ecogate/ 23928 439100      57
11327 nomad           /opt/puppet-archive/nomad-1 16276 1120220     57
11335 php             /usr/bin/php /data/ecogate/ 23912 439100      57
11336 nomad           /opt/puppet-archive/nomad-1 16296 1185500     57
11345 php             /usr/bin/php /data/ecogate/ 23924 439100      57
11370 nomad           /opt/puppet-archive/nomad-1 16304 1185244     56
11380 php             /usr/bin/php /data/ecogate/ 23928 439100      56
11403 kworker/u32:1   [kworker/u32:1]                 0      0    1787
11479 nomad           /opt/puppet-archive/nomad-1 15916 1120220     55
11486 php             /usr/bin/php /data/ecogate/ 23924 439100      55
11489 nomad           /opt/puppet-archive/nomad-1 16108 1119964     55
11499 php             /usr/bin/php /data/ecogate/ 23924 439100      55
11780 kworker/1:2     [kworker/1:2]                   0      0    3472
12053 nomad           /opt/puppet-archive/nomad-1 14452 1102996      9
12061 php             /usr/bin/php /data/ecogate/ 24184 439100       9
12063 nomad           /opt/puppet-archive/nomad-1 14556 1036052      9
12075 php             /usr/bin/php /data/ecogate/ 23376 439100       9
12092 nomad           /opt/puppet-archive/nomad-1 14464 1045912      9
12102 php             /usr/bin/php /data/ecogate/ 24188 439100       9
12106 nomad           /opt/puppet-archive/nomad-1 14008 1111192      8
12116 php             /usr/bin/php /data/ecogate/ 24188 439100       8
12117 nomad           /opt/puppet-archive/nomad-1 14092 1111192      8
12125 nomad           /opt/puppet-archive/nomad-1 13784 1036308      8
12131 php             /usr/bin/php /data/ecogate/ 24192 439100       8
12135 php             /usr/bin/php /data/ecogate/ 24192 439100       8
12138 nomad           /opt/puppet-archive/nomad-1 14012 1175320      8
12146 php             /usr/bin/php /data/ecogate/ 24188 439100       8
12148 nomad           /opt/puppet-archive/nomad-1 14304 1111192      8
12154 nomad           /opt/puppet-archive/nomad-1 14156 1102996      8
12158 php             /usr/bin/php /data/ecogate/ 24188 439100       8
12164 nomad           /opt/puppet-archive/nomad-1 14384 1176472      8
12168 php             /usr/bin/php /data/ecogate/ 24192 439100       8
12178 nomad           /opt/puppet-archive/nomad-1 14052 1037204      8
12179 php             /usr/bin/php /data/ecogate/ 24188 439100       8
12188 php             /usr/bin/php /data/ecogate/ 24188 439100       8
12195 bash            -bash                        1860 116620       6
12196 tee             tee -ai /root/.bash_history   668 108056       6
12197 bash            -bash                        2144 116620       6
12210 ps              ps -eo pid,comm,args,rss,vs  1488 153328       0
14254 cleanup         cleanup -z -t unix -u        4440 108004    1612
14256 local           local -t unix                4936  95100    1612
15222 chronyd         /usr/sbin/chronyd             944 120588 17360124
15733 zabbix_agentd   /usr/sbin/zabbix_agentd -c    808  79076 17360108
15734 zabbix_agentd   /usr/sbin/zabbix_agentd: co  1416  79076 17360108
15735 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2028 100632 17360108
15736 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2032 100632 17360108
15737 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2060 100632 17360108
15738 zabbix_agentd   /usr/sbin/zabbix_agentd: ac  1708 100640 17360108
17482 kworker/u32:0   [kworker/u32:0]                 0      0    1447
17662 trivial-rewrite trivial-rewrite -n rewrite   4388 107864    1431
17726 kworker/3:0     [kworker/3:0]                   0      0    1425
18975 kworker/2:0     [kworker/2:0]                   0      0    1372
21142 agent-linux     /data/agent-linux --collect  3888 122624   40431
21482 pickup          pickup -l -t unix -u         4380 107856    2932
21645 nomad           /opt/puppet-archive/nomad-1 18136 1259808  40429
21772 php             /usr/bin/php /data/ecogate/ 68732 716064   40429
22328 rsyslogd        /usr/sbin/rsyslogd -n       59276 621020  126112
23033 td-agent-bit    /opt/td-agent-bit/bin/td-ag  9076 145592   40334
28685 kworker/1:1     [kworker/1:1]                   0      0     832
31261 consul          /usr/local/bin/consul agent 26740 785124  435647
32156 crond           /usr/sbin/CROND -n           2864 219584     652
32157 crond           /usr/sbin/CROND -n           2864 219584     652
32158 crond           /usr/sbin/CROND -n           2864 219584     652
32159 crond           /usr/sbin/CROND -n           2864 219584     652
32167 php             /bin/php /data/ecogate/cli/ 23524 439100     652
32170 php             /bin/php /data/ecogate/cli/ 23524 439100     652
32171 php             /bin/php /data/ecogate/cli/ 23520 439100     652
32172 php             /bin/php /data/ecogate/cli/ 24724 439100     652

@bubejur
Copy link

bubejur commented Jul 7, 2021

pmap -xp nomad agent process 2h later

pmap -xp 9218
9218:   /usr/local/bin/nomad agent -config=/etc/nomad
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000   61404   10388       0 r-x-- /opt/puppet-archive/nomad-1.1.2/nomad
00000000041f6000       4       4       4 r---- /opt/puppet-archive/nomad-1.1.2/nomad
00000000041f7000    1596     500     296 rw--- /opt/puppet-archive/nomad-1.1.2/nomad
0000000004386000     292     148     148 rw---   [ anon ]
0000000004fd7000     132       4       4 rw---   [ anon ]
000000c000000000 2162688 2121640 2121640 rw---   [ anon ]
00007f3f2c157000   39264   37120   37120 rw---   [ anon ]
00007f3f2e7b1000   19416   19200   19200 rw---   [ anon ]
00007f3f2faae000    5448    5432    5432 rw---   [ anon ]
00007f3f30000000     132       8       8 rw---   [ anon ]
00007f3f30021000   65404       0       0 -----   [ anon ]
00007f3f34000000     132       4       4 rw---   [ anon ]
00007f3f34021000   65404       0       0 -----   [ anon ]
00007f3f38000000     132       4       4 rw---   [ anon ]
00007f3f38021000   65404       0       0 -----   [ anon ]
00007f3f3c000000     132       4       4 rw---   [ anon ]
00007f3f3c021000   65404       0       0 -----   [ anon ]
00007f3f40000000     132       4       4 rw---   [ anon ]
00007f3f40021000   65404       0       0 -----   [ anon ]
00007f3f44000000     132       4       4 rw---   [ anon ]
00007f3f44021000   65404       0       0 -----   [ anon ]
00007f3f48000000     132       8       8 rw---   [ anon ]
00007f3f48021000   65404       0       0 -----   [ anon ]
00007f3f4c000000     132       8       8 rw---   [ anon ]
00007f3f4c021000   65404       0       0 -----   [ anon ]
00007f3f5000b000    9868    9812    9812 rw---   [ anon ]
00007f3f509b8000    6408    6384    6384 rw---   [ anon ]
00007f3f50ffa000       4       0       0 -----   [ anon ]
00007f3f50ffb000    8192       8       8 rw---   [ anon ]
00007f3f517fb000       4       0       0 -----   [ anon ]
00007f3f517fc000    8192       8       8 rw---   [ anon ]
00007f3f51ffc000       4       0       0 -----   [ anon ]
00007f3f51ffd000    8192       8       8 rw---   [ anon ]
00007f3f527fd000       4       0       0 -----   [ anon ]
00007f3f527fe000    8192       8       8 rw---   [ anon ]
00007f3f52ffe000       4       0       0 -----   [ anon ]
00007f3f52fff000    8192      12      12 rw---   [ anon ]
00007f3f537ff000       4       0       0 -----   [ anon ]
00007f3f53800000    8192       8       8 rw---   [ anon ]
00007f3f54000000     132       4       4 rw---   [ anon ]
00007f3f54021000   65404       0       0 -----   [ anon ]
00007f3f58000000     132       8       8 rw---   [ anon ]
00007f3f58021000   65404       0       0 -----   [ anon ]
00007f3f5c000000     132       8       8 rw---   [ anon ]
00007f3f5c021000   65404       0       0 -----   [ anon ]
00007f3f60000000     132       8       8 rw---   [ anon ]
00007f3f60021000   65404       0       0 -----   [ anon ]
00007f3f64000000     132       8       8 rw---   [ anon ]
00007f3f64021000   65404       0       0 -----   [ anon ]
00007f3f68000000     132       8       8 rw---   [ anon ]
00007f3f68021000   65404       0       0 -----   [ anon ]
00007f3f6c005000    3012    2996    2996 rw---   [ anon ]
00007f3f6c2f7000    5128    5120    5120 rw---   [ anon ]
00007f3f6c7f9000       4       0       0 -----   [ anon ]
00007f3f6c7fa000    8192       8       8 rw---   [ anon ]
00007f3f6cffa000       4       0       0 -----   [ anon ]
00007f3f6cffb000    8192       8       8 rw---   [ anon ]
00007f3f6d7fb000       4       0       0 -----   [ anon ]
00007f3f6d7fc000    8192       8       8 rw---   [ anon ]
00007f3f6dffc000       4       0       0 -----   [ anon ]
00007f3f6dffd000    8192       8       8 rw---   [ anon ]
00007f3f6e7fd000       4       0       0 -----   [ anon ]
00007f3f6e7fe000    8192       8       8 rw---   [ anon ]
00007f3f6effe000       4       0       0 -----   [ anon ]
00007f3f6efff000    8192       8       8 rw---   [ anon ]
00007f3f6f7ff000       4       0       0 -----   [ anon ]
00007f3f6f800000    8192       8       8 rw---   [ anon ]
00007f3f70000000     132       4       4 rw---   [ anon ]
00007f3f70021000   65404       0       0 -----   [ anon ]
00007f3f74000000     132       8       8 rw---   [ anon ]
00007f3f74021000   65404       0       0 -----   [ anon ]
00007f3f78000000     132       8       8 rw---   [ anon ]
00007f3f78021000   65404       0       0 -----   [ anon ]
00007f3f7c000000     132       8       8 rw---   [ anon ]
00007f3f7c021000   65404       0       0 -----   [ anon ]
00007f3f80000000     132       8       8 rw---   [ anon ]
00007f3f80021000   65404       0       0 -----   [ anon ]
00007f3f84000000     132       8       8 rw---   [ anon ]
00007f3f84021000   65404       0       0 -----   [ anon ]
00007f3f88000000     132       8       8 rw---   [ anon ]
00007f3f88021000   65404       0       0 -----   [ anon ]
00007f3f8c000000     132      16      16 rw---   [ anon ]
00007f3f8c021000   65404       0       0 -----   [ anon ]
00007f3f9000d000    1280    1260    1260 rw---   [ anon ]
00007f3f90153000    3908    3876    3876 rw---   [ anon ]
00007f3f90528000    2884    2868    2868 rw---   [ anon ]
00007f3f907f9000       4       0       0 -----   [ anon ]
00007f3f907fa000    8192       8       8 rw---   [ anon ]
00007f3f90ffa000       4       0       0 -----   [ anon ]
00007f3f90ffb000    8192       8       8 rw---   [ anon ]
00007f3f917fb000       4       0       0 -----   [ anon ]
00007f3f917fc000    8192       8       8 rw---   [ anon ]
00007f3f91ffc000       4       0       0 -----   [ anon ]
00007f3f91ffd000    8192       8       8 rw---   [ anon ]
00007f3f927fd000       4       0       0 -----   [ anon ]
00007f3f927fe000    8192       8       8 rw---   [ anon ]
00007f3f92ffe000       4       0       0 -----   [ anon ]
00007f3f92fff000    8192       8       8 rw---   [ anon ]
00007f3f937ff000       4       0       0 -----   [ anon ]
00007f3f93800000    8192      16      16 rw---   [ anon ]
00007f3f94000000     132       8       8 rw---   [ anon ]
00007f3f94021000   65404       0       0 -----   [ anon ]
00007f3f98000000     132       8       8 rw---   [ anon ]
00007f3f98021000   65404       0       0 -----   [ anon ]
00007f3f9c000000     132      12      12 rw---   [ anon ]
00007f3f9c021000   65404       0       0 -----   [ anon ]
00007f3fa0000000     132       8       8 rw---   [ anon ]
00007f3fa0021000   65404       0       0 -----   [ anon ]
00007f3fa4000000     132       8       8 rw---   [ anon ]
00007f3fa4021000   65404       0       0 -----   [ anon ]
00007f3fa800e000     256     256     256 rw---   [ anon ]
00007f3fa8056000    4164    4160    4160 rw---   [ anon ]
00007f3fa8467000       4       0       0 -----   [ anon ]
00007f3fa8468000    8192       8       8 rw---   [ anon ]
00007f3fa8c68000      48       0       0 r-x-- /usr/lib64/libnss_files-2.17.so
00007f3fa8c74000    2044       0       0 ----- /usr/lib64/libnss_files-2.17.so
00007f3fa8e73000       4       4       4 r---- /usr/lib64/libnss_files-2.17.so
00007f3fa8e74000       4       4       4 rw--- /usr/lib64/libnss_files-2.17.so
00007f3fa8e75000     280     244     244 rw---   [ anon ]
00007f3fa8ebb000    8192    2568       0 r--s- /data/nomad/client/state.db
00007f3fa96bb000       4       0       0 -----   [ anon ]
00007f3fa96bc000    8448     264     264 rw---   [ anon ]
00007f3fa9efc000       4       0       0 -----   [ anon ]
00007f3fa9efd000    9216    1032    1032 rw---   [ anon ]
00007f3faa7fd000       4       0       0 -----   [ anon ]
00007f3faa7fe000    8192       8       8 rw---   [ anon ]
00007f3faaffe000       4       0       0 -----   [ anon ]
00007f3faafff000    8192       8       8 rw---   [ anon ]
00007f3fab7ff000       4       0       0 -----   [ anon ]
00007f3fab800000    8192       8       8 rw---   [ anon ]
00007f3fac000000     132       4       4 rw---   [ anon ]
00007f3fac021000   65404       0       0 -----   [ anon ]
00007f3fb0002000    2304    2304    2304 rw---   [ anon ]
00007f3fb0242000       4       0       0 -----   [ anon ]
00007f3fb0243000    8192       8       8 rw---   [ anon ]
00007f3fb0a43000       4       0       0 -----   [ anon ]
00007f3fb0a44000    8192       8       8 rw---   [ anon ]
00007f3fb1244000       4       0       0 -----   [ anon ]
00007f3fb1245000   44100    2180    2180 rw---   [ anon ]
00007f3fb3d56000  263680       0       0 -----   [ anon ]
00007f3fc3ed6000       8       8       8 rw---   [ anon ]
00007f3fc3ed8000  293560       0       0 -----   [ anon ]
00007f3fd5d86000       4       4       4 rw---   [ anon ]
00007f3fd5d87000   36692       0       0 -----   [ anon ]
00007f3fd815c000       4       4       4 rw---   [ anon ]
00007f3fd815d000    4068       0       0 -----   [ anon ]
00007f3fd8556000    1808       4       0 r-x-- /usr/lib64/libc-2.17.so
00007f3fd871a000    2044       0       0 ----- /usr/lib64/libc-2.17.so
00007f3fd8919000      16      16      16 r---- /usr/lib64/libc-2.17.so
00007f3fd891d000       8       8       8 rw--- /usr/lib64/libc-2.17.so
00007f3fd891f000      20      20      20 rw---   [ anon ]
00007f3fd8924000       8       0       0 r-x-- /usr/lib64/libdl-2.17.so
00007f3fd8926000    2048       0       0 ----- /usr/lib64/libdl-2.17.so
00007f3fd8b26000       4       4       4 r---- /usr/lib64/libdl-2.17.so
00007f3fd8b27000       4       4       4 rw--- /usr/lib64/libdl-2.17.so
00007f3fd8b28000      92       4       0 r-x-- /usr/lib64/libpthread-2.17.so
00007f3fd8b3f000    2044       0       0 ----- /usr/lib64/libpthread-2.17.so
00007f3fd8d3e000       4       4       4 r---- /usr/lib64/libpthread-2.17.so
00007f3fd8d3f000       4       4       4 rw--- /usr/lib64/libpthread-2.17.so
00007f3fd8d40000      16       4       4 rw---   [ anon ]
00007f3fd8d44000     136       0       0 r-x-- /usr/lib64/ld-2.17.so
00007f3fd8d68000     576     572     572 rw---   [ anon ]
00007f3fd8df8000     512       0       0 -----   [ anon ]
00007f3fd8e78000       4       4       4 rw---   [ anon ]
00007f3fd8e79000     508       0       0 -----   [ anon ]
00007f3fd8ef8000     400      68      68 rw---   [ anon ]
00007f3fd8f64000       4       4       4 rw---   [ anon ]
00007f3fd8f65000       4       4       4 r---- /usr/lib64/ld-2.17.so
00007f3fd8f66000       4       4       4 rw--- /usr/lib64/ld-2.17.so
00007f3fd8f67000       4       4       4 rw---   [ anon ]
00007ffdfa5f3000     132      16      16 rw---   [ stack ]
00007ffdfa789000       8       4       0 r-x--   [ anon ]
ffffffffff600000       4       0       0 r-x--   [ anon ]
---------------- ------- ------- ------- 
total kB         5058340 2240960 2227788

ps output take Wed Jul 7 16:22:56 UTC 2021

ps -eo pid,comm,args,rss,vsz,etimes
  PID COMMAND         COMMAND                       RSS    VSZ ELAPSED
    1 systemd         /usr/lib/systemd/systemd --  5836 129260 17375856
    2 kthreadd        [kthreadd]                      0      0 17375856
    4 kworker/0:0H    [kworker/0:0H]                  0      0 17375856
    6 ksoftirqd/0     [ksoftirqd/0]                   0      0 17375856
    7 migration/0     [migration/0]                   0      0 17375856
    8 rcu_bh          [rcu_bh]                        0      0 17375856
    9 rcu_sched       [rcu_sched]                     0      0 17375856
   10 lru-add-drain   [lru-add-drain]                 0      0 17375856
   11 watchdog/0      [watchdog/0]                    0      0 17375856
   12 watchdog/1      [watchdog/1]                    0      0 17375856
   13 migration/1     [migration/1]                   0      0 17375856
   14 ksoftirqd/1     [ksoftirqd/1]                   0      0 17375856
   16 kworker/1:0H    [kworker/1:0H]                  0      0 17375856
   17 watchdog/2      [watchdog/2]                    0      0 17375856
   18 migration/2     [migration/2]                   0      0 17375856
   19 ksoftirqd/2     [ksoftirqd/2]                   0      0 17375856
   21 kworker/2:0H    [kworker/2:0H]                  0      0 17375856
   22 watchdog/3      [watchdog/3]                    0      0 17375856
   23 migration/3     [migration/3]                   0      0 17375856
   24 ksoftirqd/3     [ksoftirqd/3]                   0      0 17375856
   26 kworker/3:0H    [kworker/3:0H]                  0      0 17375856
   28 kdevtmpfs       [kdevtmpfs]                     0      0 17375856
   29 netns           [netns]                         0      0 17375856
   30 khungtaskd      [khungtaskd]                    0      0 17375856
   31 writeback       [writeback]                     0      0 17375856
   32 kintegrityd     [kintegrityd]                   0      0 17375856
   33 bioset          [bioset]                        0      0 17375856
   34 bioset          [bioset]                        0      0 17375856
   35 bioset          [bioset]                        0      0 17375856
   36 kblockd         [kblockd]                       0      0 17375856
   37 md              [md]                            0      0 17375856
   38 edac-poller     [edac-poller]                   0      0 17375856
   39 watchdogd       [watchdogd]                     0      0 17375856
   45 kswapd0         [kswapd0]                       0      0 17375856
   46 ksmd            [ksmd]                          0      0 17375856
   47 khugepaged      [khugepaged]                    0      0 17375856
   48 crypto          [crypto]                        0      0 17375856
   56 kthrotld        [kthrotld]                      0      0 17375856
   58 kmpath_rdacd    [kmpath_rdacd]                  0      0 17375856
   59 kaluad          [kaluad]                        0      0 17375856
   61 kpsmoused       [kpsmoused]                     0      0 17375856
   62 ipv6_addrconf   [ipv6_addrconf]                 0      0 17375856
   75 deferwq         [deferwq]                       0      0 17375856
  112 kauditd         [kauditd]                       0      0 17375856
  281 ata_sff         [ata_sff]                       0      0 17375856
  292 scsi_eh_0       [scsi_eh_0]                     0      0 17375856
  293 scsi_tmf_0      [scsi_tmf_0]                    0      0 17375856
  294 scsi_eh_1       [scsi_eh_1]                     0      0 17375856
  295 scsi_tmf_1      [scsi_tmf_1]                    0      0 17375856
  296 virtscsi-scan   [virtscsi-scan]                 0      0 17375856
  299 scsi_eh_2       [scsi_eh_2]                     0      0 17375856
  300 scsi_tmf_2      [scsi_tmf_2]                    0      0 17375856
  303 ttm_swap        [ttm_swap]                      0      0 17375856
  367 kworker/2:1H    [kworker/2:1H]                  0      0 17375855
  395 trivial-rewrite trivial-rewrite -n rewrite   1220 107864    2717
  427 kdmflush        [kdmflush]                      0      0 17375855
  428 bioset          [bioset]                        0      0 17375855
  446 jbd2/dm-0-8     [jbd2/dm-0-8]                   0      0 17375855
  447 ext4-rsv-conver [ext4-rsv-conver]               0      0 17375855
  547 systemd-journal /usr/lib/systemd/systemd-jo 31000  80432 17375854
  566 rpciod          [rpciod]                        0      0 17375854
  568 xprtiod         [xprtiod]                       0      0 17375854
  569 lvmetad         /usr/sbin/lvmetad -f         3740 274980 17375854
  582 systemd-udevd   /usr/lib/systemd/systemd-ud  2236  48640 17375854
  606 hwrng           [hwrng]                         0      0 17375854
  659 nfit            [nfit]                          0      0 17375854
  679 jbd2/sda1-8     [jbd2/sda1-8]                   0      0 17375854
  680 ext4-rsv-conver [ext4-rsv-conver]               0      0 17375854
  682 kdmflush        [kdmflush]                      0      0 17375853
  683 bioset          [bioset]                        0      0 17375853
  688 jbd2/dm-1-8     [jbd2/dm-1-8]                   0      0 17375853
  689 ext4-rsv-conver [ext4-rsv-conver]               0      0 17375853
  692 kdmflush        [kdmflush]                      0      0 17375852
  693 bioset          [bioset]                        0      0 17375852
  695 kdmflush        [kdmflush]                      0      0 17375852
  696 bioset          [bioset]                        0      0 17375852
  710 kworker/0:1H    [kworker/0:1H]                  0      0 17375852
  716 jbd2/dm-2-8     [jbd2/dm-2-8]                   0      0 17375852
  717 ext4-rsv-conver [ext4-rsv-conver]               0      0 17375852
  719 jbd2/dm-3-8     [jbd2/dm-3-8]                   0      0 17375852
  720 ext4-rsv-conver [ext4-rsv-conver]               0      0 17375852
  748 auditd          /sbin/auditd                  452  55532 17375852
  770 dbus-daemon     /usr/bin/dbus-daemon --syst  1496  88108 17375852
  772 NetworkManager  /usr/sbin/NetworkManager --  2612 476436 17375852
  773 sssd            /usr/sbin/sssd -i --logger=  1260 268652 17375852
  777 polkitd         /usr/lib/polkit-1/polkitd -  9048 625200 17375852
  778 irqbalance      /usr/sbin/irqbalance --fore   576  21596 17375852
  780 qemu-ga         /usr/bin/qemu-ga --method=v  1400  44220 17375852
  796 gssproxy        /usr/sbin/gssproxy -D         584 269132 17375852
  802 python          /usr/bin/python /usr/share/ 46588 532144 17375852
  817 sssd_be         /usr/libexec/sssd/sssd_be -  7100 419828 17375852
  823 rpc.gssd        /usr/sbin/rpc.gssd            388  40340 17375852
  839 sssd_nss        /usr/libexec/sssd/sssd_nss   2408 277484 17375852
  840 sssd_sudo       /usr/libexec/sssd/sssd_sudo  1216 249444 17375852
  841 sssd_pam        /usr/libexec/sssd/sssd_pam   1308 255632 17375852
  842 sssd_ssh        /usr/libexec/sssd/sssd_ssh   1276 257128 17375852
  843 sssd_pac        /usr/libexec/sssd/sssd_pac   1504 293736 17375852
  873 systemd-logind  /usr/lib/systemd/systemd-lo  1152  37088 17375852
  886 crond           /usr/sbin/crond -n            944 126392 17375852
  889 atd             /usr/sbin/atd -f              208  25908 17375852
  896 agetty          /sbin/agetty --noclear tty1   132 110208 17375852
 1063 sshd            /usr/sbin/sshd -D            1032 112936 17375851
 1064 tuned           /usr/bin/python2 -Es /usr/s 13528 586428 17375851
 1066 node_exporter   /usr/local/bin/node_exporte  2780 121856 17375851
 1067 python          /usr/bin/python /usr/bin/go 136032 1103260 17375851
 1069 oddjobd         /usr/sbin/oddjobd -n -p /va   452  54832 17375851
 1195 ossec-execd     /var/ossec/bin/ossec-execd    952  61332 17375851
 1218 ossec-agentd    /var/ossec/bin/ossec-agentd  1252  48764 17375851
 1228 ossec-agentd    /var/ossec/bin/ossec-agentd   760  48668 17375851
 1251 ossec-logcollec /var/ossec/bin/ossec-logcol   836  44176 17375851
 1285 ossec-syscheckd /var/ossec/bin/ossec-sysche 13860  57368 17375851
 1736 kworker/1:1H    [kworker/1:1H]                  0      0 17375850
 1871 master          /usr/libexec/postfix/master  1296  96968 17375849
 1875 qmgr            qmgr -l -t unix -u           1684 108028 17375849
 2022 splunkd         splunkd -p 8089 start       148160 336140 17375848
 2026 splunkd         [splunkd pid=2022] splunkd  10984  84300 17375848
 2169 kworker/3:1H    [kworker/3:1H]                  0      0 17375845
 2665 kworker/0:2     [kworker/0:2]                   0      0     858
 3853 crond           /usr/sbin/CROND -n           1032 219584     798
 3854 crond           /usr/sbin/CROND -n           1032 219584     798
 3855 crond           /usr/sbin/CROND -n           1032 219584     798
 3856 crond           /usr/sbin/CROND -n           1032 219584     798
 3876 php             /bin/php /data/ecogate/cli/ 12204 439100     798
 3878 php             /bin/php /data/ecogate/cli/ 12204 439100     798
 3879 php             /bin/php /data/ecogate/cli/ 14020 439100     798
 3880 php             /bin/php /data/ecogate/cli/ 12212 439100     798
 4578 nomad           /opt/puppet-archive/nomad-1 20360 1120284   4210
 4694 sshd            sshd: ybubentsov [priv]      1364 189232   12835
 4703 sshd            sshd: ybubentsov@pts/0       1468 189232   12834
 4708 bash            -bash                        1520 127336   12834
 4743 kworker/1:1     [kworker/1:1]                   0      0     737
 4745 bash            -bash                        1448 127336   12834
 4746 tee             tee -ai /home/ybubentsov/.b   104 108056   12834
 4747 bash            -bash                        1448 127336   12834
 4748 sudo            sudo su -                    1224 280180   12834
 4750 su              su -                          716 237644   12834
 4751 bash            -bash                        2240 116620   12834
 5024 kworker/u32:0   [kworker/u32:0]                 0      0     715
 5989 unbound         /usr/sbin/unbound -d        13344 285396 2871490
 6070 kworker/2:1     [kworker/2:1]                   0      0     678
 6115 kworker/3:0     [kworker/3:0]                   0      0     672
 6760 nomad           /opt/puppet-archive/nomad-1 15048 1054492    616
 8302 nomad           /opt/puppet-archive/nomad-1 17872 1186076    536
 8311 php             /usr/bin/php /data/ecogate/ 24504 439100     536
 8321 nomad           /opt/puppet-archive/nomad-1 17768 1120540    534
 8331 php             /usr/bin/php /data/ecogate/ 24336 670496     534
 8799 kworker/0:1     [kworker/0:1]                   0      0     498
 9218 nomad           /usr/local/bin/nomad agent  2239344 5058336 98358
 9277 nomad           /opt/puppet-archive/nomad-1 522560 2412988 98349
 9281 nomad           /opt/puppet-archive/nomad-1 72824 1263784  98349
 9291 nomad           /opt/puppet-archive/nomad-1 513500 2281916 98349
 9310 nomad           /opt/puppet-archive/nomad-1 68960 1264040  98349
 9316 nomad           /opt/puppet-archive/nomad-1 73576 1386660  98349
 9325 nomad           /opt/puppet-archive/nomad-1 71972 1255588  98349
 9336 nomad           /opt/puppet-archive/nomad-1 524796 2282172 98349
 9337 nomad           /opt/puppet-archive/nomad-1 65880 1255588  98349
 9341 nomad           /opt/puppet-archive/nomad-1 76780 1255588  98349
 9342 nomad           /opt/puppet-archive/nomad-1 74592 1321380  98349
 9356 nomad           /opt/puppet-archive/nomad-1 72608 1395112  98349
 9367 nomad           /opt/puppet-archive/nomad-1 74276 1329320  98349
 9376 nomad           /opt/puppet-archive/nomad-1 69140 1329576  98349
 9377 nomad           /opt/puppet-archive/nomad-1 78028 1386916  98349
 9387 nomad           /opt/puppet-archive/nomad-1 516764 2347452 98349
 9392 nomad           /opt/puppet-archive/nomad-1 67076 1395112  98349
 9438 nomad           /opt/puppet-archive/nomad-1 71848 1329320  98349
 9448 nomad           /opt/puppet-archive/nomad-1 70148 1190308  98349
 9449 nomad           /opt/puppet-archive/nomad-1 75044 1190308  98349
 9451 nomad           /opt/puppet-archive/nomad-1 76100 1321380  98349
 9452 nomad           /opt/puppet-archive/nomad-1 70888 1255332  98349
 9453 nomad           /opt/puppet-archive/nomad-1 64392 1255588  98349
 9498 nomad           /opt/puppet-archive/nomad-1 525336 2347452 98349
 9500 nomad           /opt/puppet-archive/nomad-1  2084 1111192  98349
 9506 nomad           /opt/puppet-archive/nomad-1 71268 1255588  98349
 9518 nomad           /opt/puppet-archive/nomad-1 523784 2347452 98349
 9531 nomad           /opt/puppet-archive/nomad-1 65252 1198248  98349
 9540 nomad           /opt/puppet-archive/nomad-1 67344 1321380  98349
 9562 nomad           /opt/puppet-archive/nomad-1 72252 1263784  98349
 9577 nomad           /opt/puppet-archive/nomad-1 75116 1190052  98349
 9578 nomad           /opt/puppet-archive/nomad-1 69916 1263784  98349
 9579 nomad           /opt/puppet-archive/nomad-1  9084 1194016  98349
 9580 nomad           /opt/puppet-archive/nomad-1 69476 1329576  98349
 9581 nomad           /opt/puppet-archive/nomad-1 68520 1264040  98349
 9594 nomad           /opt/puppet-archive/nomad-1 65392 1329576  98349
 9604 nomad           /opt/puppet-archive/nomad-1 74120 1329320  98349
 9607 nomad           /opt/puppet-archive/nomad-1 66088 1190308  98349
 9633 nomad           /opt/puppet-archive/nomad-1 70652 1395112  98349
 9642 nomad           /opt/puppet-archive/nomad-1 75284 1321124  98349
 9660 nomad           /opt/puppet-archive/nomad-1 72360 1394856  98349
 9712 nomad           /opt/puppet-archive/nomad-1 73880 1321636  98349
 9725 nomad           /opt/puppet-archive/nomad-1 69520 1255844  98349
 9737 nomad           /opt/puppet-archive/nomad-1 70408 1255588  98349
 9746 nomad           /opt/puppet-archive/nomad-1 68920 1329576  98349
 9755 nomad           /opt/puppet-archive/nomad-1 66380 1329576  98349
 9770 nomad           /opt/puppet-archive/nomad-1 79428 1321188  98349
 9778 nomad           /opt/puppet-archive/nomad-1 66824 1190308  98349
 9789 nomad           /opt/puppet-archive/nomad-1 64724 1321124  98349
 9794 nomad           /opt/puppet-archive/nomad-1 71328 1321124  98349
 9802 nomad           /opt/puppet-archive/nomad-1 522624 2412988 98349
 9808 nomad           /opt/puppet-archive/nomad-1 14660 1120028  98349
 9824 nomad           /opt/puppet-archive/nomad-1 72420 1190308  98349
 9834 nomad           /opt/puppet-archive/nomad-1 70424 1321380  98349
 9862 nomad           /opt/puppet-archive/nomad-1 73004 1329320  98349
 9872 nomad           /opt/puppet-archive/nomad-1 522116 2281916 98349
 9882 nomad           /opt/puppet-archive/nomad-1 62636 1198248  98349
 9888 nomad           /opt/puppet-archive/nomad-1 71676 1255844  98349
 9889 nomad           /opt/puppet-archive/nomad-1 77836 1255844  98349
 9900 nomad           /opt/puppet-archive/nomad-1 65888 1255844  98349
 9917 nomad           /opt/puppet-archive/nomad-1 69736 1329576  98349
 9957 nomad           /opt/puppet-archive/nomad-1 59940 1394856  98349
10122 nomad           /opt/puppet-archive/nomad-1 523832 2347452 98348
10905 kworker/3:3     [kworker/3:3]                   0      0    2177
12080 kworker/1:2     [kworker/1:2]                   0      0     377
15045 kworker/2:0     [kworker/2:0]                   0      0     197
15047 kworker/3:1     [kworker/3:1]                   0      0     197
15222 chronyd         /usr/sbin/chronyd             472 120588 17367410
15733 zabbix_agentd   /usr/sbin/zabbix_agentd -c    808  79076 17367394
15734 zabbix_agentd   /usr/sbin/zabbix_agentd: co  1416  79076 17367394
15735 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2012 100632 17367394
15736 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2016 100632 17367394
15737 zabbix_agentd   /usr/sbin/zabbix_agentd: li  2012 100632 17367394
15738 zabbix_agentd   /usr/sbin/zabbix_agentd: ac  1696 100640 17367394
16379 kworker/0:0     [kworker/0:0]                   0      0     137
16950 nomad           /opt/puppet-archive/nomad-1 17140 1120540     88
16960 php             /usr/bin/php /data/ecogate/ 24188 439100      88
16961 nomad           /opt/puppet-archive/nomad-1 16916 1120540     88
16969 php             /usr/bin/php /data/ecogate/ 24188 439100      88
16970 nomad           /opt/puppet-archive/nomad-1 16960 1054748     88
16979 php             /usr/bin/php /data/ecogate/ 24192 439100      88
16980 nomad           /opt/puppet-archive/nomad-1 17332 989468      88
16988 php             /usr/bin/php /data/ecogate/ 24176 439100      88
16990 nomad           /opt/puppet-archive/nomad-1 17032 1054748     88
16999 php             /usr/bin/php /data/ecogate/ 24188 439100      88
17000 nomad           /opt/puppet-archive/nomad-1 16896 1251356     88
17006 nomad           /opt/puppet-archive/nomad-1 17008 1120540     88
17016 php             /usr/bin/php /data/ecogate/ 24188 439100      88
17019 php             /usr/bin/php /data/ecogate/ 24192 439100      88
17021 nomad           /opt/puppet-archive/nomad-1 16796 1120284     88
17029 php             /usr/bin/php /data/ecogate/ 24192 439100      88
17030 nomad           /opt/puppet-archive/nomad-1 16776 1055004     88
17035 nomad           /opt/puppet-archive/nomad-1 17096 1186332     88
17042 nomad           /opt/puppet-archive/nomad-1 17000 1055004     88
17053 php             /usr/bin/php /data/ecogate/ 24176 439100      88
17056 php             /usr/bin/php /data/ecogate/ 24188 439100      88
17061 php             /usr/bin/php /data/ecogate/ 24192 439100      88
17062 nomad           /opt/puppet-archive/nomad-1 17704 1054748     88
17072 php             /usr/bin/php /data/ecogate/ 24188 439100      88
17073 nomad           /opt/puppet-archive/nomad-1 16640 1120540     88
17078 nomad           /opt/puppet-archive/nomad-1 17404 1055004     88
17086 php             /usr/bin/php /data/ecogate/ 24188 439100      88
17093 php             /usr/bin/php /data/ecogate/ 24180 439100      88
17094 nomad           /opt/puppet-archive/nomad-1 17620 1054748     87
17098 nomad           /opt/puppet-archive/nomad-1 16816 1054748     87
17111 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17114 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17115 nomad           /opt/puppet-archive/nomad-1 17628 989468      87
17125 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17126 nomad           /opt/puppet-archive/nomad-1 16872 1120540     87
17127 nomad           /opt/puppet-archive/nomad-1 17324 1055004     87
17142 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17147 php             /usr/bin/php /data/ecogate/ 24192 439100      87
17148 nomad           /opt/puppet-archive/nomad-1 17324 989468      87
17153 nomad           /opt/puppet-archive/nomad-1 17168 1186076     87
17165 php             /usr/bin/php /data/ecogate/ 24192 439100      87
17167 php             /usr/bin/php /data/ecogate/ 24176 439100      87
17168 nomad           /opt/puppet-archive/nomad-1 16864 1120540     87
17172 nomad           /opt/puppet-archive/nomad-1 16960 1186076     87
17183 nomad           /opt/puppet-archive/nomad-1 16948 1062944     87
17188 php             /usr/bin/php /data/ecogate/ 24180 439100      87
17199 php             /usr/bin/php /data/ecogate/ 24180 439100      87
17200 nomad           /opt/puppet-archive/nomad-1 17188 1120284     87
17203 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17212 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17213 nomad           /opt/puppet-archive/nomad-1 17060 1120284     87
17222 php             /usr/bin/php /data/ecogate/ 24192 439100      87
17223 nomad           /opt/puppet-archive/nomad-1 17240 1120284     87
17232 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17233 nomad           /opt/puppet-archive/nomad-1 17148 1186076     87
17242 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17243 nomad           /opt/puppet-archive/nomad-1 16976 1185820     87
17251 php             /usr/bin/php /data/ecogate/ 24192 439100      87
17252 nomad           /opt/puppet-archive/nomad-1 16932 1185820     87
17253 nomad           /opt/puppet-archive/nomad-1 17184 1054748     87
17266 nomad           /opt/puppet-archive/nomad-1 17236 1120540     87
17269 php             /usr/bin/php /data/ecogate/ 24188 439100      87
17278 php             /usr/bin/php /data/ecogate/ 24180 439100      87
17281 php             /usr/bin/php /data/ecogate/ 24184 439100      87
17282 nomad           /opt/puppet-archive/nomad-1 17380 1186332     87
17291 php             /usr/bin/php /data/ecogate/ 24192 439100      87
17292 nomad           /opt/puppet-archive/nomad-1 16976 1120540     86
17301 php             /usr/bin/php /data/ecogate/ 24188 439100      86
17308 nomad           /opt/puppet-archive/nomad-1 17448 1055004     86
17318 php             /usr/bin/php /data/ecogate/ 24188 439100      86
17319 nomad           /opt/puppet-archive/nomad-1 17252 1055004     86
17328 php             /usr/bin/php /data/ecogate/ 24188 439100      86
17330 nomad           /opt/puppet-archive/nomad-1 17120 1055004     86
17340 php             /usr/bin/php /data/ecogate/ 24180 439100      86
17349 nomad           /opt/puppet-archive/nomad-1 17076 1186076     84
17350 nomad           /opt/puppet-archive/nomad-1 17348 1112344     84
17366 php             /usr/bin/php /data/ecogate/ 24192 439100      84
17367 php             /usr/bin/php /data/ecogate/ 24188 439100      84
17369 nomad           /opt/puppet-archive/nomad-1 17408 1186076     84
17379 php             /usr/bin/php /data/ecogate/ 24188 439100      84
17380 nomad           /opt/puppet-archive/nomad-1 17436 1120540     84
17389 php             /usr/bin/php /data/ecogate/ 24176 439100      84
17402 nomad           /opt/puppet-archive/nomad-1 17180 1120284     83
17412 php             /usr/bin/php /data/ecogate/ 24188 439100      83
17413 nomad           /opt/puppet-archive/nomad-1 17168 1186076     82
17423 php             /usr/bin/php /data/ecogate/ 24188 439100      82
17424 nomad           /opt/puppet-archive/nomad-1 16840 1128992     82
17430 nomad           /opt/puppet-archive/nomad-1 17164 1194272     82
17439 php             /usr/bin/php /data/ecogate/ 24188 439100      82
17445 php             /usr/bin/php /data/ecogate/ 24176 439100      82
17447 nomad           /opt/puppet-archive/nomad-1 16956 1186076     81
17456 php             /usr/bin/php /data/ecogate/ 24188 439100      81
17460 nomad           /opt/puppet-archive/nomad-1 17240 1120540     80
17469 php             /usr/bin/php /data/ecogate/ 24192 439100      80
17470 nomad           /opt/puppet-archive/nomad-1 16940 1186076     80
17480 php             /usr/bin/php /data/ecogate/ 24192 439100      80
17484 nomad           /opt/puppet-archive/nomad-1 16644 1194272     79
17490 nomad           /opt/puppet-archive/nomad-1 17312 1055004     79
17499 php             /usr/bin/php /data/ecogate/ 24188 439100      79
17503 php             /usr/bin/php /data/ecogate/ 24192 439100      79
17639 kworker/3:2     [kworker/3:2]                   0      0      78
17643 bounce          bounce -z -t unix -u         4388 107892      78
18509 nomad           /opt/puppet-archive/nomad-1 14636 980120       7
18519 php             /usr/bin/php /data/ecogate/ 23376 439100       7
18520 nomad           /opt/puppet-archive/nomad-1 14400 1111448      7
18528 php             /usr/bin/php /data/ecogate/ 23924 439100       7
18531 nomad           /opt/puppet-archive/nomad-1 14436 971924       7
18540 php             /usr/bin/php /data/ecogate/ 23928 439100       7
18541 nomad           /opt/puppet-archive/nomad-1 14640 1045656      7
18548 nomad           /opt/puppet-archive/nomad-1 14216 1045912      7
18558 php             /usr/bin/php /data/ecogate/ 23928 439100       7
18562 php             /usr/bin/php /data/ecogate/ 23928 439100       7
18563 nomad           /opt/puppet-archive/nomad-1 14056 1111192      7
18573 php             /usr/bin/php /data/ecogate/ 23924 439100       7
18574 nomad           /opt/puppet-archive/nomad-1 14024 1111192      7
18584 php             /usr/bin/php /data/ecogate/ 23924 439100       7
18585 nomad           /opt/puppet-archive/nomad-1 14312 1037204      7
18591 nomad           /opt/puppet-archive/nomad-1 14488 1037716      7
18592 nomad           /opt/puppet-archive/nomad-1 14320 971924       7
18607 php             /usr/bin/php /data/ecogate/ 23924 439100       6
18610 php             /usr/bin/php /data/ecogate/ 23924 439100       6
18611 nomad           /opt/puppet-archive/nomad-1 14464 1037460      6
18613 php             /usr/bin/php /data/ecogate/ 23924 439100       6
18622 php             /usr/bin/php /data/ecogate/ 23928 439100       6
18636 bash            -bash                        1860 116620       3
18637 tee             tee -ai /root/.bash_history   668 108056       3
18638 bash            -bash                        1668 116620       3
18639 ps              ps -eo pid,comm,args,rss,vs  1488 153328       0
18770 kworker/u32:3   [kworker/u32:3]                 0      0    1733
21142 agent-linux     /data/agent-linux --collect  3620 122624   47717
21645 nomad           /opt/puppet-archive/nomad-1  9592 1259808  47715
21772 php             /usr/bin/php /data/ecogate/ 64292 724256   47715
22328 rsyslogd        /usr/sbin/rsyslogd -n       20092 527216  133398
23033 td-agent-bit    /opt/td-agent-bit/bin/td-ag  6264 145592   47620
25318 kworker/2:2     [kworker/2:2]                   0      0    1398
28180 pickup          pickup -l -t unix -u         1412 107856    1217
29354 cleanup         cleanup -z -t unix -u        1644 108004    2898
29365 local           local -t unix                1788  95100    2898
31261 consul          /usr/local/bin/consul agent 27072 785124  442933

@bubejur
Copy link

bubejur commented Jul 12, 2021

@notnoop how you doing?

@bubejur
Copy link

bubejur commented Jul 19, 2021

@notnoop @tgross Hi! Any updates?

@notnoop
Copy link
Contributor

notnoop commented Jul 19, 2021

@bubejur Hey. This ticket is on my list for this week to investigate and follow up. Thank you for your patience.

@bubejur
Copy link

bubejur commented Aug 3, 2021

@notnoop @tgross any updates?

@anastazya
Copy link
Author

Nomad v1.1.2 (60638a0)

After updating to v1.1.2, we still see the same issue with nomad logmon.

Screenshot 2021-08-10 at 10 15 31

The only resolve seems to be restarting nomad ( systemctl restart nomad )

@bubejur
Copy link

bubejur commented Aug 26, 2021

@notnoop @tgross any updates?

@anastazya
Copy link
Author

I implemented a cron job across all our nomad clients that executes "systemctl restart nomad && swapoff -a && swapon -a" every 24 hours, at random minutes ( not all at once ).

So far i managed to keep memory usage under the OOM limit, but this method might have side effects that we are not yet aware of.

@bubejur
Copy link

bubejur commented Aug 26, 2021

@anastazya i guess it might not to be production solution :-)

@anastazya
Copy link
Author

@anastazya i guess it might not to be production solution :-)

Desperate times call for desperate measures. I have far too many nodes and tasks to just wait. I needed a fast solution to this OOM problem, cause tasks that failed when OOM manifests lead to data loss and that's huge issue for my setup.

@bubejur
Copy link

bubejur commented Aug 30, 2021

got same on 1.1.4 :-(

@bubejur
Copy link

bubejur commented Sep 8, 2021

@notnoop @tgross Hi! Any updates?

@rcoder rcoder removed this from Needs Roadmapping in Nomad - Community Issues Triage Sep 20, 2021
@bubejur
Copy link

bubejur commented Sep 24, 2021

@notnoop @tgross same on 1.1.5

@notnoop
Copy link
Contributor

notnoop commented Oct 3, 2021

@bubejur Thanks for your patience. I was OOO for awhile and catching up now. I have built a binary to instrument logmon to emit memory heap/allocs profiles in the b-logmon-inspect branch. If possible, can you try running with the binaries generated in https://app.circleci.com/pipelines/github/hashicorp/nomad/17688/workflows/351ab2f1-12c7-4930-b88b-c3eb4f875a70/jobs/175438/artifacts .

When you notice a high memory usage of a logmon process, you can send it a SIGUSR2 signal (e.g. kill -SIGUSR2 <logmon_pid>), and two memory profiles prefixed with nomad-logmon-pprof- will be generated in the host temporary directory - the exact name should be noted in a log line.

With the pprof files, it'll be easier to identify the cause of high memory usage and test hypothesis for fixing it. Thank you so much again for your patience.

@anastazya
Copy link
Author

@bubejur Thanks for your patience. I was OOO for awhile and catching up now. I have built a binary to instrument logmon to emit memory heap/allocs profiles in the b-logmon-inspect branch. If possible, can you try running with the binaries generated in https://app.circleci.com/pipelines/github/hashicorp/nomad/17688/workflows/351ab2f1-12c7-4930-b88b-c3eb4f875a70/jobs/175438/artifacts .

When you notice a high memory usage of a logmon process, you can send it a SIGUSR2 signal (e.g. kill -SIGUSR2 <logmon_pid>), and two memory profiles prefixed with nomad-logmon-pprof- will be generated in the host temporary directory - the exact name should be noted in a log line.

With the pprof files, it'll be easier to identify the cause of high memory usage and test hypothesis for fixing it. Thank you so much again for your patience.

I will be freeing up some time in the next days and run that binary against some of our servers to get some info.

@notnoop
Copy link
Contributor

notnoop commented Oct 5, 2021

Thanks @bubejur ! The profiles you included highlight a memory leak! There were 7,395 instances of buffered writers, each 64kb (accounting for 473.28MB). I was able to reproduce the memory leak and have the fix in #11261 .

Can you confirm that you have some tasks that are getting restarted or signaled frequently? That may explain it. I'll be curious if there is another cause or contributing factor?

@bubejur
Copy link

bubejur commented Oct 6, 2021

@notnoop yes, it's like cron tasks, but they are working as a "service". Also can you tell me how can i fix error like:
[ERROR] client.driver_mgr.raw_exec: error receiving stream from Stats executor RPC, closing stream: alloc_id=eddb6331-dcfe-92f8-c49e-eba54ffb68f6 driver=raw_exec task_name=worker-5 error="rpc error: code = Unavailable desc = transport is closing"

@notnoop
Copy link
Contributor

notnoop commented Oct 6, 2021

Great. The fix was out in 1.1.6 release of yesterday - give a try and let us know of any questions.

Sadly, the "transport is closing" error messages are a nuance and not actionable - we should gracefully handle the case better. It's tracked in #10814 .

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants