Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt minion task process stuck with eventpoll and in sleeping status #55710

Closed
kk21986 opened this issue Dec 20, 2019 · 7 comments
Closed

salt minion task process stuck with eventpoll and in sleeping status #55710

kk21986 opened this issue Dec 20, 2019 · 7 comments
Labels
Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged stale
Milestone

Comments

@kk21986
Copy link

kk21986 commented Dec 20, 2019

Description of Issue

I have found on few servers where the salt-minion task process(not minion service process) got stuck with eventpoll and while debugging found that the respective process is in sleeping statue. Since this process is running forever restarting salt-minion also not working where stopping the minion service returned TIMEOUT FAIL. Notably, this is issue not appearing all the time and few times only. Below are my findings and hope it will be helpful for troubleshooting.

From ps

root 3346 1 0 Aug20 ? 02:36:10 /usr/local/python371/bin/python3.7 /usr/local/python371/bin/salt-minion -c /etc/salt -d

From strace

[root@minion-server]# strace -p 3346
Process 3346 attached
clock_gettime(CLOCK_MONOTONIC, {10471782, 373569844}) = 0
epoll_wait(11, {}, 1023, 627)           = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 553361334}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 553600012}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 553798273}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 554004846}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 554231923}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 554466693}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 554604829}) = 0
clock_gettime(CLOCK_MONOTONIC, {10471783, 2487359}) = 0
epoll_wait(11, {}, 1023, 122)           = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 677105953}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 677291343}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 677481947}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 677661570}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 677883337}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 678074140}) = 0
clock_gettime(CLOCK_REALTIME, {1576817710, 678209044}) = 0
clock_gettime(CLOCK_MONOTONIC, {10471783, 126139004}) = 0
epoll_wait(11, {}, 1023, 874)           = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 553586257}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 553797426}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 553989700}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 554188287}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 554432918}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 554645227}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 554780886}) = 0
clock_gettime(CLOCK_MONOTONIC, {10471784, 2661905}) = 0
epoll_wait(11, {}, 1023, 122)           = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 677355206}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 677559981}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 677717325}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 677884398}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 678106342}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 678312394}) = 0
clock_gettime(CLOCK_REALTIME, {1576817711, 678535892}) = 0
clock_gettime(CLOCK_MONOTONIC, {10471784, 126423709}) = 0
epoll_wait(11, ^CProcess 3346 detached
 <detached ...>
[root@minion-server]#

From lsof

[root@minion-server]# lsof -p 3346
COMMAND    PID USER   FD   TYPE             DEVICE SIZE/OFF      NODE NAME
salt-mini 3346 root  cwd    DIR                8,3     4096         2 /
salt-mini 3346 root  rtd    DIR                8,3     4096         2 /
salt-mini 3346 root  txt    REG                8,3 10397924   2104375 /usr/local/python371/bin/python3.7
salt-mini 3346 root  mem    REG                8,3   162416   2359310 /lib64/ld-2.12.so
salt-mini 3346 root  mem    REG                8,3    91096   2363476 /lib64/libz.so.1.2.3
salt-mini 3346 root  mem    REG                8,3   191816   2363518 /lib64/libncursesw.so.5.7
salt-mini 3346 root  mem    REG                8,3   134792   2360945 /lib64/libtinfo.so.5.7
salt-mini 3346 root  mem    REG                8,3    69976   2363494 /lib64/libbz2.so.1.0.4
salt-mini 3346 root  mem    REG                8,3   137264   1971049 /usr/lib64/liblzma.so.0.0.0
salt-mini 3346 root  mem    REG                8,3   217016    919580 /var/db/nscd/hosts
salt-mini 3346 root  mem    REG                8,3   217016    919567 /var/db/nscd/group
salt-mini 3346 root  mem    REG                8,3    51076   2104846 /usr/local/python371/lib/python3.7/lib-dynload/_multiprocessing.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3  5703926   2371108 /usr/local/python371/lib/python3.7/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so
salt-mini 3346 root  mem    REG                8,3    21048   2371107 /usr/local/python371/lib/python3.7/site-packages/cryptography/hazmat/bindings/_constant_time.abi3.so
salt-mini 3346 root  mem    REG                8,3   732513    394102 /usr/local/python371/lib/python3.7/site-packages/msgpack/_cmsgpack.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   374408   1968566 /usr/lib64/libgmp.so.3.5.0
salt-mini 3346 root  mem    REG                8,3    63378   2370310 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_des3.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    18917   2370303 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_arc2.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    62533   2370309 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_des.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    41975   2370375 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_SHA512.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    41869   2370374 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_SHA384.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    35830   2370372 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_SHA224.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    52427   2370302 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_aesni.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    54112   2370301 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_aes.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    30281   2370312 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_ocb.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    34203   2370402 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_ghash_clmul.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    18719   2370403 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_ghash_portable.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    10907    393832 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Util/_cpuid_c.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    16518   2370461 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Protocol/_scrypt.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    21734   2370249 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_Salsa20.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    25442   2370369 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_MD5.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    35814   2370373 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_SHA256.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    84281   2370371 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_SHA1.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    20815   2370366 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Hash/_BLAKE2s.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    13213    393837 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Util/_strxor.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    23257   2370308 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_ctr.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    14751   2370313 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_ofb.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    19102   2370307 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_cfb.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    17215   2370306 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_cbc.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    10901   2370311 /usr/local/python371/lib/python3.7/site-packages/Cryptodome/Cipher/_raw_ecb.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   149064   2370221 /usr/local/python371/lib/python3.7/site-packages/.libs_cffi_backend/libffi-d78936b1.so.6.0.4
salt-mini 3346 root  mem    REG                8,3   849744   2370605 /usr/local/python371/lib/python3.7/site-packages/_cffi_backend.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3  1397246   2104836 /usr/local/python371/lib/python3.7/lib-dynload/_decimal.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    38875    394088 /usr/local/python371/lib/python3.7/site-packages/markupsafe/_speedups.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    18936   2363488 /lib64/libuuid.so.1.3.0
salt-mini 3346 root  mem    REG                8,3    17705   2104865 /usr/local/python371/lib/python3.7/lib-dynload/_uuid.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   226917   2106265 /usr/local/python371/lib/python3.7/lib-dynload/_elementtree.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    57712    399685 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/_proxy_steerable.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    61984    399165 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/_device.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    40144    399686 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/_version.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    79120    399684 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/_poll.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    53440    399180 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/utils.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   166488    399178 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/socket.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    79264    399172 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/context.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   105624    399176 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/message.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    40456    399174 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/error.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    90912   2364961 /lib64/libgcc_s-4.4.7-20120601.so.1
salt-mini 3346 root  mem    REG                8,3   991192   1968744 /usr/lib64/libstdc++.so.6.0.13
salt-mini 3346 root  mem    REG                8,3  1162912    399101 /usr/local/python371/lib/python3.7/site-packages/zmq/.libs/libsodium-72341b7d.so.23.2.0
salt-mini 3346 root  mem    REG                8,3   860656    399682 /usr/local/python371/lib/python3.7/site-packages/zmq/.libs/libzmq-39117701.so.5.2.1
salt-mini 3346 root  mem    REG                8,3    84680    399170 /usr/local/python371/lib/python3.7/site-packages/zmq/backend/cython/constants.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    45533   2104884 /usr/local/python371/lib/python3.7/lib-dynload/termios.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    31720   2371558 /usr/local/lib/libffi.so.5
salt-mini 3346 root  mem    REG                8,3   428352   2104830 /usr/local/python371/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    32596   2104880 /usr/local/python371/lib/python3.7/lib-dynload/resource.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   205421   2104816 /usr/local/python371/lib/python3.7/lib-dynload/_asyncio.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    18547   2104827 /usr/local/python371/lib/python3.7/lib-dynload/_contextvars.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   313874   2104858 /usr/local/python371/lib/python3.7/lib-dynload/_ssl.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   229711   2104832 /usr/local/python371/lib/python3.7/lib-dynload/_curses.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    17578   2370753 /usr/local/python371/lib/python3.7/site-packages/tornado/speedups.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   133855   2106268 /usr/local/python371/lib/python3.7/lib-dynload/_json.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    42194   2104871 /usr/local/python371/lib/python3.7/lib-dynload/fcntl.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   176893   2104867 /usr/local/python371/lib/python3.7/lib-dynload/array.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    42742   2106275 /usr/local/python371/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   487923   2106273 /usr/local/python371/lib/python3.7/lib-dynload/_pickle.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3  1172976   2106291 /usr/local/python371/lib/python3.7/lib-dynload/unicodedata.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    19537   2106272 /usr/local/python371/lib/python3.7/lib-dynload/_opcode.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   229595   2104856 /usr/local/python371/lib/python3.7/lib-dynload/_socket.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    51864   2106276 /usr/local/python371/lib/python3.7/lib-dynload/_random.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    28765   2106250 /usr/local/python371/lib/python3.7/lib-dynload/_bisect.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   390080   2104854 /usr/local/python371/lib/python3.7/lib-dynload/_sha3.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   207642   2106252 /usr/local/python371/lib/python3.7/lib-dynload/_blake2.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3  3107657   2371557 /usr/local/lib/libcrypto.so.1.1
salt-mini 3346 root  mem    REG                8,3   653962   2371573 /usr/local/lib/libssl.so.1.1
salt-mini 3346 root  mem    REG                8,3    70829   2104839 /usr/local/python371/lib/python3.7/lib-dynload/_hashlib.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   708448   2104878 /usr/local/python371/lib/python3.7/lib-dynload/pyexpat.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   364825   2104834 /usr/local/python371/lib/python3.7/lib-dynload/_datetime.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   114959   2106286 /usr/local/python371/lib/python3.7/lib-dynload/math.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    82405   2106289 /usr/local/python371/lib/python3.7/lib-dynload/select.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    52678   2106274 /usr/local/python371/lib/python3.7/lib-dynload/_posixsubprocess.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    77189   2104869 /usr/local/python371/lib/python3.7/lib-dynload/binascii.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   133359   2106280 /usr/local/python371/lib/python3.7/lib-dynload/_struct.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    34688   2106285 /usr/local/python371/lib/python3.7/lib-dynload/grp.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   104564   2106270 /usr/local/python371/lib/python3.7/lib-dynload/_lzma.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    56720   2106254 /usr/local/python371/lib/python3.7/lib-dynload/_bz2.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   103375   2104887 /usr/local/python371/lib/python3.7/lib-dynload/zlib.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3    45514   2106267 /usr/local/python371/lib/python3.7/lib-dynload/_heapq.cpython-37m-x86_64-linux-gnu.so
salt-mini 3346 root  mem    REG                8,3   217016    919564 /var/db/nscd/passwd
salt-mini 3346 root  mem    REG                8,3 99174448   1970740 /usr/lib/locale/locale-archive
salt-mini 3346 root  DEL    REG               0,16              24091 /dev/shm/sem.VsLN1m
salt-mini 3346 root  DEL    REG               0,16              24090 /dev/shm/sem.P9Ah37
salt-mini 3346 root  DEL    REG               0,16              24089 /dev/shm/sem.2mMM4S
salt-mini 3346 root  DEL    REG                8,3            1048687 /tmp/ffiR8U9jZ
salt-mini 3346 root  mem    REG                8,3    26060   1970842 /usr/lib64/gconv/gconv-modules.cache
salt-mini 3346 root  mem    REG                8,3  1971528   1969361 /usr/lib64/libcrypto.so.1.0.1e
salt-mini 3346 root  mem    REG                8,3    14184   2364960 /lib64/libksplice_helper.so
salt-mini 3346 root  mem    REG                8,3  1925712   2359309 /lib64/libc-2.12.so
salt-mini 3346 root  mem    REG                8,3   601392   2363454 /lib64/libm-2.12.so
salt-mini 3346 root  mem    REG                8,3    44912   2364974 /lib64/librt-2.12.so
salt-mini 3346 root  mem    REG                8,3    15504   2359341 /lib64/libutil-2.12.so
salt-mini 3346 root  mem    REG                8,3    21016   2363452 /lib64/libdl-2.12.so
salt-mini 3346 root  mem    REG                8,3   143736   2359333 /lib64/libpthread-2.12.so
salt-mini 3346 root    0u   CHR                1,3      0t0      3929 /dev/null
salt-mini 3346 root    1u   CHR                1,3      0t0      3929 /dev/null
salt-mini 3346 root    2u   CHR                1,3      0t0      3929 /dev/null
salt-mini 3346 root    3r   CHR                1,9      0t0      3934 /dev/urandom
salt-mini 3346 root    4u   REG                8,3     4096   1048687 /tmp/ffiR8U9jZ (deleted)
salt-mini 3346 root    5r   CHR                1,9      0t0      3934 /dev/urandom
salt-mini 3346 root    6r   CHR                1,9      0t0      3934 /dev/urandom
salt-mini 3346 root    7u   CHR                1,3      0t0      3929 /dev/null
salt-mini 3346 root    8w  FIFO                0,8      0t0     24076 pipe
salt-mini 3346 root    9r  FIFO                0,8      0t0     24088 pipe
salt-mini 3346 root   10w  FIFO                0,8      0t0     24088 pipe
salt-mini 3346 root   11u   REG                0,9        0      3925 [eventpoll]
salt-mini 3346 root   12w   REG                8,3        0    920266 /var/log/salt/minion-20191213 (deleted)
salt-mini 3346 root   13r  FIFO                0,8      0t0     24093 pipe
salt-mini 3346 root   14w  FIFO                0,8      0t0     24093 pipe
salt-mini 3346 root   15u  unix 0xffff880e22ff74c0      0t0     24094 /var/run/salt/minion/minion_event_762b0ac189_pub.ipc
salt-mini 3346 root   16u  unix 0xffff880e263033c0      0t0     24096 /var/run/salt/minion/minion_event_762b0ac189_pull.ipc
salt-mini 3346 root   17u  unix 0xffff880e26303040      0t0     24098 socket
salt-mini 3346 root   18u  unix 0xffff880e299dc500      0t0     26038 socket
salt-mini 3346 root   19u  unix 0xffff880e299dc180      0t0     26039 socket
salt-mini 3346 root   20u  IPv4          555684674      0t0       TCP 
salt-minion:39038->10.2.4.3:4505 (ESTABLISHED)
salt-mini 3346 root   21u  unix 0xffff880e21514100      0t0     26040 socket
salt-mini 3346 root   22u  unix 0xffff880e2495db40      0t0     26041 socket
salt-mini 3346 root   23u   REG                0,9        0      3925 [eventpoll]
salt-mini 3346 root   24u  unix 0xffff880e2730e040      0t0     26042 socket
salt-mini 3346 root   25u  unix 0xffff880e2730eac0      0t0     26043 socket
salt-mini 3346 root   26u   REG                0,9        0      3925 [eventpoll]
salt-mini 3346 root   27u  unix 0xffff880e25eff180      0t0     26044 socket
salt-mini 3346 root   28u  unix 0xffff880e24658480      0t0     26045 socket
salt-mini 3346 root   29u  unix 0xffff880e22532500      0t0     26054 socket
salt-mini 3346 root   30u  unix 0xffff880e2950db00      0t0     26055 socket
salt-mini 3346 root   31u  unix 0xffff880e25088140      0t0     26056 socket
salt-mini 3346 root   32u  unix 0xffff880e25ed1540      0t0     26057 socket
salt-mini 3346 root   33u   REG                0,9        0      3925 [eventpoll]
salt-mini 3346 root   34u  unix 0xffff880e24658b80      0t0     26058 socket
salt-mini 3346 root   35u  unix 0xffff880e2242a880      0t0     26059 socket
salt-mini 3346 root   36u   REG                0,9        0      3925 [eventpoll]
salt-mini 3346 root   37u  unix 0xffff880e224c6540      0t0     26060 socket
salt-mini 3346 root   38u  unix 0xffff880e25ed4b00      0t0     26061 socket
salt-mini 3346 root   39u  unix 0xffff880e227c7b40      0t0     26062 /var/run/salt/minion/minion_event_762b0ac189_pub.ipc
salt-mini 3346 root   40r  FIFO                0,8      0t0 557169926 pipe
salt-mini 3346 root   41u  FIFO                0,8      0t0 557383644 pipe
salt-mini 3346 root   42r  IPv4          555530208      0t0       TCP salt-minion:58408->10.2.4.2:4505 (ESTABLISHED)
salt-mini 3346 root   43r  FIFO                0,8      0t0 557597871 pipe
salt-mini 3346 root   44r  FIFO                0,8      0t0 557815036 pipe
salt-mini 3346 root   45r  FIFO                0,8      0t0 558029705 pipe
salt-mini 3346 root   46r  FIFO                0,8      0t0 558249938 pipe
salt-mini 3346 root   57r  FIFO                0,8      0t0 505855510 pipe
[root@minion-server]# 

From netstat

[root@minion-server]# netstat -plan | grep 3346
tcp        0      0 10.2.4.5:58408        10.2.4.2:4505         ESTABLISHED 3346/python3.7
tcp        0      0 10.2.4.5:39038        10.2.4.3:4505          ESTABLISHED 3346/python3.7
unix  2      [ ACC ]     STREAM     LISTENING     24094  3346/python3.7      /var/run/salt/minion/minion_event_762b0ac189_pub.ipc
unix  2      [ ACC ]     STREAM     LISTENING     24096  3346/python3.7      /var/run/salt/minion/minion_event_762b0ac189_pull.ipc
unix  3      [ ]         STREAM     CONNECTED     26061  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26060  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26059  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26058  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26057  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26056  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26055  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26054  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26045  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26044  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26043  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26042  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26041  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26040  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26039  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26038  3346/python3.7
unix  3      [ ]         STREAM     CONNECTED     26062  3346/python3.7      /var/run/salt/minion/minion_event_762b0ac189_pub.ipc
unix  3      [ ]         STREAM     CONNECTED     24098  3346/python3.7
[root@minion-server]#

More details from /proc

[root@minion-server]# cat /proc/3346/status
Name:	salt-minion
State:	S (sleeping)
Tgid:	3346
Pid:	3346
PPid:	1
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
Utrace:	0
FDSize:	128
Groups:
VmPeak:	  798944 kB
VmSize:	  783324 kB
VmLck:	       0 kB
VmHWM:	   63008 kB
VmRSS:	   60096 kB
VmData:	  460160 kB
VmStk:	     116 kB
VmExe:	    2140 kB
VmLib:	   18636 kB
VmPTE:	     636 kB
VmSwap:	    5668 kB
Threads:	5
SigQ:	1/225218
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000001001000
SigCgt:	0000000180004203
CapInh:	0000000000000000
CapPrm:	ffffffffffffffff
CapEff:	ffffffffffffffff
CapBnd:	ffffffffffffffff
Cpus_allowed:	ffff
Cpus_allowed_list:	0-15
Mems_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:	0
voluntary_ctxt_switches:	20997583
nonvoluntary_ctxt_switches:	474135
[root@minion-server]# readlink /proc/3346/fd/11
[eventpoll]
[root@minion-server]# cat /proc/3346/stack
[<ffffffff811e480c>] ep_poll+0x2bc/0x350
[<ffffffff811e4965>] sys_epoll_wait+0xc5/0xe0
[<ffffffff815576d6>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@minion-server]# cat /proc/3346/syscall
232 0xb 0x3787b10 0x3ff 0x36b 0x89c038 0xd12 0x7fff328ef4a0 0x7f9b935b5023
[root@minion-server]# cat /proc/3346/sched
salt-minion (3346, #threads: 5)
---------------------------------------------------------
se.exec_start                      :   10471943124.884138
se.vruntime                        :     164196303.312600
se.sum_exec_runtime                :       9363281.844781
se.wait_start                      :             0.000000
se.sleep_start                     :   10471943124.884138
se.block_start                     :             0.000000
se.sleep_max                       :         60060.075826
se.block_max                       :            29.757053
se.exec_max                        :            25.985009
se.slice_max                       :            54.710288
se.wait_max                        :            30.461475
se.wait_sum                        :         19465.521915
se.wait_count                      :             21551630
se.iowait_sum                      :           244.067142
se.iowait_count                    :                  274
sched_info.bkl_count               :                    0
se.nr_migrations                   :               253338
se.nr_migrations_cold              :                    0
se.nr_failed_migrations_affine     :                    0
se.nr_failed_migrations_running    :              1547480
se.nr_failed_migrations_hot        :                10483
se.nr_forced_migrations            :                  581
se.nr_wakeups                      :             20997746
se.nr_wakeups_sync                 :                 5091
se.nr_wakeups_migrate              :               174226
se.nr_wakeups_local                :             20773538
se.nr_wakeups_remote               :               224208
se.nr_wakeups_affine               :                   84
se.nr_wakeups_affine_attempts      :                49057
se.nr_wakeups_passive              :                    0
se.nr_wakeups_idle                 :                    0
avg_atom                           :             0.436072
avg_per_cpu                        :            36.959642
nr_switches                        :             21471835
nr_voluntary_switches              :             20997699
nr_involuntary_switches            :               474136
se.load.weight                     :                 1024
policy                             :                    0
prio                               :                  120
clock-delta                        :                  136
[root@minion-server]# cat /proc/3346/schedstat
9363287610773 19470672806 21471849
[root@minion-server]#

I suspect that this process belongs to state.apply as per the Date in the process. Since I have done only that task on that day.

Versions Report

Salt Minion:

Salt Version:
           Salt: 2019.2.0

Dependency Versions:
           cffi: 1.12.3
       cherrypy: Not Installed
       dateutil: 2.7.5
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.10.1
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.6.1
   mysql-python: Not Installed
      pycparser: 2.19
       pycrypto: Not Installed
   pycryptodome: 3.8.1
         pygit2: Not Installed
         Python: 3.7.1 (default, Nov 23 2018, 02:59:05)
   python-gnupg: 0.4.4
         PyYAML: 5.1
          PyZMQ: 18.0.1
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.3.1

System Versions:
           dist: redhat 6.5 Santiago
         locale: utf-8
        machine: x86_64
        release: 2.6.32-745.15.1.x86_64
         system: Linux
        version: Red Hat Enterprise Linux Server 6.5 Santiago

Let me know if you need any more details.

@kk21986
Copy link
Author

kk21986 commented Dec 20, 2019

Below is our setup,

We have more than 10K servers in the salt infrastructure and geographically spread with syndic masters. Each location we have two syndic masters and respective location minions are connected with both the sydic masters of the same location.

@kk21986
Copy link
Author

kk21986 commented Dec 20, 2019

While am doing the audit, found one more process hung and when I check the strace its stuck with futex

Process 11421 attached
futex(0x396e421950, FUTEX_WAIT_PRIVATE, 2, NULL^CProcess 11421 detached
 <detached ...>

When I tried to do strace on the threads associated with this process and again the same epoll_wait.

[root@salt-minion ~]# ps -efL|grep 11421
root      6407  7680  6407  0    1 21:36 pts/2    00:00:00 grep 11421
root     11421     1 11421  0    5 Nov11 ?        00:00:00 /usr/local/python371/bin/python3.7 /usr/local/python371/bin/salt-minion -c /etc/salt -d
root     11421     1 11424  0    5 Nov11 ?        00:00:00 /usr/local/python371/bin/python3.7 /usr/local/python371/bin/salt-minion -c /etc/salt -d
root     11421     1 11425  0    5 Nov11 ?        00:00:03 /usr/local/python371/bin/python3.7 /usr/local/python371/bin/salt-minion -c /etc/salt -d
root     11421     1 12078  0    5 Nov11 ?        00:00:00 /usr/local/python371/bin/python3.7 /usr/local/python371/bin/salt-minion -c /etc/salt -d
root     11421     1 12079  0    5 Nov11 ?        00:00:03 /usr/local/python371/bin/python3.7 /usr/local/python371/bin/salt-minion -c /etc/salt -d


[root@salt-minion ~]# strace -p 11424
Process 11424 attached
epoll_wait(63, ^CProcess 11424 detached
 <detached ...>
[root@salt-minion ~]# strace -p 11425
Process 11425 attached
epoll_wait(66, ^CProcess 11425 detached
 <detached ...>
[root@salt-minion ~]# strace -p 12078
Process 12078 attached
epoll_wait(77, ^CProcess 12078 detached
 <detached ...>
[root@salt-minion ~]# strace -p 12079
Process 12079 attached
epoll_wait(80, ^CProcess 12079 detached
 <detached ...>
[root@salt-minion ~]#


@garethgreenaway garethgreenaway added this to the Blocked milestone Dec 23, 2019
@garethgreenaway garethgreenaway added the Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged label Dec 23, 2019
@garethgreenaway
Copy link
Contributor

@kk21986 Thanks for the report. Are you able to upgrade to the latest version of the 2019.2.x branch, 2019.2.2. There were a number of fixes that went into that version, I would be curious if it resolves the issues you're seeing. @saltstack/team-core thoughts?

@kk21986
Copy link
Author

kk21986 commented Dec 23, 2019

@garethgreenaway Thanks for your response! Unfortunately, its a very big task for me to upgrade to the latest version as we have more than 10K servers. The problem here is, I don't have any clue to find out in what scenario this issue appearing, otherwise I would be able to just upgrade it on few servers and test the same. This issue not appearing on all the servers and its happening on different servers randomly a few times. Another strange issue here is, when I tried to stop the minion services those stuck processes are not getting killed, instead of getting TIMEOUT error. If you want me to check anything am ready to do that to find out the root cause.

FYI, I have kept a couple of servers with these issues to troubleshoot in case if you want any more information and even I can't guarantee how long I can hold these servers in this stage as well.

@stale
Copy link

stale bot commented Jan 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

@stale stale bot added the stale label Jan 22, 2020
@stale
Copy link

stale bot commented Jan 22, 2020

Thank you for updating this issue. It is no longer marked as stale.

@stale stale bot removed the stale label Jan 22, 2020
@stale
Copy link

stale bot commented Feb 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged stale
Projects
None yet
Development

No branches or pull requests

3 participants