New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minion did not return. [No response] appears occasionally,but once happened,minion never returns #56467
Comments
How many minions do you have connected to your master? And what I'm reading here is that some minions will no longer connect to the master after a few days, or all minions will no longer connect to the master after a few days? |
Also, I noticed your master is on 2018.3.4, which is a version we are no longer officially supporting. Is there any way you'd be able to try this setup in another environment with your master on 3000 or also on 2019.2? |
In our test environment,we have about 6 minions with 3 types of os: suse and other two developed by our company, only the minions installed in suse severs will occasionally meets with that 'not return' problem, others run very well;
Yes, we can set up with the latest version and see if this will occur;
I should point out that this problem happened very occasionally,we have no idea when will it happen again,but in our product environment, we have thousands minion boxes,so it could be a serious problem.
Thank you very much for your reply.
…------------------ 原始邮件 ------------------
发件人: "Joe Eacott"<notifications@github.com>;
发送时间: 2020年3月31日(星期二) 凌晨0:34
收件人: "saltstack/salt"<salt@noreply.github.com>;
抄送: "1280129660"<1280129660@qq.com>;"Author"<author@noreply.github.com>;
主题: Re: [saltstack/salt] Minion did not return. [No response] appears occasionally,but once happened,minion never returns (#56467)
Also, I noticed you're on 2018.3.4, which is a version we are no longer officially supporting. Is there any way you'd be able to try this setup in another environment with our latest Salt 3000?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Are you able to run basic commands such as |
After minion doesn't return, test.ping also doesn't return ,but ip ping is OK. When we restart the minion manually,it runs well again.
I didn't check the ports on suse, and now I am waiting for it happens again.
发自我的iPhone
…------------------ Original ------------------
From: Joe Eacott <notifications@github.com>
Date: Tue,Mar 31,2020 1:22 AM
To: saltstack/salt <salt@noreply.github.com>
Cc: marilyn6483 <1280129660@qq.com>, Author <author@noreply.github.com>
Subject: Re: [saltstack/salt] Minion did not return. [No response] appears occasionally,but once happened,minion never returns (#56467)
Are you able to run basic commands such as ip ping <address> and also check the ports on the suse minions? It could be a configuration issue on the suse minion end since it seems your other in-house minions are running fine without any hiccups!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I found some system logs: At the time 'Mar 20 14:50:40', the minion‘s log is: Is there any connections? |
I am facing the same issue in Salt: 2019.2.5. Where salt master and minion running inside centos 7 container. Almost 24 test.ping is worked. After that when do several tries also will getting "Minion did not return. [No response]". If i rebooted the salt-minion, next almost 24 test.ping worked and then this problem occurred again. |
What's your Python version?
发自我的iPhone
…------------------ Original ------------------
From: SaravanakumarSivasankaran <notifications@github.com>
Date: Tue,Jul 14,2020 2:24 PM
To: saltstack/salt <salt@noreply.github.com>
Cc: marilyn6483 <1280129660@qq.com>, Author <author@noreply.github.com>
Subject: Re: [saltstack/salt] Minion did not return. [No response] appears occasionally,but once happened,minion never returns (#56467)
I am facing the same issue in Salt: 2019.2.5. Where salt master and minion running inside centos 7 container. Almost 16 test.ping is worked. After that when do several tries also will get "Minion did not return. [No response]".
If i rebooted the salt-minion, next almost 16 test.ping worked and the problem occurred again.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Python 2.7.5 Found the below observation in salt-minion trace logs for not responded test.ping
|
I have the same problem in Salt 3000.3 (master and minons) too, but it's been going on for quite some time across prior versions to. I've struggled to find any ideas on how to identify it or how to even write a bug report for it. We're running python 2.7 too and running on AWS Linux. What I can determine is that there appears to be two cases:
You can see it play out by using In the happy case, the following happens:
The job is created, the return is received and all is right with the world. In the unhappy case:
It's broadly the same, but you can see a The minion logs similarly in both cases: In the happy case:
In the unhappy case, there is the additional
Note there are two returns - one for the first test.ping and one for the following find_job. Some Lines 389 to 395 in 45efc4c
to:
And used this to dump the output in JSON:
Here's the successful and unsuccessful cases for two runs. Successful case:
Unsuccessful case:
Nothing to suggest that the minions are behaving differently.
It looks to me that from running with I find that restarting the master almost always resolves the problem, which makes tracking it down really hard. As far as I can tell - my best guess is that something about the socket receiving logic means that the messages aren't received correctly. I found that the following code: https://github.com/saltstack/salt/blob/master/salt/ext/tornado/iostream.py#L1051-L1054
would print the chunk message from the minion immediately when sending I don't think there's anything unusual about our environment - we've around ~50-70 nodes. Most are static, with ~15 being auto scaling. The rest don't change. It's all in AWS within peered VPCs - and as above the traffic regarding tcpdump it seems to be traversing the network perfectly fine - sometimes it works, sometimes it doesn't and it's proving impossible to tell why. Any thoughts or recommendations on how best to proceed would be utterly and completely magnificent. |
@edhgoose I apologize it took me so long to actually follow up, here. Wondering if we should schedule a session with a Core Team member to attempt to hammer on this, I can help to schedule something, is there a time that is best for you? |
@sagetherage, that'd be great. If you want to take a take a look at https://calend.ly/edhgoose that's probably a good start? Generally evenings (UK time) are pretty good too. |
@sagetherage - hey, I'd really love to get that help. This issue is driving me mad, sorry to chase - can we try and setup that session with a team member? |
yes! I have assigned @DmitryKuzmenko who can help here. |
@DmitryKuzmenko have you been able to get with @edhgoose on this issue? |
@sagetherage @DmitryKuzmenko and I had a conversation about turning on certain debugging, but I'm on holiday atm and haven't seen the issue for a little while. Unfortunately it seems extremely intermittent - but I'll be sure to confirm if it comes up again what the logs are. |
@edhgoose no worries! I was checking in because I didn't want things to be left undone, thank you for the extra effort and have a nice holiday. |
Unfortunately, I'm getting this same problem and manage about 1000s of servers.
Like @edhgoose mentioned. Usually, a restart fixes it, however, it's been happening more and more lately. The company I work for adds new dedicated servers weekly and the issue persists even on new updated provisioned servers. We use the python 3 versions of Salt. |
@marilyn6483 do you have any chance to provide corresponding minion debug log? |
We found out that the Python 3. 5 had a bug that caused the problem : here's the bug report
https: //bugs. python. org/issue29386
Please check your python version first ,and what I want to know is what have you do,did you use salt to execute some nested shell scripts?
发自我的iPhone
…------------------ Original ------------------
From: Dmitry Kuzmenko <notifications@github.com>
Date: Tue,Sep 1,2020 6:56 PM
To: saltstack/salt <salt@noreply.github.com>
Cc: marilyn6483 <1280129660@qq.com>, Mention <mention@noreply.github.com>
Subject: Re: [saltstack/salt] Minion did not return. [No response] appears occasionally,but once happened,minion never returns (#56467)
|
Well, I'm running python3.6 and the issue persists. I do run a few |
Just a follow-up as it's been 10 days without a reply.
I checked this bug and ran:
Just waits... You mentioned nested shell scripts. We do something like this:
|
@JustOneMoreBlock I've asked before and asking again. Could you please provide a hung minion debug log? |
I got the same problem, my os is centos, python version is 3.6.8. |
It seems like we might have a few different issues all in the same ticket here: @marilyn6483 Just to confirm, was the original issue that you had posted related to Python 3.5 and with an upgrade has been resolved? @JustOneMoreBlock Following on your the comments from @DmitryKuzmenko can you provide a minion log with information when the issue occurs. @edhgoose Is the issue you're seeing still persisting? Does it appear to be a timeout issue between master and minion? Have you tried increasing the timeout values? Are you running Salt in the same AWS region or is between regions? From outside AWS into AWS? Thanks! |
I think the issue still exists. |
I believe I'm seeing this same behavior. Though being new to Salt, I'm not sure if it is exactly the same. The issue I'm seeing is that some of our Minions will eventually stop responding to the Master completely. Logging into the Minion, the We are running everything via To start, here's my
Here's my
We have other Minions that have gone into this failure state and they are running version I believe that if I update all of my Minions so that the Let me know if I should do anything else to assist in tracking this down and hopefully getting it fixed. |
@marilyn6483 |
Hello, We faced the same issue with our salt setup.
salt-master
Can this issue be looked at? |
Same here, anything new on this ?
|
Still an issue at our company we have a restart script as a procedure to resolve this issue for now. Master: salt --versions-report
Salt Version:
Salt: 3006.3
Python Version:
Python: 3.10.13 (main, Sep 6 2023, 02:11:27) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
dateutil: 2.8.1
docker-py: Not Installed
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.13.10
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: centos 7.9.2009 Core
locale: utf-8
machine: x86_64
release: 3.10.0-1160.36.2.el7.x86_64
system: Linux
version: CentOS Linux 7.9.2009 Core Minion Salt Version:
Salt: 3006.3
Python Version:
Python: 3.10.13 (main, Sep 6 2023, 02:15:03) [GCC 11.2.0]
Dependency Versions:
cffi: 1.14.6
cherrypy: unknown
dateutil: 2.8.1
docker-py: 1.10.6
gitdb: Not Installed
gitpython: Not Installed
Jinja2: 3.1.2
libgit2: Not Installed
looseversion: 1.0.2
M2Crypto: Not Installed
Mako: Not Installed
msgpack: 1.0.2
msgpack-pure: Not Installed
mysql-python: Not Installed
packaging: 22.0
pycparser: 2.21
pycrypto: Not Installed
pycryptodome: 3.9.8
pygit2: Not Installed
python-gnupg: 0.4.8
PyYAML: 6.0.1
PyZMQ: 23.2.0
relenv: 0.13.10
smmap: Not Installed
timelib: 0.2.4
Tornado: 4.5.3
ZMQ: 4.3.4
System Versions:
dist: amzn 2
locale: utf-8
machine: aarch64
release: 4.14.336-257.562.amzn2.aarch64
system: Linux
version: Amazon Linux 2
|
Description of Issue
Setup
(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)
cat minion | grep -v "^#" | grep -v "^$"
cat master | grep -v "^#" | grep -v "^$"
Steps to Reproduce Issue
(Include debug logs if possible and relevant.)
The following is debug log,master and minion ip is replaced by 1.2.3.4 and 2.2.2.2. We are executing a sls file, the salt minion runs very well for a couple of days before it suddenly couldn't return data to master. I located the source code, found out that minion just stopped in “ Connecting the Minion to the Master URI (for the return server)”, and no more debug log info was printed.
Versions Report
(Provided by running
salt --versions-report
. Please also mention any differences in master/minion versions.)[root@localhost chenyanyan]# salt-master -V
linux-jc57:/var/log # /root/miniconda3/bin/salt-minion -V
The text was updated successfully, but these errors were encountered: