-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasionaly git_pillar pull fails causing incorrect results of highstate (when running highstate for multiple minions) #29239
Comments
@timwsuqld, thanks for the report. |
+1 RHEL 7
|
Just ran into this problem with a similar error. It seems to happen when I have multiple host trying to refresh their pillar at the same time. Can this get escalated to P1? It breaks my boxes at random and is severely annoying.
The problem I was experiencing is a random pillar file will not be found by the minion and this error will pop up on the salt master log. While I was able to get this to occur with highstate, I was able to reproduce this issue more consistently with the following command.
But I was able to get it to happen less when I put a batch of 1.
|
Hi, I am seeing the same problem where for some minions the wrong set of data (old data or no data at all) is returned by git_pillar. I am using pillar data to template sudoers file and this is resulting is corrupted sudoers files. I tried to trace the error in the log file with debug level. This is the only error I see:
The remote exists and the error seems random (maybe caused by multiple attempts to checkout the repo at the same time!). Moreover, the file in the cache is fine and the returned pillar should not be corrupted, but they are. salt '*' pillar.item returnes the correct set of data so I think the problem is happening when highstate is templating the files. salt --versions-report
|
removing the |
This seems to be related to #31293, which was caused by concurrent master funcs attempting to evaluate git_pillar at the same time and hitting a race condition. I have addressed this in this pull request, which was opened last night. Anyone who is willing to test can either use this GitHub walkthrough to checkout the pull request into your git clone, or wait until it is merged and install from the head of the 2015.8 branch. Only the master needs to be updated. |
I switched back to gitfs in production using 2015.8.8.2 and haven't ran into any issues so far. |
@anlutro Thanks for confirming, I'll go ahead and close this. |
I am seeing a lot of these now instead:
|
I'll open a separate issue for it, I think I see a pattern. |
@anlutro did you find a satisfactory solution to all those |
Hello there, I am currently running
|
Ok, after cleaning up cache likeso:
Pillar is now returning good data but master's log now show a worrying error message: |
@EvaSDK Please open a new issue, and provide the information requested in the issue template to assist us in troubleshooting. Feel free to link to this issue. |
When running state.highstate for a single minion, everything works fine.
When running state.highstate for all minions (5), it sometimes gives incorrect results. (All highstate commands are being run with test=True)
Digging down, it appears that for some minions, the git_pillar fails to update, and so the pillar data for that minion is empty, causing the states to give the wrong output. Ideally if the git_pillar (ext_pillar) fails it shouldn't try and compile states for the minion, as the data is incorrect. I'm also not sure why the pillar data appears to be empty instead of using the last successful pull.
Some of the workarounds that I've seen involve just using Cron to pull the pillars repo, and then pointing at that. This would probably speed things up, but I'd expect salt to already do that.
Lines such as the following appear in the logs when this occurs
My understanding of #22962 and #19994 is that this should have been fixed in 2015.8.0. Maybe this is related, maybe not.
Running on Centos 7 with 4Gb of RAM
The text was updated successfully, but these errors were encountered: