New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix syndic regression #33021
Fix syndic regression #33021
Conversation
@rallytime this fix is HUGE, it should be merged forward to 2016.3 and back to 2015.5, it improved master performance by about 50% and syndic performance by a few orders of magnitude, this also basically fixes all the syndic bugs |
It is also noteworthy that if the syndic is updated with this fix and the master of masters is NOT updated, then this will cause the master of masters to go SLOWER (not much slower)!!!!! |
I will get it |
@thatch45 I tried back-porting this to 2015.5, but the line that you guys are changing in |
thanks for the heads up, lemme look |
see #33044 |
Very good catch! Thank you @UtahDave ! |
* Add run_on_start docs to schedule.rst (#32958) Fixes #22580 * Backport #33021 manually to 2015.5 (#33044) * Saltfile with pillar tests (#33045) * add file.managed with pillar data tests * do not require git for other tests * Fix minor document error of test.assertion (#33067) * test pillar.items output (#33060) * File and User test fixes for 2015.5 on Fedora23 (#33055) * Fix file_test.test_symlink on 2015.5 * Fix failing user present test * add test for installing package while using salt-call --local (#33025) * add test for installing package while using salt-call --local * fix pylint * ssh docs: install py-2.6 for RHEL 5 * Bugfix: Restore boolean values from the repo configuration * Add test data for repos * Add repo config test * Bugfix (follow-up): setting priority requires non-positive integer * modules.npm: do not log npm --version at info level (#33084) * salt-cloud: fix ipv6-only virtual machines (#32865) * salt-cloud: fix ipv6-only virtual machines * fix hostname for rsync fallback in scp_file function * use 4 spaces instead of 2 * remove global variable, use direct socket call instead * Use saltstack repo in buildpackage.py on CentOS 5 (#33080) * Lower display of msgpack failure msg to debug (#33078) Closes #33074 * cloud.query needs to define mapper.opts (#33098) * clarify docs that map is designed to be run once. is not stateful (#33102) * Moved _finger_fail method to parent class. Method _finger_fail method from SAuth to AsyncAuth class to make method available in both class and fix an issue where _finger_Fail is called inside AsyncAuth. * Fix 33058 (#33099) * Fix servermanager module - Added check for 2008 version of windows - Added Import-Module ServerManager to _pshell_json. Apparently this needs to run each time we issue a servermanager command. * Fix list_available * salt.utils.gitfs: fix formatting for warning messages (#33064) * salt.utils.gitfs: fix formatting for warning messages When git_pillar support was added to salt.utils.gitfs, the recommendation globals had string formatting placeholders added to them, but the locations where these values are referenced do not call ``.format()`` to properly replace them. This commit fixes that oversight. * Remove more gitfs and master-specific wording from log messages * Add a check that the cmdline of the found proc matches (#33129) * Doc mock decorators (#33132) * Add mock function for mocking decorators * Mock the stdlib user module because importing it will open the repl * Fix broken parsing of usermgmt.conf on OpenBSD (#33135) When creating a new user, if a group of the same name already exists, the usermgmt.conf file is consulted to determine the primary group. It's in these cases that the parsing bug is triggered. This code change addresses several of the existing issues: - The previous split statement explicitly specified a single space. Since a config line may have any number of spaces and/or tabs surrounding the entries, the resulting array's elements may be incorrect. - According to the man pages for usermgmt.conf, the "group" config entry accpets a single parameter -- so we shouldn't iterate. - The "val[1]" was returning the 2nd letter of each word and not the second word on the config line as intended. * Move salt-ssh thin dir location to /var/tmp (#33130) * Move salt-ssh thin dir location to /var/tmp Closes #32771 * Remove performance penelty language * If cache_jobs: True is set, populate the local job cache when running salt-call (#33100) * If cache_jobs: True is set, populate the local job cache Fixes #32834 Allows a masterless minion to query the job cache. * Refactor cache_jobs functionality to be DRY * Skipping salt-call --local test * Back-port #31769 to 2015.8 (#33139) * Handle empty acl_name in linux_acl state Calls to setfacl interpret an empty group or user name to mean to be the owner of the file they're operating on. For example, for a directory owned by group 'admin', the ACL 'default:group::rwx' is equivalent to 'default:group:admin:rwx'. The output of the getfacl execution module returns ACLs in the format of 'group:admin:rwx' instead of 'group::rwx'. This commit changes the acl.present state to look for the owner of the file if the acl_name paremeter is empty. * Fix acl.present/acl.absent changing default ACLs The behaviour of the acl.present and acl.absent is to check the data structure returned by getfacl contains a key by the name of acl_type. However, this data structure does not contain any default ACLs if none exist, so this check will fail. We omit the check if a default ACL was passed into the state functions. Unfortunately, the call to modfacl may fail if the user passes in an acl_type such as 'default:random'. In this case the state will appear to succeed, but do nothing. This fixes the state module to allow setting default ACLs on files which have none. * Fix regression in 2016.3 HEAD when version is specified (#33146) Resolves #33013. * Hash fileclients by opts (#33142) * Hash fileclients by opts There was an issue whereby the cache of the fileclient was being overwritten by dueling minion instances in multimaster mode. This protects them by hashing by the id of opts. Closes #25040 * Silly typo! * Remove tests which do not test any actual functionality or are too tightly coupled to the implementation * Strip ldap fqdn (#33127) * Add option to strip off domain names on computer names that come from LDAP/AD * Add strip_domains option for ldap. * Add documentation for auth.ldap.minion_stripdomains. * [2015.5] Update to latest bootstrap script v2016.05.10 (#33155) * [2015.8] Update to latest bootstrap script v2016.05.10 (#33156) * [2016.3] Update to latest bootstrap script v2016.05.10 (#33157) * add 2015.5.11 release notes (#33160) * add 2015.8.9 release notes (#33161) * Pip fix (#33180) * fix pip!! * make it work with old pip as well * Added resiliency * Don't need to check, just get the right name * [2015.5] Update to latest bootstrap script v2016.05.11 (#33185)
time.sleep(0.1) | ||
if passed_jid is None: | ||
return prep_jid(nocache=nocache, recurse_count=recurse_count+1) | ||
if not os.path.isdir(jid_dir_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be more correct as the following:
# Make sure we create the jid dir, otherwise someone else is using it,
# meaning we need a new jid.
try:
os.makedirs(jid_dir_)
except OSError as exc:
if exc.errno != errno.EEXIST:
err = 'prep_jid failed creating jid directory: {0}'.format(exc)
log.error(err)
raise salt.exceptions.SaltCacheError(err)
if passed_jid is None:
# The JID that was just generated is already in use so get a new one.
time.sleep(0.1)
return prep_jid(nocache=nocache, recurse_count=recurse_count+1)
Someone mentioned this fix to me today and, while it is good, would be more correct if tweaked. Should I submit a P.R. for this tweak? |
except OSError: | ||
time.sleep(0.1) | ||
if passed_jid is None: | ||
return prep_jid(nocache=nocache, recurse_count=recurse_count+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original bug is that the sleep()
is not inside this conditional (if passed_jid is None
) for just the case where a JID
is generated and we need a new clock tick for a different JID
to affect this recursive call.
yes @plastikos please feel free to send us a PR optimizing this |
What does this PR do?
Fixes syndic and speeds up all of salt's processing of minion returns.
What issues does this PR fix or reference?
Previous Behavior
Syndic slow and inconsistent results
New Behavior
Syndic fast and consistent results.
Tests written?
No