Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix syndic regression #33021

Merged
merged 2 commits into from May 3, 2016
Merged

Fix syndic regression #33021

merged 2 commits into from May 3, 2016

Conversation

UtahDave
Copy link
Contributor

@UtahDave UtahDave commented May 3, 2016

What does this PR do?

Fixes syndic and speeds up all of salt's processing of minion returns.

What issues does this PR fix or reference?

Previous Behavior

Syndic slow and inconsistent results

New Behavior

Syndic fast and consistent results.

Tests written?

No

@thatch45 thatch45 added the bugfix-bckport will be be back-ported to an older release branch by creating a PR against that branch label May 3, 2016
@thatch45
Copy link
Member

thatch45 commented May 3, 2016

@rallytime this fix is HUGE, it should be merged forward to 2016.3 and back to 2015.5, it improved master performance by about 50% and syndic performance by a few orders of magnitude, this also basically fixes all the syndic bugs

@thatch45
Copy link
Member

thatch45 commented May 3, 2016

It is also noteworthy that if the syndic is updated with this fix and the master of masters is NOT updated, then this will cause the master of masters to go SLOWER (not much slower)!!!!!
So we need to be careful, these fixes need to be on both syndic and master of master (minions need to update)

@rallytime
Copy link
Contributor

@thatch45 Sounds good. Thanks for the heads up.

@UtahDave Since this fix is a really good one for all of the reasons Tom mentioned above, can you add something to the release notes about this?

@thatch45
Copy link
Member

thatch45 commented May 3, 2016

I will get it

@cachedout cachedout merged commit 3434f44 into saltstack:2015.8 May 3, 2016
@rallytime
Copy link
Contributor

@thatch45 I tried back-porting this to 2015.5, but the line that you guys are changing in minion.py is in a function that doesn't exist on the 2015.5 branch and I am not sure what the correct solution/conflict resolution is.

@thatch45
Copy link
Member

thatch45 commented May 4, 2016

thanks for the heads up, lemme look

thatch45 added a commit to thatch45/salt that referenced this pull request May 4, 2016
@thatch45
Copy link
Member

thatch45 commented May 4, 2016

see #33044

@rallytime rallytime added ZZZ[Done]-back-ported-bf RETIRED The pull request has been back-ported to an older branch. and removed bugfix-bckport will be be back-ported to an older release branch by creating a PR against that branch labels May 4, 2016
rallytime pushed a commit that referenced this pull request May 4, 2016
cachedout pushed a commit that referenced this pull request May 5, 2016
* Add run_on_start docs to schedule.rst (#32958)

Fixes #22580

* Backport #33021 manually to 2015.5 (#33044)

* Saltfile with pillar tests (#33045)

* add file.managed with pillar data tests

* do not require git for other tests

* Whitespace...
@DmitryKuzmenko
Copy link
Contributor

Very good catch! Thank you @UtahDave !

cachedout pushed a commit that referenced this pull request May 12, 2016
* Add run_on_start docs to schedule.rst (#32958)

Fixes #22580

* Backport #33021 manually to 2015.5 (#33044)

* Saltfile with pillar tests (#33045)

* add file.managed with pillar data tests

* do not require git for other tests

* Fix minor document error of test.assertion (#33067)

* test pillar.items output (#33060)

* File and User test fixes for 2015.5 on Fedora23 (#33055)

* Fix file_test.test_symlink on 2015.5

* Fix failing user present test

* add test for installing package while using salt-call --local (#33025)

* add test for installing package while using salt-call --local

* fix pylint

* ssh docs: install py-2.6 for RHEL 5

* Bugfix: Restore boolean values from the repo configuration

* Add test data for repos

* Add repo config test

* Bugfix (follow-up): setting priority requires non-positive integer

* modules.npm: do not log npm --version at info level (#33084)

* salt-cloud: fix ipv6-only virtual machines (#32865)

* salt-cloud: fix ipv6-only virtual machines

* fix hostname for rsync fallback in scp_file function

* use 4 spaces instead of 2

* remove global variable, use direct socket call instead

* Use saltstack repo in buildpackage.py on CentOS 5 (#33080)

* Lower display of msgpack failure msg to debug (#33078)

Closes #33074

* cloud.query needs to define mapper.opts (#33098)

* clarify docs that map is designed to be run once. is not stateful (#33102)

* Moved _finger_fail method to parent class.

Method _finger_fail method from SAuth to AsyncAuth class to make method available
in both class and fix an issue where _finger_Fail is called inside AsyncAuth.

* Fix 33058 (#33099)

* Fix servermanager module

- Added check for 2008 version of windows
- Added Import-Module ServerManager to _pshell_json.
  Apparently this needs to run each time we issue a
  servermanager command.

* Fix list_available

* salt.utils.gitfs: fix formatting for warning messages (#33064)

* salt.utils.gitfs: fix formatting for warning messages

When git_pillar support was added to salt.utils.gitfs, the
recommendation globals had string formatting placeholders added to them,
but the locations where these values are referenced do not call
``.format()`` to properly replace them. This commit fixes that
oversight.

* Remove more gitfs and master-specific wording from log messages

* Add a check that the cmdline of the found proc matches (#33129)

* Doc mock decorators (#33132)

* Add mock function for mocking decorators

* Mock the stdlib user module because importing it will open the repl

* Fix broken parsing of usermgmt.conf on OpenBSD (#33135)

When creating a new user, if a group of the same name already exists,
the usermgmt.conf file is consulted to determine the primary group.
It's in these cases that the parsing bug is triggered.

This code change addresses several of the existing issues:

- The previous split statement explicitly specified a single space.
  Since a config line may have any number of spaces and/or tabs
  surrounding the entries, the resulting array's elements may be
  incorrect.

- According to the man pages for usermgmt.conf, the "group" config
  entry accpets a single parameter -- so we shouldn't iterate.

- The "val[1]" was returning the 2nd letter of each word and not the
  second word on the config line as intended.

* Move salt-ssh thin dir location to /var/tmp (#33130)

* Move salt-ssh thin dir location to /var/tmp

Closes #32771

* Remove performance penelty language

* If cache_jobs: True is set, populate the local job cache when running salt-call (#33100)

* If cache_jobs: True is set, populate the local job cache

Fixes #32834

Allows a masterless minion to query the job cache.

* Refactor cache_jobs functionality to be DRY

* Skipping salt-call --local test

* Back-port #31769 to 2015.8 (#33139)

* Handle empty acl_name in linux_acl state

Calls to setfacl interpret an empty group or user name to mean to be the
owner of the file they're operating on. For example, for a directory
owned by group 'admin', the ACL 'default:group::rwx' is equivalent to
'default:group:admin:rwx'.

The output of the getfacl execution module returns ACLs in the format of
'group:admin:rwx' instead of 'group::rwx'. This commit changes the
acl.present state to look for the owner of the file if the acl_name
paremeter is empty.

* Fix acl.present/acl.absent changing default ACLs

The behaviour of the acl.present and acl.absent is to check the data
structure returned by getfacl contains a key by the name of acl_type.

However, this data structure does not contain any default ACLs if none
exist, so this check will fail. We omit the check if a default ACL was
passed into the state functions.

Unfortunately, the call to modfacl may fail if the user passes in an
acl_type such as 'default:random'. In this case the state will appear to
succeed, but do nothing.

This fixes the state module to allow setting default ACLs on files which
have none.

* Fix regression in 2016.3 HEAD when version is specified (#33146)

Resolves #33013.

* Hash fileclients by opts (#33142)

* Hash fileclients by opts

There was an issue whereby the cache of the fileclient was being overwritten
by dueling minion instances in multimaster mode. This protects them by hashing
by the id of opts.

Closes #25040

* Silly typo!

* Remove tests which do not test any actual functionality or are too tightly coupled to the implementation

* Strip ldap fqdn (#33127)

* Add option to strip off domain names on computer names that come from LDAP/AD

* Add strip_domains option for ldap.

* Add documentation for auth.ldap.minion_stripdomains.

* [2015.5] Update to latest bootstrap script v2016.05.10 (#33155)

* [2015.8] Update to latest bootstrap script v2016.05.10 (#33156)

* [2016.3] Update to latest bootstrap script v2016.05.10 (#33157)

* add 2015.5.11 release notes (#33160)

* add 2015.8.9 release notes (#33161)

* Pip fix (#33180)

* fix pip!!

* make it work with old pip as well

* Added resiliency

* Don't need to check, just get the right name

* [2015.5] Update to latest bootstrap script v2016.05.11 (#33185)
time.sleep(0.1)
if passed_jid is None:
return prep_jid(nocache=nocache, recurse_count=recurse_count+1)
if not os.path.isdir(jid_dir_):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be more correct as the following:

# Make sure we create the jid dir, otherwise someone else is using it,
# meaning we need a new jid.
try:
    os.makedirs(jid_dir_)
except OSError as exc:
    if exc.errno != errno.EEXIST:
        err = 'prep_jid failed creating jid directory: {0}'.format(exc)
        log.error(err)
        raise salt.exceptions.SaltCacheError(err)
    if passed_jid is None:
        # The JID that was just generated is already in use so get a new one.
        time.sleep(0.1)
        return prep_jid(nocache=nocache, recurse_count=recurse_count+1)

@plastikos
Copy link
Contributor

Someone mentioned this fix to me today and, while it is good, would be more correct if tweaked. Should I submit a P.R. for this tweak?

except OSError:
time.sleep(0.1)
if passed_jid is None:
return prep_jid(nocache=nocache, recurse_count=recurse_count+1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original bug is that the sleep() is not inside this conditional (if passed_jid is None) for just the case where a JID is generated and we need a new clock tick for a different JID to affect this recursive call.

@thatch45
Copy link
Member

yes @plastikos please feel free to send us a PR optimizing this

@mimianddaniel mimianddaniel mentioned this pull request Sep 2, 2016
@UtahDave UtahDave deleted the 2015.8-syndic branch October 28, 2016 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awesome ZZZ[Done]-back-ported-bf RETIRED The pull request has been back-ported to an older branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants