Optimize all membership tests to use constant-time operators #54939

arizvisa · 2019-10-09T21:47:05Z

What does this PR do?

This PR is probably offensive to some especially as it touches a lot of files. What it does is change all membership tests against lists/tuples to use sets. This is a linear-time O(n) to constant-time O(1) optimization. During the transition to Python3, this can actually be a big deal especially against large lists.

When testing membership against a list-like object, in most situations this will be iterated through a member at a time whilst comparing. If the last element of the list is the member to be returned, this will hit the worst-case. Using a set will cost a hashtable lookup which in most cases is faster when testing membership.

As an example of how much of a difference this optimization can make, here's some example code. This code is trying to simulate the membership that is used in numerous places by salt where a string is tested against a list of strings or a list comprehension for validation. The loop count is set to a large amount only to make the difference in timings more distinguishable and as such will typically not be as high except for maybe in the chocolatey package (iirc).

import sys, time
member = sys.argv[1]

tupled = ('require', 'watch', 'prereq', 'onchanges')
listed = ['require', 'watch', 'prereq', 'onchanges']
setted = {'require', 'watch', 'prereq', 'onchanges'}

def closure(member, members):
    return member in members

for t, members in zip(['tuple','list','set'], [tupled, listed, setted]):
    start = time.time()            # start the clock
    for i in range(4096 * 4096):
        if closure(member, members):
            pass
        continue
    print(t, time.time() - start)    # stop the clock

Running with Python2 and Python3 using a non-matching string shows the set being faster:

[1580] user@E5570 ~/work/salt/salt$ python2 perf.py nomatch
('tuple', 2.705552101135254)
('list', 2.5032870769500732)
('set', 1.971463918685913)
[1581] user@E5570 ~/work/salt/salt$ python3 perf.py nomatch
tuple 2.7079267501831055
list 2.6025850772857666
set 2.0482747554779053

Running with Python2 and Python3 using the last-matching string again shows the set is faster:

[1584] user@E5570 ~/work/salt/salt$ python2 perf.py onchanges
('tuple', 2.9444751739501953)
('list', 2.5777699947357178)
('set', 2.1840360164642334)
[1585] user@E5570 ~/work/salt/salt$ python3 perf.py onchanges
tuple 2.672790050506592
list 2.6342225074768066
set 2.0565736293792725

This patch converts both "in LIST", and "in TUPLE" with "in SET". A few containment operators were also changed to use "any" and "all" instead of using membership as iteration can be faster rather than generating the whole type and testing for membership. The states/mssql_*.py files also were using membership to check types, and were replaced with usage of the isinstance keyword.

What issues does this PR fix or reference?

This is just a small step towards improving performance. Some steps in the future would be to move closure definitions out of loops, re-factoring exceptions so the default path is the hot-path, etc. These other suggestions might require changing semantics and so will likely require more work.

Tests written?

No. The original tests should be fine if they're comprehensive.

Commits signed with GPG?

No.

arizvisa · 2019-10-09T21:52:03Z

As mentioned, I'm not sure if you guys want to merge this. But this primitive is probably something you'd want to enforce with some tool at some point.

I noticed while checking each of these that there's a lot of inconsistent checks when trying to determine the "truthiness" of a string. Like there's also some cases where .lower() isn't being used ("FalSe" would not be false), or "1" is forgotten about, etc. We might want to include some kind of util for refactoring these tests into so the caller does not need to do explicit string checking.

cmcmarrow · 2019-10-10T17:31:30Z

@arizvisa Thank you for giving use this code. I see one problem when using sets in this situation. The problems is unhashable types. For example ["salt", "stack"] in {"cats"}. Also most of the list and tuples are very small three or less. So O(n) O(1) is not a big deal here. When we run the same piece of code 100,000 of times we can see a difference. The performance boost from going from list to sets in this situation should not be noticeable in a real world situation.

cmcmarrow

Now with that being said speed is speed. If you want to go through and find every "in (...)" that will not have a unshushable type error and make them into a set you can. I will approve it.

arizvisa · 2019-10-10T21:04:47Z

ok. are you sure you want to merge this tho? this was mostly a proof-of-concept to open discussion.

but sure, i'll go through and explicitly check for these situations. I didn't see any when i first went through it, but i also wasn't going through them explicitly for this.

arizvisa · 2019-10-11T03:22:17Z

Okay, I hope I got them all. Thanks for reviewing this. I understand it's a huge pita since it covers so much code. Is there anything in particular that you think I've missed?

Really hoping this doesn't break anything, heh.

arizvisa · 2019-10-11T03:42:51Z

Also, do you or the other maintainers have any opinions on providing utility functions for common constructs? Namely with the intention of de-duplicating common code or improving consistency. Is there an already-existing place for these types of things?

Like as an example, a very common one could be for determining the "truthiness" of a value and similarly for "falseness". There are a number of places where code is checking for '1', 'true', 'True', 'yes', 'on' in order to return True or a value with similar semantics.

There's a number of places in the membership tests where this is done where some cases are missed., One case where this is done is in salt/states/cmd.py where a function _is_true() was implemented. Similarly, there's also a function in salt/states/selinux.py named _refine_value() which performs this similarly (but possibly for selinux semantics).

I also noticed that salt/states/macdefaults.py has an interesting function, safe_cast, which seems like it would be duplicated in other places. This function simply passes its first parameter to the constructor of another parameter (int, str, etc,) and wraps it in a catch-all exception. If this does turn out to be a common construct, it seems like it should be part of some facility of pure+primitive functions.

There was also this weird line in salt/utils/minions.py that follows.

if [x for x in ret if x in {x for y in regex_chars if y in x}]

Actually it's not too weird, but this double list-comprehension seems reduceable. At the very least, it could probably be refactored for the sake of readability. Do you guys accept refactor-type patches like this?

arizvisa · 2019-10-16T05:48:17Z

I guess I need to rebase this again since there's some massive changes going on in master, right?

…h compares in O(1).

…ch compares in O(1).

…ly composed prior to iteration) with the "any" and "all" containment operators.

…lls to isinstance.

…ashable, or comes from the user (and does not have a documented type) where the type may vary.

…alt.cloud.clouds.nova with six.viewkeys() as the author was originally checking for the existence of a key.

… comma when defining a set.

…shed.

arizvisa

Okay, rebased to accommodate the deprecation of raet and double-checked.

thatch45 · 2019-10-23T14:02:58Z

I am also curious about total test suite time, does this create an at-all measurable reduction in total test suite execution time?

arizvisa · 2019-10-23T18:41:05Z

Is there a proper way to profile the tests? Like perhaps some way to either increase the resolution of the timestamps, or a good place to integrate cProfile into the unit tests? or should i script a for-loop through each individually and use time when running them?

Actually speaking of modifying the test suite, where's a good place to import modules that the test suite depends on? One of my more "important" PRs is blocked on that question and has been like that for quite a while...

dwoz · 2019-12-20T05:35:59Z

@arizvisa It's a nice optimization and something we should be aware of. However, I'd be surprised if there much of a measurable performance increase. Especially since most of these sets only have a few items in them.

Overall, I'd like to see these change broken up into several pull requests.

arizvisa · 2019-12-20T16:26:44Z

Yeah, its ok. I dropped this anyways. It was more to facilitate discussion on measuring performance of various components in saltstack which unfortunately didn't actually happen. Also, until you guys have a mean of prioritizing PRs, it's too hard to maintain something like this that touches a large number of files as development progresses.

sagetherage · 2020-03-19T22:25:45Z

@arizvisa so do you want to close this PR? or break it up? I am not sure how to help facilitate movement forward here so simply asking.

arizvisa · 2020-03-23T22:12:39Z

Yeah, I'll close this. It didn't get the response I'd have expected. This'd be something to think about in the future as other improvements have been added w/o testing for how it affects performance.

arizvisa requested a review from a team as a code owner October 9, 2019 21:47

ghost requested a review from cmcmarrow October 9, 2019 21:47

cmcmarrow suggested changes Oct 10, 2019

View reviewed changes

arizvisa requested a review from cmcmarrow October 13, 2019 04:05

arizvisa added 9 commits October 22, 2019 21:40

Replaced all O(n) comparisons that match /in list/ with /in set/ whic…

4bc8698

…h compares in O(1).

Replaced all O(n) comparisons that match /in tuple/ with /in set/ whi…

55ca308

…ch compares in O(1).

Replaced containment comprehensions which (require the list to be ful…

021ef9d

…ly composed prior to iteration) with the "any" and "all" containment operators.

Replaced type checking that use membership for testing with proper ca…

4f0fac3

…lls to isinstance.

Demoted some membership tests to use tuples where the member is non-h…

fa0edaf

…ashable, or comes from the user (and does not have a documented type) where the type may vary.

Replaced a list-comprehension in the request_instance() function of s…

9baff01

…alt.cloud.clouds.nova with six.viewkeys() as the author was originally checking for the existence of a key.

Fixed some lint issues related to trailing commas when defining a set.

fa63e2a

Missed some more lint errors in salt.modules.lxc due to an extraneous…

6baefdc

… comma when defining a set.

Missed a multi-lined list in salt.modules.docker mod that can't be ha…

1fa7d6c

…shed.

arizvisa force-pushed the constant-time branch from 61c35dc to 1fa7d6c Compare October 23, 2019 02:44

Missed some stuff in salt.cli.daemons and salt.key during rebase.

061143d

arizvisa commented Oct 23, 2019

View reviewed changes

arizvisa closed this Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize all membership tests to use constant-time operators #54939

Optimize all membership tests to use constant-time operators #54939

arizvisa commented Oct 9, 2019

arizvisa commented Oct 9, 2019

cmcmarrow commented Oct 10, 2019 •

edited

cmcmarrow left a comment •

edited

arizvisa commented Oct 10, 2019

arizvisa commented Oct 11, 2019

arizvisa commented Oct 11, 2019

arizvisa commented Oct 16, 2019

arizvisa left a comment

thatch45 commented Oct 23, 2019

arizvisa commented Oct 23, 2019

dwoz commented Dec 20, 2019

arizvisa commented Dec 20, 2019

sagetherage commented Mar 19, 2020

arizvisa commented Mar 23, 2020

Optimize all membership tests to use constant-time operators #54939

Optimize all membership tests to use constant-time operators #54939

Conversation

arizvisa commented Oct 9, 2019

What does this PR do?

What issues does this PR fix or reference?

Tests written?

Commits signed with GPG?

arizvisa commented Oct 9, 2019

cmcmarrow commented Oct 10, 2019 • edited

cmcmarrow left a comment • edited

Choose a reason for hiding this comment

arizvisa commented Oct 10, 2019

arizvisa commented Oct 11, 2019

arizvisa commented Oct 11, 2019

arizvisa commented Oct 16, 2019

arizvisa left a comment

Choose a reason for hiding this comment

thatch45 commented Oct 23, 2019

arizvisa commented Oct 23, 2019

dwoz commented Dec 20, 2019

arizvisa commented Dec 20, 2019

sagetherage commented Mar 19, 2020

arizvisa commented Mar 23, 2020

cmcmarrow commented Oct 10, 2019 •

edited

cmcmarrow left a comment •

edited