New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize all membership tests to use constant-time operators #54939
Conversation
As mentioned, I'm not sure if you guys want to merge this. But this primitive is probably something you'd want to enforce with some tool at some point. I noticed while checking each of these that there's a lot of inconsistent checks when trying to determine the "truthiness" of a string. Like there's also some cases where .lower() isn't being used ("FalSe" would not be false), or "1" is forgotten about, etc. We might want to include some kind of util for refactoring these tests into so the caller does not need to do explicit string checking. |
@arizvisa Thank you for giving use this code. I see one problem when using sets in this situation. The problems is unhashable types. For example ["salt", "stack"] in {"cats"}. Also most of the list and tuples are very small three or less. So O(n) O(1) is not a big deal here. When we run the same piece of code 100,000 of times we can see a difference. The performance boost from going from list to sets in this situation should not be noticeable in a real world situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now with that being said speed is speed. If you want to go through and find every "in (...)" that will not have a unshushable type error and make them into a set you can. I will approve it.
ok. are you sure you want to merge this tho? this was mostly a proof-of-concept to open discussion. but sure, i'll go through and explicitly check for these situations. I didn't see any when i first went through it, but i also wasn't going through them explicitly for this. |
Okay, I hope I got them all. Thanks for reviewing this. I understand it's a huge pita since it covers so much code. Is there anything in particular that you think I've missed? Really hoping this doesn't break anything, heh. |
Also, do you or the other maintainers have any opinions on providing utility functions for common constructs? Namely with the intention of de-duplicating common code or improving consistency. Is there an already-existing place for these types of things? Like as an example, a very common one could be for determining the "truthiness" of a value and similarly for "falseness". There are a number of places where code is checking for There's a number of places in the membership tests where this is done where some cases are missed., One case where this is done is in I also noticed that There was also this weird line in
Actually it's not too weird, but this double list-comprehension seems reduceable. At the very least, it could probably be refactored for the sake of readability. Do you guys accept refactor-type patches like this? |
I guess I need to rebase this again since there's some massive changes going on in master, right? |
…h compares in O(1).
…ch compares in O(1).
…ly composed prior to iteration) with the "any" and "all" containment operators.
…lls to isinstance.
…ashable, or comes from the user (and does not have a documented type) where the type may vary.
…alt.cloud.clouds.nova with six.viewkeys() as the author was originally checking for the existence of a key.
… comma when defining a set.
61c35dc
to
1fa7d6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, rebased to accommodate the deprecation of raet and double-checked.
I am also curious about total test suite time, does this create an at-all measurable reduction in total test suite execution time? |
Is there a proper way to profile the tests? Like perhaps some way to either increase the resolution of the timestamps, or a good place to integrate cProfile into the unit tests? or should i script a for-loop through each individually and use Actually speaking of modifying the test suite, where's a good place to import modules that the test suite depends on? One of my more "important" PRs is blocked on that question and has been like that for quite a while... |
@arizvisa It's a nice optimization and something we should be aware of. However, I'd be surprised if there much of a measurable performance increase. Especially since most of these sets only have a few items in them. Overall, I'd like to see these change broken up into several pull requests. |
Yeah, its ok. I dropped this anyways. It was more to facilitate discussion on measuring performance of various components in saltstack which unfortunately didn't actually happen. Also, until you guys have a mean of prioritizing PRs, it's too hard to maintain something like this that touches a large number of files as development progresses. |
@arizvisa so do you want to close this PR? or break it up? I am not sure how to help facilitate movement forward here so simply asking. |
Yeah, I'll close this. It didn't get the response I'd have expected. This'd be something to think about in the future as other improvements have been added w/o testing for how it affects performance. |
What does this PR do?
This PR is probably offensive to some especially as it touches a lot of files. What it does is change all membership tests against lists/tuples to use sets. This is a linear-time O(n) to constant-time O(1) optimization. During the transition to Python3, this can actually be a big deal especially against large lists.
When testing membership against a list-like object, in most situations this will be iterated through a member at a time whilst comparing. If the last element of the list is the member to be returned, this will hit the worst-case. Using a set will cost a hashtable lookup which in most cases is faster when testing membership.
As an example of how much of a difference this optimization can make, here's some example code. This code is trying to simulate the membership that is used in numerous places by salt where a string is tested against a list of strings or a list comprehension for validation. The loop count is set to a large amount only to make the difference in timings more distinguishable and as such will typically not be as high except for maybe in the chocolatey package (iirc).
Running with Python2 and Python3 using a non-matching string shows the set being faster:
Running with Python2 and Python3 using the last-matching string again shows the set is faster:
This patch converts both "in LIST", and "in TUPLE" with "in SET". A few containment operators were also changed to use "any" and "all" instead of using membership as iteration can be faster rather than generating the whole type and testing for membership. The states/mssql_*.py files also were using membership to check types, and were replaced with usage of the
isinstance
keyword.What issues does this PR fix or reference?
This is just a small step towards improving performance. Some steps in the future would be to move closure definitions out of loops, re-factoring exceptions so the default path is the hot-path, etc. These other suggestions might require changing semantics and so will likely require more work.
Tests written?
No. The original tests should be fine if they're comprehensive.
Commits signed with GPG?
No.