New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save_ts: fix string concat bottleneck. BZ 1600383 #86
Conversation
b81e8a6
to
330daa8
Compare
First off, as far as I can see the patch should change nothing ... so from a purely "who cares, it's faster" POV you can just push it if you want. However my understanding was that python string concat had been fixed significantly since the above text was written (hence the first sentence disputing everything below it in the link, and https://stackoverflow.com/questions/12169839/which-is-the-preferred-way-to-concatenate-a-string-in-python). Given that I'm somewhat concerned that you didn't replace everything with the list+join hack but have "normal" string concat in both sections too. It feels like either you did too much or not enough? Eg. I find it hard to believe converting the repo. section changes anything significantly. ... In short it'd be nice to know a bit more about why the string concat code you changed is taking ~20seconds and why lots of other string concat code isn't taking anywhere near that time. Also, from a completely different angle, this code (txmbr dumping) probably shouldn't run at all for anaconda. |
That's right. With modern Pythons, the concat performance really should be comparable to that of the (formerly preferable) list+join idiom. At least for the most common cases.. Unfortunately, according to my experiments, we seem to be hitting some weird and unoptimized edge cases here. Either that, or there's a bug in Python or Anaconda starting with RHEL-7.5. What I discovered is, when Anaconda calls None of that happens with the exact same Yum and Anaconda builds on RHEL-7.4. Whatever the root cause, using the list+join idiom helped a great deal when I was testing this. This is a small and safe change (and apparently still quite reasonable) so I thought we could use it as a fast workaround before the root cause gets investigated and possibly fixed in the right places (probably Python or Anaconda).
I wanted to keep the patch as small as possible so I only replaced those concats in
Agreed. Actually, configuring the |
Cool, looks like you've looked into it more than enough :). Should probably add a comment in the code linking somewhere someone can see all this data, apart from that minor thing feel free to push whatever version you want. |
8f5dc0d
to
ae401df
Compare
Instead of accumulating a long string object, build up a list and join it at the end into a string. Even though modern Pythons are optimized for such string concatenations, we're hitting some edge cases in Anaconda where the save_ts() method becomes really slow when dumping large transactions (~80 seconds for a "Server with GUI" selection including all Add-Ons on RHEL-7.6). Using the list paradigm here (which is a good practice[1] anyway) reduces that time to ~20 seconds for the same transaction. Of course, the proper fix should land in Anaconda/Python[2] eventually but let's do this as a hotfix for now. [1] https://docs.python.org/3/faq/programming.html#what-is-the-most-efficient-way-to-concatenate-many-strings-together [2] https://bugzilla.redhat.com/show_bug.cgi?id=1674016
Previously, the way we built the transaction dump was quite inefficient
(appending to a string in a number of for-loops). This caused a major
delay in Anaconda during the depsolve phase when a large transaction was
selected (~80 seconds for a "Server with GUI" selection including all
Add-Ons in a RHEL-7.6 VM).
This commit uses a more efficient method [1] in those for-loops that
involves appending to a list instead. This cuts down the depsolve time
quite a bit (to ~20 seconds according to my tests).
[1] https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation