-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run binary package generation and file classification in parallel threads #695
Run binary package generation and file classification in parallel threads #695
Conversation
Added parallel file classification, which is beneficial for some cases. Most of the time, the dependency generator is the biggest bottle-neck however. That's a much tougher nut to crack though: there are huge gains to be made even without parallelisation there, but it needs a massive overhaul to make that possible. |
I just wonder, how this influences the console output. Will it be still always readable or will there happen some output race conditions and it won't be readable anymore? |
Well, why don't you see for yourself? |
2e373de
to
d0f6d5f
Compare
These are clearly separate operations that deserve functions of their own, and this makes memory management nicer. In addition, in case the directory creation fails we now actually error out instead of trying to continue, and take care not to fail in case somebody created it behind our back. No other functional changes though.
No functional changes and doesn't make much difference here, but we'll need this later.
Now that we can, split the "lets run something on generated packages" check out of packageBinaries(), it doesn't belong there at all. No functional changes other fix attempt to check non-built packages which return with no filename but RPMRC_OK from packageBinary().
No functional change, but we'll need this later.
The comment has been wrong for more than twenty years... No functional changes here, but this will make a difference in the next commits.
Enable OpenMP use in librpmbuild and set the number of OMP threads from rpm config after spec parsing. The place matters as we want to allow individual specs to control and disable parallel builds.
It's worth noting that this really is walking on thin ice as only few parts of rpm are thread-protected. The spec is entirely unprotected so must be accessed only for read-only purposes from parallel jobs (and we should work towards enforcing that via const-correctness and other protection as needed), and similarly the package struct and headers are unprotected so they can only be manipulated within a single thread. Based on initial work by Alexander Kanavin in PR rpm-software-management#226
d0f6d5f
to
e6c79e5
Compare
Our file classification is not exactly in large quantities. Fedora's kernel-debuginfo-common has circa 25000 files, and classifying them serially takes about a minute on my rusty old T520. With parallelization this goes down to ~24s. It's all remarkably simple, except for the fact that libmagic is not thread-safe so we need separate magic handles for each of our threads. This will leak those libmagic handles on error situations, I don't see any obvious, nice way to handle that.
b42103a
to
94e5708
Compare
Maybe third time's the charm... while walking the dog I remembered the hash table added in commit e33045e that wasn't there when these patches were initially created needs thread protection now. |
This is a more elaborate version of #226, with various internal issues fixed first for cleaner and simpler resulting code.