Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run binary package generation and file classification in parallel threads #695

Merged
merged 9 commits into from
May 21, 2019

Conversation

pmatilai
Copy link
Member

@pmatilai pmatilai commented May 7, 2019

This is a more elaborate version of #226, with various internal issues fixed first for cleaner and simpler resulting code.

@pmatilai pmatilai changed the title Run binary package generation in parallel threads Run binary package generation and file classification in parallel threads May 8, 2019
@pmatilai
Copy link
Member Author

pmatilai commented May 8, 2019

Added parallel file classification, which is beneficial for some cases.

Most of the time, the dependency generator is the biggest bottle-neck however. That's a much tougher nut to crack though: there are huge gains to be made even without parallelisation there, but it needs a massive overhaul to make that possible.

@voxik
Copy link
Contributor

voxik commented May 9, 2019

I just wonder, how this influences the console output. Will it be still always readable or will there happen some output race conditions and it won't be readable anymore?

@pmatilai
Copy link
Member Author

pmatilai commented May 9, 2019

Well, why don't you see for yourself?

These are clearly separate operations that deserve functions of their
own, and this makes memory management nicer. In addition, in case
the directory creation fails we now actually error out instead
of trying to continue, and take care not to fail in case somebody
created it behind our back. No other functional changes though.
No functional changes and doesn't make much difference here, but we'll
need this later.
Now that we can, split the "lets run something on generated packages"
check out of packageBinaries(), it doesn't belong there at all. No
functional changes other fix attempt to check non-built packages
which return with no filename but RPMRC_OK from packageBinary().
No functional change, but we'll need this later.
The comment has been wrong for more than twenty years...
No functional changes here, but this will make a difference in the
next commits.
Enable OpenMP use in librpmbuild and set the number of OMP threads
from rpm config after spec parsing. The place matters as we want to
allow individual specs to control and disable parallel builds.
It's worth noting that this really is walking on thin ice as only few
parts of rpm are thread-protected. The spec is entirely unprotected so
must be accessed only for read-only purposes from parallel jobs (and
we should work towards enforcing that via const-correctness and other
protection as needed), and similarly the package struct and headers
are unprotected so they can only be manipulated within a single thread.

Based on initial work by Alexander Kanavin in PR rpm-software-management#226
@pmatilai
Copy link
Member Author

@ffesti spotted a bug in the binaryPackages() for loop logic, fixed in the first push but then a regression introduced in commit 0f21bdd caused one test to fail. Now rebased on top of the regression fix... pffft.

Our file classification is not exactly in large quantities. Fedora's
kernel-debuginfo-common has circa 25000 files, and classifying them
serially takes about a minute on my rusty old T520. With parallelization
this goes down to ~24s.

It's all remarkably simple, except for the fact that libmagic is not
thread-safe so we need separate magic handles for each of our threads.
This will leak those libmagic handles on error situations, I don't see
any obvious, nice way to handle that.
@pmatilai
Copy link
Member Author

pmatilai commented May 14, 2019

Maybe third time's the charm... while walking the dog I remembered the hash table added in commit e33045e that wasn't there when these patches were initially created needs thread protection now.

@ffesti ffesti merged commit 41f0e21 into rpm-software-management:master May 21, 2019
@pmatilai pmatilai deleted the parallel-build-pr branch May 21, 2019 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants