Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Move transaction processing to a subprocess #1526

Closed
DemiMarie opened this issue Feb 3, 2021 · 12 comments
Closed

RFE: Move transaction processing to a subprocess #1526

DemiMarie opened this issue Feb 3, 2021 · 12 comments

Comments

@DemiMarie
Copy link
Contributor

DemiMarie commented Feb 3, 2021

As described in #1483, performing an RPM transaction from a multithreaded process will very likely result in Undefined Behavior. Furthermore, if RPM performs any database operations with an altered root directory, this will also result in Undefined Behavior, as SQLite will use an incorrect WAL.

This can be fixed by moving all transaction processing to a subprocess. Due to POSIX restrictions on fork() in a multi-threaded process, this subprocess would need to be a separate binary, and would use stdio to communicate with the parent.

@Conan-Kudo
Copy link
Member

Subprocessing a transaction would make this much more brittle, since it would expose RPM to weaknesses in POSIX itself wrt system software data replacement. Think for example if rpm is upgrading rpm: having a separate binary means that we need to design a complex method to handle that the rpm binaries are being replaced, rather than relying on the in-memory program data footprint that comes from DNF using librpm and holding it in memory while it works through everything.

RPM already has locking semantics to implement a "write once, read many" setup, so I'm not sure we actually need to do much more than beef this up with the SQLite database backend.

@DemiMarie
Copy link
Contributor Author

If a single subprocess is used for the entire transaction, then I imagine those problems would go away.

@Conan-Kudo
Copy link
Member

That gets us back to square one, though. Making this MT safe is effectively pointless since we're still constrained to one process no matter what.

@DemiMarie
Copy link
Contributor Author

Not really. The difference is that the subprocess would be created and managed by librpm itself. That means that librpm itself is thread-safe, which is a hard requirement for embedding librpm in certain scenarios.

@Conan-Kudo
Copy link
Member

What scenario do you want to embed librpm that requires this that would also do transactions? Because pretty much all MT-safe operations would generally not require doing transactions...

@DemiMarie
Copy link
Contributor Author

I (and rpm-ostree) want to be able to run a transaction from a multi-threaded parent process. This is only possible if the actual transaction is done in a child process managed by librpm.

@DemiMarie
Copy link
Contributor Author

Also, Rust, Java, .NET, glib, and several other languages, runtimes, and frameworks require that all code must be thread-safe, full stop. A Java or .NET VM will always have multiple threads running, and a GTK or QT application must assume that it will. Using the RPM transaction APIs from such a process is currently Undefined Behavior. Rust programs are not all multi-threaded, but Rust libraries are required to work in multithreaded programs, which means that the librpm.rs bindings are probably unsound.

@Conan-Kudo
Copy link
Member

Using the RPM transaction APIs from such a process is currently Undefined Behavior.

All such environments you listed also provide a way to constrain threading behavior when you need to, because it's unrealistic to actually mandate that at the layers below it. Even Python, Perl, and Ruby have this. I know Java definitely does.

The phrase "undefined behavior" (in title case or no) isn't enough in itself to justify breaking the librpm architecture.

@DemiMarie
Copy link
Contributor Author

Using the RPM transaction APIs from such a process is currently Undefined Behavior.

All such environments you listed also provide a way to constrain threading behavior when you need to, because it's unrealistic to actually mandate that at the layers below it. Even Python, Perl, and Ruby have this. I know Java definitely does.

The phrase "undefined behavior" (in title case or no) isn't enough in itself to justify breaking the librpm architecture.

Java, at least, does not support programs that call chdir(), much less chroot(). #1483 (comment) is an example of this being a problem in the real world.

@pmatilai
Copy link
Member

This comes up every now and then. Running the transaction in a sub-process would of course be the sane thing to do, but within the existing rpm architecture it's quite impossible to do in rpm itself. We have no plans to work on this.

@DemiMarie
Copy link
Contributor Author

@pmatilai what about RPMv6? Asking because RPMv6 can break backwards compat.

Right now anyone wanting to run RPM transactions from a multithreaded process needs to do this themselves.

@DemiMarie
Copy link
Contributor Author

This comes up every now and then. Running the transaction in a sub-process would of course be the sane thing to do, but within the existing rpm architecture it's quite impossible to do in rpm itself. We have no plans to work on this.

Would you mind explaining? I am a bit confused what you mean by “rpm architecture” here. Could the RPM CLI be expanded to do everything that can be done via the API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants