New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python bindings: Load all modules with RTLD_GLOBAL #1618
python bindings: Load all modules with RTLD_GLOBAL #1618
Conversation
libdnf's python bindings are implemented by a set of C++ shared objects generated by swig. Some generated code is duplicated between modules, in particular the SwigPyIterator class templates, which use exceptions of type swig::stop_iteration to signal an end-of-iteration condition. The modules do not depend on each other and thus belong to different DAGs from the perspective of the runtime linker. It turns out that this stop_iteration exception can be thrown between modules. This happens at least during dnf startup with python 3.9: cli.py(935): subst.update_from_etc(from_root, varsdir=conf._get_value('varsdir')) --- modulename: config, funcname: _get_value config.py(102): method = getattr(self._config, name, None) config.py(103): if method is None: config.py(105): return method().getValue() --- modulename: conf, funcname: varsdir conf.py(1183): return _conf.ConfigMain_varsdir(self) --- modulename: conf, funcname: getValue conf.py(512): return _conf.OptionStringList_getValue(self) --- modulename: substitutions, funcname: update_from_etc substitutions.py(47): for vars_path in varsdir: --- modulename: module, funcname: __iter__ module.py(557): return self.iterator() --- modulename: module, funcname: iterator module.py(555): return _module.VectorString_iterator(self) --- modulename: transaction, funcname: __next__ transaction.py(94): return _transaction.SwigPyIterator___next__(self) In particular, the module and transaction modules are somehow both involved: module returns the iterator, and transaction advances the iterator. Both modules contain the same iterator code, so I'm not sure why it works this way. The behaviour is sensitive to import order; for example, if transaction is imported before module, then the code above ends up using module's implementation of SwigPyItreator___next__. In any case, the use of swig::stop_iteration is broken in the above scenario since the exception is thrown by module with module.so's copy of the swig::stop_iteration type info, and caught by transaction.so using transaction.so's copy of the type info, resulting in an uncaught exception. Work around the problem by loading all modules with RTLD_GLOBAL to ensure that RTTI is unique. This is required when throwing exceptions across DSO boundaries, see https://gcc.gnu.org/faq.html#dso for example.
Otherwise we can end up with duplicate runtime type information which breaks exception handling in libdnf. Submitted upstream as a pull request: rpm-software-management/libdnf#1618 Sponsored by: Klara, Inc.
Hi Mark, thanks for the proposed fix and detailed description! I was looking into it and it kinda seems to me like it's related just to a specific SWIG / Python version, because I was not able to reproduce it with standard setup in current Fedora 38. Could you please share more info about your configuration and the reproducer? |
Yes, I suspect that this is somewhat toolchain/OS-dependent. I'm running on FreeBSD with python 3.9.17 and swig 4.1.1. The crash can be triggered by running any
Here the abort occurs because the C++ runtime didn't find a matching exception handler due to the fact that the different libdnf python modules bring duplicate RTTI in with them. (Or maybe there is some difference in behaviour with the runtime linker when the python modules are imported.) I'm happy to try and grab additional info, but I don't know the libdnf code well enough to understand what exactly is happening in the stack trace above. |
Thanks. Although I couldn't reproduce this exact problem on Fedora containers, the fix itself seems pretty safe and I haven't noticed any difference when testing it using |
8a8548d
into
rpm-software-management:dnf-4-master
libdnf's python bindings are implemented by a set of C++ shared objects generated by swig. Some generated code is duplicated between modules, in particular the SwigPyIterator class templates, which use exceptions of type swig::stop_iteration to signal an end-of-iteration condition. The modules do not depend on each other and thus belong to different DAGs from the perspective of the runtime linker.
It turns out that this stop_iteration exception can be thrown between modules. This happens at least during dnf startup with python 3.9:
cli.py(935): subst.update_from_etc(from_root, varsdir=conf._get_value('varsdir'))
--- modulename: config, funcname: _get_value
config.py(102): method = getattr(self._config, name, None)
config.py(103): if method is None:
config.py(105): return method().getValue()
--- modulename: conf, funcname: varsdir
conf.py(1183): return _conf.ConfigMain_varsdir(self)
--- modulename: conf, funcname: getValue
conf.py(512): return _conf.OptionStringList_getValue(self)
--- modulename: substitutions, funcname: update_from_etc
substitutions.py(47): for vars_path in varsdir:
--- modulename: module, funcname: iter
module.py(557): return self.iterator()
--- modulename: module, funcname: iterator
module.py(555): return module.VectorString_iterator(self)
--- modulename: transaction, funcname: next
transaction.py(94): return transaction.SwigPyIterator___next(self)
In particular, the module and transaction modules are somehow both involved: module returns the iterator, and transaction advances the iterator. Both modules contain the same iterator code, so I'm not sure why it works this way. The behaviour is sensitive to import order; for example, if transaction is imported before module, then the code above ends up using module's implementation of SwigPyItreator___next__.
In any case, the use of swig::stop_iteration is broken in the above scenario since the exception is thrown by module with module.so's copy of the swig::stop_iteration type info, and caught by transaction.so using transaction.so's copy of the type info, resulting in an uncaught exception.
Work around the problem by loading all modules with RTLD_GLOBAL to ensure that RTTI is unique. This is required when throwing exceptions across DSO boundaries, see https://gcc.gnu.org/faq.html#dso for example.