Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python bindings: Load all modules with RTLD_GLOBAL #1618

Merged
merged 1 commit into from Sep 6, 2023

Conversation

markjdb
Copy link
Contributor

@markjdb markjdb commented Aug 24, 2023

libdnf's python bindings are implemented by a set of C++ shared objects generated by swig. Some generated code is duplicated between modules, in particular the SwigPyIterator class templates, which use exceptions of type swig::stop_iteration to signal an end-of-iteration condition. The modules do not depend on each other and thus belong to different DAGs from the perspective of the runtime linker.

It turns out that this stop_iteration exception can be thrown between modules. This happens at least during dnf startup with python 3.9:

cli.py(935): subst.update_from_etc(from_root, varsdir=conf._get_value('varsdir'))
--- modulename: config, funcname: _get_value
config.py(102): method = getattr(self._config, name, None)
config.py(103): if method is None:
config.py(105): return method().getValue()
--- modulename: conf, funcname: varsdir
conf.py(1183): return _conf.ConfigMain_varsdir(self)
--- modulename: conf, funcname: getValue
conf.py(512): return _conf.OptionStringList_getValue(self)
--- modulename: substitutions, funcname: update_from_etc
substitutions.py(47): for vars_path in varsdir:
--- modulename: module, funcname: iter
module.py(557): return self.iterator()
--- modulename: module, funcname: iterator
module.py(555): return module.VectorString_iterator(self)
--- modulename: transaction, funcname: next
transaction.py(94): return transaction.SwigPyIterator___next
(self)

In particular, the module and transaction modules are somehow both involved: module returns the iterator, and transaction advances the iterator. Both modules contain the same iterator code, so I'm not sure why it works this way. The behaviour is sensitive to import order; for example, if transaction is imported before module, then the code above ends up using module's implementation of SwigPyItreator___next__.

In any case, the use of swig::stop_iteration is broken in the above scenario since the exception is thrown by module with module.so's copy of the swig::stop_iteration type info, and caught by transaction.so using transaction.so's copy of the type info, resulting in an uncaught exception.

Work around the problem by loading all modules with RTLD_GLOBAL to ensure that RTTI is unique. This is required when throwing exceptions across DSO boundaries, see https://gcc.gnu.org/faq.html#dso for example.

libdnf's python bindings are implemented by a set of C++ shared objects
generated by swig.  Some generated code is duplicated between modules,
in particular the SwigPyIterator class templates, which use exceptions
of type swig::stop_iteration to signal an end-of-iteration condition.
The modules do not depend on each other and thus belong to different
DAGs from the perspective of the runtime linker.

It turns out that this stop_iteration exception can be thrown between
modules.  This happens at least during dnf startup with python 3.9:

cli.py(935):         subst.update_from_etc(from_root, varsdir=conf._get_value('varsdir'))
 --- modulename: config, funcname: _get_value
config.py(102):         method = getattr(self._config, name, None)
config.py(103):         if method is None:
config.py(105):         return method().getValue()
 --- modulename: conf, funcname: varsdir
conf.py(1183):         return _conf.ConfigMain_varsdir(self)
 --- modulename: conf, funcname: getValue
conf.py(512):         return _conf.OptionStringList_getValue(self)
 --- modulename: substitutions, funcname: update_from_etc
substitutions.py(47):         for vars_path in varsdir:
 --- modulename: module, funcname: __iter__
module.py(557):         return self.iterator()
 --- modulename: module, funcname: iterator
module.py(555):         return _module.VectorString_iterator(self)
 --- modulename: transaction, funcname: __next__
transaction.py(94):         return _transaction.SwigPyIterator___next__(self)

In particular, the module and transaction modules are somehow both
involved: module returns the iterator, and transaction advances the
iterator.  Both modules contain the same iterator code, so I'm not sure
why it works this way.  The behaviour is sensitive to import order; for
example, if transaction is imported before module, then the code above
ends up using module's implementation of SwigPyItreator___next__.

In any case, the use of swig::stop_iteration is broken in the above
scenario since the exception is thrown by module with module.so's copy
of the swig::stop_iteration type info, and caught by transaction.so
using transaction.so's copy of the type info, resulting in an uncaught
exception.

Work around the problem by loading all modules with RTLD_GLOBAL to
ensure that RTTI is unique.  This is required when throwing exceptions
across DSO boundaries, see https://gcc.gnu.org/faq.html#dso for example.
markjdb added a commit to markjdb/freebsd-ports that referenced this pull request Aug 24, 2023
Otherwise we can end up with duplicate runtime type information which
breaks exception handling in libdnf.

Submitted upstream as a pull request:
rpm-software-management/libdnf#1618

Sponsored by:	Klara, Inc.
@jan-kolarik jan-kolarik self-assigned this Aug 28, 2023
@jan-kolarik
Copy link
Member

jan-kolarik commented Sep 4, 2023

Hi Mark, thanks for the proposed fix and detailed description! I was looking into it and it kinda seems to me like it's related just to a specific SWIG / Python version, because I was not able to reproduce it with standard setup in current Fedora 38. Could you please share more info about your configuration and the reproducer?

@markjdb
Copy link
Contributor Author

markjdb commented Sep 4, 2023

Hi Mark, thanks for the proposed fix and detailed description! I was looking into it and it kinda seems to me like it's related just to a specific SWIG / Python version, because I was not able to reproduce it with standard setup in current Fedora 38. Could you please share more info about your configuration and the reproducer?

Yes, I suspect that this is somewhat toolchain/OS-dependent. I'm running on FreeBSD with python 3.9.17 and swig 4.1.1. The crash can be triggered by running any dnf subcommand, the stack trace looks like this:

(gdb) bt                                                                                                                                                                                                                                                                                                                      
#0  thr_kill () at thr_kill.S:4                                                
#1  0x0000000825d0dfb4 in __raise (s=s@entry=6) at /root/freebsd/lib/libc/gen/raise.c:50
#2  0x0000000825dbe8c9 in abort () at /root/freebsd/lib/libc/stdlib/abort.c:65                                                                                 
#3  0x00000008392a7d29 in report_failure(_Unwind_Reason_Code, __cxxabiv1::__cxa_exception*) (err=<optimized out>, thrown_exception=0x32072a054da0)
    at /root/freebsd/contrib/libcxxrt/exception.cc:714                                                                                                         
#4  0x000000084f6a501e in swig::SwigPyForwardIteratorClosed_T<std::__1::__wrap_iter<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, swig::from_oper<std::__1::basic_string<char, std::__1::cha
r_traits<char>, std::__1::allocator<char> > > >::value() const (this=0x32072ae511c0)       
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/.build/bindings/python/CMakeFiles/_module.dir/modulePYTHON_wrap.cxx:4484
#5  0x00000008500ba56a in swig::SwigPyIterator::next() (this=0x32072ae511c0)
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/.build/bindings/python/CMakeFiles/_transaction.dir/transactionPYTHON_wrap.cxx:3260
#6  0x00000008500ba5a5 in swig::SwigPyIterator::__next__() (this=0x32072ae511c0)
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/.build/bindings/python/CMakeFiles/_transaction.dir/transactionPYTHON_wrap.cxx:3269
#7  0x000000085006528c in _wrap_SwigPyIterator___next__(_object*, _object*) (self=<module at remote 0x32072b1fcb80>, args=Python Exception <class 'gdb.error'>: There is no member named ma_keys.
)                                                                                                                                                              
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/.build/bindings/python/CMakeFiles/_transaction.dir/transactionPYTHON_wrap.cxx:6190
#8  0x000000082143d0ef in  () at /usr/local/lib/libpython3.9.so.1.0                                                                                            
#9  0x00000008214c6c0b in  () at /usr/local/lib/libpython3.9.so.1.0
#10 0x00000008214c3bbf in _PyEval_EvalFrameDefault () at /usr/local/lib/libpython3.9.so.1.0
#11 0x00000008213ff9fa in  () at /usr/local/lib/libpython3.9.so.1.0
#12 0x000000082145f623 in  () at /usr/local/lib/libpython3.9.so.1.0                                                                                            
#13 0x000000082145b950 in  () at /usr/local/lib/libpython3.9.so.1.0
#14 0x000000082141c412 in  () at /usr/local/lib/libpython3.9.so.1.0
#15 0x00000008213e8e97 in PySequence_Fast () at /usr/local/lib/libpython3.9.so.1.0        
#16 0x000000085732f4cf in pySequenceConverter(_object*) (pySequence=Python Exception <class 'gdb.error'>: There is no member named ma_keys.
)                                                                              
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/libdnf-0.70.2/python/hawkey/iutil-py.cpp:340
#17 0x000000085733b7a7 in filter_add(libdnf::Query*, long, int, _object*) (query=0x32072aa215b0, keyname=8, cmp_type=256, match=Python Exception <class 'gdb.error'>: There is no member named ma_keys.
)                                                                              
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/libdnf-0.70.2/python/hawkey/query-py.cpp:412
#18 0x000000085733a799 in filter_internal(libdnf::Query*, libdnf::Selector*, _object*, _object*, _object*)
    (query=0x32072aa215b0, sltr=0x0, sack=<Sack at remote 0x32072b9f29a0>, args=(), kwds=Python Exception <class 'gdb.error'>: There is no member named ma_keys.
)                                                                              
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/libdnf-0.70.2/python/hawkey/query-py.cpp:523
#19 0x000000085733cc31 in filterm(_QueryObject*, _object*, _object*) (self=0x32072ba085d0, args=(), kwds=Python Exception <class 'gdb.error'>: There is no member named ma_keys.
)
    at /usr/home/markj/src/freebsd-ports/sysutils/libdnf/work/libdnf-0.70.2/python/hawkey/query-py.cpp:571
#20 0x000000082143d4fd in  () at /usr/local/lib/libpython3.9.so.1.0
#21 0x00000008213ff1de in _PyObject_MakeTpCall () at /usr/local/lib/libpython3.9.so.1.0
#22 0x00000008214c6cdc in  () at /usr/local/lib/libpython3.9.so.1.0
#23 0x00000008214c3cfd in _PyEval_EvalFrameDefault () at /usr/local/lib/libpython3.9.so.1.0
#24 0x00000008213ff9fa in  () at /usr/local/lib/libpython3.9.so.1.0
#25 0x00000008214c6c0b in  () at /usr/local/lib/libpython3.9.so.1.0
#26 0x00000008214c3c57 in _PyEval_EvalFrameDefault () at /usr/local/lib/libpython3.9.so.1.0
#27 0x00000008214c7806 in  () at /usr/local/lib/libpython3.9.so.1.0
#28 0x00000008213ff8f5 in _PyFunction_Vectorcall () at /usr/local/lib/libpython3.9.so.1.0
#29 0x00000008214c6c0b in  () at /usr/local/lib/libpython3.9.so.1.0
#30 0x00000008214c3ba8 in _PyEval_EvalFrameDefault () at /usr/local/lib/libpython3.9.so.1.0
#31 0x00000008213ff9fa in  () at /usr/local/lib/libpython3.9.so.1.0
#32 0x00000008214c6c0b in  () at /usr/local/lib/libpython3.9.so.1.0
#33 0x00000008214c3c57 in _PyEval_EvalFrameDefault () at /usr/local/lib/libpython3.9.so.1.0
#34 0x00000008214c7806 in  () at /usr/local/lib/libpython3.9.so.1.0
#35 0x00000008213ff8f5 in _PyFunction_Vectorcall () at /usr/local/lib/libpython3.9.so.1.0
#36 0x00000008214c6c0b in  () at /usr/local/lib/libpython3.9.so.1.0
#37 0x00000008214c3c57 in _PyEval_EvalFrameDefault () at /usr/local/lib/libpython3.9.so.1.0
#38 0x00000008214c7806 in  () at /usr/local/lib/libpython3.9.so.1.0
#39 0x00000008213ff8f5 in _PyFunction_Vectorcall () at /usr/local/lib/libpython3.9.so.1.0
#40 0x00000008214c6c0b in  () at /usr/local/lib/libpython3.9.so.1.0
#41 0x00000008214c3cfd in _PyEval_EvalFrameDefault () at /usr/local/lib/libpython3.9.so.1.0
#42 0x00000008214c7806 in  () at /usr/local/lib/libpython3.9.so.1.0
#43 0x00000008214bded1 in PyEval_EvalCode () at /usr/local/lib/libpython3.9.so.1.0
#44 0x0000000821503661 in  () at /usr/local/lib/libpython3.9.so.1.0
#45 0x00000008215037f9 in  () at /usr/local/lib/libpython3.9.so.1.0
#46 0x0000000821501c7b in PyRun_SimpleFileExFlags () at /usr/local/lib/libpython3.9.so.1.0
#47 0x000000082151e7d3 in Py_RunMain () at /usr/local/lib/libpython3.9.so.1.0
#48 0x000000082151ecb5 in  () at /usr/local/lib/libpython3.9.so.1.0
#49 0x000000082151ed2a in Py_BytesMain () at /usr/local/lib/libpython3.9.so.1.0
#50 0x0000000825ce2b5a in __libc_start1 (argc=3, argv=0x8209927a8, env=0x8209927c8, cleanup=<optimized out>, mainX=0x201740)
    at /root/freebsd/lib/libc/csu/libc_start1.c:157
#51 0x00000000002016d0 in _start ()

Here the abort occurs because the C++ runtime didn't find a matching exception handler due to the fact that the different libdnf python modules bring duplicate RTTI in with them. (Or maybe there is some difference in behaviour with the runtime linker when the python modules are imported.)

I'm happy to try and grab additional info, but I don't know the libdnf code well enough to understand what exactly is happening in the stack trace above.

@jan-kolarik
Copy link
Member

Thanks. Although I couldn't reproduce this exact problem on Fedora containers, the fix itself seems pretty safe and I haven't noticed any difference when testing it using dnf command on various containers ...

@jan-kolarik jan-kolarik merged commit 8a8548d into rpm-software-management:dnf-4-master Sep 6, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants