Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when used in python 3.8 with hmac #851

Closed
eduardomezencio opened this issue Jan 19, 2020 · 13 comments
Closed

Crash when used in python 3.8 with hmac #851

eduardomezencio opened this issue Jan 19, 2020 · 13 comments
Assignees
Labels
Milestone

Comments

@eduardomezencio
Copy link

@eduardomezencio eduardomezencio commented Jan 19, 2020

Edit: I renamed the issue now that I found that bezier is not needed to reproduce the problem. Please, see my next comment.

It's quite difficult to point who's to blame for this issue, but I'm reporting here since panda3d is one of the packages involved.

Using python 3.8.1, panda3d 1.10.5, bezier 2020.1.14 in arch linux.

The problem is that if I import the bezier package before I instantiate ShowBase, nothing wrong happens. Importing it after causes a crash with message "malloc(): invalid size (unsorted)"

The problem happens while importing bezier, so it makes more sense to report to that package, but the fact that it only happens in this specific situation involving panda3d bugs me. It does not happen with python 3.7, so may be something wrong with python 3.8.

Example without error:

import bezier

from direct.showbase.ShowBase import ShowBase
base = ShowBase()
base.run()

Example with error:

from direct.showbase.ShowBase import ShowBase
base = ShowBase()

import bezier

base.run()
@eduardomezencio eduardomezencio changed the title Crash when used in python 3.8 with pypi bezier package Crash when used in python 3.8 with hmac Jan 19, 2020
@eduardomezencio

This comment has been minimized.

Copy link
Author

@eduardomezencio eduardomezencio commented Jan 19, 2020

After getting in touch with the bezier package developer, he found that the problem happened when importing the default python package hmac, thus removing the bezier package from the equation.

He also found that the error cannot be reproduced on macos, but happens in arch linux and ubuntu 18.04. I still have no idea about windows.

Now here's the more minimal error example:

from direct.showbase.ShowBase import ShowBase
base = ShowBase()
import hmac
@dhermes

This comment has been minimized.

Copy link

@dhermes dhermes commented Jan 20, 2020

Howdy, maintainer of bezier here. I saw the report and got interested. I traced the issue down to a single line in hashlib.py in the standard library. The import of hashlib.py boils down to any (or all) of the following 5 lines:

import _hashlib


_hashlib.openssl_md5()
_hashlib.openssl_sha1()
_hashlib.openssl_sha224()
_hashlib.openssl_sha256()
_hashlib.openssl_sha384()
@CFSworks

This comment has been minimized.

Copy link
Member

@CFSworks CFSworks commented Jan 20, 2020

I've been trying very hard to replicate this in an Arch container and have gotten nowhere so far:

# python
Python 3.8.1 (default, Jan  8 2020, 23:09:20) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from direct.showbase.ShowBase import ShowBase
>>> base = ShowBase()
Known pipe types:
  glxGraphicsPipe
(all display modules loaded.)
:display:x11display(warning): Couldn't open input method.
:device(error): Error adding inotify watch on /dev/input: No such file or directory
:device(error): Error opening directory /dev/input: No such file or directory
:audio(error): Couldn't open default OpenAL device
:audio(error): OpenALAudioManager: No open device or context
:audio(error):   OpenALAudioManager is not valid, will use NullAudioManager
:audio(error): Couldn't open default OpenAL device
:audio(error): OpenALAudioManager: No open device or context
:audio(error):   OpenALAudioManager is not valid, will use NullAudioManager
>>> 
>>> import bezier
>>> 

Panda does use OpenSSL, so it's not out of the realm of possibility that there's a conflict with how Panda initializes OpenSSL vs. how _hashlib wants OpenSSL to be initialized, but without seeing this malloc error myself, I can't offer much more than that speculation. :(

@eduardomezencio would you be willing to run the problem snippet with Python running under valgrind? This sounds like memory mismanagement and Valgrind's memcheck tool (the default) is very good at sniffing that out.

@rdb

This comment has been minimized.

Copy link
Member

@rdb rdb commented Jan 20, 2020

I can't reproduce this on Arch Linux with my normal source build, but I get this with a pip-installed build of Panda3D:

free(): invalid pointer
Aborted (core dumped)

Did you also use a pip-installed build?

It is worth noting that when installing the manylinux wheel from pip, Panda statically links with an old version of OpenSSL (1.0.2r), which may not match the version of OpenSSL that _hashlib was built against. So there are two different versions of OpenSSL getting loaded at the same time. I wonder if that is the source of our trouble.

@rdb rdb added the bug label Jan 20, 2020
@eduardomezencio

This comment has been minimized.

Copy link
Author

@eduardomezencio eduardomezencio commented Jan 20, 2020

@rdb, yes, I'm using panda3d installed from pip. My process to reproduce the error was simply to create a virtual environment with python 3.8.1, install panda3d (1.10.5) in this environment using pip and run the code above.

@CFSworks, I tried running it with valgrind, and the output is quite big, so I'll attach it here. I don't know exactly what to look for in this output to check it myself, but it does report errors.

valgrind-output.txt

@rdb

This comment has been minimized.

Copy link
Member

@rdb rdb commented Jan 20, 2020

Apparently, according to nm, OpenSSL symbols are being exported from the Panda libraries. I think we have to compile with the -Wl,--exclude-libs,ALL option to isolate our version of OpenSSL.

@dhermes

This comment was marked as off-topic.

Copy link

@dhermes dhermes commented Jan 20, 2020

Is this project using auditwheel when creating manylinux wheels? I expected to see a .libs directory in your packaged wheel (I'm looking at panda3d-1.10.5-cp38-cp38-manylinux1_x86_64.whl).

Also, I recently realized manylinux1 support is effectively discontinued and starting using manylinux2010 (image is CentOS 6 vs. CentOS 5 for manylinux1).


ALSO: libpandaexpress.so.1.10 🤣


$ auditwheel show panda3d-1.10.5-cp38-cp38-manylinux1_x86_64.whl 

panda3d-1.10.5-cp38-cp38-manylinux1_x86_64.whl is consistent with the
following platform tag: "linux_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libgcc_s.so.1 with versions {'GCC_3.4',
'GCC_3.0'}, libm.so.6 with versions {'GLIBC_2.2.5'}, libdl.so.2 with
versions {'GLIBC_2.2.5'}, libpthread.so.0 with versions
{'GLIBC_2.3.2', 'GLIBC_2.2.5', 'GLIBC_2.3.3'}, libc.so.6 with versions
{'GLIBC_2.3.4', 'GLIBC_2.7', 'GLIBC_2.17', 'GLIBC_2.8', 'GLIBC_2.2.5',
'GLIBC_2.3', 'GLIBC_2.9', 'GLIBC_2.4', 'GLIBC_2.25', 'GLIBC_2.3.2',
'GLIBC_2.6', 'GLIBC_2.11', 'GLIBC_2.16', 'GLIBC_2.14', 'GLIBC_2.15',
'GLIBC_2.5', 'GLIBC_2.12', 'GLIBC_2.3.3'}, libstdc++.so.6 with
versions {'CXXABI_1.3.1', 'GLIBCXX_3.4', 'CXXABI_1.3',
'GLIBCXX_3.4.5'}, libcrypt.so.2 with versions {'XCRYPT_2.0'},
libnsl.so.1 with versions {'GLIBC_2.2.5'}, librt.so.1 with versions
{'GLIBC_2.2.5'}, libutil.so.1 with versions {'GLIBC_2.2.5'},
libresolv.so.2 with versions {'GLIBC_2.2.5'}, libz.so.1 with versions
{'ZLIB_1.2.2', 'ZLIB_1.2.9', 'ZLIB_1.2.3.4'}, libmount.so.1 with
versions {'MOUNT_2.19'}, libthai.so.0 with versions {'LIBTHAI_0.1'},
libdatrie.so.1 with versions {'DATRIE_0.2'}, libpng16.so.16 with
versions {'PNG16_0'}, libuuid.so.1 with versions {'UUID_1.0'},
libbsd.so.0 with versions {'LIBBSD_0.2'}, libblkid.so.1 with versions
{'BLKID_1.0', 'BLKID_2.17', 'BLKID_2.15'}

This constrains the platform tag to "linux_x86_64". In order to
achieve a more compatible tag, you would need to recompile a new wheel
from source on a system with earlier versions of these libraries, such
as a recent manylinux image.
@rdb

This comment was marked as off-topic.

Copy link
Member

@rdb rdb commented Jan 20, 2020

We don't use auditwheel to produce our wheels, but have our own script to do effectively the same thing that auditwheel does. The use of a .libs directory is a numpy convention and is not mentioned in PEP 513.

The dependencies you are seeing come from optional tools and third-party plug-ins that are included, that require more recent Linux versions. Most of what you pasted is a result of the inclusion of the pstats tool, which requires GTK and a host of other dependencies that we can't possibly package in a self-contained way (so we don't bother, and include it on an as-is basis). The core libraries, on the other hand, are manylinux1-compliant, so we still tag it manylinux1.

We don't have a compelling reason to upgrade to manylinux2010 at the moment. manylinux1 is still being updated. When building for manylinux1 becomes a hassle, we will probably upgrade.

@dhermes

This comment was marked as off-topic.

Copy link

@dhermes dhermes commented Jan 21, 2020

The use of a .libs directory is a numpy convention and is not mentioned in PEP 513.

I'm not sure when the usage of .libs originated, but it's now "standard" since auditwheel repair is a recommended part of the build process and it has a flag --lib-sdir that defaults to .libs.

@rdb rdb added this to the 1.10.6 milestone Jan 21, 2020
@rdb

This comment has been minimized.

Copy link
Member

@rdb rdb commented Jan 21, 2020

Back on topic: I produced a build of Panda that linked statically to OpenSSL 1.0.2, and indeed, it crashed when running @dhermes' example code.

Then I set the following:

export LDFLAGS="-Wl,--exclude-libs,libssl.a -Wl,--exclude-libs,libcrypto.a"

…and produced another build of Panda. It no longer crashed, which simultaneously proved my guess about the source of the issue and validated a potential solution.

The catch is that building Panda with this setting increased the total size of the Panda libraries and binaries by a whopping 5%, because OpenSSL is used by several components of Panda, and it ends up duplicated a bunch this way.

I'll keep looking to see if there are ways to avoid this, but we may have to just eat the 5% increase until we move things around a bit more within Panda.

@rdb rdb added the build label Jan 21, 2020
@rdb rdb self-assigned this Jan 21, 2020
rdb added a commit that referenced this issue Jan 21, 2020
This results in size savings for thirdparty libraries that are only used once, and a size increase for libraries used more than once (eg. OpenSSL).  More importantly, it prevents conflicts with other versions of the libraries loaded by other Python modules, such as the version of OpenSSL that the hmac module uses.

We need to be careful to only apply this for packages that are either used once, used in a plug-in module, or if we don't need to pass thirdparty library structures across Panda library boundaries.  For example, I haven't done this for Bullet, since the Bullet symbols need to be available through libpandabullet.so due to the fact that pandabullet contains calls to the Bullet libraries in the inline methods.

Fixes #851
@el-dee

This comment was marked as resolved.

Copy link
Contributor

@el-dee el-dee commented Jan 21, 2020

This commit breaks the build on macOS, clang ld does not support option --exclude-libs

@rdb

This comment was marked as resolved.

Copy link
Member

@rdb rdb commented Jan 21, 2020

I noticed the buildbot failure, I'm on it :)

rdb added a commit that referenced this issue Jan 21, 2020
Fixes build regression introduced in e78ce78 (see #851)
rdb added a commit that referenced this issue Jan 21, 2020
Part two of fix for build regression introduced in e78ce78 (see #851)
@rdb

This comment has been minimized.

Copy link
Member

@rdb rdb commented Jan 21, 2020

I managed to offset the loss by using -Wl,--exclude-libs, on other third-party libraries. My change introduces a savings of a few hundred KB. Deployed games will not need to ship make-prc-key of course, so there will be an additional 1384 KB saved there.

Old (KB) New (KB) Difference Library
13916 12208 - 1708 lib/libp3assimp.so
1848 1572 - 276 lib/libp3dtoolconfig.so.1.10
15604 15268 - 336 lib/libp3ffmpeg.so.1.10
996 984 - 12 lib/libp3openal_audio.so
168 364 + 196 lib/libp3vision.so.1.10
328 264 - 64 lib/libp3vrpn.so.1.10
1916 3200 + 1284 lib/libpandaexpress.so.1.10
20400 19320 - 1080 lib/libpanda.so.1.10
152 1536 + 1384 bin/make-prc-key

You can grab a fixed wheel from the buildbot:
https://buildbot.panda3d.org/downloads/4f4b14dd2b13b420f6641181abdab8455a9b1680/

Thanks to all for helping to track down this issue!

@rdb rdb closed this Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.