Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: strip debug symbols from macOS wheels #51900

Closed
1 of 3 tasks
jameslamb opened this issue Mar 11, 2023 · 3 comments · Fixed by #51971
Closed
1 of 3 tasks

ENH: strip debug symbols from macOS wheels #51900

jameslamb opened this issue Mar 11, 2023 · 3 comments · Fixed by #51971
Labels
Build Library building on various platforms Enhancement Linux Linux OS OS X Related to Mac OS & hardware issues (M1)

Comments

@jameslamb
Copy link
Contributor

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I found tonight that .so files in pandas macOS wheels contain debug symbols.

By my estimate, these add around 1MB compressed and around 4 MB uncompressed to the cp311-cp311-macosx_10_9_universal2 wheel.

how I estimated that (click me)

On my macbook tonight (macOS 12.2, Intel CPU), I downloaded the latest CPython 3.11, macOS universal wheel and checked its size with du.

WHEEL_FILENAME='pandas-1.5.3-cp311-cp311-macosx_10_9_universal2.whl'
du -h ${WHEEL_FILENAME}
# 18M

Next, I unzipped it and used dsymutil to check for debug symbols.

mkdir unpack-dir
cp \
    ./${WHEEL_FILENAME} \
    ./unpack-dir

cd unpack-dir
unzip ./${WHEEL_FILENAME}

# check uncompressed size
du -sh pandas
# 65M

du -h pandas/_libs/missing.cpython-311-darwin.so
# 524K

dsymutil -s pandas/_libs/missing.cpython-311-darwin.so | grep N_OSO

That showed some (and interesting that they seem to include filepaths from a CI system's runner image 👀 )

# [   765] 00007670 66 (N_OSO        ) 03     0001   0000000063c8cc76 '/Users/runner/work/1/s/pandas/build/temp.macosx-10.9-x86_64-cpython-311/pandas/_libs/missing.o'
# [   261] 00002a95 66 (N_OSO        ) 00     0001   0000000063c8cef7 '/Users/runner/work/1/s/pandas/build/temp.macosx-11.0-arm64-cpython-311/pandas/_libs/missing.o'

Next, I tried stripping all of the .so files using OSX strip (docs link).

find \
    $(pwd)/pandas \
    -type f \
    -name '*.so' \
    -exec strip -S '{}' \;

Warning

This did produce warnings like the following: /Library/Developer/CommandLineTools/usr/bin/strip: warning: changes being made to the file will invalidate the code signature in: /private/tmp/check-pandas/unpack-dir/pandas/_libs/interval.cpython-311-darwin.so (for architecture arm64)

So maybe that's not the best approach for pandas actual build pipeline.

Packed the wheel back up

# check uncompressed size again
du -sh pandas
# 61M

rm ./${WHEEL_FILENAME}
zip -r ${WHEEL_FILENAME} .

# check compressed size again
du -h ${WHEEL_FILENAME}
# 17M

Then installed it (making sure the environment didn't have pandas installed before), and ran the tests.

pip uninstall --yes pandas
pip install ${WHEEL_FILENAME}
pip install pytest hypothesis
python -c "import pandas as pd; pd.test()"

Installation worked, and the tests all ran to completion, with the following results:

= 1 failed, 152571 passed, 24052 skipped, 1362 xfailed, 12 xpassed, 1762 warnings, 23 errors in 1144.98s (0:19:04) =

I checked all of the wheels (for all platforms) from the 1.5.3 release (PyPI link) and only found debug symbols in the macOS ones.

I did not repeat the analysis above to estimate the size impact of those symbols for any wheels other than the cp311-cp311-macosx_10_9_universal2 one.

Feature Description

If the inclusion of these symbols is not intentional and if I'm right that they're not necessary, please consider removing them.

This might be accomplished by avoiding them in the first place, e.g.:

Or by stripping those built objects after the fact.

  • e.g. with strip -S _file.so on MacOS

I'm not familiar enough with pandas build system and preferred toolchain to offer more specific recommendations, sorry.

Alternative Solutions

Instead of a pandas-specific fix, it might be worth adding support to delocate similar to how auditwheel supports stripping after the fact with auditwheel repair --strip for Linux.

I'm not aware of another such tool that works with macOS wheels containing mach-o format objects.

Additional Context

Relevant Discussions

I can see there was some discussion about stripping this project's wheels back in 2018, although that looks to be mostly about Linux wheels:

This conversation from 2020 contains some details about building pandas on macOS with `clang:

Other misc. discussions from similar projects about stripping debug symbols while building Python wheels.

Notes for Reviewers

Thanks very much for your time and consideration!

@jameslamb jameslamb added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 11, 2023
@jameslamb
Copy link
Contributor Author

One other note for reviewers... I checked all wheels from all files in the the 2.0.0 release candidate (PyPI link) and found that in addition to the macOS wheels, the manylinux wheels there also contain debug symbols.

evidence of that (click me)
mkdir /tmp/linux-wheels
cd /tmp/linux-wheels

curl -O \
https://files.pythonhosted.org/packages/3e/55/9212e3cca8c3c2ac8cf6ea85491080e011523bf5a0886782110ef69dfbe5/pandas-2.0.0rc0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

unzip pandas-2.0.0rc0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

# list all symbols
docker run \
    --rm \
    -v $(pwd):/opt/check \
    --workdir /opt/check \
    python:3.11  \
   nm --debug-syms pandas/_libs/algos.cpython-311-x86_64-linux-gnu.so

# show that strip reduces the size
docker run \
    --rm \
    -v $(pwd):/opt/check \
    --workdir /opt/check \
    python:3.11  \
    du -h pandas/_libs/algos.cpython-311-x86_64-linux-gnu.so
# 2.2M

docker run \
    --rm \
    -v $(pwd):/opt/check \
    --workdir /opt/check \
    python:3.11  \
    /bin/bash -c 'strip --strip-unneeded pandas/_libs/algos.cpython-311-x86_64-linux-gnu.so && du -h pandas/_libs/algos.cpython-311-x86_64-linux-gnu.so'
# 2.0M

Is that expected? If not, would you like me to open a separate issue documenting it?

@lithomas1
Copy link
Member

Hi @jameslamb,
Thanks for reporting this.

The macOS issue is expected since we can't pass flags for stripping to the linker.
I'll try to look into this (I think if this is done before delocate is run, delocate might be able to fix the signatures?).

I'll try to look into the manylinux issue more. We migrated to a new build system with cibuildwheel, and currently strip the wheels with -Wl, strip-debug.

When I added this, I kind of eyeballed the wheel sizes, but I'll try strip-all to see if that makes a difference.
multi-build/multibuild#162 may also be related.

@lithomas1 lithomas1 added Build Library building on various platforms Linux Linux OS OS X Related to Mac OS & hardware issues (M1) and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 13, 2023
@jameslamb
Copy link
Contributor Author

The macOS issue is expected since we can't pass flags for stripping to the linker

Thanks for looking into it! I'm not that familiar with pandas build toolchain, so sorry that I can't offer more specific advice, but as a general comment about building C/C++ wheels stripping at linking time doesn't have to be the only place to try to fix this.

You could also investigate whether -g or any of its variants are making it into CXXFLAGS passed to the compiler (and try to avoid that), or try unzipping the wheel after building, stripping objects after-the-fact with the strip utility, and then re-zipping the wheel (like in the example code I provided above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Enhancement Linux Linux OS OS X Related to Mac OS & hardware issues (M1)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants