Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manylinux wheel size could be reduced by 54% #20719

Closed
LouisAmon opened this issue Apr 17, 2018 · 1 comment
Closed

Manylinux wheel size could be reduced by 54% #20719

LouisAmon opened this issue Apr 17, 2018 · 1 comment
Labels
Build Library building on various platforms Duplicate Report Duplicate issue or pull request

Comments

@LouisAmon
Copy link

Code Sample

# Build pandas
python3 -m pip install -t build --no-binary pandas pandas==0.22.0

# Measure size
du -sh build/pandas

# Strip binaries
find build/pandas -name "*.so"|xargs strip

# Measure -drastically reduced- size
du -sh build/pandas

Problem description

When building pandas from source and running the command strip, the resulting folder is 54% lighter than when using the manylinux wheel (via pip).

[...] the strip program removes inessential information from executable binary programs and object files, thus potentially resulting in better performance and sometimes significantly less disk space usage
https://en.wikipedia.org/wiki/Strip_(Unix)

This is probably harmless on most systems, yet it is quite important in size-constrained systems (such as AWS Lambda).

Some developers have resorted to distributing their own stripped binaries (e.g. lambda packages for the serverless framework zappa), but it seems like a makeshift solution.

I think the problem should be solved upstream, as each library should be responsible for packaging their own optimized binaries.

An issue has also been opened regarding numpy, as it is a dependency of pandas and they both end up using a lot of unneccesary disk space.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.87-linuxkit-aufs
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 36.2.7
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

How to replicate

System specifications

Every command below has been executed using the amazonlinux docker image.
https://hub.docker.com/_/amazonlinux/

Pandas version: 0.22.0
Python version: 3.6.2

Prepare docker image
docker run -it amazonlinux bash
yum update -y
yum install -y findutils binutils python36-devel gcc gcc-c++
Install wheel & measure package size
python3 -m pip install -t wheel pandas==0.22.0
du -sh wheel/pandas

--> 111 MB

Try strip:

find wheel/pandas -name "*.so"|xargs strip
du -sh wheel/pandas

--> 52 MB

Almost 50% of the binary size can be stripped.

Build from source & measure package size
python3 -m pip install -t build --no-binary pandas pandas==0.22.0
du -sh build/pandas

--> 107 MB

Try strip:

find build/pandas -name "*.so"|xargs strip
du -sh build/pandas

--> 51 MB

@jreback
Copy link
Contributor

jreback commented Apr 17, 2018

duplicate of MacPython/pandas-wheels#24

@jreback jreback closed this as completed Apr 17, 2018
@jreback jreback added Build Library building on various platforms Duplicate Report Duplicate issue or pull request labels Apr 17, 2018
@jreback jreback added this to the No action milestone Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants