Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid rebuilding duplicate universal wheels for every CI shard #7258

Closed
Eric-Arellano opened this issue Feb 19, 2019 · 3 comments

Comments

Projects
None yet
2 participants
@Eric-Arellano
Copy link
Contributor

commented Feb 19, 2019

Problem

We currently build 19 Pants wheels for every Travis build wheels shard, as enumerated in packages.py. Only pantsbuild.pants is platform-specific and has a specified ABI; i.e., the other 18 are pure Python. Because they are pure Python, those 18 wheels only need to be built once.

Currently, while we only need to build those 18 pure Python wheels, we always build all 36 wheels (OSX wheels and Linux Wheels). With #7235, we will build 72 wheels, which is 54 more than we need. Once we add Python 3 wheels, we will be building 108 wheels, which is 90 more than we need.

Building these wheels multiple times presents a time and a monetary cost. We waste time in Travis rebuilding something we don't need. We waste money through S3 transfer costs (storage costs are not an issue because of duplicate file names meaning we don't store more than one copy of the file).

Note this also extends to 3rd party wheels, which we build in order to release a Pex. Some of them are platform and/or ABI specific, such as cryptography, but many are universal.

Potential solution - flag to release.sh

Develop a flag, such as -u, for release.sh that specifies the script should only build wheels that are not universal, i.e. are platform and/or abi specific. Then, modify .travis.yml to have only one of the Build Wheels shards build every wheel and have the others use -u to skip this.

This would work well for avoiding rebuilding our first party Pants wheels, as -u would mean skip all Python packages beyond pantsbuild.pants.

However, the solution falls apart when we consider 3rd party wheels. We do not know of an automatic way to know which are universal easily. We could use a whitelist, but this would require a solution to ensure the whitelist does not fall out of date. Maybe we could for now just ignore 3rd party wheels and accept that we will rebuild them every time?

Potential solution - s3 sync

@jsirois suggested using S3 Sync to upload our wheels, rather than dumping them directly to S3 as we seem to do now. The idea is that S3 Sync will check that we have the wheel already, so will avoid initiating a transfer.

See https://github.com/pantsbuild/binaries/blob/master/sync-s3.sh for a binary we have to implement S3 sync.

This appears to be the preferred solution.

@stuhood

This comment has been minimized.

Copy link
Member

commented Feb 19, 2019

We expect that building both 2 and 3 wheels will only last for two stable releases though, so probably good to size the solution based on the assumption that in the future we'll have only ABI differences, and not python differences.

@Eric-Arellano

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2019

Yes, good point @stuhood that the scope becomes smaller once we drop Py2.

We will still have some duplication, though. 18 of our wheels are not platform specific. Even though we thus only need one copy of those wheels, so long as we have the Linux and OSX shards, we will continue to build duplicate wheels for them.

Eric-Arellano added a commit that referenced this issue Mar 9, 2019

Specify ABI for pantsbuild.pants wheel and build with both UCS2 and U…
…CS4 (#7235)

### Problem
We should be marking the [ABI (application binary interface)](https://docs.python.org/3/c-api/stable.html) for the `pantsbuild.pants` wheel because it uses native code. Currently, we mark the ABI as `none`, which is incorrect per https://www.python.org/dev/peps/pep-0513/#ucs-2-vs-ucs-4-builds.

In particular, in Python 2, Python may be installed with either UCS2 (UTF-16) or UCS4 (UTF-8). We should be marking the wheel as either `cp27m` for UCS2 or `cp27mu` for UCS4.

As a result of marking the ABI, we must now produce more wheels. macOS defaults to UCS2. For Linux, "ucs4 is much more widespread among Linux CPython distributions." We do not want to rely on these assumptions, however, when releasing, as some users may not have these default unicode settings. So, instead we must release `pantsbuild.pants` as both a `cp27m` and `cp27mu` wheel, and rely on Pip to resolve which the user should use.

### Solution
At a high level, this PR does two things:
1. Marks that the ABI should be specified, rather than `none`.
1. Sets up 4 Travis shards so that we build both a `cp27m` and `cp27mu` wheel for both Linux and OSX. See https://travis-ci.org/pantsbuild/pants/builds/503639333 for the end result of this.

To setup the new shards, we use Pyenv to install new versions of Python 2 with the appropriate unicode settings, thanks to the env var `PYTHON_CONFIGURE_OPTS=--enable-unicode=ucs{2,4}` (https://stackoverflow.com/a/38930764). 

Because both the OSX UCS4 shard and Linux UCS2 shard already have Python 2.7 installed, we must install a Python 2.7.x version different than what is already there, and use `PANTS_PYTHON_SETUP_INTERPRETER_CONSTRAINTS` to ensure Pants and PEX are using the exact interpreter we want. For this reason, this PR was blocked by #7285 to propagate interpreter constraints to PEX.

Specifically, we make these changes to achieve these two high level goals:

1. Modify [`src/python/pants/BUILD`](https://github.com/pantsbuild/pants/pull/7235/files#diff-3ce39309d74098493a1f3c8107292a8d) so that `bdist_wheel` knows it needs to mark the ABI. This achieves goal 1.
1. Change [`release.sh`](https://github.com/pantsbuild/pants/pull/7235/files#diff-9ed7102b7836807dc342cc2246ec4839) to allow pre-setting `$PY` and to also set `PANTS_PYTHON_SETUP_INTERPRETER_CONSTRAINTS` in order to use the specific Python interpreter we are targeting.
1. Create [`travis_ci_py27_ucs2/Dockerfile`](https://github.com/pantsbuild/pants/pull/7235/files#diff-b90425bcccb6969a98b9d1c4066422a8) to get Python 2 w/ UCS2 onto Linux.
1. Extract out `.travis.yml` `env` entries to get OpenSSL and Homebrew-installed Python working properly, along with launching a Docker image, in order to avoid duplication: [`env_osx_with_pyenv.mustache`](https://github.com/pantsbuild/pants/pull/7235/files#diff-c2ab029a1887e2e99f2391fc7568cc5f), [`launch_docker_image.mustache`](https://github.com/pantsbuild/pants/pull/7235/files#diff-3d4a0cecc373a624f233521f76d66999), and [`generate_travis_yml.py`](https://github.com/pantsbuild/pants/pull/7235/files#diff-c11e2f109e12527d2e1ac2c62161edf6).
1. Modify [`travis.yml.mustache`](https://github.com/pantsbuild/pants/pull/7235/files#diff-88af3146f5cc486b749ed790399bde46) to set up the 4 distinct wheel building shards. Similar to how we created a Dockerfile to use Pyenv to install Python 2 with UCS2 on Linux, we use Pyenv to install Python 2 with UCS4 on OSX. We also move the wheel building shards below unit tests.

#### Ensuring the correct abi is used
We need to ensure the `pants.pex` used by `release.sh` has the correct abi for its dependencies, and that `release.sh` is using the correct Python interpreter. We introduce a new script [`check_pants_pex_abi.py`](https://github.com/pantsbuild/pants/pull/7235/files#diff-8b857b8cee6cb9784bf37950220c587b) that inspects the pex's `PEX-INFO` to ensure the targeted abi was used.

An even better test would test the result of `release.sh` to ensure the built `pantsbuild.pants` wheel has the correct ABI and can be consumed properly. Currently `release.sh` verifies the wheel is valid, but it does not enforce which ABI it was built with. This could be a good followup PR.

### Result
We now properly mark the ABI for Python 2. Beyond the new script `check_pants_pex_abi.sh` proving this, we performed a run of this PR with verbose PEX logging turned on: https://travis-ci.org/pantsbuild/pants/builds/503639333. Inspecting the logs for the wheel building shards and searching for `Using the current platform` proves the 4 wheel building shards are using the correct interpreter and abi.

In addition to correctness for Python 2, this unblocks releasing Python 3 wheels (#7197).

Note this should have no significant impact on the end user, as Pip will resolve to the current ABI for their interpreter. It will change the name of our `pantsbuild.pants` wheel and will prevent using that wheel with an interpreter that uses a different UCS setting, but all users should be able to pull down whichever wheel they need as we provide wheels for both UCS2 and UCS4 on both OSX and Linux.

#### Downside: wheel building explosion
We currently are building more wheels than necessary. For wheels that are universal / platform-independent, we only need them to be built once, but we build them every time. See #7258.

This PR adds two new shards so adds ~30 unnecessary core wheels we build, in addition to 3rd party wheels that are universal / platform-independent.
@Eric-Arellano

This comment has been minimized.

Copy link
Contributor Author

commented Jun 18, 2019

While this goes down in impact when we drop Python 2 wheels via #7888, it will within a week or two become a problem again as we introduce Python 3.7 wheel building shards.

@Eric-Arellano Eric-Arellano self-assigned this Jun 18, 2019

Eric-Arellano added a commit that referenced this issue Jun 21, 2019

Use S3 Sync for CI deploys to avoid recopying files (#7895)
We currently copy all of `dist/deploy` into S3, even if the file is an exact copy of something already in S3. This poses a real cost, both monetary and time.

Instead, here we use `aws s3 sync` to only transfer files that are different from what is currently in the bucket.

Will close #7258.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.