Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark various defaults for PEX compression for internal PEXes #15062

Closed
stuhood opened this issue Apr 7, 2022 · 8 comments · Fixed by #20670
Closed

Benchmark various defaults for PEX compression for internal PEXes #15062

stuhood opened this issue Apr 7, 2022 · 8 comments · Fixed by #20670
Labels
backend: Python Python backend-related issues good first issue

Comments

@stuhood
Copy link
Sponsor Member

stuhood commented Apr 7, 2022

Pants uses lots of PEXes "internally", which are never exported for users to consume. Since these are implementation details, we have free-rein to choose an internal-default PEX compression level... and some values can make a considerable difference in performance.

To experiment/adjust for "internal only", logic similar to the logic for PEX's layout setting could be added, which set a default compression value for internal use:

# Use any explicitly requested layout, or Packed for internal PEXes (which is a much
# friendlier layout for the CAS than Zipapp.)
self.layout = layout or (PexLayout.PACKED if internal_only else PexLayout.ZIPAPP)

@thejcannon thejcannon added the backend: Python Python backend-related issues label Jun 7, 2022
@huonw
Copy link
Contributor

huonw commented Apr 22, 2023

Are there internal PEX that aren't PACKED (grepping for layout=, PexLayout. and internal_only= suggests no)? If all of them are packed, will lowering the compression level do anything?

@jsirois
Copy link
Member

jsirois commented Apr 22, 2023

Yes. A packed layout consists of N + 1 zips. 1 PEX .bootstrap code zip and 1 per installed wheel contained in the PEX under .deps/. By default these N+1 are compressed, but turning off compression turns off their compression just as it does for a single zip zipapp layout.

@cosmicexplorer
Copy link
Contributor

Has anyone tried seeing whether PEX files might start up a little faster too if they're uncompressed?

@jsirois
Copy link
Member

jsirois commented Jul 24, 2023

All PEX files are ~uncompressed (unzipped). Whether --venv or normal (zipapp), and regardless of --layout, A PEX file is extracted in one way or another on 1st run and forever more just re-directs to the extracted location. The re-direct overhead is ~O(50ms) except when using --sh-boot which takes it down to ~1ms.

@cosmicexplorer
Copy link
Contributor

Oh yes! Thanks!

@jsirois
Copy link
Member

jsirois commented Jul 24, 2023

To underscore - the benchmark here is about PEX creation time then, not run time.

@huonw
Copy link
Contributor

huonw commented Dec 14, 2023

With https://github.com/pantsbuild/pex/releases/tag/v2.1.154, potentially we could be considering using --no-pre-install-wheels for internal PEXes, and maybe --max-install-jobs. Quoting pex-tool/pex#2292 (comment), for perf results with PyTorch (i.e. large):

Status quo With --no-pre-install-wheels --no-pre-install-wheels savings
Cold build and run 1st local machine 188.51s 29.80s 84% faster
Cold run 1st remote machine 26.74s 29.60s 11% slower
Cold run 1st remote machine parallel 12.79s 14.06s 10% slower
Size (bytes) 2679282217 2678537958 0.02% smaller

@huonw
Copy link
Contributor

huonw commented Dec 30, 2023

(#20347 updates to pex 2.1.155 by default, but doesn't touch version_constraints, so if we actually start using that flag internally we'll need to either bump the constraints or make it conditional on version.)

huonw added a commit that referenced this issue Jan 9, 2024
All changes:

- https://github.com/pantsbuild/pex/releases/tag/v2.1.153
- https://github.com/pantsbuild/pex/releases/tag/v2.1.154
- https://github.com/pantsbuild/pex/releases/tag/v2.1.155

Highlights:

- `--no-pre-install-wheels` (and `--max-install-jobs`) that likely helps
with:
  - #15062 
  - (the root cause of) #20227
  - _maybe_ arguably #18293, #18965, #19681 
- improved shebang selection, helping with
#19514, but probably not the
full solution (#19925)
- performance improvements
cburroughs added a commit to cburroughs/pants that referenced this issue Mar 29, 2024
Changelog: https://github.com/pex-tool/pex/releases/tag/v2.3.0

 - `sync` of interest for pantsbuild#15704
 - error message clarification regarding pantsbuild#15062
 - fix for explicit flags as implemented pantsbuild#20598

```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]

==                    Upgraded dependencies                     ==

  asgiref                        3.7.2        -->   3.8.1
  cryptography                   42.0.3       -->   42.0.5
  pex                            2.2.1        -->   2.3.0
  pyparsing                      3.1.1        -->   3.1.2
  python-dateutil                2.8.2        -->   2.9.0.post0
  sniffio                        1.3.0        -->   1.3.1
```
huonw pushed a commit that referenced this issue Mar 31, 2024
Changelog: https://github.com/pex-tool/pex/releases/tag/v2.3.0

 - `sync` of interest for #15704
 - error message clarification regarding #15062
 - fix for explicit flags as implemented #20598

```
Lockfile diff: 3rdparty/python/user_reqs.lock [python-default]

==                    Upgraded dependencies                     ==

  asgiref                        3.7.2        -->   3.8.1
  cryptography                   42.0.3       -->   42.0.5
  pex                            2.2.1        -->   2.3.0
  pyparsing                      3.1.1        -->   3.1.2
  python-dateutil                2.8.2        -->   2.9.0.post0
  sniffio                        1.3.0        -->   1.3.1
```
huonw added a commit that referenced this issue Apr 15, 2024
…s of internal pexes (#20670)

This has all internal PEXes be built with settings to improve
performance:

- with `--no-pre-install-wheels`, to package `.whl` directly rather than
unpack and install them. (NB. this requires Pex 2.3.0 to pick up
pex-tool/pex#2392)
- with `PEX_MAX_INSTALL_JOBS`, to use more concurrency for install, when
available

This is designed to be a performance improvement for any processing
where Pants synthesises a PEX internally, like `pants run
path/to/script.py` or `pants test ...`.
pex-tool/pex#2292 has benchmarks for the PEX
tool itself.

For benchmarks, I did some more purposeful ones with tensorflow (PyTorch
seems a bit awkward hard to set-up and Tensorflow is still huge), using
https://gist.github.com/huonw/0560f5aaa34630b68bfb7e0995e99285 . I did 3
runs each of two goals, with 2.21.0.dev4 and with `PANTS_SOURCE`
pointing to this PR, and pulled the numbers out by finding the relevant
log lines:

- `pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d)
test example_test.py`. This involves building 4 separate PEXes partially
in parallel, partially sequentially: `requirements.pex`,
`local_dists.pex` `pytest.pex`, and then `pytest_runner.pex`. The first
and last are the interesting ones for this test.
- `pants --no-local-cache --no-pantsd --named-caches-dir=$(mktemp -d)
run script.py`. This just builds the requirements into `script.pex`.

(NB. these are potentially unrealistic in they're running with all
caching turned off or cleared, so are truly a worst case. This means
they're downloading tensorflow wheels and all the others, each time,
which takes about 30s on my 100Mbit/s connection. Faster connections
will thus see a higher ratio of benefit.)

| goal                | period                       | before (s) | after (s) |
|---------------------|------------------------------|-----------:|----------:|
| `run script.py`     | building requirements        |      74-82 |     49-52 |
| `test some_test.py` | building requirements        |      67-71 |     30-36 |
|                     | building pytest runner       |        8-9 |     17-18 |
|                     | total to start running tests |      76-80 |     53-58 |
 
I also did more adhoc ones on a real-world work repo of mine, which
doesn't use any of the big ML libraries, just running some basic goals
once.

| goal                                              | period                                  | before (s) | after (s) |    |
|---------------------------------------------------|-----------------------------------------|-----------:|----------:|----|
| `pants export` on largest resolve                 | building requirements                   |         66 |        35 |    |
|                                                   | total                                   |         82 |        54 |    |
| "random" `pants test path/to/file.py` (1 attempt) | building requirements and pytest runner |          1 |        49 | 38 |

Fixes #15062
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues good first issue
Projects
None yet
5 participants