Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design mechanism to keep `pants.pex` up-to-date #8209

Closed
Eric-Arellano opened this issue Aug 25, 2019 · 6 comments

Comments

@Eric-Arellano
Copy link
Contributor

commented Aug 25, 2019

Motivation

We must now depend on pants.pex for integration tests, rather than pants, but this introduces the risk of the file becoming out of date and Pants devs forgetting to regenerate it. See
#8183 (comment).

Design objectives

  • Autogeneration only runs when a test is being run. Even better, only when an integration test is being run.
  • Switching between a branch does not require having to bootstrap again if that version was already generated.
    • Just like how we handle bootstrapping the native engine, that we store multiple versions.
  • This works both locally and in CI, where CI uses an uploaded PEX
    • Maybe this can be used to solve the problem in CI of the uploaded PEX being out of date with the actual source code because of how GitHub tries to merge against master whenever a new shard starts

Possible design

  1. We fingerprint all source code, as this is what's used to build the PEX. We include the Python version also. Similar to how we fingerprint the engine:
    function calculate_current_hash() {
    # Cached and unstaged files, with ignored files excluded.
    # NB: We fork a subshell because one or both of `ls-files`/`hash-object` are
    # sensitive to the CWD, and the `--work-tree` option doesn't seem to resolve that.
    #
    # Assumes we're in the venv that will be used to build the native engine.
    (
    cd "${REPO_ROOT}" || exit 1
    (echo "${MODE_FLAG}"
    echo "${RUST_TOOLCHAIN}"
    uname
    python --version 2>&1
    git ls-files -c -o --exclude-standard \
    "${NATIVE_ROOT}" \
    "${REPO_ROOT}/rust-toolchain" \
    "${REPO_ROOT}/src/python/pants/engine/native.py" \
    "${REPO_ROOT}/build-support/bin/native" \
    "${REPO_ROOT}/3rdparty/python/requirements.txt" \
    | grep -v -E -e "/BUILD$" -e "/[^/]*\.md$" \
    | git hash-object -t blob --stdin-paths) | fingerprint_data
    )
    }
  2. We start storing in ~/.cache every pants.pex ever generated, using the fingerprint as the unique ID.
  3. When test is in the command line args, some new script will check if that fingerprint already exists and bootstrap if not. It will symlink the specific pants.pex desired into the build root as pants.pex without the fingerpint in the name.

Open questions

How will this work in CI?

Problem: the test shards won't have a cache with all the generated pants.pex fingerprints, and we don't want them to ever bootstrap. They should use whatever is uploaded.

Possible solution: check for the env var CI.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

Possible solution: check for the env var CI.

Prior art for determining in the ./pants script whether we are running in CI or locally is in #7151 and linked PRs, in case that helps anyone.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

Also very relevant prior art for thinking about reaching into the command-line goals for pants logic is in #7205!

@Eric-Arellano

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

I'm implemented parts 1 and 2 through #8183, i.e. the script to generate and cache pants.pex.

Now, we have to figure out step 3: when to actually trigger bootstrap_pants_pex.sh. Ideally, we will call this script any time we run an integration test.

There appear to be two approaches:

  1. Crude approach of ./pants script, where we check if goal is in $@.
    • Downside is that this runs for unit tests as well.
  2. Use pants-plugins.
    • In V1, we add a line like this:
    def register_goals():
      task(name='prepare-pants-pex', action=BootstrapPantsPex).install('test', before="pytest")
    • Open question: is this possible to do in V2? We want to add a hook before ./pants --no-v1 --v2 test is ran that will call the script. However, we should not modify src code at all to do this.

We could technically do a 3rd approach of modifying src/python/pants to implement this functionality, but we do not want to add any special logic to src because it is solely useful for this repo and no others.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

Ideally, we will call this script any time we run an integration test.

Is there a reason we can't do this in pants_run_integration_test.py itself? Specifically here:

pants_script = os.path.join(build_root or get_buildroot(), self.PANTS_SCRIPT_NAME)

It seems like it's perhaps possible to run the pex bootstrap script where we obtain the path to pants_script here?

@stuhood

This comment has been minimized.

Copy link
Member

commented Aug 26, 2019

Is there a reason we can't do this in pants_run_integration_test.py itself? Specifically here:

That code will already be running inside the remoting sandbox, because it's part of what PyTest itself is running.


@Eric-Arellano : Re: Use pants-plugins and v1/v2: the idea behind that suggestion was that if you did something eagerly at register.py execution time (ie, not by registering a task, but by just "having a side effect"), it would happen "during plugin loading", which is very early in the pants process, before options have been parsed, and any tasks have begun. Since it is so early, you won't be able to inspect options without extreme hacks. But one interesting possibility would be that you could directly fingerprinting sys.path, which reduces the hacks in one dimension, at least.

It's also more likely that doing it in this location could evolve into a more native "pants startup hooks" facility (although it's not clear how end users would use it...).

Eric-Arellano added a commit that referenced this issue Sep 2, 2019
Use `./pants.pex`, not `./pants`, to run internal integration tests (#…
…8183)

## Changes to get `--chroot` working
As prework to running integration tests with `--chroot` and then [remoting](#8113), we first must make two major changes to how we handle integration tests:

1) Having integration tests depend on `//:build_root`. This ensures that `get_buildroot()` properly resolves to the chrooted folder in `.pants.d`, rather than traversing all the way up to the original build root.
2) Using `./pants.pex` rather than `./pants` in integration tests. Because V1 strips the prefix of source roots, whereby `src/python/pants` becomes `pants`, we cannot also have a file named `pants` in the same build root. Instead, we can safely depend on `./pants.pex`.

## Autogeneration of `pants.pex`
Now that we depend on `pants.pex`, it is very easy for this file to fall out-of-date and to unintentionally be testing old code.

We work around this via the design in #8209. First, we start caching built PEXes in `~/.cache/pants`, with the ID derived from all the source files used for Pants. This allows us to quickly switch between branches without having to regenerate the PEX every time.

Then, we teach `./pants` to run `bootstrap_pants_pex.sh` whenever `test` is one of the goals and it's not CI. This will ensure that the file is always up-to-date.
@Eric-Arellano

This comment has been minimized.

Copy link
Contributor Author

commented Sep 2, 2019

Closing this out thanks to #8183.

We may still want to formalize the mechanism a bit more, per #8183 (review), but that will be left for a separate issue if we decide to that. The core functionality we need is now implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.