Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up run to no longer rebuild a Pex on source file changes #10410

Merged
merged 2 commits into from Jul 21, 2020

Conversation

Eric-Arellano
Copy link
Contributor

Whereas binary must include source files in the PEX, run does not need to. We get less cache invalidation and generally faster performance by instead having the chroot simply be populated with the source files, similar to how we implement Pytest.

We still use a Pex to handle the entry_point field and to resolve all 3rd party requirements.

Before, with a whitespace change:

▶ /usr/bin/time ./pants run build-support/bin/generate_travis_yml.py > .travis.yml
        2.72 real         0.73 user         0.21 sys

After, with a whitespace change:

▶ /usr/bin/time ./pants run build-support/bin/generate_travis_yml.py > .travis.yml
        1.87 real         0.73 user         0.21 sys

Implements half of #10406.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 0.0% when pulling 2542706 on Eric-Arellano:faster-run into 26c73d8 on pantsbuild:master.

class RunRequest:
digest: Digest
binary_name: str
extra_args: Tuple[str, ...]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "extra" to me tends to imply that things are added afterward. Maybe prefix_args?

Comment on lines +24 to +26
async def run_python_binary(
field_set: PythonBinaryFieldSet, python_binary_defaults: PythonBinaryDefaults
) -> RunRequest:
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like rather than being binary (or repl) specific, this is connected to TwoStepPex... where rather than creating two pexes (one for requirements, the other for sources, iirc) we instead could use a requirements pex, and a Digest of loose sources.

Repl already uses TwoStepPex, so perhaps starting there would make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. We use TwoStepPex for binary and awslambda, actually. TwoStepPex results in one single Pex file. I think it's important that it continues to behave the same way.

I plan to change repl tomorrow to behave the same way as run. If it helps, I can bundle it into this PR. Only wanted to get this one up as a checkpoint and to make it in time for the release.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. We use TwoStepPex for binary and awslambda, actually. TwoStepPex results in one single Pex file. I think it's important that it continues to behave the same way.

Got it: my mistake.

I plan to change repl tomorrow to behave the same way as run. If it helps, I can bundle it into this PR. Only wanted to get this one up as a checkpoint and to make it in time for the release.

No need to bundle. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mistake - it was a good question! I was wondering around the same time you wrote the original question "Hm, can we delete TwoStepPex now?"

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 21, 2020

I would maybe edit the title slightly - it's not that we no longer invalidate on source file changes (that would be broken), it's that we no longer rebuild a pex...

@Eric-Arellano Eric-Arellano changed the title Speed up run to no longer invalidate on source file changes Speed up run to no longer rebuild a Pex on source file changes Jul 21, 2020
Copy link
Sponsor Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!



@rule
async def run_python_binary(
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this doesn't actually run the binary: just produces it.

@stuhood stuhood merged commit 64bfcbb into pantsbuild:master Jul 21, 2020
@Eric-Arellano Eric-Arellano deleted the faster-run branch July 21, 2020 20:48
Eric-Arellano added a commit that referenced this pull request Jul 21, 2020
@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

FYI that according to git bisect this broke /pants run build-support/bin/generate_docs.py, because it can no longer see its resources.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

The resources are in the chroot, so maybe some PYTHONPATH issue.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

OK, the problem is that loading relative to __name__ no longer works. Even though AFAICT __name__ is __main__ in both cases.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

I think this is a quirk of the loader system in python, possibly not something we should address here.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

Workaround is to use pkgutil.get_data("generate_docs", f"docs_templates/{name}")instead of pkgutil.get_data(__name__, f"docs_templates/{name}")

@Eric-Arellano
Copy link
Contributor Author

Hm, does that workaround work still if you run ./pants binary and then run the binary? If not, that's concerning.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jul 24, 2020

It does work in that case.

@stuhood
Copy link
Sponsor Member

stuhood commented Jul 28, 2020

It looks like one other visible effect of this change is that there won't be a PEX-INFO file created for the actual binary. We have some code internally that expects it.

@Eric-Arellano
Copy link
Contributor Author

That expects PEX-INFO during run? binary should behave the same as before.

@stuhood
Copy link
Sponsor Member

stuhood commented Jul 28, 2020

That expects PEX-INFO during run? binary should behave the same as before.

It's less that they "expect it during run", and more that they expected the python_binary target to always involve the creation of a pex. I expect that the performance difference between the two implementations makes it worthwhile for them to be different, but it's a consideration.

@Eric-Arellano
Copy link
Contributor Author

Hm, okay. Yeah, I'm thinking about a new user who has never used Pants or Pex. I think they would expect this current implementation. You don't normally need a Pex to run a script, so why would you need it here?

Whereas with binary, we are explicitly saying that the point of it is to create a Pex file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants