Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make a rust ruleset to wrap cargo and build the native engine #6891

Open
wants to merge 19 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@cosmicexplorer
Copy link
Contributor

commented Dec 10, 2018

Problem

It would be great if we could use pants to build the native engine instead of a bunch of (very nice) shell scripts. See the "Result" section for how I plan to obtain a pair of pants in order to run this ruleset.

Solution

  • Create a new backend in src/python/pants/backend/cargo_bootstrap/ with a cargo_dist() target and a BootstrapCargo task.
  • Create a ruleset in rustup.py which attempts to directly translate the logic in build-support/bin/native/bootstrap_rust.sh.
  • Make some changes to the rust code to support doing a yield Get(Snapshot, PathGlobsAndRoot, ...) within the body of an @rule.
  • Add an integration test that builds the native engine using the new ruleset.

Result

After discussion with @stuhood, we think one possible way to use these rules is to modify the ./pants runner script to download the most recent pants release pex, invoke that pex with this backend to build the native engine, then continue as normal (replacing the line that invokes bootstrap_code.sh). We can't do that in this PR because we need the previous release to contain the native engine changes required to support usage of PathGlobsAndRoot in rule bodies.

I could probably use an @console_rule instead of the BootstrapCargo task post #6880.

There are a lot of yield Get(ExecuteProcessResponse, ...) calls which aren't assigned to any variable -- this feels like a code smell and could probably be fixed in a followup PR.

cosmicexplorer added some commits Dec 7, 2018

try

@cosmicexplorer cosmicexplorer requested review from stuhood and illicitonion Dec 10, 2018

@illicitonion
Copy link
Contributor

left a comment

This is really exciting, thanks for putting it together! I think there are ~2 pieces of work we should do before we can merge...

We definitely shouldn't be exposing PathGlobsAndRoot to @rules. There are two reasons this is happening at the moment, as far as I can see:

  1. BinUtils - modifying BinUtil so that it uses UrlToFetch means the engine will do proper hermetic downloading and caching.
  2. Grabbing Snapshots out of ExecuteProcessResults - we should be able to futz with paths enough that we don't need global shared state that side-steps the caching, as long as everything is relative to the pwd.

@memoized_property
def pants_owned_rustup_globs(self):
return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir)

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

Doesn't this need a /** at the end?

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 10, 2018

Member

This is the compiled binary I think.

self._cmake.bin_dir(),
self._go_dist.bin_dir(),
self._protoc.bin_dir(),
# TODO: execute this in a sealed path!

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

Assuming this means drop the env, let's definitely do this before landing :)


@rule(RustupExe, [Select(Rustup)])
def unpack_rustup(rustup):
if not is_executable(rustup.pants_owned_rustup_path):

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

We shouldn't be building additional non-hermetic caching mechanisms on top of the engine's actual caching...

I suspect the steps we need to follow to make this actually work are:

  1. Modify BinUtil sufficiently that it can under the hood use the UrlToFetch -> Snapshot rule, and so that we can get that Snapshot rather than materialize to a known constant path. The UrlToFetch -> Snapshot rule is cached both in the engine, and in the store, so if it's been run once on a machine, it won't re-download, even if you don't have a daemon.
  2. Have rustup.sh write to (a child of) the pwd, rather than a system global path.

1 doesn't sound too horrible to do as a small separate PR, and then rebase this onto that one.

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 10, 2018

Member

@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.

I think that with that support, a lot more of this would be cacheable in-engine.

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 11, 2018

Member

Opened #6898 about this: it's worth consider whether that is worth doing as a blocker here... and if not, whether we think it would be useful to allow for lowering-latency/increasing parallelism in the rsc task.

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 11, 2018

Contributor

I think a reasonable middle-ground to get this in would be for the is_executable checks to exist in the v1 Task to decide whether to make a product request for a RustupExe from a version, or make a RustupExe from a Snapshot/directory path/something.

Then we'd either block migrating from a v1 Task to a @console_rule on having a local process cache, or acknowledge that this bootstrapping should be rare and not worry about the performance hit of re-extracting a tar file.



class RustUpdateRequest(datatype([
('toolchain_root', text_type),

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

We should make sure toolchain_root is removed before we we merge :)

env=which_cargo_env,
)
which_cargo_res = yield Get(ExecuteProcessResult, ExecuteProcessRequest, which_cargo_req)
cargo_bin_path = which_cargo_res.stdout.strip()

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

stdout is a binary_type not a text_type, so for py3-ness there should be a .decode("utf-8") before the .strip()

yield Get(ExecuteProcessResult, ExecuteProcessRequest,
rustup_update_request(['component', 'add', '--toolchain', version] + list(components),
'add components {} to the toolchain'.format(components)))
# NB: We don't use the symlink here, but we keep in the fs for backwards compat for now (see how

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

Messing with the real filesystem should happen in @console_rules or a v1 Tasks, not in @rules.

Let's pull this into the Task for now, so that from there we get the CargoBin and materialize it, then pass the CargoBin into the next rule we run.


@rule(CargoInstallation, [Select(Rustup), Select(CargoBin)])
def ensure_cargo_installed(rustup, cargo_bin):
if not is_executable(rustup.cargo_ensure_installed_path):

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

ditto

@@ -89,6 +89,10 @@ def load_plugins(build_configuration, plugins, working_set):
subsystems = entries['global_subsystems'].load()()
build_configuration.register_subsystems(subsystems)

if 'rules' in entries:

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

Let's pull this change into a separate PR, and add a test to show it works.

'**/*.rs',
exclude=[
zglobs('**/target/*'),
'process_execution/bazel_protos'

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

Doesn't this need a /**?


# TODO: check that the engine is output to the dist dir, check that the engine works
# somehow?
self.assert_success(pants_run)

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 10, 2018

Contributor

Checking that we've produced a .so and that nm or similar claims it has a symbol called _initnative_engine would probably be a reasonable test?

@stuhood
Copy link
Member

left a comment

This is pretty awesome, thanks!

I think that it currently violates a few too many of the Engine's "laws" to be safe to land. The big pieces are: 1) getting things into Snapshots from arbitrary paths, 2) touching the filesystem in @rules 3) checking things out of Snapshots into the workspace.

We should definitely have tickets/design for 1 and 3 before landing this. We should also most likely have implemented one or both of those things.

2 is trickier... we should definitely not be doing this, but we don't currently have a way to enforce that. Design of enforcement is ... larger, and thus probably not a blocker for landing. But we definitely can't land @rules that use import os APIs to access files, so the code just needs changes there. As Daniel said, actually using/expanding the Engine's caching to make poking paths unnecessary for caching feels like a path forward here.

# coding=utf-8
# Copyright 2018 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 10, 2018

Member

This should probably be defined under pants-plugins, which would send the signal that this is a backend that is only used in the pantsbuild/pants repo, and not actually shipped (and we shouldn't ship it!).


@memoized_property
def pants_owned_rustup_globs(self):
return PathGlobsAndRoot(PathGlobs(['rustup']), self._cargo_bin_dir)

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 10, 2018

Member

This is the compiled binary I think.

description='execute rustup',
env=rustup.rustup_exec_env(),
)
# TODO: this isn't remotable -- it will modify the local filesystem at CARGO_HOME and

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 10, 2018

Member

This is a blocker... although nothing is preventing access to CWD here, we definitely should not be relying on that access.

So the big pieces here (and I think that they're important to plan/design/hopefully-implement before landing this in order to actually prove out the assumptions of the model):

  1. Using PathGlobsAndRoot from within an @rule is not currently kosher because we cannot invalidate dependencies on files outside of the workdir.
  2. The right way to get files out of a Snapshot and into a local working copy is via https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/scheduler.py#L332-L347... but that is not exposed to @rules, and shouldn't be. It could probably be exposed to @console_rules, with a bit of design work.

This comment has been minimized.

Copy link
@illicitonion

illicitonion Dec 11, 2018

Contributor

For 2, we can just do this in the v1 Task for now, and worry about the design work later :)

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 11, 2018

Member

Fine with that.


@rule(RustupExe, [Select(Rustup)])
def unpack_rustup(rustup):
if not is_executable(rustup.pants_owned_rustup_path):

This comment has been minimized.

Copy link
@stuhood

stuhood Dec 10, 2018

Member

@illicitonion : One important missing bit here is that we do not have a local process execution cache, which means that invoking a process to extract a tarball isn't cached, and would run every time.

I think that with that support, a lot more of this would be cacheable in-engine.

cosmicexplorer added a commit that referenced this pull request Jan 3, 2019

add unit testing for requesting transitively available products (#7015)
### Problem

When working on #6892 (and later #6891) I wasn't sure whether I was seeing an issue in the engine relating to requesting products from transitively linked rules -- I was able to figure out that the engine did not have any such issue, using the `--native-engine-visualize-to` option. However, these tests will help me to avoid worrying about that case in the future.

### Solution

- Add the most basic of testing to `test_scheduler.py`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.