New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local process execution can fetch output files #5784

Merged
merged 5 commits into from May 16, 2018

Conversation

Projects
None yet
3 participants
@illicitonion
Copy link
Contributor

illicitonion commented May 4, 2018

This doesn't yet implement output file fetching for remote process
execution, but that's underway in separate PRs.

Depends on #5782 and #5783

@illicitonion illicitonion requested review from stuhood , ity and dotordogh May 4, 2018

("roland",),
tuple(file.path for file in execute_process_result.snapshot.files),
)
# TODO: Work out why this can't be satisified by the engine

This comment has been minimized.

@illicitonion

illicitonion May 4, 2018

Contributor

I would love some help with this one. If I uncomment the below code, I get told:

Exception: No installed rules can satisfy Select(Collection.of(FileContent)) for a root subject of type Snapshot.

but there's a rule installed by create_fs_rules (files_content_noop) which has literally this signature.

This comment has been minimized.

@stuhood

stuhood May 4, 2018

Member

I believe that this relates to the hacks involved in #5718... I started that one on a plane, and will bump the priority of cleaning it up.

This comment has been minimized.

@stuhood

stuhood May 9, 2018

Member

Confirmed fixed by #5718/#5792.

@stuhood
Copy link
Member

stuhood left a comment

Thanks!

@@ -16,23 +16,24 @@
logger = logging.getLogger(__name__)


class ExecuteProcessRequest(datatype(['argv', 'env', 'input_files_digest', 'digest_length'])):
class ExecuteProcessRequest(datatype(['argv', 'env', 'input_files_digest', 'digest_length', 'output_files'])):

This comment has been minimized.

@stuhood

stuhood May 4, 2018

Member

As mentioned on the other PR, I feel about 75% strongly that this should support globs. I could be convinced otherwise, but my understanding of why the "known output files" restriction exists elsewhere is that it's purely to allow for applicative builds. And it would immediately introduce a bunch of otherwise unnecessary logic for codegen.

This comment has been minimized.

@illicitonion

illicitonion May 8, 2018

Contributor

The remote execution API supports capturing individual files, and capturing whole directories, but nothing in between. My plan is to add directory capturing when we have a concrete use case for it.

If we find this isn't flexible enough, we should draft a change to the API, and standardise it with folks, but I'm hoping we can get by with just files and directories.

This comment has been minimized.

@stuhood

stuhood May 8, 2018

Member

Oof. Yea, standardizing glob matching would be a challenge. Darn.

@@ -57,4 +61,8 @@ pub struct ExecuteProcessResult {
pub stdout: Bytes,
pub stderr: Bytes,
pub exit_code: i32,

// It's unclear whether this should be a Snapshot or a digest of a Directory. A Snapshot is handy,

This comment has been minimized.

@stuhood

stuhood May 4, 2018

Member

I think that in the case of a remote process, materializing a Snapshot with PathStat objects will have a non-zero cost (recursively fetching the directory structure, basically).

... which... probably applies to a bunch of usecases where we're currently talking about passing around Snapshots. Hm. It might be the case that we need to actually reify "digest" into the python API, so that we avoid using Snapshots except when we actually need their listings? Relates to #5502 certainly (EDIT: commented there).

This comment has been minimized.

@illicitonion

illicitonion May 8, 2018

Contributor

Let's continue that discussion there :)

future::ok(fs::Snapshot::empty()).to_boxed()
} else {
// Use no ignore patterns, because we are looking for explicitly listed paths.
let posix_fs = Arc::new(fs::PosixFS::new(workdir, self.fs_pool.clone(), vec![]).unwrap());

This comment has been minimized.

@stuhood

stuhood May 4, 2018

Member

IMO, should handle this error. Failing here might mean that the workdir was on a custom filesystem of some kind, which is not-unlikely.

This comment has been minimized.

@illicitonion

illicitonion May 8, 2018

Contributor

Done.

For reference, the only reasons this can fail are:

  1. The root isn't a directory (which would panic at execution time, unless it was changed between process execution and directory fetching)
  2. The root could not be canonicalised (which is super weird)
  3. The ignore patterns could not be parsed (but they're hard-coded as empty, so can't fail)
argv: owned_string_vec(&["/bin/echo", "-n", "foo"]),
env: BTreeMap::new(),
input_files: fs::EMPTY_DIGEST,
output_files: BTreeSet::new(),

This comment has been minimized.

@stuhood

stuhood May 4, 2018

Member

To have similar semantics to globs, this should be either a Vec or OrderMap... as mentioned elsewhere, I think these should be globs. But if they'll be file paths, we should probably preserve literal order in a similar fashion.

This comment has been minimized.

@illicitonion

illicitonion May 8, 2018

Contributor

Let's come back to this when we've resolved the globs discussion :)

This comment has been minimized.

@illicitonion

illicitonion May 9, 2018

Contributor

Ok, globs resolved, let's talk!

I guess if the Result returns a Directory this should definitely be a Set, but if the Result returns a Snapshot then either may make sense, and a Vec is probably a little more natural.

I'll change this to a Vec for now, and see how we go :)

@@ -230,6 +219,8 @@ impl CommandRunner {
stdout: stdout,
stderr: stderr,
exit_code: execute_response.get_result().get_exit_code(),
// TODO: Populate output directory

This comment has been minimized.

@stuhood

stuhood May 4, 2018

Member

A ticket reference would be great.

This comment has been minimized.

@illicitonion

illicitonion May 8, 2018

Contributor

Done

})
.materialize_directory(dir.clone(), request.input_files)
.and_then(move |()| command_runner.run(request, &dir))
// Force tmpdir not to get dropped until the command is finished being run, because it's

This comment has been minimized.

@stuhood

stuhood May 4, 2018

Member

Would it be worthwhile to have run_command_locally explicitly borrow a TempDir to enforce this? Will we ever be using this outside a tmpdir?

This comment has been minimized.

@illicitonion

illicitonion May 8, 2018

Contributor

I can believe a future world where we're e.g. using a FUSE filesystem rather than a tmpdir, but until we actually have other constraints this is a great suggestion! Done.

@stuhood stuhood referenced this pull request May 4, 2018

Closed

@rules to merge Snapshots #5502

// It is a separate trait so that caching implementations can be written which wrap the Store (used
// to store the bytes) and VFS (used to read the files off disk if needed).
pub trait StoreFileByDigest<Error> {
fn store_by_digest(&self, file: File) -> BoxFuture<Digest, Error>;

This comment has been minimized.

@dotordogh

dotordogh May 8, 2018

Contributor

Why change the signature here? Ie pass in a file vs the reference to the file. Is it required because this is only an interface?

This comment has been minimized.

@illicitonion

illicitonion May 8, 2018

Contributor

This is a "my personal taste thing", but I think a reasonable guideline...

tl;dr: Be explicit about overhead. It should be reasonable to expect that if a method takes a reference, it's not going to allocate memory for its own copy of that reference.

Both implementations of this trait actually need to own a File; the OneOffStoreFileByDigest implementation because it passes it to a thread on a threadpool (so needs a copy, because the calling stack may no longer exist when it's run), and the Context implementation because it creates a DigestFile which itself needs to own a File.

I have a general principle that if you're going to end up taking ownership of something's memory, it's better to be up-front about that and say so in the method signature, than to pretend you just need a reference, and quietly make a copy of it your implementation. It means that when you write a call to the method, you're aware that you're making a copy, and if you think this is expensive (e.g. it's being done in a loop on a large Vec) you can decide whether this is a good idea, or whether it makes sense to e.g. rewrite some code so that you can wrap your type in an Rc so that copying it is cheap.

Here, I'm sure it doesn't matter at all, because File is tiny, and copying it is cheap, but I personally find the take-a-reference-and-copy-it thing un-idiomatic :)

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/local-output-files/2 branch from 2a2c43c to aebe931 May 8, 2018

@stuhood

stuhood approved these changes May 8, 2018

stuhood added a commit to twitter/pants that referenced this pull request May 9, 2018

stuhood added a commit to twitter/pants that referenced this pull request May 9, 2018

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/local-output-files/2 branch from aebe931 to a55b298 May 9, 2018

stuhood added a commit that referenced this pull request May 9, 2018

Improve support for intrinsic tasks (#5792)
### Problem

#5718 describes a case which resulted in an attempt to lift the wrong input type for a process execution, and #5784 demonstrates a case where intrinsic types can't be used as root subject types.

Both of these are caused by a fragile/hacky implementation of intrinsics that the `RuleGraph` was (mostly) not aware of.

### Solution

Made `Intrinsic` a `RuleGraph::Entry` type, and then cleaned up the product requests made by intrinsics so that they will always select the input type before executing.

### Result

Fixes #5718, unblocks testing of #5784, and lays cleaner groundwork for addition of further `Intrinsic` tasks.

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/local-output-files/2 branch 2 times, most recently from 4ae3515 to aa93ec5 May 10, 2018

@stuhood stuhood force-pushed the pantsbuild:master branch from b6bb42d to 9e2fdb5 May 11, 2018

illicitonion added some commits May 4, 2018

Local process execution can fetch output files
This doesn't yet implement output file fetching for remote process
execution, but that's underway in separate PRs.

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/local-output-files/2 branch from aa93ec5 to b6e766f May 15, 2018

illicitonion added some commits May 15, 2018

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented May 15, 2018

@stuhood This has changed to return a Digest of a Directory instead of a Snapshot, and accordingly output_files became a BTreeSet again. I also added some unit tests, and the integration tests now all pass - please take another look :)

@stuhood
Copy link
Member

stuhood left a comment

Thanks!

path_globs = PathGlobs.create('', include=files_to_compile)
sources_snapshot = yield Get(Snapshot, PathGlobs, path_globs)

output_files = tuple(

This comment has been minimized.

@stuhood

stuhood May 15, 2018

Member

Is this intentionally exercising multiple outputs? Otherwise you'd want to send these into a subdirectory, and then capture the subdirectory, in order to avoid missing classes generated by annotation processors, etc.

This comment has been minimized.

@illicitonion

illicitonion May 16, 2018

Contributor

Good point. For now, I've just added a comment that this isn't a fully featured rule and we should delete it at some point.

I think it's useful for now to act as documentation for how rules which execute processes can look, but doesn't provide much standalone value other than that (test_write_file tests that the API actually works), and we should just delete it when we have something more real which can act as that documentation.

('stdout', str),
('stderr', str),
('directory_digest', DirectoryDigest),
])): pass


@rule(JavacCompileResult, [Select(JavacCompileRequest)])

This comment has been minimized.

@stuhood

stuhood May 15, 2018

Member

This rule now dupes the above rule.

This comment has been minimized.

@illicitonion

illicitonion May 16, 2018

Contributor

So it does! Removed

@@ -314,5 +376,5 @@ def mk_example_fs_tree(self):
return fs_tree

def mk_scheduler_in_example_fs(self, rules):
rules = list(rules) + create_fs_rules()
rules = list(rules) + create_fs_rules() + [RootRule(ExecuteProcessRequest)]

This comment has been minimized.

@stuhood

stuhood May 15, 2018

Member

Intrinsics are supposed to be ignored for the purposes of unused/unreachable rule checking... are you seeing otherwise?

This comment has been minimized.

@illicitonion

illicitonion May 16, 2018

Contributor

Yep, if I remove this, I see:

$ ./pants test tests/python/pants_test/engine:isolated_process -- -s -k test_write_file

10:45:42 00:00 [main]
               (To run a reporting server: ./pants server)
10:45:43 00:01   [setup]
10:45:49 00:07     [parse]
               Executing tasks in goals: bootstrap -> imports -> unpack-jars -> deferred-sources -> gen -> jvm-platform-validate -> resolve -> resources -> compile -> pyprep -> test
10:45:50 00:08   [bootstrap]
10:45:50 00:08     [substitute-aliased-targets]
10:45:50 00:08     [jar-dependency-management]
10:45:50 00:08     [bootstrap-jvm-tools]
10:45:50 00:08     [provide-tools-jar]
10:45:50 00:08   [imports]
10:45:50 00:08     [ivy-imports]
10:45:50 00:08   [unpack-jars]
10:45:50 00:08     [unpack-jars]
10:45:50 00:08   [deferred-sources]
10:45:50 00:08     [deferred-sources]
10:45:50 00:08   [gen]
10:45:50 00:08     [antlr-java]
10:45:50 00:08     [antlr-py]
10:45:50 00:08     [jaxb]
10:45:50 00:08     [protoc]
10:45:50 00:08     [ragel]
10:45:50 00:08     [thrift-java]
10:45:50 00:08     [thrift-py]
10:45:50 00:08     [wire]
10:45:50 00:08     [avro-java]
10:45:50 00:08     [go-thrift]
10:45:50 00:08     [jax-ws]
10:45:50 00:08     [scrooge]
10:45:50 00:08     [thrifty]
10:45:50 00:08   [jvm-platform-validate]
10:45:50 00:08     [jvm-platform-validate]
10:45:50 00:08   [resolve]
10:45:50 00:08     [ivy]
10:45:50 00:08     [coursier]
10:45:50 00:08     [go]
10:45:50 00:08     [scala-js-compile]
10:45:50 00:08     [scala-js-link]
10:45:50 00:08     [node]
10:45:50 00:08   [resources]
10:45:50 00:08     [prepare]
10:45:50 00:08     [services]
10:45:50 00:08   [compile]
10:45:50 00:08     [node]
10:45:50 00:08     [compile-jvm-prep-command]
10:45:50 00:08       [jvm_prep_command]
10:45:50 00:08     [compile-prep-command]
10:45:50 00:08     [compile]
10:45:50 00:08     [zinc]
10:45:50 00:08     [javac]
10:45:50 00:08     [cpp]
10:45:50 00:08     [errorprone]
10:45:50 00:08     [findbugs]
10:45:50 00:08     [go]
10:45:50 00:08   [pyprep]
10:45:50 00:08     [interpreter]
10:45:50 00:08     [build-local-dists]
10:45:50 00:08     [requirements]
10:45:50 00:08     [sources]
10:45:50 00:08   [test]
10:45:50 00:08     [test-jvm-prep-command]
10:45:50 00:08       [jvm_prep_command]
10:45:50 00:08     [test-prep-command]
10:45:50 00:08     [test]
10:45:50 00:08     [pytest-prep]
10:45:51 00:09     [pytest]
                   Invalidated 2 targets.
                   scrubbed PYTHONPATH=/Users/dwagnerhall/src/github.com/pantsbuild/otherpants/src/python: from py.test environment
10:45:51 00:09       [run]
                     ============== test session starts ===============
                     platform darwin -- Python 2.7.14, pytest-3.4.2, py-1.5.2, pluggy-0.6.0
                     rootdir: /Users/dwagnerhall/src/github.com/pantsbuild/otherpants/.pants.d, inifile: /Users/dwagnerhall/src/github.com/pantsbuild/otherpants/.pants.d/test/pytest-prep/CPython-2.7.14/5a577850dda9b793dbf4d39d014db37ec4327bca/pytest.ini
                     plugins: timeout-1.2.1, cov-2.4.0
                     collected 6 items
                     
                     tests/python/pants_test/engine/test_isolated_process.py F
                     
                      generated xml file: /Users/dwagnerhall/src/github.com/pantsbuild/otherpants/.pants.d/test/pytest/914ece5fb25f3ae64b1832fd03888b46361db439/junitxml/TEST-914ece5fb25f3ae64b1832fd03888b46361db439.xml 
                     ==================== FAILURES ====================
                     ______ IsolatedProcessTest.test_write_file _______
                     
                     self = <pants_test.engine.test_isolated_process.IsolatedProcessTest testMethod=test_write_file>
                     
                         def test_write_file(self):
                           scheduler = self.mk_scheduler_in_example_fs(())
                         
                           request = ExecuteProcessRequest.create_with_empty_snapshot(
                             ("/bin/bash", "-c", "echo -n 'European Burmese' > roland"),
                             dict(),
                             ("roland",)
                           )
                         
                     >     execute_process_result = self.execute_expecting_one_result(scheduler, ExecuteProcessResult, request).value
                     
                     .pants.d/pyprep/sources/25b99150a0661ed71e060bf1d326fcd9be91c377/pants_test/engine/test_isolated_process.py:303: 
                     _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                     .pants.d/pyprep/sources/25b99150a0661ed71e060bf1d326fcd9be91c377/pants_test/engine/scheduler_test_base.py:79: in execute_expecting_one_result
                         request = scheduler.execution_request([product], [subject])
                     .pants.d/pyprep/sources/25b99150a0661ed71e060bf1d326fcd9be91c377/pants/engine/scheduler.py:374: in execution_request
                         self._scheduler.add_root_selection(native_execution_request, subject, product)
                     _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
                     
                     self = <pants.engine.scheduler.Scheduler object at 0x10fdb5150>
                     execution_request = <cdata 'void *' 0x7fd94ca8a870>
                     subject = ExecuteProcessRequest(argv=(u'/bin/bash', u'-c', u"echo -n 'European Burmese' ...st(fingerprint=e3b0c442, serialized_bytes_length=0), output_files=(u'roland',))
                     product = <class 'pants.engine.isolated_process.ExecuteProcessResult'>
                     
                         def add_root_selection(self, execution_request, subject, product):
                           res = self._native.lib.execution_add_root_select(self._scheduler,
                                                                            execution_request,
                                                                            self._to_key(subject),
                                                                            self._to_constraint(product))
                           if res.is_throw:
                     >       raise self._from_value(res.value)
                     E       Exception: No installed rules can satisfy Select(ExecuteProcessResult) for a root subject of type ExecuteProcessRequest.
                     
                     .pants.d/pyprep/sources/25b99150a0661ed71e060bf1d326fcd9be91c377/pants/engine/scheduler.py:283: Exception
                     =============== 5 tests deselected ===============
                     ===== 1 failed, 5 deselected in 0.60 seconds =====
                     
                   tests/python/pants_test/engine:isolated_process                                 .....   FAILURE
                   tests/python/pants_test/engine:scheduler_test_base                              .....   SUCCESS
FAILURE: FAILURE


               Waiting for background workers to finish.
10:45:52 00:10   [complete]
               FAILURE

Leaving this in for now, and we can follow up

@illicitonion illicitonion merged commit 6a5d77c into pantsbuild:master May 16, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@illicitonion illicitonion deleted the twitter:dwagnerhall/local-output-files/2 branch Aug 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment