New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert usages of the ExecuteProcess helper into simple @rules to simplify snapshot consumption for process execution #5703

Merged
merged 18 commits into from Apr 18, 2018

Conversation

Projects
4 participants
@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Apr 15, 2018

Problem

I find the use of functools in the ExecuteProcess and SnapshottedProcess helpers very difficult to follow (see here).

Solution

Convert the usages of ExecuteProcess helper into a series of @rules (see one consumption of the new rules here).

Result

Several examples of implementing isolated process execution with @rules are shown in test_isolated_process.py. I think this seems like a better interface than ExecuteProcess.

I'm also going to look into the same kind of change for SnapshottedProcess -- that would probably involve changes to the snapshot manipulation API (see #5502) as well.

cosmicexplorer added some commits Apr 14, 2018

attempt to enable execution of processes with input snapshots
`execute_snapshotted_process()` fails when running
`test_integration_concat_with_snapshots_stdout()` because it's not being run within
the directory containing the files in question.
@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Apr 15, 2018

This is actually done as of f2beed0, but I'm also trying to resolve the fixme I left in 7f9f4ee, and after I remove the execute_process_noop() rule and the RootRule(ExecuteProcessRequest), I can successfully request a yield Get(ExecuteProcessResult, CatExecutionRequest, cat_exe_req) directly in test_isolated_process.py. However, this fails with:

Traceback (most recent call last):
                       File "/Users/dmcclanahan/tools/pants/.pants.d/pyprep/sources/fb6a677f9c16cd0342f766119ee7153cebc7351c/pants/engine/native.py", line 447, in extern_project_multi
                         return c.vals_buf(tuple(c.to_value(p) for p in getattr(obj, field_name)))
                     AttributeError: 'CatExecutionRequest' object has no attribute 'env'

This is because gen_nodes() is assuming that if the product is an ExecuteProcessResult, the subject must be an ExecuteProcessRequest, but the point of resolving this fixme is that you can make product requests which use ExecuteProcessRequest and ExecuteProcessResult as intermediate products, but by specifying something else as the subject (in this case, a CatExecutionRequest) which must have a rule to convert it to an ExecuteProcessRequest. The fixme shows this leads to more concise @rule implementations.

However, I can't figure out how to edit that branch of the if in gen_nodes() to execute a rule to convert whatever the subject is into an ExecuteProcessRequest before calling ExecuteProcess::lift() on the result of that rule. Since this already involves removing RootRule(ExecuteProcessRequest), we can assume the subject is not already an ExecuteProcessRequest, or we could check its type (but I don't see how to do that either, even if I get the subject as a Value with val_for()). How might I do either of those?

This above bit should probably go into a separate PR, so for the time being I'm going to revert to f2beed0 so the point of this PR is clear.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:unskip-isolated-process-tests branch from 066a7cc to f2beed0 Apr 16, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Apr 16, 2018

This question is an excellent one... sorry that gen_nodes is so hacky.

we can assume the subject is not already an ExecuteProcessRequest, or we could check its type (but I don't see how to do that either, even if I get the subject as a Value with val_for()). How might I do either of those?

We're currently matching the requested product type and then assuming something about the subject based on that. Instead of assuming the subject type, what we should do instead is first make a request for Select(ExecuteProcessRequest) for the subject, and then and_then the result to create the ExecuteProcess node.

@illicitonion
Copy link
Contributor

illicitonion left a comment

This looks great, thanks for putting it together!!

Feel free to delete absolutely as much of the old code here as you want, e.g. the entirety of SnapshottedProcess and all of the input/output conversion stuff - it's not used by anything, and I don't think anyone intends for it to be used by anything in the future any more :)

cat_file_paths = []
for s in snapshots:
cat_file_paths.extend(f.path for f in s.files)
# TODO(cosmicexplorer): ensure everything is a real path?

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

What does this mean?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

If a file is named e.g. -e or another of cat's expected options it would parse it as an option, not a file. This is something that should be addressed in general but this TODO can be removed.

return (self.bin_path,) + tuple(cat_file_paths)


class CatSourceFiles(datatype('CatSourceFiles', ['globs'])):

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

Maybe drop this type entirely and just use a PathGlobs directly? As this is just for a toy example, I'm not sure the extra level of indirection is useful, but it does reduce clarity...

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

I agree, I wasn't sure I could do that for some reason when I started making these edits but I agree that it's wholly unnecessary / obscuring here.

input_files_digest="e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
class JavacVersionCommand(Javac):

def gen_argv(self, snapshots=None):

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

If we remove the abstract method, the snapshots argument can disappear :)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

Agreed, I'm not sure why I left the @abstractmethod in.

digest_length=sources_snapshot.digest_length,
env=env)
class JavacVersionOutput(datatype('JavacVersionOutput', [
'exit_code',

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

exit_code seems like a strange thing to be modelling here as a field; wouldn't this be more naturally exposed by making the task Throw on a non-zero exit code?

I can imagine tasks where we want to expose exit_code, but I don't think invoking javac is one of them...

(Can happily be a follow-up to experiment separately with failure-related things)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

I hadn't thought of it this way, I think that change would be reasonable to introduce in this PR.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

Modified JavacVersionOutput to only contain the stderr field and instead raise if the exit code is nonzero, and caught the Throw in the test!

class JavacCompileCommand(Javac):

def gen_argv(self, snapshots):
snapshot_file_paths = []

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

Probably wants to deduplicate; use an OrderedSet?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

Made the change and left a TODO here because deduplication of file paths sounds like something we should handle for users of @rules and Binarys.

self.assertEquals(1, len(results))
concatted = results[0]
self.assertEqual(Concatted('one\ntwo\n'), concatted)

# TODO: Re-write this test to work with non-tar-file snapshots

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

You've done this TODO, please delete the old test :)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

Done!

self.assertEquals(1, len(results))
javac_version_output = results[0]
self.assertEqual(0, javac_version_output.exit_code)
self.assertIn('javac', javac_version_output.version_output)

# TODO: Re-write this test to work with non-tar-file snapshots
@unittest.skip

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

Delete this one too :)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

Done!

@@ -198,46 +375,40 @@ def test_javac_compilation_example(self):
self.assertTrue(os.path.exists(os.path.join(classpath_entry.path, 'simple', 'Simple.class')))

def test_javac_compilation_example_rust_success(self):

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

Drop rust from the name of this test

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

Done, and also removed from the other test with _rust_ in the name.

self.assertEquals(1, len(results))
javac_compile_result = results[0]
self.assertEqual(1, javac_compile_result.exit_code)
self.assertIn("NOT VALID JAVA", javac_compile_result.stderr)

def test_failed_command_propagates_throw(self):

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

Feel free to delete this, and the entire SnapshottedProcess everything, it feels unnecessary now

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

Everything related to SnapshottedProcess was deleted, a TODO was left on JavacCompileResult so it's clear where to edit once we can make process output file snapshots.

def prefix_of_command(self):
return tuple([self.bin_path])
@abstractmethod
def gen_argv(self, snapshots):

This comment has been minimized.

@illicitonion

illicitonion Apr 16, 2018

Contributor

Can you add some documentation on this method explaining what it does and, in particular, what snapshots means? I'm not convinced that "snapshots" is a general-purpose enough abstraction in an interface here... Maybe let's remove the abstract method for now, and use ad-hoc methods on the subclasses for now?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 16, 2018

Contributor

I was going to delete this from the Binary class but forgot to leave a TODO, totally agree and will do.

cosmicexplorer added some commits Apr 16, 2018

@illicitonion
Copy link
Contributor

illicitonion left a comment

This looks great!

Pretty much just more stuff to delete :)

More symbols to delete:
src/python/pants/engine/isolated_process.py:
_run_command
_snapshot_path
_extract_snapshot
_snapshotted_process
_setup_process_execution
_post_process_execution
SnapshottedProcessRequest
SnapshottedProcessResult
_Snapshots
create_snapshot_rules

tests/python/pants_test/engine/test_isolated_process.py:
tarfile
assertFirstEntryIsReturn
assertFirstEntryIsThrow
assertReturn
assertPathContains

cat_file_paths = []
for s in snapshots:
cat_file_paths.extend(f.path for f in s.files)
# TODO(cosmicexplorer): We should have a structured way to create command

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

I'd be tempted to just remove the Binary class entirely and hand-craft the command lines in each thing which is currently a subclass, until we work out what Binary should actually look like. Right now, it's extra cognitive overhead which doesn't provide any value :)

Then we can just address this TODO in this class for now with something along the lines of if any(lambda path: path.startswith("-"), cat_file_paths): raise(...)

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 17, 2018

Contributor

Removed the Binary class for now and sanitized the command line in the way you described. Also made an issue #5719 which touches on this.

raise ValueError('shell_cat_binary should be an instance of ShellCat')
if not isinstance(input_file_globs, PathGlobs):
raise ValueError(
'cat_source_files should be an instance of PathGlobs')

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

input_file_globs

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 17, 2018

Contributor

Fixed, thanks. This is why we need simple type checking, which I want to introduce immediately.

def gen_argv(self, snapshots):
# TODO(cosmicexplorer): We use an OrderedSet here to dedup file entries --
# should we be allowing different snapshots to have overlapping file paths
# when exposing them in python?

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

I'm not sure, but either (definitely for a future follow-up):

  1. The Python code here should just have a Snapshot, not a collection of Snapshots.
  2. The Python code should have to handle overlapping paths.
    I don't currently have a strong preference between the two, and I suspect both will end up being practical, but I suspect that recommending towards 1 will be nicer for rule writers.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 17, 2018

Contributor

Left a comment on #5707 quoting this, agreed on preferring the 1st approach.

input_files_digest=str(sources_snapshot.fingerprint),
digest_length=sources_snapshot.digest_length,
env=env)
class JavacCompileCommand(Javac):

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

I'm not sure how much value this provides separate from the JavacCompileRequest... Is there a reason it needs to exist?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 17, 2018

Contributor

I don't think so, so I've removed it for now. The right solution here will arise eventually.

if exit_code != 0:
stdout = javac_proc_result.stdout
stderr = javac_proc_result.stderr
raise ProcessExecutionFailure(

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

For the future:

Maybe we should expose a error_on_nonzero_exit_code boolean on ExecuteProcessRequest (either in the Rust or in a Python wrapper) to do this automagically? It feels like 90% of ExecuteProcessRequests are probably going to want this behaviour, and it's going to be annoyingly much boilerplate to have to add it each time

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 17, 2018

Contributor

Totally agree, and I broke out #5719 for this.

class ExecuteProcessRequestTest(SchedulerTestBase, unittest.TestCase):
def test_blows_up_on_invalid_args(self):
with self.assertRaises(ValueError):
ExecuteProcessRequest(argv=['1'], env=tuple(), input_files_digest='', digest_length=0)

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

Given input_files_digest has a required length and set of characters, it may be worth putting a valid digest in here, because erroring on an invalid digest would be a valid thing for us to do, which would make the rest of these tests pass for the wrong reason

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 17, 2018

Contributor

I've made this test use the empty snapshot I added to fs.py and left a TODO saying we should be validating digests provided to ExecuteProcessRequests.

process_request_from_javac_version,
get_javac_version_output,
])
results = self.execute(scheduler, JavacVersionOutput, JavacVersionCommand())

This comment has been minimized.

@illicitonion

illicitonion Apr 17, 2018

Contributor

New function: self.execute_expecting_one_result?
javac_version_output = self.execute_expecting_one_result(scheduler, JavacVersionOutput, JavacVersionCommand())

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Apr 17, 2018

Contributor

Added exactly this to scheduler_test_base.py, thanks for pointing that out.

cosmicexplorer added some commits Apr 17, 2018

cosmicexplorer added some commits Apr 17, 2018

@illicitonion illicitonion merged commit 3f49af5 into pantsbuild:master Apr 18, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ity ity added the remoting label Apr 22, 2018

@ity ity added this to To do in Remoting via automation Apr 22, 2018

@ity ity moved this from To do to Done in Remoting Apr 22, 2018

@ity ity moved this from Done to In progress in Remoting Apr 22, 2018

@illicitonion illicitonion moved this from In progress to Done in Remoting Apr 23, 2018

stuhood added a commit that referenced this pull request Apr 27, 2018

type-check specific datatype fields concisely and remove the class na…
…me argument (#5723)

### Problem

See #5716. We had to introduce a ton of boilerplate to check argument types in some very simple usage of `datatype` subclasses in tests in #5703. This is one way to make that easier. 

### Solution

- Move `TypeConstraint` and subclasses from `src/python/pants/engine/addressable.py` to `src/python/pants/util/objects.py`, where they probably should have been anyway, to avoid an import cycle.
- Remove the class name argument from `datatype()` (and update all call sites).
- Allow specifying a tuple `('field_name', FieldType)` in the list of fields to `datatype()` to have the field type-checked by an `Exactly(FieldType)` type constraint at construction.
- Override `__str__()` and `__repr__()` for `datatype` objects, and add testing for the str and repr.
- Add testing for informative error messages in `datatype` construction.

### Result

We can now concisely declare `datatype` objects without having to repeat the class name, and can opt-in to a type check for specific fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment