Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New intrinsic InputFileContent -> Digest #8226

Merged
merged 13 commits into from Sep 5, 2019

Conversation

@gshuflin
Copy link
Contributor

commented Aug 30, 2019

Problem

We'd like to have an intrinsic that creates a Digest from an FileContent (or similar type), which would allow us to programatically create files that can be exposed to an isolated process. The immediate use case for this is for porting the cloc goal to the v2 engine, but it's a broadly-useful rule to have engine support for.

Solution

Write that intrinsic, wrapping FileContent inside a new type InputFileContent to avoid a rule graph ambiguity.

Result

This commit just adds the intrinsic and a test for it, so that it can be used in future work. It has no direct user-facing effect yet. This also resolves #7718

@gshuflin gshuflin force-pushed the gshuflin:new-intrinsic branch from af3e6a8 to cf3b13b Aug 30, 2019

@cosmicexplorer
Copy link
Contributor

left a comment

This is great!!! I love how clean this solution is and how easy it is to integrate into v2 cloc!

  1. Could the title contain a few more words describing what the new intrinsic does? It can make using git log later on significantly easier for others.
  2. Can we perhaps link to/create an issue/PR about making cloc v2 in the "Problem" section?
@@ -21,6 +21,10 @@ def __str__(self):
return repr(self)


class InputFileContent(FileContent):
pass

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor
Suggested change
pass
"""A "newtype" wrapper for `FileContent`.
TODO: This class is currently necessary because the engine
otherwise finds a cycle between FileContent <=> DirectoryDigest.
"""

(and possibly link to a newly-created issue which briefly links to this PR and adds the contents of the TODO?)

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 3, 2019

Author Contributor

changed

"""Given a Handle for `obj`, write bytes(obj) and return it."""
c = self._ffi.from_handle(context_handle)
v = c.from_value(val[0])
# Consistently use the empty string to indicate None.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor

What does "consistently" mean here?

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 3, 2019

Author Contributor

I just copied this from the extern_val_to_str definition immediately below

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 4, 2019

Contributor

Some context! IIRC if None is encountered here, it will silently fail in a weird way, so we need to not pass None; ideally we would fix this by making the cffi layer not silently fail in weird ways, but given we haven't done that, having a safe-ish behaviour to fall back to seemed reasonable. Probably the same applies here :)

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 4, 2019

Author Contributor

added a comment to clarify this

@@ -81,6 +81,7 @@ default-members = [
]

[dependencies]
bazel_protos = { path = "process_execution/bazel_protos" }

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor

(see below comment on this dependency and how it could probably instead be scoped just to the fs crate)

file_node.set_name(filename);
file_node.set_digest((&digest).into());
file_node
});

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor

So I didn't realize this when discussing this with you earlier, but it seems like it might be possible to actually keep all of the logic which uses the bazel_protos crate within the fs crate, since storing bytes is a pretty generic thing to want to do, and other subcrates might want to use it, for example.

I think it should be possible to expose a store_file_with_name() (or something) method in the Store, which could replace the above call to store.store_file_bytes() and remove the rest of this and_then()?

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 3, 2019

Author Contributor

changed

{
let context = context.clone();
let core = context.core.clone();
self
.select_product(&context, context.core.types.path_globs, "intrinsic")
.select_product(&context, types.path_globs, "intrinsic")

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor

It tends to make it harder to review with these sorts of changes, especially with rustfmt involved. While I strongly agree with the spirit, I found it difficult when reviewing the previous commits of this you'd shown me to know which lines you'd really changed versus which ones were cosmetic. I think these could stay since I've maybe already reviewed them a couple times, but in general I try to avoid making these sorts of abbreviations personally.

&tasks::Rule::Intrinsic(Intrinsic { product, input })
if product == context.core.types.snapshot
&& input == context.core.types.directory_digest =>
&Rule::Intrinsic(Intrinsic { product, input })

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor

Same comment as above with abbreviating things -- it can make it more difficult to review. I might actually really appreciate it if you could revert these because I'm finding it a little difficult to read even right now. It's not necessary if you're convinced the improved readability justifies the increase in review difficulty, imho.

file_contents = b'some file contents'

input_file = InputFileContent(path=file_name, content=file_contents)
digest = self.scheduler.product_request(Digest, [input_file])[0]

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor
Suggested change
digest = self.scheduler.product_request(Digest, [input_file])[0]
digest, = self.scheduler.product_request(Digest, [input_file])

I try to do this because it will also check if there happens to be any more than just one value, which may happen e.g. if the method's API changes for some reason.

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 3, 2019

Author Contributor

changed

description='cat the contents of this file',
)

result = self.scheduler.product_request(ExecuteProcessResult, [req])[0]

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor
Suggested change
result = self.scheduler.product_request(ExecuteProcessResult, [req])[0]
result, = self.scheduler.product_request(ExecuteProcessResult, [req])

(same as above)

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 3, 2019

Author Contributor

changed

@@ -253,6 +253,24 @@ def test_create_from_snapshot_with_env(self):
self.assertEqual(req.env, ('VAR', 'VAL'))


class TestInputFileCreation(TestBase, unittest.TestCase):

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Aug 31, 2019

Contributor
Suggested change
class TestInputFileCreation(TestBase, unittest.TestCase):
class TestInputFileCreation(TestBase):

TestBase already subclasses unittest.TestCase!

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 3, 2019

Author Contributor

changed

@benjyw benjyw requested review from stuhood and jsirois Aug 31, 2019

@illicitonion
Copy link
Contributor

left a comment

Looks great, thanks!

Would be handy to explain what the intrinsic does in the PR title, and I'm curious about what it's going to be used for :)

...Digging through history, it would be good to close out #7718 when this lands. And also, it looks like I already did something similar in #7739 but never merged it because of #7710... Oops...

self.select_product(&context, types.input_file_content, "intrinsic")
.and_then(move |files_content: Value| {
let filename = externs::project_str(&files_content, "path");
let bytes: bytes::Bytes = externs::project_bytes(&files_content, "content").into();

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 2, 2019

Contributor

May be slightly clearer as:

Suggested change
let bytes: bytes::Bytes = externs::project_bytes(&files_content, "content").into();
let bytes = bytes::Bytes::from(externs::project_bytes(&files_content, "content"));

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 3, 2019

Author Contributor

done

let digest = store.store_file_bytes(bytes, false)
.map_err(|e| throw(&e));

digest.and_then(move |digest: hashing::Digest| {

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 2, 2019

Contributor

This code actually already exists in DownloadedFile::snapshot_of_one_file - probably worth moving it maybe into the store crate and re-using it :)

@@ -130,6 +130,10 @@ impl Tasks {

pub fn intrinsics_set(&mut self, types: &Types) {
let intrinsics = vec![
Intrinsic {
product: types.directory_digest,
input: types.input_file_content,

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 2, 2019

Contributor

Does it make sense to have this be an intrinsic from FilesContent rather than FileContent? That would be more general (allow for capturing multiple files at once), and phrasing the singleton case as:

directory_digest = yield Get(Digest, FilesContent([FileContent("blah", "bleh")]))`

doesn't seem too inconvenient... (Though we may want to enforce some kind of path ordering constraint to maximise cache hits)

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Sep 3, 2019

Contributor

I like the idea of using FilesContent for multiple files by default! And a path ordering constraint sounds like exactly the right idea (I believe we'd probably have to do this sorting within the FilesContent constructor in order for the engine to cache it?). I don't believe this would be too difficult to implement.

@@ -253,6 +253,24 @@ def test_create_from_snapshot_with_env(self):
self.assertEqual(req.env, ('VAR', 'VAL'))


class TestInputFileCreation(TestBase, unittest.TestCase):
def test_input_file_creation(self):

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 2, 2019

Contributor

Could you also throw in a test for a file in a directory? It's easy to accidentally make the Directory digest look like:

root:
directories = []
files = [somedir/filename]

rather than

root:
directories = [somedir]
files = []

somedir:
directories = []
files = [filename]
@illicitonion

This comment has been minimized.

Copy link
Contributor

commented Sep 2, 2019

...Digging through history, it would be good to close out #7718 when this lands. And also, it looks like I already did something similar in #7739 but never merged it because of #7710... Oops...

One key difference is that #7739 allowed you to set the executable bit on a file - may be worth adding (maybe as a follow-up) :)

@gshuflin gshuflin changed the title New intrinsic New intrinsic InputFileContent -> Digest Sep 3, 2019

@gshuflin gshuflin force-pushed the gshuflin:new-intrinsic branch from cf3b13b to ebb3ca9 Sep 4, 2019

@stuhood stuhood removed their request for review Sep 4, 2019

@stuhood

This comment has been minimized.

Copy link
Member

commented Sep 4, 2019

I'll resign to let @illicitonion look at this one. Thank you!

@illicitonion
Copy link
Contributor

left a comment

Looks good :) Let's re-use one more piece of code, then I'm happy to merge, though still curious about your thoughts on:

  1. FilesContent vs FileContent
  2. Executable bit setting
@@ -21,6 +21,13 @@ def __str__(self):
return repr(self)


class InputFileContent(FileContent):
"""A newtype wrapper for FileContent.
TODO: This class is currently necessary because the engine

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 4, 2019

Contributor
Suggested change
TODO: This class is currently necessary because the engine
TODO(7710): This class is currently necessary because the engine
"""Given a Handle for `obj`, write bytes(obj) and return it."""
c = self._ffi.from_handle(context_handle)
v = c.from_value(val[0])
# Consistently use the empty string to indicate None.

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 4, 2019

Contributor

Some context! IIRC if None is encountered here, it will silently fail in a weird way, so we need to not pass None; ideally we would fix this by making the cffi layer not silently fail in weird ways, but given we haven't done that, having a safe-ish behaviour to fall back to seemed reasonable. Probably the same applies here :)

let store = self.clone();
store
.store_file_bytes(contents, false)
.and_then(move |digest: hashing::Digest| {

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 4, 2019

Contributor

I think this can just be rewritten as:

store.store_file_bytes(contents, true)
  .and_then(move |digest| DownloadedFile::snapshot_of_one_file(store, path, digest))
  .map(|snapshot| snapshot.digest)

subject to moving the DownloadedFile::snapshot_of_one_file implementation to somewhere more convenient :)

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 4, 2019

Contributor

(Now that I look at the comment, just moving DownloadedFile::snapshot_of_one_file to live instead on Store, and calling that in nodes.rs seems like a reasonable solution?

This comment has been minimized.

Copy link
@gshuflin

gshuflin Sep 4, 2019

Author Contributor

Yup, this looks like it works (and takes care of the path-munging for me), will commit this shortly.

digest, = self.scheduler.product_request(Digest, [input_file])

req = ExecuteProcessRequest(
argv=('/usr/bin/echo', 'oo'),

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 4, 2019

Contributor

Shouldn't this /usr/bin/cat somedir/filename and assert that the stdout is content? :)

@gshuflin gshuflin force-pushed the gshuflin:new-intrinsic branch from ebb3ca9 to b048e9e Sep 4, 2019

@illicitonion
Copy link
Contributor

left a comment

I'm going to quickly push my last review comment to the branch to avoid another cycle of review - hope that's ok!

@@ -238,6 +238,53 @@ impl Store {
.to_boxed()
}

/// Store the given bytes buffer as a file at the given path
pub fn store_file_with_path(&self, path: PathBuf, contents: Bytes) -> BoxFuture<Digest, String> {

This comment has been minimized.

Copy link
@illicitonion

illicitonion Sep 5, 2019

Contributor

I'd probably inline this at the call site - taking a file-like thing (path/bytes) and returning a Digest, it's unclear whether this returns a Digest of a file or a directory, whereas having the caller use something which returns a Snapshot, and then do the manipulation itself to project out the Digest is more clear.

illicitonion added 2 commits Sep 5, 2019

@illicitonion illicitonion merged commit 41e6b39 into pantsbuild:master Sep 5, 2019

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
illicitonion added a commit that referenced this pull request Sep 10, 2019
Use InputFilesContent instead of InputFileContent (#8272)
### Problem

cf. the discussion on #8226 , it makes more sense for the input type for the Digest-yielding intrinsic to be a wrapper around `FilesContent` rather than `FileContent`. This makes the intrinsic more general - it can be used to create a `Digest` of several files at once.

### Solution

Rename the type `InputFileContent` to `InputFilesContent` everywhere it is used, change the v2 engine intrinsic to handle multiple input `FileContent` objects, and update tests accordingly.

### Result

This commit shouldn't change any end-user-facing behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.