New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@rules to merge Snapshots #5502

Closed
illicitonion opened this Issue Feb 22, 2018 · 11 comments

Comments

2 participants
@illicitonion
Copy link
Contributor

illicitonion commented Feb 22, 2018

Python code needs the ability to manipulate Snapshots in order to pass them to (or interpret them from) things like process execution.

This primarily involves the ability to compose/merge/union snapshots (union(toolchain_snapshot, sources_snapshot)).

Merging snapshots is already implemented as a rust function here:

#[no_mangle]
pub extern "C" fn merge_directories(
scheduler_ptr: *mut Scheduler,
directories_value: Handle,
) -> PyResult {
let digests_result: Result<Vec<hashing::Digest>, String> =
externs::project_multi(&directories_value.into(), "dependencies")
.iter()
.map(|v| nodes::lift_digest(v))
.collect();
let digests = match digests_result {
Ok(d) => d,
Err(err) => {
let e: Result<Value, String> = Err(err);
return e.into();
}
};
with_scheduler(scheduler_ptr, |scheduler| {
fs::Snapshot::merge_directories(scheduler.core.store.clone(), digests)
.wait()
.map(|dir| nodes::Snapshot::store_directory(&scheduler.core, &dir))
.into()
})
}

Following the pattern of ExecuteProcess(Request|Result), this ticket should add input/output datatypes, and add an intrinsic rule to satisfy them:

pub fn intrinsics_set(&mut self, types: &Types) {
self.intrinsics = vec![
Intrinsic {
kind: IntrinsicKind::Snapshot,
product: types.snapshot,
input: types.path_globs,
},
Intrinsic {
kind: IntrinsicKind::FilesContent,
product: types.files_content,
input: types.directory_digest,
},
Intrinsic {
kind: IntrinsicKind::ProcessExecution,
product: types.process_result,
input: types.process_request,
},
].into_iter()
.map(|i| (i.product, i))
.collect();
}
... which should expose the equivalent of the existing function.

The result should be an API like:

snapshot = yield Get(MergedSnapshot, Snapshots([snapshot_1, snapshot_2, ..]))
... = snapshot.value.directory_digest
@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Mar 27, 2018

The first use of this should probably be to implement #5634 - this is just being able to introspect snapshots.

The second should probably be to remove the explicit PathGlobs calls from https://github.com/pantsbuild/pants/blob/master/tests/python/pants_test/engine/test_isolated_process.py and instead specify the input snapshots using this new API. This covers basic creation of snapshots from files.

The third should be to specify the list of input files for executing cloc remotely. This covers unioning of snapshots (combining a path glob with the cloc script itself). It also covers introspecting snapshots (as the output of cloc is produced as a file in the output snapshot).

After those three things are done, this API is probably sufficiently complete for us to close out this ticket.

@cosmicexplorer I suspect you may be interested in picking this up :)

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Mar 27, 2018

Also, it's possible that this can change a bit in the presence of #5580, which I've been pushing forward on in my spare time. In particular, I think it would remove the question of whether the API needs to be synchronous or not: it could be implemented like any other intrinsic rule, in theory.

It's possible that I will get some non-spare time this week to bring that to a reviewable state.

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Mar 28, 2018

That would make me super happy, and I suspect make whoever's implementing this's life significantly easier; in particular, making it easier to experiment with incremental solutions rather than design the whole thing up front :)

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 4, 2018

Mentioned on #5784: it's possible that we should differentiate Snapshot from "snapshot digest", as the former requires having transitively fetched the directory listing? But not very clear how many usecases would not need the listing. IIRC, there are only a handful of locations where we inspect a Snapshot's listing from a @rule.

I think that skipping past cloc and jumping straight to prototyping a chain of compilations would answer some of these questions.

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented May 8, 2018

@stuhood Yeah, @dotordogh and I have been discussing this, and will be putting together a more concrete proposal very soon. I think we're leaning towards:

Keep everything as a Snapshot until we've got JVM compiles actually implemented (and maybe another thing or two as well), and then see whether we want to pull out a Directory Digest primitive based on how much we actually inspect the file listing, and what the overhead is in practice.

I can imagine worlds in which either is a reasonable default. Picking the cheapest probably makes sense, but isn't critical yet :)

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 8, 2018

Any thoughts on I think that skipping past cloc and jumping straight to prototyping a chain of compilations would answer some of these questions. ?

@stuhood stuhood added the quelea label May 23, 2018

@stuhood stuhood added this to To Do in Python Pipeline Porting via automation Oct 6, 2018

@stuhood stuhood added the P3 - M3 label Oct 6, 2018

@stuhood stuhood changed the title Snapshot manipulation Python API @rules to merge Snapshots Oct 6, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Oct 6, 2018

I've now added this to P3 - M3 and retitled it, because having the ability to merge multiple Snapshots from within an @rule blocks merging the sources for multiple targets in order to invoke pytest (in #6003).

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Oct 6, 2018

@illicitonion : So, for the purposes of #6003, we really only need the ability to merge Snapshots at the instant when we're about to execute a process. If it would make more sense for this ticket to just change ExecuteProcessRequest to take a list of input Snapshots and then merge them, please let me know.

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Oct 8, 2018

I think I'd prefer an @rule to do the merging, just because otherwise we get cache key divergence in the engine (a Directory is nicely canonical, whereas a union of Directorys isn't)...

@illicitonion

This comment has been minimized.

Copy link
Contributor

illicitonion commented Oct 15, 2018

Do you need to merge Snapshots or are Directorys sufficient?

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Oct 15, 2018

@illicitonion : I think that merging Directorys is sufficient. See the usecase sketch on #6003.

@illicitonion illicitonion moved this from To Do to In Progress in Python Pipeline Porting Oct 16, 2018

Python Pipeline Porting automation moved this from In Progress to Done Oct 17, 2018

stuhood added a commit that referenced this issue Oct 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment