New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary directories to be Snapshotted #5801

Merged
merged 4 commits into from May 17, 2018

Conversation

Projects
None yet
2 participants
@illicitonion
Copy link
Contributor

illicitonion commented May 10, 2018

We don't cache anything intermediate here, we just dump the Snapshot
into the Store and return it.

This is intended to be a short-term transition path to allow BinaryUtils
to be consumed in the v2 engine.

@illicitonion illicitonion requested a review from stuhood May 10, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 10, 2018

As discussed in slack, I think option (2) is the way to go, via a native method like: PyResult scheduler_capture_snapshots(Scheduler*, char*, Value*, uint64_t), which would take a base path and a list of Values wrapping PathGlobs, and return a value wrapping a list of Snapshots.

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/binutil/snapshotoutsideroot branch from e3a93d4 to e8c3279 May 11, 2018

@stuhood stuhood force-pushed the pantsbuild:master branch from b6bb42d to 9e2fdb5 May 11, 2018

@stuhood
Copy link
Member

stuhood left a comment

Thanks!

Ok(PosixFS {
root: Self::canonicalize(root)?,
pool: self.pool.clone(),
ignore: self.ignore.clone(),

This comment has been minimized.

@stuhood

stuhood May 13, 2018

Member

The ignore patterns make this a bit challenging I think... the patterns used here are relative to the root of the repository/buildroot, and so we construct the Gitignore instance with an empty root. But if you were going to be capturing from a subdirectory of the repository and you still wanted to apply the same patterns, you'd need to pass in the portion of the subdirectory relative to the buildroot (afaik).

This is to account for patterns like:

$ cat .gitignore
/please/ignore/this/long/specific/file.sh

I think that not applying ignore patterns to this type of capture would be reasonable, and would probably be fine for now. But if we're going to apply them, I think we'd need to re-construct them as relative to the buildroot (if the capture was under the buildroot), or ignore the patterns if the capture was outside the buildroot.

This comment has been minimized.

@stuhood

stuhood May 16, 2018

Member

This is still an issue I think.

This comment has been minimized.

@illicitonion

illicitonion May 17, 2018

Contributor

Oops! Done

@@ -216,6 +216,8 @@
PyResult execution_add_root_select(Scheduler*, ExecutionRequest*, Key, TypeConstraint);
RawNodes* execution_execute(Scheduler*, ExecutionRequest*);
PyResult capture_snapshot(Scheduler*, char*, Value);

This comment has been minimized.

@stuhood

stuhood May 13, 2018

Member

In general, for synchronous methods with concurrent/async implementations, I think it's a good idea to expose batch interfaces by default, as that sends the right signal about usage (that a caller should attempt to batch their calls wherever possible). Would recommend making this a batch method.

This comment has been minimized.

@illicitonion

illicitonion May 16, 2018

Contributor

I understand the desire, but will push back a little for now for two reasons:

  1. I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.
  2. The overhead of taking a list of things, which I think to do nicely would be: Create a new intrinsic type for PathGlobsAndRoot, pass it through everywhere, take a pointer to them with separate length, project out each thing from the tuple, seems like overkill. Happy to do this if we end up actually needing it, but again, if we find ourselves in that place we should probably design an actual API.

This comment has been minimized.

@stuhood

stuhood May 16, 2018

Member

I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.

It will be used until "all" of pants is ported to the v2 engine in order to transition back and forth between local files and snapshots. And we immediately have a need for it to be concurrent I think, in order to capture ~500 jars into 500 snapshots on every jvm_compile startup. See #5762 (comment)

The overhead of taking a list of things, which I think to do nicely would be: Create a new intrinsic type for PathGlobsAndRoot, pass it through everywhere, take a pointer to them with separate length, project out each thing from the tuple, seems like overkill.

I don't think you need a new intrinsic type, although wrapper datatypes would be useful. It would look like:

  1. project_multi for a field that contains a list of PathGlobsAndRoots
  2. project str for the root
  3. use existing PathGlobs listing code for the rest

This comment has been minimized.

@illicitonion

illicitonion May 17, 2018

Contributor

Before jumping into discussion here: This is now done.

I don't expect people to actually call this much at all, it's really just a short-term hack; if the do see people calling this, we need to design a better API.

It will be used until "all" of pants is ported to the v2 engine in order to transition back and forth between local files and snapshots.

I was expecting this to be a stop-gap until BinUtils is migrated into v2, and somehow exposed to v1 tasks...

And we immediately have a need for it to be concurrent I think, in order to capture ~500 jars into 500 snapshots on every jvm_compile startup. See #5762 (comment)

I think I maybe don't understand #5762 then... What are those 500 jars? My understanding is that we already create snapshots for targets when we parse BUILD files for them (that's why ./pants list :: fills up ~/.cache/pants/lmdb_store), but that we throw away the references to the Snapshots when we create the v1 Target type. I thought that #5762 was basically a "don't throw away those references" task, plus some "clean up some legacy stuff which means we have some convoluted paths" task, rather than a "do all of those snapshots twice" kind of task...?

This comment has been minimized.

@stuhood

stuhood May 17, 2018

Member

I was expecting this to be a stop-gap until BinUtils is migrated into v2, and somehow exposed to v1 tasks...

There are lots of existing tasks that download things that will need to be ported/bootstrapped into the new model in order to be remoted... maybe this is the last integration that we will do starting from "mid graph", and all other task porting will be bottom up (and thus run resolvers in the datacenter with network access...?), but I have been assuming that ivy/coursier/npm/yarn/go/etc would need bootstrapping into Snapshots for now.

This comment has been minimized.

@stuhood

stuhood May 17, 2018

Member

I think I maybe don't understand #5762 then... What are those 500 jars?

The jars fetched by ivy and coursier: the 3rdparty dependencies. The ivy/coursier tasks run during the resolve goal, rather than during build file parsing.

@@ -286,6 +286,14 @@ def run_and_return_roots(self, execution_request):
self._native.lib.nodes_destroy(raw_roots)
return roots

def capture_snapshot(self, root_path, path_globs):
result = self._native.lib.capture_snapshot(self._scheduler, root_path, self._to_value(path_globs))
value = self._from_value(result.value)

This comment has been minimized.

@stuhood

stuhood May 13, 2018

Member

Worth lifting out a little _from_pyresult helper?

This comment has been minimized.

@illicitonion

illicitonion May 16, 2018

Contributor

Done

})
}

fn capture_snapshot_inner(

This comment has been minimized.

@stuhood

stuhood May 13, 2018

Member

Keeping this file slimmer and making this an impl Scheduler method would be preferable I think.

This comment has been minimized.

@illicitonion

illicitonion May 16, 2018

Contributor

Done. Slightly ugly because I can't use ?s in a function returning PyResult, but probably a reasonable separation

@stuhood stuhood force-pushed the twitter:dwagnerhall/binutil/snapshotoutsideroot branch from e8c3279 to 874350f May 13, 2018

@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 13, 2018

(rebased this to get a useful travis run)

illicitonion added some commits May 10, 2018

Allow arbitrary directories to be Snapshotted
We don't cache anything intermediate here, we just dump the Snapshot
into the Store and return it.

This is intended to be a short-term transition path to allow BinaryUtils
to be consumed in the v2 engine.

@illicitonion illicitonion force-pushed the twitter:dwagnerhall/binutil/snapshotoutsideroot branch from 874350f to f7ba084 May 16, 2018

illicitonion added some commits May 17, 2018

@stuhood
Copy link
Member

stuhood left a comment

Thanks!

@stuhood stuhood merged commit 777e500 into pantsbuild:master May 17, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@illicitonion illicitonion deleted the twitter:dwagnerhall/binutil/snapshotoutsideroot branch Aug 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment