Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting roots goal to v2 #8199

Merged
merged 11 commits into from Aug 24, 2019

Conversation

@gshuflin
Copy link
Contributor

commented Aug 22, 2019

Porting the roots goal from the v1 to v2 engine. The feedback from #8045 and #8176 was substantial enough that I closed those and made a new pull request with much of the same code mentioned there.

@stuhood
Copy link
Member

left a comment

This turned out really nicely. Thanks @gshuflin!

else:
all_paths |= {f"**/{path}/"}

path_globs = [PathGlobs(include=(glob_text,)) for glob_text in all_paths]

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 22, 2019

Member

It's a bit simpler to just do a single expansion here, because you avoid redundant work, and don't have to merge the matches later:

snapshot = yield Get(Snapshot, PathGlobs(include=tuple(all_paths)))
dirs_from_snapshot |= set(snapshot.dirs)

all_source_roots: Set[SourceRoot] = set()
for dir in sorted(list(dirs_from_snapshot)):

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 22, 2019

Member

dir is a great name for this variable, but unfortunately it shadows a python builtin function. Should avoid doing that.

class SourceRoot(datatype([('path', str), 'langs', ('category', str)])):
pass

SourceRootsCollection = Collection.of(SourceRoot)

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 22, 2019

Member

So, I know that we do this in a few other places, but I think that we should do less of it. The issue with "Collection.of(SourceRoot)" is that it doesn't send a good signal of what the collection actually represents. And because types are so important in v2, it's very likely that someone might be tempted to use a SourceRootCollection in some other position... and that would end up confusing. When I request a SourceRootCollection, will I get "all of them"? "some of them"? etc.

So. What I'd recommend is, rather than this line, consider defining:

class AllExistingSourceRoots(datatype([
  (source_roots, Set[SourceRoot]),
])): pass

...instead (or some other name that explains "which ones" you are getting). That sends a better signal about what your rule computes.

This comment has been minimized.

Copy link
@gshuflin

gshuflin Aug 22, 2019

Author Contributor

So is the feedback here that we should generally prefer to create custom types that are wrappers around native python collections, rather than using the Collection.of construction?

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 22, 2019

Member

Yes. And the reason is that it allows you to give it a unique type such that is unambiguously a particular set of SourceRoots (namely "all the ones that exist on disk").

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 22, 2019

Member

So, actually, this would also be fine:

class AllExistingSourceRoots(Collection.of(SourceRoot)): pass

Because unlike AllExistingSourceRoots = Collection.of(SourceRoot) it results in a unique type. This is basically the difference between a typealias and a trait impl in rust. And the datatypey one above would be like a rust newtype.


@console_rule(Roots, [Console, Roots.Options, SourceRootConfig])
def list_roots(console, options, source_root_config):
all_roots = yield Get(SourceRootsCollection, SourceRootConfig, source_root_config)

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 22, 2019

Member

Requesting this type explicitly is fine, but you can also move this directly into your argument list and skip this step:

@console_rule(Roots, [Console, Roots.Options, SourceRootsCollection])
def list_roots(console, options, all_roots):
  ..

The reason this works is just because the same way this @(console_)rule gets the SourceRootConfig directly, your other @rule can get it directly.

@stuhood stuhood requested a review from benjyw Aug 22, 2019

@stuhood
Copy link
Member

left a comment

Thanks!

}
options.update(self.options['']) # We need inherited values for pants_workdir etc.

self.context(for_subsystems=[SourceRootConfig], options={

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 23, 2019

Member

@benjyw : Do you see a way to make the Subsystem creation easier here?

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Not offhand, this is how we do it in v1 tests, no?

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 24, 2019

Member

Maybe. It's just... a lot.

@benjyw
Copy link
Contributor

left a comment

Nice, I am liking how this is turning out.

all_paths: Set[str] = set()
for path in source_roots.traverse():
if path.startswith("^/"):
all_paths |= {f"{path[2:]}/"}

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

I think all_paths.add() is preferable instead of constructing a throwaway singleton set.


all_source_roots: Set[SourceRoot] = set()
for directory in sorted(snapshot.dirs):
match: SourceRoot = source_roots.trie_find(directory)

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

What is an example of a case where a directory doesn't match in the trie? Didn't we construct all_paths so that they have to match? If there is such a case then it needs to be documented here to explain why we do this extra match.

This comment has been minimized.

Copy link
@gshuflin

gshuflin Aug 23, 2019

Author Contributor

The PathGlobs -> Snapshot rule is picking up subdirectories of the actual source roots (for instance picking up subdirectories of src/rust/*) that would get pruned out by calling find_by_path() (which is what should be here instead of trie_find)

pass


class AllExistingSourceRoots(Collection.of(SourceRoot)):

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

The type should be SourceRootCollection or something. "All existing" is a description of what happens to be inside one specific instance we care about, but it's not a property of the type.

This comment has been minimized.

Copy link
@gshuflin

gshuflin Aug 23, 2019

Author Contributor

@stuhood 's argument was that this type should be a newtype wrapping the more-generic Collection.of(SourceRoot) to indicate what we're using that collection for in this change (and presumably if we needed to do something else with a Collection.of(SourceRoot) somewhere else, we'd create a new wrapper type for that purpose).

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Oh, hm, yeah, I guess the engine requires those to all be separate so it knows what to return.

Let's just name this AllSourceRoots though. The "existing" is redundant. The concept of a non-existing source root (i.e., a string that matches the pattern but isn't a directory on disk) is a subtlety not normally needed by users of this code.

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 24, 2019

Member

Fine with that.


def trie_find(self, dir_path: str) -> SourceRoot:
return self._trie.find(dir_path)

def all_roots(self):

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Do we kill this method now?

This comment has been minimized.

Copy link
@gshuflin

gshuflin Aug 23, 2019

Author Contributor

No it's still being used in one place (cf. 29713c0 ) and I don't want to try to get rid of it there in this commit set.

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Fair enough.

def traverse(self) -> Set[str]:
return self._trie.traverse()

def trie_find(self, dir_path: str) -> SourceRoot:

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

The fact that we use a trie is an implementation detail, so this should have a different name. Ideally one that explains what this does and how it's different from find_by_path...

Also, let's add docstrings to all the new methods.

This comment has been minimized.

Copy link
@gshuflin

gshuflin Aug 23, 2019

Author Contributor

I think I actually duplicated find_by_path's functionality with trie_find, so I'll just remove this and use that instead.

@benjyw
benjyw approved these changes Aug 23, 2019
Copy link
Contributor

left a comment

Mod a clarification to that comment, grammar nit, and possibly renaming a class.

Nice!!

all_source_roots: Set[SourceRoot] = set()
# Get(Snapshot, PathGlobs) picks up subdirectories of the source root
# that we don't want in the final output, so use find_by_path to prune
# those and turn the ones we do want into SourceRoot's

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

s/SourceRoot's/SourceRoots.

(i.e., drop the apostrophe and end sentence with a period).

snapshot = yield Get(Snapshot, PathGlobs(include=tuple(all_paths)))

all_source_roots: Set[SourceRoot] = set()
# Get(Snapshot, PathGlobs) picks up subdirectories of the source root

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Oh, I see, so if we glob on **/src/*/ then it could pick up src/python/src/foo as a source root for foo? Is that the concern? Or does it match on every subdir? Because if the former then I think it would be better to rephrase as something like "The globs above can match on subdirs of source roots, e.g., src/py/foo/src/bar, so we use find_by_path to verify every candidate source root."

That is, it's important to explain precisely what the failure mode is here.

}
options.update(self.options['']) # We need inherited values for pants_workdir etc.

self.context(for_subsystems=[SourceRootConfig], options={

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Not offhand, this is how we do it in v1 tests, no?

pass


class AllExistingSourceRoots(Collection.of(SourceRoot)):

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Oh, hm, yeah, I guess the engine requires those to all be separate so it knows what to return.

Let's just name this AllSourceRoots though. The "existing" is redundant. The concept of a non-existing source root (i.e., a string that matches the pattern but isn't a directory on disk) is a subtlety not normally needed by users of this code.


def trie_find(self, dir_path: str) -> SourceRoot:
return self._trie.find(dir_path)

def all_roots(self):

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Fair enough.

@gshuflin gshuflin force-pushed the gshuflin:new-listroots-work branch from cffd75c to dbb6034 Aug 23, 2019

all_source_roots: Set[SourceRoot] = set()

# The globs above can match on subdirectories of the source roots.
# For instance, `src/*/` might mach 'src/rust/' as well as

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

s/mach/match/

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Otherwise, perfect comment! Very clear.

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 23, 2019

Contributor

Oh, but let's add a test for exactly this example, so anyone reading this comment can see it in action.

This comment has been minimized.

Copy link
@gshuflin

gshuflin Aug 23, 2019

Author Contributor

smacks forehead will fix typo and add tests

@benjyw
benjyw approved these changes Aug 23, 2019

@gshuflin gshuflin force-pushed the gshuflin:new-listroots-work branch from dbb6034 to 94280ff Aug 23, 2019

gshuflin added 11 commits Jul 10, 2019
Define v2 rule version of ListRoots
This commit creates a v2 rule that does the same thing as the currently
existing roots task. It is currently named 'rootsv2' so that it doesn't
conflict with the exisiting task, but will be renamed in the next commit
when that task is removed.
Replace v1 roots task with v2 rule
Remove the v1 roots task and rename the v2 rule from "rootsv2" to
"roots" so that it will take over the functionality of the old task.
Sort roots output by path
In the roots goal, sort the output by path before printing it to the
console for consistency's sake
Create all_roots @rule for use in roots goal
The SourceRoots.all_roots function shouldn't be used in a v2 engine
rule, since it uses python standard library functionality to talk to the
filesystem driectly. This commit adds a new @rule `all_roots` which does
the same thing, but in a v2-engine-compatible way.

The basic strategy here is to add a method on the `SourceRoots` class
called `traverse` which gets a list of every source root path string
(with *-globs) that `SourceRoots`. `all_roots` uses this method to
construct `PathGlobs`, then creates `Snapshot`s from these to get a
collection of the paths that actually exist on the file system. It then
tests these against the trie contained within the `SourceRoots` instance
to pare them down to the output we actually want. `list_roots` can then
use the output from this rulelinstead of invoking the
`SourceRoots.all_roots`.

We can't yet get rid of `SourceRoots.all_roots` because go_buildgen.py
is still using it, but once it's replaced there it should be possible to
delete that code entirely.
Add additional tests for roots goal behavior
Currently we detect languages in source roots even if those languages
are not languages pants knows about, and we should have a unit test for
this behavior.
Simplify Get request in all_roots
Instead of requesting a list of PathGlobs request a single PathGlobs
with many includes.

Also rename the variable dir -> directory to avoid shadowing a python
builtin.
Use custom type in all_roots @rule
Instead of assigning a name to Collection.of(SourceRoot), create a
"newtype" AllExistingSourceRoots that wraps it.
Simplify list_roots inputs
We can just ask for AllExistingSourceRoots directly rather than asking
for SourceRootConfig and immediately using it to get
AllExistingSourceRoots.
Code review responses for SourceRoots
- trie_find() does basically the same thing as find_by_path()
so remove the former and use the latter
- add some type annotations
- add a comment clarifying the use of find_by_path in list_roots.py
- add a test that subdirectories of source roots captured by PathGlobs
  don't  show up in the final output
- call all_paths.add instead of creating a set object and immediately
merging it
-rename AllExistingSourceRoots to AllSourceRoots

@stuhood stuhood merged commit 01a384f into pantsbuild:master Aug 24, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@gshuflin gshuflin deleted the gshuflin:new-listroots-work branch Aug 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.