Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add traverse method to SourceRootTrie #8176

Closed
wants to merge 2 commits into from

Conversation

@gshuflin
Copy link
Contributor

commented Aug 15, 2019

This will return a set of tuples of all valid paths (with filesystem
globbing included), and the languages each path applies to, that a
given SourceRootTrie instance describes. This will be useful for
implementing a v2 engine-compatible way of getting a list of all source
files, but is a standalone change in and of itself.

@benjyw benjyw requested review from benjyw and stuhood Aug 15, 2019

}, trie.traverse())

trie = make_trie()
trie.add_pattern('*')

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 15, 2019

Member

Just FYI: the pattern format roughly matches git's globs: https://git-scm.com/docs/gitignore#_pattern_format, so a single * matches exactly one path component, and a double star (**) matches zero or more path components.

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 15, 2019

Contributor

The sourceroot patterms are a very restricted form of globbing. They allow a single * to stand in for a known language name. So ** and other full-on globbing idioms aren't pertinent here.

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 15, 2019

Member

Sorry... this comment was intended to be relevant to #8045, which I assume will be consuming these patterns.

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 15, 2019

Contributor

Yes, we've talked about how that will work. Basically, fixed patterns will be matched as-is, and wildcard patterns will be prefixed by "**/" and have the single * replaced by every allowed language.

@@ -361,6 +362,24 @@ def _do_add_pattern(self, pattern, langs, category):
node.category = category
node.is_terminal = True

def traverse(self) -> Set[Tuple[str, Tuple[str, ...]]]:

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 15, 2019

Member

It's not clear that this method needs to be on the Trie: it feels like it could just be on the Config, which can generate all of these patterns directly?

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 15, 2019

Contributor

I think it's neater to have it on the trie, since it's the "front door" that the rest of the code uses to access source roots, whereas config is just one (albeit the main) source of source roots.

We do actually add source roots directly, not via config, in some cases:

self.source_roots.add_source_root(rel_target_base)

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 15, 2019

Member

We do actually add source roots directly, not via config, in some cases:

That will only have an effect on v1... synthetic targets aren't a thing in v2.

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 15, 2019

Contributor

True. Nonetheless it seems like the Trie is the right place for this, at least while that's what code uses to query source roots.

@benjyw
benjyw approved these changes Aug 15, 2019
Copy link
Contributor

left a comment

Let's wait for Stu to sign off on this though.

}, trie.traverse())

trie = make_trie()
trie.add_pattern('*')

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 15, 2019

Contributor

The sourceroot patterms are a very restricted form of globbing. They allow a single * to stand in for a known language name. So ** and other full-on globbing idioms aren't pertinent here.

@@ -361,6 +362,24 @@ def _do_add_pattern(self, pattern, langs, category):
node.category = category
node.is_terminal = True

def traverse(self) -> Set[Tuple[str, Tuple[str, ...]]]:

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 15, 2019

Contributor

I think it's neater to have it on the trie, since it's the "front door" that the rest of the code uses to access source roots, whereas config is just one (albeit the main) source of source roots.

We do actually add source roots directly, not via config, in some cases:

self.source_roots.add_source_root(rel_target_base)

@gshuflin gshuflin force-pushed the gshuflin:traverse_source_root branch from 4bf738e to 485a4ef Aug 15, 2019

@@ -18,6 +19,7 @@ class SourceRootCategories:


SourceRoot = namedtuple('_SourceRoot', ['path', 'langs', 'category'])
SourceRootWithLangDirnames = namedtuple('SourceRootWithLangDirnames', ['path', 'langs', 'category'])

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 15, 2019

Contributor

This name is possibly more confusing than useful. How about UncanonicalizedSourceRoot and add a comment explaining what that is.

@stuhood
Copy link
Member

left a comment

Thanks.

See above re putting this on the Trie vs on config. One of the design goals of v2 is that any mutable additions to the Trie should not be relevant anymore.

}, trie.traverse())

trie = make_trie()
trie.add_pattern('*')

This comment has been minimized.

Copy link
@stuhood

stuhood Aug 15, 2019

Member

Sorry... this comment was intended to be relevant to #8045, which I assume will be consuming these patterns.

@gshuflin gshuflin force-pushed the gshuflin:traverse_source_root branch from 485a4ef to 4f0d075 Aug 15, 2019

Add traverse method to SourceRootTrie
This will return a set of named tuples of all valid paths (with filesystem
globbing included), and the languages/categories each path applies to, that a
given SourceRootTrie instance describes. This will be useful for
implementing a v2 engine-compatible way of getting a list of all source
files, but is a standalone change in and of itself.

@gshuflin gshuflin force-pushed the gshuflin:traverse_source_root branch from 4f0d075 to 1049366 Aug 15, 2019

@benjyw

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

Thanks.

See above re putting this on the Trie vs on config. One of the design goals of v2 is that any mutable additions to the Trie should not be relevant anymore.

True.

@Eric-Arellano
Copy link
Contributor

left a comment

Cool!

My suggestions are a bit nitty - feel free to ignore. After looking at student code past week on Turtle graphics, fun to look at this!

@@ -19,6 +20,10 @@ class SourceRootCategories:

SourceRoot = namedtuple('_SourceRoot', ['path', 'langs', 'category'])

# Named tuple of path/langs/category, where the langs are deliberately not canonicalized in order to match up
# with the actual directory names on disk
UncanonicalizedSourceRoot = namedtuple('UncanonicalizedSourceRoot', ['path', 'langs', 'category'])

This comment has been minimized.

Copy link
@Eric-Arellano
@@ -361,6 +366,25 @@ def _do_add_pattern(self, pattern, langs, category):
node.category = category
node.is_terminal = True

def traverse(self) -> Set[UncanonicalizedSourceRoot]:

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Yay type hints!

@@ -361,6 +366,25 @@ def _do_add_pattern(self, pattern, langs, category):
node.category = category
node.is_terminal = True

def traverse(self) -> Set[UncanonicalizedSourceRoot]:
uncanonicalized_source_roots = set()

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Usually you'll want to type empty collection initializations like this. Otherwise MyPy can only infer Set[Any].

Suggested change
uncanonicalized_source_roots = set()
uncanonicalized_source_roots: Set[UncanonicalizedSourceRoot] = set()
lang_canonicalizations = self._source_root_factory._lang_canonicalizations
all_lang_names = tuple(lang_canonicalizations.keys())

def traverse_helper(node, path_components):

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Type hints encouraged for inner functions. Not required and no one will block if you don't have it, but if you already know the types would be great to add :)

effective_path = '/'.join([*path_components, name])
effective_lang_names = child.langs if len(child.langs) != 0 else all_lang_names
category = child.category
root = UncanonicalizedSourceRoot(effective_path, effective_lang_names, category)

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Nit. kwargs may make this even more readable:

Suggested change
root = UncanonicalizedSourceRoot(effective_path, effective_lang_names, category)
root = UncanonicalizedSourceRoot(path=effective_path, langs=effective_lang_names, category=category)

You could also then inline the line above to have category=child.category for less variables to keep track of.

category = child.category
root = UncanonicalizedSourceRoot(effective_path, effective_lang_names, category)
uncanonicalized_source_roots.add(root)
traverse_helper(child, [*path_components, name])

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Nit. Kwargs would help here too imo:

Suggested change
traverse_helper(child, [*path_components, name])
traverse_helper(node=child, path_components=[*path_components, name])
@@ -91,6 +91,51 @@ def root(path, langs):
self.assertEqual(root('src/go/src', ('go',)),
trie.find('src/go/src/foo/bar/baz.go'))

def test_source_root_trie_traverse(self):

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Return type is -> None. The benefit of doing that is that MyPy will then know to check this code + it's easier to scan what the function does.

This comment has been minimized.

Copy link
@gshuflin

gshuflin Aug 16, 2019

Author Contributor

Should we be putting -> None type hints in test functions that have no meaningful return values?

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Yes, encouraged to do this so that MyPy knows to type check the code and to catch the unlikely but possible bug where we try returning from a unit test.

As close as we can get to 100%, the better. Makes it simpler to always aim for 100% and have less special cases.

@@ -20,6 +21,14 @@ class SourceRootCategories:
SourceRoot = namedtuple('_SourceRoot', ['path', 'langs', 'category'])


class UncanonicalizedSourceRoot(NamedTuple):

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 16, 2019

Contributor

I mean, at this point we may as well make it a @dataclass.

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

If we could do that in 3.6 (which we can via backport), I’d agree. They are very similar and I usually default to dataclass.

This comment has been minimized.

Copy link
@benjyw

benjyw Aug 16, 2019

Contributor

Oh, are we not 3.7-ing yet? Never mind. We'll wait.

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

I’d love to drop 3.6 so that we could use dataclasses and from __future__ import annotations. Twitter just switched to using 3.7, but I think we have several users like Foursquare still on 3.6 and in general we’re planning on supporting the two (three?) most recent Python versions.

@Eric-Arellano
Copy link
Contributor

left a comment

Thanks!

lang_canonicalizations = self._source_root_factory._lang_canonicalizations
all_lang_names = tuple(lang_canonicalizations.keys())

def traverse_helper(node: SourceRootTrie.Node, path_components: Sequence[str]):

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Aug 16, 2019

Contributor

Good use of Sequence rather than List! That’s great to use the generic duck type when typing parameters.

@gshuflin gshuflin closed this Aug 22, 2019

@gshuflin gshuflin referenced this pull request Aug 22, 2019
stuhood added a commit that referenced this pull request Aug 24, 2019
Porting roots goal to v2 (#8199)
Porting the `roots` goal from the v1 to v2 engine. The feedback from #8045 and #8176 was substantial enough that I closed those and made a new pull request with much of the same code mentioned there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.