Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A @console_rule for validating source files against regexes. #7515

Merged
merged 6 commits into from Apr 12, 2019

Conversation

Projects
None yet
4 participants
@benjyw
Copy link
Contributor

commented Apr 7, 2019

This can be used e.g., for checking header comments in a uniform way, instead of relying on per-language linters.

The validator configuration contains three elements: A set of path patterns, a set of content patterns, and a set of requirements mapping path patterns to content patterns. Meaning: If a file matches some path pattern, its content must match all the corresponding content patterns.

This change also includes an application of this rule to check for Python and JVM source file headers in the pants codebase, as a demonstration. Not all our source files pass it yet, but most do.

@benjyw benjyw force-pushed the benjyw:regex_lint2 branch 3 times, most recently from 95dcc42 to b712284 Apr 7, 2019

@benjyw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

Note that a big part of the impetus for this change was to get my feet wet on writing v2 rules...

@benjyw benjyw requested review from stuhood, jsirois and cosmicexplorer Apr 7, 2019

@benjyw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

Note that running ./pants --v2 --no-v1 validate :: yields

1550 files matched all required patterns.
188 files failed to match at least one required pattern.
@benjyw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

Possibly a future change may introduce set operations between path patterns, so you can, e.g., exclude certain paths.

@benjyw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 7, 2019

Note also that this change has its tests alongside the source, per #7489

Show resolved Hide resolved src/python/pants/backend/project_info/rules/source_file_validator.py Outdated
#
# Meaning: if a file matches some path pattern, its content must match all the corresponding
# content patterns.
register('--config', type=dict, fromfile=True,

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 8, 2019

Contributor

fromfile arguments don't invalidate the daemon; they're read as a side-effect of running some python code, which doesn't declare them as file dependencies.

I've just validated this experimentally; I ran ./pants --v2 --no-v1 --enable-pantsd=True validate testprojects/src/python/python_targets:test_library then edited the regex file so that it should pass, and it still failed. Then I ran ./pants --v2 --no-v1 --enable-pantsd=False validate testprojects/src/python/python_targets:test_library and it correctly passed.

(This is why I'm skeptical that we can just take existing Subsystems as-is and use them in v2... Lots of implicit side effects we can't easily capture...)

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 8, 2019

Member

Yea, this happens as part of Options parsing, but the very first portion of options parsing being integrated into v2 happened in December in #6993. AFAIK, this is the only case of options parsing using external files, but not having it sandboxed remains error prone.

This comment has been minimized.

Copy link
@benjyw

benjyw Apr 8, 2019

Author Contributor

That's important to know. But we'll have to fix this on the options parsing side, presumably, or do away with fromfile altogether. This isn't an issue specific to this change.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 8, 2019

Contributor

I mean, we hydrate targets through a v2 ruleset generating HydratedTarget, would it make sense to have a ruleset producing a HydratedSubsystem which e.g. extracts strings for fromfile options, then undergoes normal options parsing?

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 8, 2019

Member

It's a solvable (albeit large) problem, yea. fromfile is one case, but I just remembered that plugin resolution likely happens here as well.

As @benjyw said, not specific to this issue. I'll open a ticket about it.

Show resolved Hide resolved src/python/pants/backend/project_info/rules/source_file_validator.py Outdated
pass


class SourceFileValidation(Subsystem):

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 8, 2019

Contributor

Open style question (definitely not blocking this PR):
It's unclear to me whether the preferred style is "write a subsystem that has all the logic it can" or "write an @rule per language which share some code (which may live in a system)". For regexes, I don't think it makes much of a difference, but if the objective is to replace the lint goal, some things (e.g. scalafix) may want to do more complex things like drive compilation, and doing that in an extensible way feels like it would be easier in a set of per-language @rules than a centralised Subsystem.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 8, 2019

Contributor

I think having it as a continuum makes some sense to me, depending on the degree of logic sharing. Only rules can do things like yield to the engine, so if that's needed it would tend towards making a v2 ruleset, but if rules are more just windows into existing subsystem logic (which they would be at first for existing code), it would tend to have more in the subsystem. I think for things that would normally be helper methods on Subsystem instances, it's probably better to move those into @rules, and in general keep subsystems as a bag of options. Some existing mildly-complex functionality such as CompilerOptionSetsMixin could stay on subsystems too, though.

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 8, 2019

Member

Yea. I expect that (as with testing), we're going to want to factor linting as a @console_rule that consumes an @union LintResult (or something). Unlike with tests (which generally have one test runner), per-language linting will likely have an additional level of composition because one target type (ie PythonLibrary) might have many linters that need composition.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 8, 2019

Contributor

Here might be a good example of taking logic similar to that implemented in this subsystem, spreading it out over multiple v2 rules: #7419. The only reason that PR didn't also create an @console_rule was because I wanted to make sure it ran when any v1 gen task did, and there's currently no way to expose a v2 rule directly as a v1 task except by doing what I did there and wrapping it in a v1 task, which seems good for now (and might also be applicable here?).

.format(e))
except Exception:
logger.warning('Encountered unhandled exception during rule execution!')
logger.warning(traceback.format_exc())

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 9, 2019

Contributor

not really relevant note Was this changed to logger.warning() instead of .warn() because of a deprecation warning when using logger.warn()? Because I thought I saw that happen elsewhere, too, and it seems like the fact that logger.warn{,ing}() are used in except blocks means those might not always show up when running pants. I might do a s/logger\.warn/logger.warning/g on the repo in another PR.

This comment has been minimized.

Copy link
@benjyw

benjyw Apr 9, 2019

Author Contributor

Correct - warn() is deprecated.

@benjyw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

PTAL, thanks.

@stuhood
Copy link
Member

left a comment

Thanks @benjyw. One serious issue, and then should be landable.

Have you considered porting over the pre-commit.sh script to use this now?

files_content = yield Get(FilesContent,
Digest, hydrated_target.adaptor.sources.snapshot.directory_digest)
for file_content in files_content:
yield multi_matcher.check_source_file(file_content.path, file_content.content)

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 10, 2019

Member

Yipes. So, the effect of this line is that you will immediately break out of this loop and @rule with a final value. I'm not sure what the intended effect here was, but we should probably expand our linting for these cases.

This comment has been minimized.

Copy link
@benjyw

benjyw Apr 10, 2019

Author Contributor

Oops, yes, it was late and I was tired... Fixed.



@rule(RegexMatchResults, [HydratedTargets])
def match_regexes(hydrated_targets):

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 10, 2019

Member

You could avoid this intermediate @rule (and extra wrapper type) by inlining these Gets into def validate.

This comment has been minimized.

Copy link
@benjyw

benjyw Apr 10, 2019

Author Contributor

I've inlined it, but I can't get rid of the collection type, as the single-target rule needs to return it.

@@ -197,6 +197,7 @@ def run_console_rules(self, options_bootstrapper, goals, target_roots):
subjects = self._determine_subjects(target_roots)
console = Console()
# Console rule can only have one subject.
# TODO: What about console_rules with no subjects (i.e., no target specs)?

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 10, 2019

Member

IIRC, this is now always a single Specs object containing a (possibly empty) list of Spec objects. Could probably drop/explain the assertion.

This comment has been minimized.

Copy link
@benjyw

benjyw Apr 10, 2019

Author Contributor

All I know is, the assert fails when you run with no targets...

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 10, 2019

Member

Well, that's super annoying! Sorry about that.

I can make sure to address that in #6880.

@@ -69,6 +70,7 @@ def register_options(cls, register):
register('--detail-level', type=DetailLevel, default=DetailLevel.nonmatching,
help='How much detail to emit to the console.')

@memoized_method

This comment has been minimized.

Copy link
@illicitonion

illicitonion Apr 10, 2019

Contributor

Idle question (non-blocking): Should we be avoiding python-side static memoization and favouring rule-graph memoization? It feels like we'll probably end up leaking a bunch of memory if we're storing stuff on python classes, but also, encouraging this kind of thing to be @rules feels like it violates the spirit of "you mostly just write python".

The specific example of memoizing this in the rule graph would be:

@rule(MultiMatcher, [SourceFileValidation])
def get_multi_matcher(sfv):
  return MultiMatcher(sfv.get_options().config)

and then yield Get(MultiMatcher, SourceFileValidation, sfv) where you currently call get_multi_matcher()

This comment has been minimized.

Copy link
@benjyw

benjyw Apr 10, 2019

Author Contributor

Yeah, I don't have a strong opinion either way. Not leaking memory is probably a worthy goal.

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 10, 2019

Member

So, assuming that we fix #6555, I think that in many cases @memoized is totally fine... the primary correctness issue that could get involved would be when global or static variables are memoizing things. And in those cases, the global/static is the real issue.

@stuhood
Copy link
Member

left a comment

Thanks Benjy.

Have you considered porting over the pre-commit.sh script to use this now?

^ would still be good to do in order to start dogfooding this and delete some other scripts.

@benjyw

This comment has been minimized.

Copy link
Contributor Author

commented Apr 11, 2019

Thanks @benjyw. One serious issue, and then should be landable.

Have you considered porting over the pre-commit.sh script to use this now?

Yes, but not in this change...

@benjyw benjyw force-pushed the benjyw:regex_lint2 branch from c7605b8 to 0e0b7f7 Apr 11, 2019

@benjyw benjyw merged commit f555ff2 into pantsbuild:master Apr 12, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@benjyw benjyw deleted the benjyw:regex_lint2 branch Apr 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.