Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pylint run on transitive deps #13918

Merged
merged 1 commit into from
Jan 14, 2022

Conversation

thejcannon
Copy link
Member

@thejcannon thejcannon commented Dec 18, 2021

Pylint works when you just have direct dependencies, but it will underreport issues. That's not ideal.

That behavior is also confusing because it means ./pants lint :: will behave differently than ./pants lint f.py as :: will have all transitive deps in the same batchd run.

The fix is to always use transitive deps. Unfortunately, this does mean that Pylint will be a bit slower.

@thejcannon
Copy link
Member Author

thejcannon commented Dec 18, 2021

Unfortunately this change has some negative side-effects as we'll be making lints with pylint pull in far more dependencies 😭

Unless you're linting with mypy which already requires transitive deps. In that case I'd expect the same PEX of deps to be re-used between both tools.

But also, this means people might see lint errors where there was none previously. 😬

@stuhood
Copy link
Sponsor Member

stuhood commented Jan 3, 2022

@thejcannon : Is it the case that only some lints require the transitive deps? And if so, is it worth putting use of transitive dependencies behind a flag?

But additionally, the "re-export of an import" case is an interesting one: we actually added explicit handling of "exports" for Java compilation (see #13603) to avoid the need for transitive deps, and to instead have inference detect that files are re-exporting something. Unfortunately, it seems like doing the same thing here would require some pretty serious type analysis.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Is it the case that only some lints require the transitive deps?

I've been lagging on reviewing for this reason - I'm confused how both Toolchain and StackStorm have been using Pylint without ever hitting this issue (cc @cognifloyd). They're large repos.

Your test makes clear that you do sometimes need transitive deps, but it seems like not always.

is it worth putting use of transitive dependencies behind a flag?

This could make sense...this PR has a performance downside and also means you need to use --changed-since-dependees=transitive rather than --changed-since-depndees=direct, which results in much more invalidation.

Unfortunately, it seems like doing the same thing here would require some pretty serious type analysis.

Yeah, probably not feasible for Python imo. Good context though.

@thejcannon
Copy link
Member Author

I would assume the need for transitive deps is woven throughout most of the various error checks. Technically speaking, pylint uses astroid for AST inference, so it could indirectly be getting transitive information by asking astroid to infer something. Pylint does none of the work itself. You could try and tease out which error checks ask for inference, but that's getting needlessly complicated.

The fact that the test in question is a re-export is a red herring per-se. I believe it would also fail if say: transitive_dep exposed a class inheriting from NotImplemented, direct_dep exposed a class that inherited from transitive_dep's class, and then the linted file tried to raise an instance of that class. That's not a re-export and exhibits the behavior.


Now how did no one catch this? Pylint (via astroid) by-nature would rather have a false-negative if it comes across something it can't follow. Python's import system is way to configurable for astroid to be confident a module doesn't exist (as opposed to say, mypy which will error). So you'd never see an "issue" unless you purposefully saw yourself getting a watered-down pylint run. ... I suppose there could be some difference between pylint running in PR if you linted the world but ran --changed-since locally. But the chance of that manifesting in an obvious way is likely slim.

@thejcannon
Copy link
Member Author

On the side, I think --changed-since-dependees needs an "implicit" value, since I want to run pylint/mypy on transitive deps, but flake8/black/etc... only on the files that changed.

@cognifloyd
Copy link
Member

Pylint and astroid caused me serious grief in the StackStorm repo. After many hours (days) of debugging, I traced the issue to a custom pylint extension that used __import__() to extract a dict from the code being linted. Importing clearly requires transitive deps so that caused tons of weird issues.

I then refactored the pylint plugin to extract the dict from the astroid AST. That turned out to be involved because I had to resolve several cases where copy.deepcopy() we're used and then the dict was modified. (The dict in question defined the dynamic attributes on some model classes).

So, now I don't need transitive deps with pylint because I fixed that plugin. Are you using any pylint plugins?

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :)

Copy link
Sponsor Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me, but I will defer to Eric for the final word.

@Eric-Arellano
Copy link
Contributor

Apologies for my delay again! To recap our DM message, it sounds like Pylint does not error if you only have direct dependencies. Which is why Toolchain and others have been able to use Pylint just fine for so long. @cognifloyd points out that certain plugins also error if you don't have transitive deps, but I don't think that's what this PR was motivated by.

But, Pylint does underreport issues if transitive deps are missing. Not good.

Merging this PR as-is is going to be disruptive for folks because it may result in multiple lint failures. (@asherf is testing this out on the Toolchain repository to get a sense of how disruptive it is there - I had issues with my M1.) I believe this violates our deprecation policy https://www.pantsbuild.org/docs/deprecation-policy, and either way it may hold back people from upgrading to Pants 2.10 which isn't ideal.

So, Joshua and I discussed adding an option to [pylint] that lets you toggle between direct deps vs transitive deps. It will default to direct deps to avoid breaking anyone. But we deprecate not setting the default. Then in 2.11 we change the default to transitive. Maybe we deprecate the option entirely in 2.11, unclear.

But we deprecate not setting the default.

You can do that by checking if pylint.options.is_default("my_option") and then use pants.base.deprecated.warn_or_error() if so.

@asherf

This comment has been minimized.

@Eric-Arellano

This comment has been minimized.

@asherf

This comment has been minimized.

@asherf

This comment has been minimized.

@Eric-Arellano

This comment has been minimized.

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jan 14, 2022

If I understand the issue correctly, we currently have a bug that causes lint errors to be underreported? I'm not sure fixing that falls under the deprecation policy, even if disruptive...

@thejcannon
Copy link
Member Author

thejcannon commented Jan 14, 2022

This might be somewhat off-topic, but (naming faux pas aside) should pylint be moved to check?
This line in the docs makes me think maybe:

Because most linters do not care about a target's dependencies, we lint all changed targets, but not any dependees of those changed targets.


Even more off-topic (but still relevant). pylint came before mypy, and did the best job it could at finding developer mistakes. Part of it's way of doing this is to follow deps so it could "understand" types decently. Now that mypy exists, and has muuuuuuuch better support for finding a large swath of these mistakes (which is primarily powered by type annotations, which pylint never had the opportunity to really make ubiquitous), the need for pylint has really diminished in the world of mypy. At the very least several codes are just worse-supported versions of mypy ones.

At some point I was hoping to correlate pylint codes to mypy for the sake of not double-reporting in our repo. Perhaps Pants could give a recommendation to turn pylint codes off if mypy is running with the associated codes. Additionally a toggle could be provided so if you're sure you're not using a transitive (or even direct) dep-requiring code, you can make pylint only just run on your file.

Of course, I wish pylint could evolve to provide this info, then Pants could consume it 😞 . Or a newer more-modern linter enters the ecosystem (but stays out of the way of mypy)

@benjyw
Copy link
Sponsor Contributor

benjyw commented Jan 14, 2022

This might be somewhat off-topic, but (naming faux pas aside) should pylint be moved to check? This line in the docs makes me think maybe:

Because most linters do not care about a target's dependencies, we lint all changed targets, but not any dependees of those changed targets.

This is an interesting proposal. The semantic line between linting and checking is blurry. But we necessarily have to sharpen it when we decide what goes in each goal. So, yeah, definitely possible that "check" is for things that find bugs and "lint" is for style?

@thejcannon
Copy link
Member Author

According to Wikipedia "Lint" is for "programming errors, bugs, stylistic errors and suspicious constructs" so there's no winning this war 😂

I guess taking a step back, what the difference between lint and check? If it is to address the "run on dependees" issue, I just filed #14170 to slice that a different way. I don't think that's it though, because the help docs say (for check)

Run type checking or the lightest variant of compilation available for a language.

And pylint isn't that ;)

So maybe stick to the current art (which has a nice clean delineation) and we tackle the dependees behavior separately

@stuhood
Copy link
Sponsor Member

stuhood commented Jan 14, 2022

If it is to address the "run on dependees" issue, I just filed #14170 to slice that a different way.

Yes, at least partially. Some of this is about cost: if you're operating on transitive dependencies, you're going to be a much more expensive check in general. It also makes it easier (...or, possible, at least) to decide what to do locally: i.e. ./pants --changed-dependees=transitive check vs ./pants fmt lint.

And we can certainly tweak the definition/help of check as we gain a better understanding of what we'd like it to do.

@Eric-Arellano
Copy link
Contributor

So what do we think for the deprecation? Should we go add an option to toggle behavior, or call this a "bug fix" and merge as-is? cc @stuhood

@thejcannon, what do you think as the author of this change?

@thejcannon
Copy link
Member Author

I could see it either way. I'm more "move-fast-and-break-things". Also if there were hidden bugs I'd want to know sooner rather than later. But then again I'm not the one "representing" and supporting Pants per-se.

If I'm being honest, I suspect the likelihood of this breaking anyone low, as I expect most people's CI to run on all files, which would then likely catch the hidden bugs. Not impossible, but low.

But an option is also perfectly fine and possibly expected.

@Eric-Arellano
Copy link
Contributor

If I'm being honest, I suspect the likelihood of this breaking anyone low, as I expect most people's CI to run on all files, which would then likely catch the hidden bugs. Not impossible, but low.

Oh!!! You know what, you're totally right there. I was forgetting that we batch all of the run together, meaning transitive deps are present most the time. Great! Okay, let's land this :)

@Eric-Arellano Eric-Arellano merged commit 3b62728 into pantsbuild:main Jan 14, 2022
@Eric-Arellano
Copy link
Contributor

(FYI I updated the PR description to better explain PYlint's expectations)

@stuhood
Copy link
Sponsor Member

stuhood commented Jan 14, 2022

IMO, following up to move pylint to check would probably be a good idea, independent of #14170, since the finest granularity something like #14170 is likely to have is still at the goal level.

@cognifloyd
Copy link
Member

👍 move pylint to check. lint should be much lighter weight things. pylint is not lightweight.

@Eric-Arellano
Copy link
Contributor

cc @asherf re thoughts on moving Pylint to check? You've pointed out that Pylint is by far the slowest thing to run in Toolchain's repo with ./pants lint.

Eric-Arellano added a commit that referenced this pull request Jan 28, 2022
Same as #13918, but for third-party deps. As with first-party code, Pylint doesn't complain if transitive deps are missing, but it is less exhaustive than normal. 

[ci skip-build-wheels]
[ci skip-rust]
Eric-Arellano added a commit that referenced this pull request Jan 28, 2022
Pylint was our only user of this functionality, and @thejcannon found in #13918 that it wasn't even correct to use. It indeed seems very unusual for Python to want just direct deps and not transitive.

If we need this back again, we have Git.

[ci skip-rust]
[ci skip-build-wheels]
@thejcannon thejcannon deleted the pylint-transitive branch August 17, 2022 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants