Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not lint ignored file on stdin #7220

Merged
merged 5 commits into from
Sep 3, 2022

Conversation

christoph-blessing
Copy link
Contributor

Type of Changes

Type
🐛 Bug fix

Description

This PR adds a check to the see whether the filename passed with the --from-stdin option is ignored. The input will not be linted if it is. A corresponding test is also included.

Closes #4354

Previously pylint would lint a file passed on stdin even if the user
meant to ignore the file. This commit fixes that issue.
@christoph-blessing christoph-blessing changed the title Issue 4354 Do not lint ignored file on stdin Jul 22, 2022
@Pierre-Sassoulas Pierre-Sassoulas added Bug 🪲 Command line Related to command line interface labels Jul 22, 2022
Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the test is simple and efficient !

I think we also need to cleanup the "old faulty way of doing it" if filtering just before check file make more sense. In particular for multiprocessing we probably want to do the filtering before forking (?)

@@ -641,6 +641,13 @@ def check(self, files_or_modules: Sequence[str] | str) -> None:

filepath = files_or_modules[0]
with fix_import_path(files_or_modules):
if _is_ignored_file(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also remove the previous place where we where ignoring some file ? If this is the right place then we don't need to check in the other places it was checked ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure I understand. The check was not present previously in this if statement. Do you mean that we should put the check before the second if statement and remove it from all the subsequent if statements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already filtering files somewhere when they are not from stdin, I think there's a kind of refactor to do there so we filter files at the right place and so it's efficient (done only once per run, possibly when using multiprocessing). I would start by checking where _is_ignored_file is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function is used in two places:

It seems like the expand_modules function is the place where the filtering of ignored files is meant to take place and the use of _is_ignored_file while discovering is more of a "hack". Unfortunately expand_modules does not get called when linting from standard input. Furthermore expand_modules seems to do two things:

  1. Expand modules
  2. Filter ignored files

It might be a good idea to entangle these two concerns. In any case I don't see a way to only filter once. Before expand_modules you want to filter once to avoid unnecessarily processing any ignored files but you can not filter everything at this stage because you do not have filepaths for the modules yet. Therefore you want to filter again after you have those.

So I am not sure how to proceed from here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking this.

It might be a good idea to entangle these two concerns. In any case I don't see a way to only filter once. Before expand_modules you want to filter once to avoid unnecessarily processing any ignored files but you can not filter everything at this stage because you do not have filepaths for the modules yet. Therefore you want to filter again after you have those.

Hmm, it sounds like a bigger refactor than I thought. If we're going to extract the file discovering from PyLinter we could expand modules, reading from stdin and files discovery at the same time, right ? Removing concerns from the PyLinter would be a good thing especially for more efficient multiprocessing (easier to map files to threads), but I don't have a clear vision about the issue we'll face when doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it should be possible to do the discovering, filtering and expanding beforehand and then passing the result to Pylinter.check but as you said that is going to be a larger refactor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to give it a try ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to give it a try. That said I have no idea how long it will take because it seems like a quite big refactor and I do not have a lot of time to work on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to hear, don't worry about deadline there is none :) ! You can open a draft pull request to discuss the refactor with maintainers, early feedback would probably help you.

@coveralls
Copy link

coveralls commented Jul 22, 2022

Pull Request Test Coverage Report for Build 2977710138

  • 5 of 5 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.0003%) to 95.319%

Totals Coverage Status
Change from base Build 2977378583: 0.0003%
Covered Lines: 16942
Relevant Lines: 17774

💛 - Coveralls

@Pierre-Sassoulas
Copy link
Member

Do not mind the pypy failure, main is broken.

@github-actions

This comment has been minimized.

@DanielNoord
Copy link
Collaborator

@Pierre-Sassoulas @cblessing24 Would we be okay with merging this without the refactor? This is breaking the pylint plugin for VS Code, see microsoft/vscode-pylint#169

I think it make sense to offer "first-class" support for the extension as a large part of our user base is likely using VS Code. We could potentially get this in 2.15.1.

@Pierre-Sassoulas
Copy link
Member

Right, but this is technical debt and if this kind of treatment isn't DRY we're going to have a bad time in the long term. We already have a bad time now, this issue is clearly a problem of not treating all sources the same way. Often new contributors come, fix an issue they want to fix but are not expert in the codebase so they do not know that there's two places to modify, tests only the one they changed but not the other and we get inconsistencies (new issues are opened, new contributor come, etc.). I think this is a vicious circle.

I'm not saying we should not create this technical debt but we should make sure we think about it.

@DanielNoord
Copy link
Collaborator

@cblessing24 I'm quite familiar with this code. Would you be okay with me taking a stab at this refactor?

@christoph-blessing
Copy link
Contributor Author

Sure, I am totally fine with that.

@DanielNoord
Copy link
Collaborator

My proposed refactor is actually quite small.

See: https://github.com/PyCQA/pylint/pull/6528/files

In that PR we already extracted the ignore checking into a separate function out of _expand_modules so that we could re-use it.
However, I don't think there is a good single place to call this. In _expand_modules we need to call it halfway through the discovery because we need to determine whether we should continue in certain directories. We do this while iterating so there is not really a good way to extract that from the full process.
For from-stdin we don't want to "expand_modules" as we create a FileItem based on an import that astroids import system gives us. By entering into expand_modules we would incur a lot of side effects that can only cause further issue with the only benefit of one less call to _is_ignored_file.
The third and final call is done for the --recursive option which has a similar issue as expand_modules. It is called while iterating so we can't really extract more than maths already did by creating a separate function for it.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2022

🤖 According to the primer, this change has no effect on the checked open source code. 🤖🎉

This comment was generated for commit 9da9b5d

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Pierre-Sassoulas Pierre-Sassoulas added the Needs backport Needs to be cherry-picked on the current patch version by a pylint's maintainer label Sep 3, 2022
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.15.1 milestone Sep 3, 2022
@Pierre-Sassoulas Pierre-Sassoulas merged commit cd7761d into pylint-dev:main Sep 3, 2022
@christoph-blessing christoph-blessing deleted the issue-4354 branch September 5, 2022 11:12
@Pierre-Sassoulas Pierre-Sassoulas added Backported and removed Needs backport Needs to be cherry-picked on the current patch version by a pylint's maintainer labels Sep 6, 2022
Pierre-Sassoulas pushed a commit to Pierre-Sassoulas/pylint that referenced this pull request Sep 6, 2022
Previously pylint would lint a file passed on stdin even if the user
meant to ignore the file. This commit fixes that issue.

Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com>
Pierre-Sassoulas pushed a commit that referenced this pull request Sep 6, 2022
Previously pylint would lint a file passed on stdin even if the user
meant to ignore the file. This commit fixes that issue.

Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backported Bug 🪲 Command line Related to command line interface
Projects
Faster pylint
Awaiting triage
Development

Successfully merging this pull request may close these issues.

Files not ignored when linting from standard input
4 participants