Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebased Doublethefish's bug/3314 duplicate code error only shows up with pylint jobs 1 #4007

Conversation

Pierre-Sassoulas
Copy link
Member

This is #3458 rebased because it's mergeable but stuck as the original one is not directly editable by maintainers. Had to fix some conflict and move the changelog from 2.5 to 2.6.1.

Before adding a new mixin this proves the concept works, adding tests as
examples of how this would work in the main linter.

The idea here is that, because `check_parallel()` uses a multiprocess
`map` function, that the natural follow on is to use a 'reduce`
paradigm. This should demonstrate that.
@coveralls
Copy link

coveralls commented Jan 1, 2021

Coverage Status

Coverage increased (+0.02%) to 90.681% when pulling 6653dd7 on Pierre-Sassoulas:doublethefish-bug/3314_duplicate_code_error_only_shows_up_with_pylint_jobs_1 into bf2985a on PyCQA:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 90.673% when pulling 1704b99 on Pierre-Sassoulas:doublethefish-bug/3314_duplicate_code_error_only_shows_up_with_pylint_jobs_1 into bf2985a on PyCQA:master.

This integrate the map/reduce functionality into lint.check_process().

We previously had `map` being invoked, here we add `reduce` support.

We do this by collecting the map-data by worker and then passing it to a
reducer function on the Checker object, if available - determined by
whether they confirm to the `mapreduce_checker.MapReduceMixin` mixin
interface or nor.

This allows Checker objects to function across file-streams when using
multiprocessing/-j2+. For example SimilarChecker needs to be able to
compare data across all files.

The tests, that we also add here, check that a Checker instance returns
and reports expected data and errors, such as error-messages and stats -
at least in a exit-ok (0) situation.

On a personal note, as we are copying more data across process
boundaries, I suspect that the memory implications of this might cause
issues for large projects already running with -jN and duplicate code
detection on. That said, given that it takes a long time to perform
lints of large code bases that is an issue for the [near?] future and
likely to be part of the performance work. Either way but let's get it
working first and deal with memory and perforamnce considerations later
- I say this as there are many quick wins we can make here, e.g.
file-batching, hashing lines, data compression and so on.
@Pierre-Sassoulas Pierre-Sassoulas force-pushed the doublethefish-bug/3314_duplicate_code_error_only_shows_up_with_pylint_jobs_1 branch from 1704b99 to 6653dd7 Compare January 1, 2021 22:50
@Pierre-Sassoulas Pierre-Sassoulas merged commit b41e8d9 into pylint-dev:master Jan 2, 2021
@Pierre-Sassoulas Pierre-Sassoulas deleted the doublethefish-bug/3314_duplicate_code_error_only_shows_up_with_pylint_jobs_1 branch January 2, 2021 08:56
@Pierre-Sassoulas Pierre-Sassoulas linked an issue Jan 2, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants