New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New: Lint files in parallel #11

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
6 participants
@ilyavolodin
Copy link
Member

ilyavolodin commented Feb 4, 2019

Summary

This change would enable a new option to lint files in parallel. This should improve performance of ESLint on large repositories.

Related Issues

eslint/eslint#3565

@ljharb
Copy link

ljharb left a comment

This sounds awesome - one catch tho; eslint-plugin-import relies on building up a "dependency map", and caching it across various rules.

With this mode, how could that data/processing be shared/reused across parallel rules?

@ilyavolodin

This comment has been minimized.

Copy link
Member Author

ilyavolodin commented Feb 4, 2019

I don't really know how does eslint-plugin-import shares information across various rules. ESLint was specifically designed to not allow that, so what that plugin is doing goes directly against the original design. I would have to take a look at what that plugin does to suggest work-around (if it at all possible).

@ljharb

This comment has been minimized.

Copy link

ljharb commented Feb 4, 2019

The main thing it needs is a dependency graph - a map of imports/requires and exports/module.export values.

@ilyavolodin

This comment has been minimized.

Copy link
Member Author

ilyavolodin commented Feb 5, 2019

Dependency graph isn't a problem. Problem is sharing it between different rules and different instances of the same rule. ESLint was never designed to do that (more then that, we specifically said that rules cannot know about any other rules).

@ljharb

This comment has been minimized.

Copy link

ljharb commented Feb 5, 2019

Noted, but it still allows it at the moment - if there was a way to perhaps explicitly inject a “setup” function (that for the import plugin would populate the graph) that would both work well with this RFC and also would work better with the current setup.

@not-an-aardvark

This comment has been minimized.

Copy link
Member

not-an-aardvark commented Feb 5, 2019

I think a setup function might be a good solution (of course, more details would need to be hashed out). There has been similar discussion about needing to access the entire list of files ESLint is running on in order to run the Typescript compiler only once rather than multiple times.

@ilyavolodin

This comment has been minimized.

Copy link
Member Author

ilyavolodin commented Feb 5, 2019

My original though behind settings was that it could be a mechanize to share information between rules. But it's read-only, so it can only be set manually in the config file, not populated automatically. Maybe we can extend settings to provide a communication medium for shared information?

@not-an-aardvark

This comment has been minimized.

Copy link
Member

not-an-aardvark commented Feb 5, 2019

It's already not very difficult for rules to communicate if they really want to (they can share an object externally even if ESLint doesn't provide that object).

The bigger problem is that a rule has no way to determine which files are being linted; it only knows the current file that it's running on. The original design of rules assumed that rules wouldn't need to know this because a rule would only check things within a single file, but rules in plugins like eslint-plugin-import can provide useful functionality by depending on information outside of the current file.

EDIT: To clarify, this refers to the current state of affairs without parallelism. Communication between rules would indeed be a problem once we add parallelism, although I don't think enhancing settings would be the best way to resolve the issue (see #11 (comment)).

@ilyavolodin

This comment has been minimized.

Copy link
Member Author

ilyavolodin commented Feb 5, 2019

I'm fine with adding a list of all linting files to context, but I think that should be done outside of this proposal. We will just need to make sure that those two proposals work together correctly.

@not-an-aardvark

This comment has been minimized.

Copy link
Member

not-an-aardvark commented Feb 5, 2019

As it relates to this proposal, the problem is more broadly that:

  • Some rules want to perform an expensive operation once across the entire codebase, and have the result be used for multiple different files. (For example, a rule might want to run the TypeScript compiler on an entire codebase once, and use the results when linting each file.)
  • This is currently accomplished by performing the expensive operation when linting the first file, and reading the results from a cache when linting other files.
  • This will be harder to do correctly when files are linted in parallel, because files won't be linted in a total order (i.e. there is no "first file" that gets linted). It will also be more difficult to establish a shared cache between multiple invocations of the same rule. (It would still be possible to accomplish this using global locks, but it seems like this would be messy.)

In other words, having a list of all files would be nice, but it's not directly related to parallelism. The main concern introduced by parallelism is the lack of shared memory between rule invocations, because this makes it more difficult for rules to use a cache for expensive operations. (My comment at #11 (comment) might have been misleading, I've edited it to clarify.) The advantage of a setup/prelint function is that it would run once before running rules on any files, which would eliminate the need for rules to implement locks/parallel caching.

@JamesHenry

This comment has been minimized.

Copy link
Member

JamesHenry commented Feb 5, 2019

Thank you for the advocacy for that use case @not-an-aardvark! Excited about this proposal

@ilyavolodin

This comment has been minimized.

Copy link
Member Author

ilyavolodin commented Feb 6, 2019

Fair enough. I think that could be solved by providing a setup function that would run before linting occurs (on the main thread) and would allow for modification of context that's passed to the rules. That would still prevent rules from talking to each other, though (which I think is fine). Question is, how does plugin author gets access to that setup function? Another config property?

@not-an-aardvark

This comment has been minimized.

Copy link
Member

not-an-aardvark commented Feb 6, 2019

By the way, have you seen #4? It's not necessarily a problem to have two RFCs open about the same topic, although it's not clear how the proposal in this RFC differs from that one.

@ilyavolodin

This comment has been minimized.

Copy link
Member Author

ilyavolodin commented Feb 6, 2019

Oh... I've seen it, but didn't have time to fully read it and assumed that, based on the title, it was talking about making ESLint async, not parallel. I'm fine with incorporating this PR into #4

@nzakas

This comment has been minimized.

Copy link
Member

nzakas commented Feb 6, 2019

@ljharb

This comment has been minimized.

Copy link

ljharb commented Feb 6, 2019

@nzakas definitely it makes sense to make it a different RFC :-) however, it might be seen as a blocker for this one, so that parallel mode doesn't break eslint-plugin-import.

@kaicataldo

This comment has been minimized.

Copy link
Member

kaicataldo commented Feb 6, 2019

Question is, how does plugin author gets access to that setup function? Another config property?

If we did this, I think we would have to make it clear that this setup function can not be used to modify any source files, since I'm not sure how we could guarantee the order in which individual plugins run their respective function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment