Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Lint files in parallel if many files exist #42

Closed
wants to merge 16 commits into from

Conversation

mysticatea
Copy link
Member

@mysticatea mysticatea commented Sep 30, 2019

Summary

This RFC adds the supports for parallel linting by worker_threads/child_process modules.

Related Issues


What are differences from RFC 4 and RFC 11?

  • This RFC does linting in parallel by default if files are many, but people can enforce to not use workers by --concurrency 1 option.
  • This RFC uses read-lint-report loops with non-blocking IO along with worker threads to use CPU resource efficiently.
  • This RFC uses read-lint-report loops with non-blocking IO even in the case that ESLint doesn't make worker threads.

@mysticatea mysticatea added the Initial Commenting This RFC is in the initial feedback stage label Sep 30, 2019
@ljharb
Copy link

ljharb commented Sep 30, 2019

If this happens implicitly, how will it affect cross-file linting and caching like eslint-plugin-import uses?

@mysticatea
Copy link
Member Author

mysticatea commented Sep 30, 2019

The cache will be created in each worker thread individually. It's not ideal because of an overhead, but it should work (each worker should lint over a hundred files with using the cache).

@mysticatea
Copy link
Member Author

And the worker threads are born and died in a executeOnFiles() call. The lifetime of the cache will be shorter.

@ljharb
Copy link

ljharb commented Sep 30, 2019

In the case of the import plugin on large codebases, we’re talking about like a 30s overhead minimum.

I do not think multithreading should ever be implicit or the default, at least not until long after there’s been a cross-thread “caching” stage that plugins can migrate to use so they can insulate themselves from parralelization.

@mysticatea
Copy link
Member Author

mysticatea commented Sep 30, 2019

Doesn't multithreading reduce time over 30s? In my environment, it has reduced over 10s with a little 697 files. I guess it reduces more time on a large codebase.

@ljharb
Copy link

ljharb commented Sep 30, 2019

I'm saying that the caching alone can easily take 30s. Parallelizing the linting (without an explicit pre-lint caching stage for plugins) does not speed up the overall run, it just makes that 30s happen more than once, which taxes machine resources, likely slowing down the overall run.

@mysticatea
Copy link
Member Author

If your environment is single-core, don't worry, the default concurrency is the number of cores at most. This means ESLint doesn't create workers in the environment.

@ljharb
Copy link

ljharb commented Sep 30, 2019

The number of cores isn’t the issue; the number of other things my computer is doing is the issue.

@mysticatea
Copy link
Member Author

So... do you mean that don't want ESLint to use CPU 100% by default to lint?

@mysticatea
Copy link
Member Author

Even if caching stage increases 30s, but if other stages reduce over 30s, then total time is reduced. It's reasonable. However, it needs more CPU and memory resource than single-thread. This is a trade-off.

This RFC is that ESLint uses more CPU resource if many files exist. Indeed, using 100% may be overkill.

@ljharb
Copy link

ljharb commented Sep 30, 2019

The tradeoff is fine. I’m asking to not force a different tradeoff by default until there’s been time for plugins to migrate to a semantic api that makes the tradeoff explicit and makes it possible to avoid duplicate work and race conditions when linting in parallel.

@mysticatea
Copy link
Member Author

I don't think duplicate work is a big problem because it's about efficiency. The result that it reduces time does not change. We can consider adding APIs for more efficiency later.

Are there race condition problems on plugins?

@ljharb
Copy link

ljharb commented Sep 30, 2019

There can always be race conditions when formerly synchronous things become suddenly async.

As for it reducing time, parallelization definitely will not always be faster (although for most cases it will be). However, if the change I’m suggesting happens first, then the switch to async by default can be made safe instead of just “use this command line flag to unbreak your lint run”

@ilyavolodin
Copy link
Member

@mysticatea Just as a background, the issue @ljharb is talking about is that eslint-plugin-import uses some trickery to build a list of all modules so that it can verify that you are only importing existing files. That is done once (I think on the first file linted), and then cached and provided to all of the rest of the files. This is not really intentionally supported functionality by ESLint, but plugin-import relies on it heavily.
If we start doing linting asynchronously without providing plugins with a mechanism of exchanging arbitrary messages between file linting processes, it will completely break plugin-import without a work-around.
When we discussed this issue on other RFCs related to parallel linting, idea was to provide plugins with a way to execute some initialization code (where plugin-import can resolve all modules in the project) and have a way for initializer to pass return value to all rules.

@mysticatea
Copy link
Member Author

mysticatea commented Oct 1, 2019

There can always be race conditions when formerly synchronous things become suddenly async.

But worker_threads cannot share objects with each other except SharedArrayBuffer. I don't see race conditions there since plugins cannot touch the message ports of related threads.

eslint-plugin-import uses some trickery to build a list of all modules so that it can verify that you are only importing existing files.

I know that eslint-plugin-import and @typescript-eslint/eslint-plugin parse the whole codebase and cache it. Especially, @typescript-eslint/eslint-plugin analyzes types and variable/expression types as well and it's slow. But I don't think that it's a problem. Every worker verifies many files and the cache makes it faster.

I measured TypeScript repo that is using both eslint-plugin-import and @typescript-eslint/eslint-plugin.

$ npm i github:eslint/eslint#very-rough-worker-threads-poc
...

$ eslint --rulesdir scripts/eslint/built/rules --ext .ts src --config src/.eslintrc.ci.json --concurrency 1
Number of files: 366
Lint files in the main thread.
Linting complete in: 23200ms

$ eslint --rulesdir scripts/eslint/built/rules --ext .ts src --config src/.eslintrc.ci.json
Number of files: 366
Lint files in 3 worker threads.
Linting complete in: 14810ms

Since the number of files is 366, the estimated proper concurrency was 3. On my PC, it made 35% faster (23200ms → 14810ms) without breakage.

@ljharb
Copy link

ljharb commented Oct 1, 2019

@mysticatea you're assuming that the only changes an eslint rule makes are a context.report, never writing a file to the filesystem or making network calls or anything else that can be done in JS.

@ljharb
Copy link

ljharb commented Oct 1, 2019

Also, sharding the files doesn't fix anything, since the entire codebase must be parsed for any eslint-plugin-import rules that use ExportMap.

@mysticatea
Copy link
Member Author

Ah, I got it. Indeed, writing the same file is race condition. Thank you!

@typescript-eslint also must parse entire codebase. But it was 35% faster after I applied the PoC of this RFC. (there were 3 worker threads)

@mysticatea
Copy link
Member Author

mysticatea commented Oct 1, 2019

Thank you for the feedback.

I have updated this RFC as separating to step 1 and step 2.

  • Step 1 is to add "linting in parallel" behind a flag. People can try it with --concurrency option with auto or 2 or larger integer. The default of --concurrency option is 1. We can do this in a minor release.
  • Step 2 is to change the default value of --concurrency option to auto. This change requires a major release.

Also, I updated the "Drawbacks" section and the "Backwards Compatibility Analysis" section.

@ljharb
Copy link

ljharb commented Oct 1, 2019

I would still strongly recommend Step 0.5 as, release a minor version of eslint that adds an explicit stage for plugins to run code that must be in a single thread, prior to all linting beginning - that way eslint-plugin-import can use that stage, and avoid these race conditions entirely.

@mysticatea
Copy link
Member Author

I think that it should be another RFC because (1) it's an enough rare case, (2) probably it will be large spec, and (3) honestly I have not understood what kind of API can help your case.

Copy link

@voxpelli voxpelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mysticatea that a step 0.5 as suggested by @ljharb would probably be better of as another RFC.

@mysticatea Have you considered whether a worker approach like this could maybe have some positive security impacts? Can one limit what these workers are allowed to do and that way maybe limit the possible damage a compromised eslint-plugin could do? I know it's not the purpose of this RFC, but it could be an interesting additional reason to push for an approach like this.

designs/2019-worker-threads/README.md Outdated Show resolved Hide resolved
designs/2019-worker-threads/README.md Outdated Show resolved Hide resolved
designs/2019-worker-threads/README.md Show resolved Hide resolved
designs/2019-worker-threads/README.md Outdated Show resolved Hide resolved

This change needs a major release.

ESLint gets using worker threads by default if target files are many.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the fact that in some cases the linting will happen in the main thread and in other in worker threads create a scenario where some plugins work well in just one of the two cases?

To mitigate that, would it maybe be preferable to always use workers in Step 2, even when just a single worker is required?

Copy link
Member Author

@mysticatea mysticatea Oct 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because worker_threads got stable recently, this RFC uses child_process if Node.js is older. Spawning child processes spent time (about 1 sec) on Windows. Anyway, ESLint has to find config files, parse it, and verify it in all of every worker and the main thread, and the cost is not small.

Therefore, I don't think good if ESLint always uses workers.

designs/2019-worker-threads/README.md Outdated Show resolved Hide resolved
Co-Authored-By: Pelle Wessman <pelle@kodfabrik.se>
@aladdin-add
Copy link
Member

aladdin-add commented Feb 7, 2021

can we split the RFC to 2:

  1. support parallel linting(behind an option), and document its potenial limitations(e.g. OOM as @bradzacher mentioned)
  2. address these issues to improve parallel linting

this is as the same strategy as caching - plugins like eslint-plugin-import has some caching issues, but it does not stop other users using cache.

@nzakas
Copy link
Member

nzakas commented Feb 9, 2021

address these issues

which issues are you referring to?

@aladdin-add
Copy link
Member

there are mainly 2 issues mentioned in the discussion:

  1. cross-file linting and caching in eslint-plugin-import: feat: Lint files in parallel if many files exist #42 (comment)
  2. OOM. feat: Lint files in parallel if many files exist #42 (comment)

maybe there are more, but none of these is a blocker - users can choose to not use the option, and it's the default. :)

@bradzacher
Copy link

bradzacher commented Feb 9, 2021

Ultimately it's up to you all as maintainers, but personally I think you need to at least have an early plan of attack for how you might solve these issues before moving on this RFC.

The reason I think this is because eslint-plugin-import users make up ~62%, and @typescript-eslint/parser users make up ~56% of ESLint's userbase (based on raw npm weekly download numbers) - so it's a non-trivial amount of ESLint's userbase which is impacted and unable to use this feature.

@ljharb
Copy link

ljharb commented Feb 9, 2021

Better to not even offer the option, than to inflict frustration on a significant percentage of eslint’s users when they attempt to use the option.

@aladdin-add
Copy link
Member

I'm 👍 to having the option - we cannot take care of all the cases, given eslint is how widely used in the community.

we can get feedback from early adopters, to help us improve the feature. maybe at some point, it can be the default option.

@ljharb
Copy link

ljharb commented Feb 9, 2021

All the cases need not be taken care of - but it seems prudent to first take care of the one case that we all already know about, that arguably constitutes the majority of eslint usage, to provide time for package authors to adapt to the new solution, and also time for users to update to it.

In other words, #42 (comment) :-)

@schmod
Copy link

schmod commented Feb 9, 2021

Also, esprint already exists, if anybody wants to lint files in parallel today (and accept any extra overhead that'll produce if you're using plugins like typescript-eslint and plugin-import).

My $0.02 is that it's probably best for parallel linting to exist in a fork or an alternative CLI, until it's ready for general use.

@ljharb
Copy link

ljharb commented Feb 9, 2021

@btmills
Copy link
Member

btmills commented Feb 11, 2021

TSC Summary: ESLint assumes that each rule and source file can be processed independently. typescript-eslint (ref #42 (comment)) and eslint-plugin-import (ref #42 (comment)) need to do upfront initialization work beyond the scope of a single rule and source file, specifically loading type information and tracing a module graph. Lacking a first-class API, they have inserted these initialization steps into the regular rule linting flow.

If we were to ship parallel linting without supporting this use case, the duplicated initialization could make parallel linting slower than single-threaded linting with these plugins. The large number of ESLint users who also use one of these plugins would not benefit from parallel linting.

This API would need to provide the plugin with the config and list of files to be linted, and parallel linting would need a way to share the result with workers.

TSC Question: Are we interested in an RFC for a plugin initialization API? If so, is that functionality a prerequisite for parallel linting?

@btmills btmills added the tsc agenda Topic for discussion at next TSC meeting label Feb 11, 2021
@bradzacher
Copy link

Are we interested in an RFC for a plugin initialization API? If so, is that functionality a prerequisite for parallel linting?

I don't think it's a problem for ESLint to remain stateless; but I think adding some form of lifecycle hooks/API to help stateful modules interop with ESLint would be a huge win.

I have thought about this and informally put the idea forward a little while back - eslint/eslint#13525.

Unfortunately I haven't had the spare time to spend develop the idea enough to write an RFC.

@nzakas
Copy link
Member

nzakas commented Feb 24, 2021

@btmills can you update this with discussion from last TSC meeting?

@btmills
Copy link
Member

btmills commented Feb 25, 2021

I've opened eslint/eslint#14139 to collect requirements and ideas for a plugin initialization API. @bradzacher, @ljharb, and anyone else who's thought about hooks for plugins to do pre-linting initialization work, please head over there to make sure we're considering all the use cases.

Once we've come up with a direction in that issue, we'll open a separate RFC detailing the design for that API. It will be complementary to this RFC and should explicitly address how it will work with parallel linting.

We agreed in the TSC meeting to complete a plugin initialization API before finalizing parallel linting. That way, when we release parallel linting, there won't be any snags for typescript-eslint or eslint-plugin-import users.

@btmills btmills removed the tsc agenda Topic for discussion at next TSC meeting label Feb 25, 2021
@JamesHenry
Copy link
Member

Awesome, thanks @btmills @nzakas and the rest of the TSC!

@JounQin
Copy link

JounQin commented Apr 26, 2021

Will the rule API be async too? I don't see any discussion about it here.

@nzakas
Copy link
Member

nzakas commented Nov 2, 2022

With all of the internal changes, we are going to need to rethink how to move forward with this, so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change This RFC contains breaking changes Initial Commenting This RFC is in the initial feedback stage on hold This RFC is awaiting upstream updates.
Projects
None yet
Development

Successfully merging this pull request may close these issues.