Plugin initialization API #14139

btmills · 2021-02-25T07:37:27Z

The problem you want to solve.

ESLint assumes that each rule and source file can be processed independently. typescript-eslint (ref eslint/rfcs#42 (comment)) and eslint-plugin-import (ref eslint/rfcs#42 (comment)) need to do upfront initialization work beyond the scope of a single rule and source file, specifically loading type information and tracing a module graph. Lacking a first-class API, they have inserted these initialization steps into the regular rule linting flow.

If we were to ship parallel linting without supporting this use case, the duplicated initialization could make parallel linting slower than single-threaded linting with these plugins. The large number of ESLint users who also use one of these plugins would not benefit from parallel linting.

Your take on the correct solution to problem.

I'm not familiar with typescript-eslint and eslint-plugin-import internals, so I'm opening this issue to gather requirements.

What I (think I) know:

Plugins have asked for a hook to perform pre-lint initialization.
When we ship parallel linting, currently under discussion in RFC42, plugins will need a mechanism to share the initialization result with workers.

What I want to know:

When before linting should ESLint call plugin initialization hooks?
What data do plugins need from ESLint? This may constrain the answer to the previous question.
Does ESLint need to pass any of the initialization result back to plugin rules during linting (via e.g. the rule context object)?
What data do plugins need to share with workers? For example, is it JSON serializable?
Are there other plugins beside typescript-eslint and eslint-plugin-import whose authors should be part of this discussion?
What am I missing?

Are you willing to submit a pull request to implement this change?

The next step will be to write an RFC, which I will write once we've settled on requirements and hopefully brainstormed some solutions.

Related discussions:

The text was updated successfully, but these errors were encountered:

btmills · 2021-02-25T08:04:17Z

I'll kick off the discussion by relating an idea @nzakas had in the TSC meeting discussion that led to this issue. This is only one possible approach, so please share any other ideas you have.

Right now, eslint-plugin-import uses ESLint's FileEnumerator directly and parses those files to build a module graph. FileEnumerator is intended to be private API, but there's not yet a public alternative. This also causes each file to be parsed multiple times, once by eslint-plugin-import and once by ESLint.

What if, after parsing source files and before running any rules, ESLint provided eslint-plugin-import with the pre-parsed ASTs for all files targeted by the lint run? eslint-plugin-import would no longer need to use FileEnumerator, and we'd avoid re-parsing every file. It would also allow eslint-plugin-import to drop the separate import/parsers configuration in eslintrc settings.

However, this could increase memory usage on larger projects by having full ASTs for all files in memory simultaneously. Right now, I assume that eslint-plugin-import discards the full AST after extracting any exported members, and ESLint currently parses and lints one file at a time then moves on to the next file.

This approach might constrain parallel linting. ESLint's current one-file-at-a-time model maps well to parallel workers that could do their own parsing and traversal. This approach would split that into a parse phase followed by a lint phase and require us to pass ASTs around.

ljharb · 2021-02-25T15:04:54Z

Only some of eslint-plugin-import's rules need its ExportMap, so ideally, each rule should be able to "register" a hook, and only when the rule is going to be invoked (ie, not disabled by config) would the hook need to run. It does, however, eagerly parse every file in the project, regardless of what's being currently linted.

I'm pretty sure we discard AST nodes as soon as we've derived our data structure from them, so I don't anticipate memory being a huge problem. If it were, we could always serialize it to the filesystem (which would also allow us to share it across threads).

bradzacher · 2021-02-26T18:29:47Z

Some background/context on @typescript-eslint:

Our parser is built on top of TypeScript. This means that we don't do any parsing ourselves. Instead we invoke TypeScript and have it parse any given file, and then we transform TS's AST representation to match the ts-estree representation.

In "non-type-aware" mode we work 100% the same as any other ESLint parser; meaning we cache nothing and just produce an AST.
In this mode we invoke TypeScript in "isolated modules" mode (meaning TS does not parse dependencies, etc), and we just convert and return the AST. Any rule require type-information will hard fail.

"Type-aware" mode is where we break from other parsers. Users provide a reference to one or more of their tsconfig.json's via parserOptions.project.
In this mode, when ESLint asks us to parse a file (in a nutshell) we do the following:

for each provided tsconfig in parserOptions.project
- invoke TypeScript in project mode and pass it the path to the tsconfig
  - this causes TypeScript to parse all files referenced by the tsconfig and create a number of data structures to store types
  - note that types are mostly computed lazily, meaning there's only limited type info computed up front during the parse.
- check the TS project to see if it includes the required file
- if it does, return the TS AST, else continue.
convert the TS AST to ts-estree AST and generate a mapping from ts-estree AST node to TS AST node.
Return the AST, and the parser services (the node map and the TS "program" instance).

When a type-aware runs it works as per a normal ESLint rule until it specifically needs type information and then it simply gets the TS AST node via the node map, and then asks TS for type information for the TS AST node.

For @typescript-eslint, it's less about plugin initialisation, and more about parser initialisation.

As above; in type-aware mode all of the work is front-loaded during the parse step. ESLint currently does not provide any separate steps, so the work is specifically performed during the first file that gets parsed.

In regards to an initialisation API, ideally we'd want an API that gets called before parsing which does something along the lines of "broadcast which files are being linted in this run and the config they are being linted with".

This will allow us to do a few things:

figure out which tsconfigs are actually being used (if any).
figure out the mapping of files to tsconfigs.
invoke TS for each tsconfig.
- potentially... probably still better to defer this until it's actually used so we can save time/memory.

(1) and (2) would allow us to understand the workload ahead of time, and plan accordingly.
For example, if we know the entire set of files from a tsconfig that are being linted, we can know exactly when we can dispose of the "program", and free up that memory. Right now we have no idea about that, and instead have to keep all programs in memory always.

One other suggestion would be for the ability for a parser to influence the order in which ESLint processes files.
For example, if ESLint tells us that files X, Y, and Z are being linted, and we determine that X and Y belong to tsconfig A, and Z belongs to tsconfig B, then it'd be great if we could tell ESLint these batches.
Which would further allow us to optimise memory usage as we could ensure we only have one program in memory at any given time.
This would also naturally lend itself to parallelised linting as well, as this would allow us to tell ESLint how to best batch files across threads to prevent OOMs.

One final suggestion would be for ESLint to differentiate between the "persistent" usecase (like IDEs) and the "one and done" usecase (like CLI runs).

In a "one and done" run we don't really need to care about what happens on the filesystem - we can pretty much assume that the state at the start of the lint run is the state at the end of the lint run.
In a "persistent" usecase we need to know to update the state of TS continually as the user adds and removes files, or changes file contents.

Right now there's no mechanism for this so we have to treat all runs as if they're potentially "persistent" runs; meaning we can waste a lot of time on CLI runs.

nzakas · 2021-03-02T19:12:29Z

@bradzacher there's a lot of great info in your comment, thanks for sharing that. To keep this thread focused, would you mind opening separate issues about "allowing a parser to influence the order of files" and "persistent vs one-and-done modes"? Those are both interesting to me, and I'd like to explore those more, but I don't feel they are related enough to the topic of this issue to risk derailing the overall conversation.

btmills · 2021-03-07T00:09:47Z

@ljharb: each rule should be able to "register" a hook, and only when the rule is going to be invoked

Not all rules need these hooks, so hooks should be registered based on enabled rules, got it.

It does, however, eagerly parse every file in the project, regardless of what's being currently linted.

To clarify, eslint-plugin-import needs to do its own filesystem traversal and parsing that is distinct from the list of to-be-linted files. It would ignore that list if ESLint were to provide it. Is that correct?

@bradzacher: broadcast which files are being linted in this run and the config they are being linted with

For @typescript-eslint, the combination of providing tsconfig.json paths and enabling type-aware rules determines whether any hooks are needed. If type-aware linting is used, the hook needs the full list of to-be-linted files and their configs.

High-level requirements:

Before parsing or linting any files, ESLint's main thread (today, the only thread) needs to enumerate files and calculate their configs. If any enabled rules register hooks, ESLint should then call them with the list of to-be-linted files and their configs.

eslint-plugin-import will use shared settings from the configs but will ignore the files list.
@typescript-eslint needs parser settings and the files list.

After hooks have been called, the main thread or worker threads parse and lint each file.

What about hook results?

What should happen with hooks' results, particularly once we have parallel linting? Are those are being shared between rules as in-memory singletons today? Would eslint-plugin-import and @typescript-eslint continue to share results with workers internally, or does ESLint need to provide a mechanism?

@ljharb, you mentioned the filesystem as a fallback if memory is an issue, though I assume sharing results in memory would be preferable. Does eslint-plugin-import have any requirements for how that would be done?
@bradzacher, you mentioned lazy and deferred calculation in type-aware mode, which is very different from the serializable results eslint-plugin-import generates. What does the communication between the plugin and TypeScript look like today? Is it all via synchronous function calls? Would workers require bi-directional communication with a single instance of TypeScript?

ljharb · 2021-03-07T03:31:27Z

To clarify, eslint-plugin-import needs to do its own filesystem traversal and parsing that is distinct from the list of to-be-linted files. It would ignore that list if ESLint were to provide it. Is that correct?

yes, exactly.

you mentioned the filesystem as a fallback if memory is an issue, though I assume sharing results in memory would be preferable. Does eslint-plugin-import have any requirements for how that would be done?

No requirements really; the easiest thing would be to be able to share a fully populated instance of a class (https://github.com/benmosher/eslint-plugin-import/blob/master/src/ExportMap.js#L27). If the requirement were to provide something serializable, I suspect we could, for example, use a Symbol provided by eslint to provide an instance method that produced whatever serialization format eslint required, and then we could deserialize it ourselves later on demand.

bradzacher · 2021-03-07T05:50:29Z

What does the communication between the plugin and TypeScript look like today? Is it all via synchronous function calls?

The parser provides parserServices which includes the "program" object created by TS.
A rule will talk to TS via it's "type checker" class (parserServices.program.getTypeChecker()).
All of this is done synchronously and in one thread.

No part of the type-aware linting is async because ESLint does not (/did not) support asynchrony.
Introducing async into a synchronous ESLint would be a recipe for race condition bugs!

Would workers require bi-directional communication with a single instance of TypeScript?

TS is designed so that any type information is accessed within the same process. I think it would require some serious bespoke engineering to make it work as a singleton type server.

TS has a language server, but the design there is for an IDE to ask specific things: "this file changed, tell me the problems", "what is the hover information to display at this specific range?", "what refactoring operations can be done at this specific location?".
It's not designed to be a singleton server to serve type information requests.
Flow is designed that way, but TS is not.

This is why I mentioned "the ability for a parser to influence the order in which ESLint processes files" (I'll see if I can write up an issue tomorrow).

Ideally you would group files by tsconfig (project), and have one thread per project.
If you split one project over two threads then you'd have to recalculate the types for each thread, and then you'd just be duplicating work.
For medium-large projects this would likely cause memory contention and OOMs.

AbdealiLoKo · 2022-03-20T14:14:55Z

I was looking to create a similar plugin to eslint-plugin-import - and found some difficulties with the plugin architecture too.
My plugin is for checking unused class members - and I found that I would also need to:

Find all files in the project
Parse all the files and check usage in the AST

I found that finding all files seems like a somewhat easier problem ... But the performance hit of parsing the entire project is quite large.
And in my applications, we use eslint-plugin-import - meaning we would be parsing the entire source code 3 times

The naive thought I had:
For the short term - Is there any possibility to simply create a helper function in use-at-your-own-risk which provides the same AST eslint would be creating later on - but it caches the AST ?
That way plugin developers can use that function if the plugin requires the parsing in create() - i.e. before the standard execution order of eslint.
The next time eslint tries to fetch the AST - it will anyway be cached and there is no significant performance hit

This way I can atleast write my plugin without parsing the entire project 3 times in my application

nzakas · 2022-03-23T00:35:02Z

Sorry, we don’t add temporary APIs to ESLint. A better approach for you might be to create a custom parser that wraps Espree and memoizes the parse() function.

dead-claudia · 2023-06-13T15:54:57Z

I wonder if, for modules like typescript-eslint and eslint-plugin-import that need state persisted across files, it'd be a good idea to allow plugins to register something like a worker or just a module to run on the main thread, and communicate to each file worker via a provided message channel or RPC mechanism.

This would provide some independence as well from however ESLint decides to implement parallel linting.

btmills mentioned this issue Feb 25, 2021

feat: Lint files in parallel if many files exist eslint/rfcs#42

Closed

nzakas added this to Planned in Public Roadmap via automation Mar 2, 2021

nzakas added this to Needs Triage in Triage via automation Mar 2, 2021

nzakas moved this from Needs Triage to Evaluating in Triage Mar 2, 2021

nzakas moved this from Planned to Needs RFC in Public Roadmap Mar 2, 2021

bradzacher mentioned this issue May 23, 2021

OOM when many (>10) project configs passed to the parser typescript-eslint/typescript-eslint#1192

Open

btmills self-assigned this May 28, 2021

mizdra mentioned this issue Dec 24, 2021

Change Request: Support ESM plugins #15453

Closed

1 task

six-transit mentioned this issue Mar 17, 2022

Docs: explain whether type-aware linting is safe to use with eslint --cache typescript-eslint/typescript-eslint#4694

Closed

btmills mentioned this issue May 7, 2022

feat!: Support async parsing eslint/rfcs#87

Closed

benmccann mentioned this issue Mar 31, 2023

Change Request: allow asynchronous parser #15475

Open

1 task

voxpelli mentioned this issue Jun 1, 2023

feat: Initial RFC for sharding eslint/rfcs#112

Closed

stevenpetryk mentioned this issue Jun 9, 2023

Lint multiple files in parallel [$500] #3565

Open

bradzacher mentioned this issue Jan 29, 2024

ESLint does not re-compute cross-file information on file changes microsoft/vscode-eslint#1774

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugin initialization API #14139

Plugin initialization API #14139

btmills commented Feb 25, 2021

btmills commented Feb 25, 2021

ljharb commented Feb 25, 2021 •

edited

bradzacher commented Feb 26, 2021 •

edited

nzakas commented Mar 2, 2021

btmills commented Mar 7, 2021 •

edited

ljharb commented Mar 7, 2021

bradzacher commented Mar 7, 2021

AbdealiLoKo commented Mar 20, 2022

nzakas commented Mar 23, 2022

dead-claudia commented Jun 13, 2023

Plugin initialization API #14139

Plugin initialization API #14139

Comments

btmills commented Feb 25, 2021

btmills commented Feb 25, 2021

ljharb commented Feb 25, 2021 • edited

bradzacher commented Feb 26, 2021 • edited

nzakas commented Mar 2, 2021

btmills commented Mar 7, 2021 • edited

ljharb commented Mar 7, 2021

bradzacher commented Mar 7, 2021

AbdealiLoKo commented Mar 20, 2022

nzakas commented Mar 23, 2022

dead-claudia commented Jun 13, 2023

ljharb commented Feb 25, 2021 •

edited

bradzacher commented Feb 26, 2021 •

edited

btmills commented Mar 7, 2021 •

edited