Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin initialization API #14139

Open
btmills opened this issue Feb 25, 2021 · 10 comments
Open

Plugin initialization API #14139

btmills opened this issue Feb 25, 2021 · 10 comments
Assignees
Labels
core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion needs design Important details about this change need to be discussed

Comments

@btmills
Copy link
Member

btmills commented Feb 25, 2021

The problem you want to solve.

ESLint assumes that each rule and source file can be processed independently. typescript-eslint (ref eslint/rfcs#42 (comment)) and eslint-plugin-import (ref eslint/rfcs#42 (comment)) need to do upfront initialization work beyond the scope of a single rule and source file, specifically loading type information and tracing a module graph. Lacking a first-class API, they have inserted these initialization steps into the regular rule linting flow.

If we were to ship parallel linting without supporting this use case, the duplicated initialization could make parallel linting slower than single-threaded linting with these plugins. The large number of ESLint users who also use one of these plugins would not benefit from parallel linting.

Your take on the correct solution to problem.

I'm not familiar with typescript-eslint and eslint-plugin-import internals, so I'm opening this issue to gather requirements.

What I (think I) know:

  • Plugins have asked for a hook to perform pre-lint initialization.
  • When we ship parallel linting, currently under discussion in RFC42, plugins will need a mechanism to share the initialization result with workers.

What I want to know:

  • When before linting should ESLint call plugin initialization hooks?
  • What data do plugins need from ESLint? This may constrain the answer to the previous question.
  • Does ESLint need to pass any of the initialization result back to plugin rules during linting (via e.g. the rule context object)?
  • What data do plugins need to share with workers? For example, is it JSON serializable?
  • Are there other plugins beside typescript-eslint and eslint-plugin-import whose authors should be part of this discussion?
  • What am I missing?

Are you willing to submit a pull request to implement this change?

The next step will be to write an RFC, which I will write once we've settled on requirements and hopefully brainstormed some solutions.

Related discussions:

@btmills btmills added enhancement This change enhances an existing feature of ESLint core Relates to ESLint's core APIs and features evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion needs design Important details about this change need to be discussed labels Feb 25, 2021
@btmills
Copy link
Member Author

btmills commented Feb 25, 2021

I'll kick off the discussion by relating an idea @nzakas had in the TSC meeting discussion that led to this issue. This is only one possible approach, so please share any other ideas you have.

Right now, eslint-plugin-import uses ESLint's FileEnumerator directly and parses those files to build a module graph. FileEnumerator is intended to be private API, but there's not yet a public alternative. This also causes each file to be parsed multiple times, once by eslint-plugin-import and once by ESLint.

What if, after parsing source files and before running any rules, ESLint provided eslint-plugin-import with the pre-parsed ASTs for all files targeted by the lint run? eslint-plugin-import would no longer need to use FileEnumerator, and we'd avoid re-parsing every file. It would also allow eslint-plugin-import to drop the separate import/parsers configuration in eslintrc settings.

However, this could increase memory usage on larger projects by having full ASTs for all files in memory simultaneously. Right now, I assume that eslint-plugin-import discards the full AST after extracting any exported members, and ESLint currently parses and lints one file at a time then moves on to the next file.

This approach might constrain parallel linting. ESLint's current one-file-at-a-time model maps well to parallel workers that could do their own parsing and traversal. This approach would split that into a parse phase followed by a lint phase and require us to pass ASTs around.

@ljharb
Copy link
Sponsor Contributor

ljharb commented Feb 25, 2021

Only some of eslint-plugin-import's rules need its ExportMap, so ideally, each rule should be able to "register" a hook, and only when the rule is going to be invoked (ie, not disabled by config) would the hook need to run. It does, however, eagerly parse every file in the project, regardless of what's being currently linted.

I'm pretty sure we discard AST nodes as soon as we've derived our data structure from them, so I don't anticipate memory being a huge problem. If it were, we could always serialize it to the filesystem (which would also allow us to share it across threads).

@bradzacher
Copy link
Contributor

bradzacher commented Feb 26, 2021

Some background/context on @typescript-eslint:

Our parser is built on top of TypeScript. This means that we don't do any parsing ourselves. Instead we invoke TypeScript and have it parse any given file, and then we transform TS's AST representation to match the ts-estree representation.

In "non-type-aware" mode we work 100% the same as any other ESLint parser; meaning we cache nothing and just produce an AST.
In this mode we invoke TypeScript in "isolated modules" mode (meaning TS does not parse dependencies, etc), and we just convert and return the AST. Any rule require type-information will hard fail.

"Type-aware" mode is where we break from other parsers. Users provide a reference to one or more of their tsconfig.json's via parserOptions.project.
In this mode, when ESLint asks us to parse a file (in a nutshell) we do the following:

  • for each provided tsconfig in parserOptions.project
    • invoke TypeScript in project mode and pass it the path to the tsconfig
      • this causes TypeScript to parse all files referenced by the tsconfig and create a number of data structures to store types
      • note that types are mostly computed lazily, meaning there's only limited type info computed up front during the parse.
    • check the TS project to see if it includes the required file
    • if it does, return the TS AST, else continue.
  • convert the TS AST to ts-estree AST and generate a mapping from ts-estree AST node to TS AST node.
  • Return the AST, and the parser services (the node map and the TS "program" instance).

When a type-aware runs it works as per a normal ESLint rule until it specifically needs type information and then it simply gets the TS AST node via the node map, and then asks TS for type information for the TS AST node.


For @typescript-eslint, it's less about plugin initialisation, and more about parser initialisation.

As above; in type-aware mode all of the work is front-loaded during the parse step. ESLint currently does not provide any separate steps, so the work is specifically performed during the first file that gets parsed.

In regards to an initialisation API, ideally we'd want an API that gets called before parsing which does something along the lines of "broadcast which files are being linted in this run and the config they are being linted with".

This will allow us to do a few things:

  1. figure out which tsconfigs are actually being used (if any).
  2. figure out the mapping of files to tsconfigs.
  3. invoke TS for each tsconfig.
    • potentially... probably still better to defer this until it's actually used so we can save time/memory.

(1) and (2) would allow us to understand the workload ahead of time, and plan accordingly.
For example, if we know the entire set of files from a tsconfig that are being linted, we can know exactly when we can dispose of the "program", and free up that memory. Right now we have no idea about that, and instead have to keep all programs in memory always.


One other suggestion would be for the ability for a parser to influence the order in which ESLint processes files.
For example, if ESLint tells us that files X, Y, and Z are being linted, and we determine that X and Y belong to tsconfig A, and Z belongs to tsconfig B, then it'd be great if we could tell ESLint these batches.
Which would further allow us to optimise memory usage as we could ensure we only have one program in memory at any given time.
This would also naturally lend itself to parallelised linting as well, as this would allow us to tell ESLint how to best batch files across threads to prevent OOMs.


One final suggestion would be for ESLint to differentiate between the "persistent" usecase (like IDEs) and the "one and done" usecase (like CLI runs).

In a "one and done" run we don't really need to care about what happens on the filesystem - we can pretty much assume that the state at the start of the lint run is the state at the end of the lint run.
In a "persistent" usecase we need to know to update the state of TS continually as the user adds and removes files, or changes file contents.

Right now there's no mechanism for this so we have to treat all runs as if they're potentially "persistent" runs; meaning we can waste a lot of time on CLI runs.

@nzakas nzakas added this to Planned in Public Roadmap via automation Mar 2, 2021
@nzakas nzakas added this to Needs Triage in Triage via automation Mar 2, 2021
@nzakas nzakas moved this from Needs Triage to Evaluating in Triage Mar 2, 2021
@nzakas nzakas moved this from Planned to Needs RFC in Public Roadmap Mar 2, 2021
@nzakas
Copy link
Member

nzakas commented Mar 2, 2021

@bradzacher there's a lot of great info in your comment, thanks for sharing that. To keep this thread focused, would you mind opening separate issues about "allowing a parser to influence the order of files" and "persistent vs one-and-done modes"? Those are both interesting to me, and I'd like to explore those more, but I don't feel they are related enough to the topic of this issue to risk derailing the overall conversation.

@btmills
Copy link
Member Author

btmills commented Mar 7, 2021

@ljharb: each rule should be able to "register" a hook, and only when the rule is going to be invoked

Not all rules need these hooks, so hooks should be registered based on enabled rules, got it.

It does, however, eagerly parse every file in the project, regardless of what's being currently linted.

To clarify, eslint-plugin-import needs to do its own filesystem traversal and parsing that is distinct from the list of to-be-linted files. It would ignore that list if ESLint were to provide it. Is that correct?

@bradzacher: broadcast which files are being linted in this run and the config they are being linted with

For @typescript-eslint, the combination of providing tsconfig.json paths and enabling type-aware rules determines whether any hooks are needed. If type-aware linting is used, the hook needs the full list of to-be-linted files and their configs.

High-level requirements:

Before parsing or linting any files, ESLint's main thread (today, the only thread) needs to enumerate files and calculate their configs. If any enabled rules register hooks, ESLint should then call them with the list of to-be-linted files and their configs.

  • eslint-plugin-import will use shared settings from the configs but will ignore the files list.
  • @typescript-eslint needs parser settings and the files list.

After hooks have been called, the main thread or worker threads parse and lint each file.

What about hook results?

What should happen with hooks' results, particularly once we have parallel linting? Are those are being shared between rules as in-memory singletons today? Would eslint-plugin-import and @typescript-eslint continue to share results with workers internally, or does ESLint need to provide a mechanism?

  • @ljharb, you mentioned the filesystem as a fallback if memory is an issue, though I assume sharing results in memory would be preferable. Does eslint-plugin-import have any requirements for how that would be done?
  • @bradzacher, you mentioned lazy and deferred calculation in type-aware mode, which is very different from the serializable results eslint-plugin-import generates. What does the communication between the plugin and TypeScript look like today? Is it all via synchronous function calls? Would workers require bi-directional communication with a single instance of TypeScript?

@ljharb
Copy link
Sponsor Contributor

ljharb commented Mar 7, 2021

To clarify, eslint-plugin-import needs to do its own filesystem traversal and parsing that is distinct from the list of to-be-linted files. It would ignore that list if ESLint were to provide it. Is that correct?

yes, exactly.

you mentioned the filesystem as a fallback if memory is an issue, though I assume sharing results in memory would be preferable. Does eslint-plugin-import have any requirements for how that would be done?

No requirements really; the easiest thing would be to be able to share a fully populated instance of a class (https://github.com/benmosher/eslint-plugin-import/blob/master/src/ExportMap.js#L27). If the requirement were to provide something serializable, I suspect we could, for example, use a Symbol provided by eslint to provide an instance method that produced whatever serialization format eslint required, and then we could deserialize it ourselves later on demand.

@bradzacher
Copy link
Contributor

What does the communication between the plugin and TypeScript look like today? Is it all via synchronous function calls?

The parser provides parserServices which includes the "program" object created by TS.
A rule will talk to TS via it's "type checker" class (parserServices.program.getTypeChecker()).
All of this is done synchronously and in one thread.

No part of the type-aware linting is async because ESLint does not (/did not) support asynchrony.
Introducing async into a synchronous ESLint would be a recipe for race condition bugs!

Would workers require bi-directional communication with a single instance of TypeScript?

TS is designed so that any type information is accessed within the same process. I think it would require some serious bespoke engineering to make it work as a singleton type server.

TS has a language server, but the design there is for an IDE to ask specific things: "this file changed, tell me the problems", "what is the hover information to display at this specific range?", "what refactoring operations can be done at this specific location?".
It's not designed to be a singleton server to serve type information requests.
Flow is designed that way, but TS is not.

This is why I mentioned "the ability for a parser to influence the order in which ESLint processes files" (I'll see if I can write up an issue tomorrow).

Ideally you would group files by tsconfig (project), and have one thread per project.
If you split one project over two threads then you'd have to recalculate the types for each thread, and then you'd just be duplicating work.
For medium-large projects this would likely cause memory contention and OOMs.

@AbdealiLoKo
Copy link

I was looking to create a similar plugin to eslint-plugin-import - and found some difficulties with the plugin architecture too.
My plugin is for checking unused class members - and I found that I would also need to:

  • Find all files in the project
  • Parse all the files and check usage in the AST

I found that finding all files seems like a somewhat easier problem ... But the performance hit of parsing the entire project is quite large.
And in my applications, we use eslint-plugin-import - meaning we would be parsing the entire source code 3 times

The naive thought I had:
For the short term - Is there any possibility to simply create a helper function in use-at-your-own-risk which provides the same AST eslint would be creating later on - but it caches the AST ?
That way plugin developers can use that function if the plugin requires the parsing in create() - i.e. before the standard execution order of eslint.
The next time eslint tries to fetch the AST - it will anyway be cached and there is no significant performance hit

This way I can atleast write my plugin without parsing the entire project 3 times in my application

@nzakas
Copy link
Member

nzakas commented Mar 23, 2022

Sorry, we don’t add temporary APIs to ESLint. A better approach for you might be to create a custom parser that wraps Espree and memoizes the parse() function.

@dead-claudia
Copy link

I wonder if, for modules like typescript-eslint and eslint-plugin-import that need state persisted across files, it'd be a good idea to allow plugins to register something like a worker or just a module to run on the main thread, and communicate to each file worker via a provided message channel or RPC mechanism.

This would provide some independence as well from however ESLint decides to implement parallel linting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Relates to ESLint's core APIs and features enhancement This change enhances an existing feature of ESLint evaluating The team will evaluate this issue to decide whether it meets the criteria for inclusion needs design Important details about this change need to be discussed
Projects
Public Roadmap
  
Needs RFC
Status: Evaluating
Triage
Evaluating
Development

No branches or pull requests

6 participants