- Repo: eslint/eslint
- Start Date: 2022-11-25
- RFC PR: #99
- Authors: Nicholas C. Zakas
This proposal provides a way for plugin authors to define parsers, rules, etc., for any arbitrary programming languages. This would allow ESLint to work on any type of programming language for which there is a plugin defined, not just for JavaScript and its variants.
Over the nine years ESLint has been in existence, it has grown into being not just the most popular JavaScript linter, but arguably the most popular linter for any programming language. Twitter is filled with people lamenting that they don't have an "ESLint for X", where X is some other file type they're working with. Numerous projects have popped up that mimic a lot of what ESLint does at it's core:
In addition, there are numerous projects that hack around other languages to make them work in ESLint, such as:
Instead of forcing others to rewrite what is core linting functionality, or asking people to produce an ESTree format AST for some language other than JavaScript, we can make everyone's lives easier by providing formal language support such that languages can be plugged in to ESLint easily.
We did also previously accept #56 as an extension to parseForESLint()
to support more languages and eslint/eslint#14745 has been open for a couple of years to figure out a way to lint both a file and its virtual parts.
This proposal consists of the following changes:
- Allows plugins to publish languages in addition to rules, processors, etc.
- Allows end users to select which language to use in their config files.
- Defines an API for defining languages.
- Changes the ESLint core to consume language definitions instead of assuming JavaScript is always used.
Prerequisites: This proposal requires the flat config system and cannot be used with the eslintrc config system.
Note: This design is intended to work both in the current version of ESLint and also in the "rewrite" version, which I'll refer to as ESLint X to distinguish between the two. Not all features of this proposal will be available in both versions.
Languages can be defined in plugins by using the languages
key and giving the language a name. This allows multiple languages to be contained in a single plugin if desired. For example:
// inside a plugin called eslint-plugin-lang
export default {
languages: {
lang1: {
// language object
},
lang2: {
// language object
}
},
processors: {
// ..
},
rules: {
// ...
}
};
Here, lang1
and lang2
are languages defined by the plugin. End users can specify which language to use via the language
key in a config object, such as:
// inside eslint.config.js
import lang from "eslint-plugin-lang";
export default [
{
plugins: {
lang
},
files: ["**/*.lang"],
language: "lang/lang1",
languageOptions: {
// options
}
}
];
Here, "lang/lang1"
refers to the lang1
language as exported from eslint-plugin-lang
. Anything inside of languageOptions
is then handled by lang1
rather than by ESLint itself.
Each language definition object must implement the following interface:
interface ESLintLanguage {
/**
* Indicates how ESLint should read the file.
*/
fileType: "text" | "binary";
/**
* First line number returned from the parser (text mode only).
*/
lineStart?: 0 | 1;
/**
* First column number returned from the parser (text mode only).
*/
columnStart?: 0 | 1;
/**
* The property to read the node type from. Used in selector querying.
*/
nodeTypeKey: string;
/**
* The traversal path that tools should take when evaluating the AST
*/
visitorKeys: Record<string,Array<string>>;
/**
* Validates languageOptions for this language.
*/
validateOptions(options: LanguageOptions): void;
/**
* Helper for esquery that allows languages to match nodes against
* class. esquery currently has classes like `function` that will
* match all the various function nodes. This method allows languages
* to implement similar shorthands.
*/
matchesSelectorClass(className: string, node: ASTNode, ancestry: Array<ASTNode>): boolean;
/**
* Parses the given file input into its component parts. This file should not
* throws errors for parsing errors but rather should return any parsing
* errors as parse of the ParseResult object.
*/
parse(file: File, context: LanguageContext): ParseResult | Promise<ParseResult>;
/**
* Creates SourceCode object that ESLint uses to work with a file.
*/
createSourceCode(file: File, input: ParseResult, context: LanguageContext): SourceCode | Promise<SourceCode>;
}
// Supplemental interfaces
type LanguageOptions = Record<string, unknown>;
interface File {
/**
* The path that ESLint uses for this file. May be a virtual path
* if it was returned by a processor.
*/
path: string;
/**
* The path to the file on disk. This always maps directly to a file
* regardless of whether it was returned from a processor.
*/
physicalPath: string;
/**
* Indicates if the original source contained a byte-order marker.
* ESLint strips the BOM from the `body`, but this info is needed
* to correctly apply autofixing.
*/
bom: boolean;
/**
* The body of the file to parse.
*/
body: string | ArrayBuffer
}
interface LanguageContext {
options: LanguageOptions;
}
interface ParseResult {
/**
* Indicates if the parse was successful. If true, the parse was successful
* and ESLint should continue on to create a SourceCode object and run rules;
* if false, ESLint should just report the error(s) without doing anything
* else.
*/
ok: boolean;
/**
* The abstract syntax tree created by the parser.
*/
ast: ASTNode;
/**
* The body of the file, either text or bytes.
*/
body: string | ArrayBuffer;
/**
* Any parsing errors, whether fatal or not.
*/
errors?: Array<ParseError>;
}
interface ParseError {
message: string;
line: number;
column: number;
endLine?: number;
endColumn?: number;
}
interface ASTNode {
// language-defined
}
At a high-level, ESLint uses the methods on a language object in the following way:
- During config validation, ESLint calls
validateOptions()
on thelanguageOptions
specified in the config. It is the expectation that this method throws an error if any of the options are invalid. - When ESLint reads a file, it checks
fileType
to determine whether the language would prefer the file in text or binary format. - When preparing violation messages, ESLint uses
lineStart
andcolumnStart
to determine how to offset the locations. Some parsers use line 0 as the first line but ESLint normalizes to line 1 as the first line (similar for columns). UsinglineStart
andcolumnStart
allows ESLint to ensure a consistently reported output so one file doesn't start with line 0 and another starts at line 1. If not present,lineStart
andcolumnStart
are assumed to be 0. - After reading a file, ESLint passes the file information to
parse()
to create a raw parse result. - The raw parse result is then passed into
createSourceCode()
to create theSourceCode
object that ESLint will use to interact with the code. - The
nodeTypeKey
property indicates the property key in which each AST node stores the node name. Even though ESTree usestype
, it's also common for ASTs to usekind
to store this information. This property will allow ESLint to use ASTs as they are instead of forcing them to usetype
. This is important as it allows for querying by selector.
The intent of this method is to validate the languageOptions
object as specified in a config. With flat config, all of the options in languageOptions
are related to JavaScript and are currently validated inside of the schema. With this proposal, that validation will instead need to be based on the language
specified in the config, which will require retrieving the associated language object and calling the validateOptions()
method inside of FlatConfigArray
.
The validateOptions()
method must throw an error if any part of the languageOptions
object is invalid.
No specific languageOptions
keys will be required for languages. The parser
and parserOptions
keys are unique to JavaScript and may not make sense for other languages.
Inside of esquery
, there are some shortcuts like function
and expression
that will match more than one type of node. These are all specific to ESTree and might not make sense for other languages. However, these shortcut classes are very convenient and other languages might want to implement something similar.
The esquery
External class resolve pull request implements an option that allows us to pass in a function to interpret pseudoclasses from ESLintLanguage#matchesSelectorClass()
.
The parse()
method receives the file information and information about the ESLint context in order to correctly parse the file. This is a raw parse, meaning that parse()
should do nothing more than convert the file into an abstract format without doing any postprocessing. This is important so that ESLint can tell how long parsing takes vs. other operations. (One problem with the current parseForESLint()
approach is that we lose visibility into the true parsing time.)
The parse()
method is expected to return an object with at least ast
and body
properties; the object may also have other properties as necessary for the language to properly create a SourceCode
object later. The return value from parse()
is passed directly into createSourceCode()
.
This method will be called inside of Linter
and will require that the language object be passed into the private parse()
function.
Async: For use with current ESLint, parse()
must be synchronous; ESLint X will allow this method to be asynchronous.
The createSourceCode()
method is intended to create an instance of SourceCode
suitable for ESLint and rule developers to use to work with the file. As such, it may perform postprocessing of a parse result in order to get it into a format that is easier to use.
This method will be called inside of Linter
and will require that the language object be passed into the private parse()
function.
Async: For use with current ESLint, createSourceCode()
must be synchronous; ESLint X will allow this method to be asynchronous.
ESLint already has a SourceCode
class that is used to postprocess and organize the AST data for JavaScript files. This proposal extends the intended purpose of SourceCode
such that it encapsulates all functionality related to a single file, including traversal. As such, SourceCode
objects must implement the following interface:
interface SourceCode {
/**
* Root of the AST.
*/
ast: ASTNode;
/**
* The body of the file that you'd like rule developers to access.
*/
body: string | ArrayBuffer;
/**
* The traversal path that tools should take when evaluating the AST
*/
visitorKeys?: Record<string,Array<string>>;
/**
* Traversal of AST.
*/
traverse(): Iterable<TraversalStep>;
/**
* Return all of the inline areas where ESLint should be disabled/enabled.
*/
getDisableDirectives(): Array<DisableDirective>;
/**
* Return any inline configuration provided via comments.
*/
getInlineConfig(): Array<FlatConfig>;
}
type TraversalStep = VisitTraversalStep | CallTraversalStep;
interface VisitTraversalStep {
type: "visit";
target: ASTNode;
phase: "enter" | "exit";
args: Array<any>;
}
interface CallTraversalStep {
type: "call";
target: string;
phase: string | null | undefined;
args: Array<any>;
}
interface DisableDirective {
type: "enable" | "disable";
ruleId: string;
/**
* The point from which the directive should be applied.
*/
from: Location;
/**
* Location of the comment.
*/
loc: LocationRange;
}
interface LocationRange {
start: Location;
end: Location;
}
interface Location {
line: number;
column: number;
}
Other than these interface members, languages may define any additional methods or properties that they need to provide to rule developers. For instance,
the JavaScript SourceCode
object currently has methods allowing retrieval of tokens, comments, and lines. It is up to the individual language object implementations to determine what additional properties and methods may be required inside of rules. (We may want to provide some best practices for other methods, but they won't be required.)
The JavaScript language allows a parser
option to be passed in, and so the result AST may not be the exact structure represented on the ESLintLanguage
object. In such a case, an additional visitorKeys
property can be provided on SourceCode
that overrides the ESLintLanguage#visitorKeys
property just for this file.
Today, ESLint uses a custom traverser based on data from eslint-visitor-keys
. All of this is specific to an ESTree-compatible AST structure. There are two problems with this:
- We cannot traverse anything other than ESTree structures.
- Because this traversal is synchronous, rules must always be synchronous.
The SourceCode#traverse()
method solves these problems by allowing each SourceCode
instance to specify the traversal order through an iterable. Returning an iterable means several things:
- The method can be a generator or just return an array. This gives maximum flexibility to language authors.
- The ESLint core can move step by step through the traversal, waiting for asynchronous calls to return before moving to the next node. This will ultimately enable async rules in ESLint X.
- It's possible to add method calls into the traversal, such as the current
onCodePathStart()
method for JavaScript, in a standard way that is available to other languages. - If the
SourceCode
object ends up doing more than one traversal of the same AST (which is how ESLint currently work), it can cache the traversal steps in an array to reuse multiple times, saving compute time.
As an example, the current traversal can be rewritten to look like this:
class Traverser {
// methods skipped
*_traverse(node, parent) {
if (!isNode(node)) {
return;
}
this._current = node;
this._skipped = false;
yield {
type: "visit",
target: node,
phase: "enter",
args: [node, parent]
}
if (!this._skipped && !this._broken) {
const keys = getVisitorKeys(this._visitorKeys, node);
if (keys.length >= 1) {
this._parents.push(node);
for (let i = 0; i < keys.length && !this._broken; ++i) {
const child = node[keys[i]];
if (Array.isArray(child)) {
for (let j = 0; j < child.length && !this._broken; ++j) {
yield *this._traverse(child[j], node);
}
} else {
yield *this._traverse(child, node);
}
}
this._parents.pop();
}
}
if (!this._broken) {
yield {
type: "visit",
target: node,
phase: "exit",
args: [node, parent]
}
}
this._current = parent;
}
}
Note that we use this data inside of NodeEventGenerator
to make the calls into the visitor objects.
Using SourceCode#traverse()
, we will be able to traverse an AST like this:
// Simplified example only!! -- not intended to be the final implementation
for (const step of sourceCode.traverse()) {
switch (step.type) {
// visit a node
case "node": {
rules.forEach(rule => {
Object.keys(rule).forEach(key => {
// match the selector
if (sourceCode.match(step.target, key)) {
// account for enter and exit phases
if (key.endsWith(":exit")) {
if (step.phase === "exit") {
// async visitor methods!!
await rule[key](...step.args)
}
} else {
await rule[key](...step.args)
}
}
});
});
break;
}
// call a method
case "call": {
rules.forEach(rule => {
Object.keys(rule).forEach(key => {
if (step.target === key) {
await rule[key](...step.args)
}
});
});
break;
}
default:
throw new Error(`Invalid step type "${ step.type }" found.`);
}
}
In ESLint X, we can swith the for-of
loop for a for await-of
loop in order to allow asynchronous traversal, as well.
ESLint has all kinds of disable directives to use inside of JavaScript. It's impossible for the ESLint core to know how disable directives might be formatted or used in other languages, so we need to abstract this out and delegate it to the SourceCode
object. The SourceCode#getDisableDirectives()
method returns an array of disable directive objects indicating when each rule is disabled and enabled.
The ESLint core will then use the disable directive information to:
- Filter out violations appropriately.
- Report any unused disable directives.
This functionality currently lives in apply-disable-directives.js
and will have to be updated to call SourceCode#getDisableDirectives()
.
This is a method instead of a property because if ESLint is run with noInlineConfig: true
then there is no reason to calculate the disable directives.
In JavaScript, we use /* eslint */
comments to specify inline configuration of rules. Other languages may want to do the same thing, so this method provides a way for a language to extract any inline configuration and return it to the ESLint core.
This functionality currently lives in linter.js
and will have to be updated to call SourceCode#getInlineConfig()
.
This is a method instead of a property because if ESLint is run with noInlineConfig: true
then there is no reason to calculate inline config.
An important part of this process will be extracting all of the JavaScript functionality for the ESLint core and placing it in a language object. As a first step, that language object will live in the main ESLint repository so we can easily test and make changes to ensure backwards compatibility. Eventually, though, I'd like to move this functionality into its own eslint/js
repo.
The default flat config will need to be updated to specify a default language
to use. We would need to hardcode the language for eslintrc configs as there won't be a way to set that in the config itself. (Likely, config.language || defaultLanguage
in Linter
.)
The JavaScript-specific traversal functionality that currently lives in linter.js
will need to move into SourceCode#traverse()
.
ESLint currently uses esquery
to match visitor patterns to nodes. esquery
is defined for use specifically with ESTree-compatible AST structures, which means that it cannot work for other structures without modification.
To make esquery
more generic, I've submitted the following pull requests:
- Allow for custom node type keys that implements an option to specify which key in an AST node contains the node type.
- External class resolve implements an option that allows us to pass in a function to interpret pseudoclasses like
:expression
fromESLintLanguage#matchesSelectorClass()
.
The applyDisableDirectives()
method is called inside of Linter
, and if Linter
calls SourceCode#getDisableDirectives()
, we can likely reuse a lot of the existing functionality with some minor tweaks.
The getDirectiveComments()
function will move into SourceCode
to postprocess the AST with all directive comments and then expose just the disable comments through SourceCode#getDisableDirectives()
.
For JavaScript, we will need to deprecate the following methods and properties and redirect them to the SourceCode
object:
getAncestors()
getDeclaredVariables()
,getScope()
parserServices
The following properties we can remove once we remove the eslintrc config system, so they will still show up for JavaScript but won't be included for non-JavaScript:
parserOptions
parserPath
With a new Rule Context, we should also start converting existing rules over to the new APIs as they move from context
onto SourceCode
.
In order to make all of this work, we'll need to make the following changes in the core:
linter.js
will need to be updated to callSourceCode#traverse()
.FlatConfigArray
will need to be updated to define thelanguage
key and to delegate validation oflanguageOptions
to the language.Linter
will need to be updated to honor thelanguage
key and to use JS-specific functionality where we need to provide backwards compatibility for existing JS rules. There will be a lot of code removed and delegated to the language being used, including filtering of violation via disable directives, traversing the AST, and formulating the final violation list.- The
context
object passed to rules will need to be updated so it works for all languages. FlatRuleTester
will need to be updated to support passing inlanguage
. We should also updatedFlatRuleTester
to warn about the newly-deprecatedcontext
methods and properties.
The rule context object, which is passed into rules as context
, will need to be significantly altered to support other languages. All of the methods that relate specifically to language features will need to move into SourceCode
, leaving us with a rule context object that follows this interface:
interface RuleContext {
languageOptions: LanguageOptions;
settings: object;
options: Array<any>;
id: string;
// method replacements
cwd: string;
filename: string;
physicalFilename: string
sourceCode: SourceCode;
report(violation: Violation): void;
// deprecated methods only available in JS
getCwd(): string;
getFilename(): string;
getPhysicalFilename(): string;
getSourceCode(): SourceCode;
}
interface Violation {
messageId: string;
data: object;
line: number;
column: number;
endLine?: number;
endColumn?: number
suggest: Array<Suggestion>;
fix(fixer: Fixer): Iterable<FixOp>;
}
interface Suggestion {
desc: string;
fix(fixer: Fixer): Iterable<FixOp>;
}
We should strictly use this interface for all non-JavaScript languages from the start.
Note: I have purposely removed the message
property from the Violation
interface. I'd like to move to just one way of supplying messages inside of rules and have that be messageId
. This will also reduce the complexity of processing the violations.
Most existing end-user facing documentation will remain unchanged.
Plugin developer documentation will need to be updated:
- The documentation on writing new rules needs to be updated to include API changes.
- A new page on writing languages for ESLint will need to be written. This will need to be quite extensive.
- Performance. There will likely be a small performance penalty from this refactoring. Moving to iterators from standard loops seems to incur a slowdown. My hope is that we can mitigate the effects of this performance hit by refactoring code in other ways.
- eslintrc Compatibility. Unfortunately, there's no easy way to make this functionality available to eslintrc config users. When implemented, eslintrc users will be left without the ability to use other languages.
This proposal is 100% backwards compatible in the short-term, and all breaking changes are optional and can be phased in over time to avoid disrupting the ecosystem.
This proposal does not require any changes from end-users.
- We could continue using the
parserForESLint()
approach to support other languages. This was proposed and accepted in #56. - We could choose to not support languages other than JavaScript.
The :exit
was left over from before we used esquery
and is a bit of an outlier from how selectors generally work. We could eliminate it and ask people to write their rule visitors like this:
{
"FunctionExpression": {
enter(node, parent) {
// ...
},
exit(node, parent) {
// ...
}
}
}
And if they just wanted to use :exit
, we could ask them to write this:
{
"FunctionExpression": {
exit(node, parent) {
// ...
}
}
}
This would allow the visitor keys to stay strictly as selectors while still allowing both enter and exit phases of traversal to be hit.
N/A
I think there may be a use in validating non-text files such as images, videos, and WebAssembly files. I may be wrong, but by allowing languages to specify that they'd prefer to receive the file in binary, it at least opens up the possibility that people may create plugins that can validate binary files in some way.
At least to start, we won't allow autofix for binary files. While it is technically possible to overwrite particular byte offsets in the same way we do character ranges, it's difficult to know how this should work in practice without an actual use case. My suggestion is that we implement linting for binary files and see what people end up using it for before designing autofix for binary files.
As a first step, yes, the JavaScript language object will still live in the main eslint repo. This will allow us to iterate faster and keep an eye on backwards compatibility issues.
In the future, I anticipate moving all JavaScript-specific functionality, including rules, into a separate repository (eslint/js
) and publishing it as a separate package (@eslint/js
). That would require some thought around how the website should change and be built as rule documentation would also move into a new repo.
It's important that we are able to determine the amount of time it takes to parse a file in order to optimize performance. Combining the parsing with creating a SourceCode
object, which may require additional processing of the parse result, means obscuring this important number.
Additionally, createSourceCode()
allows integrators to do their own parsing and create their own SourceCode
instances without requiring the use of a specific parse()
method.
Rules will be written the same way regardless of the language being used. The only things that will change are the AST node names and the SourceCode
object passed in. Otherwise, rules can be written the same way, including the use of CSS-like selector strings to find nodes.
Yes. The same text-based mechanism we currently use for JavaScript autofixes will also work for other languages.
That is out of scope for this RFC.
Yes. In the short-term, we will create a JSON language plugin and a Markdown language plugin (to coincide with or replace eslint-plugin-markdown
).
We may kickstart other language plugins as well and invite others to maintain them.