Grammars for ES2015-ES2022 #452

pdubroy · 2023-05-12T07:12:39Z

From #451:

I have added preliminary grammars for all JavaScript releases from ES2015 through ES2022

generate-ecmascript-grammar.mjs and grammarkdown.ohm have been updated to parse later ES specs

extract-grammarkdown.mjs and dedent.py have been included to create a clean Grammarkdown grammar from an ES spec

Extracted & manually cleaned ES grammars for each release

Per-grammar overrides have been moved from generate-ecmascript-grammar.mjs to an override.json file, along with productions containing reserved words

Grammarkdown grammars & overrides have been organized into folders for each version of ES

Generated Ohm grammars for each ES version

Grammars appear to parse JS programs correctly in the Ohm editor

Using the esm module loader syntax is only supported if the start symbol is changed to Module

- Grammarkdown grammars for es2015-2022 - Tool to extract & dedent grammarkdown grammars - `generate-ecmascript-grammar.mjs` now takes an overrides parameter to allow for per-grammar overrides (useful when getting into `async`/`await` in es2017+

pdubroy · 2023-05-12T07:24:01Z

@elgertam I opened a up a new pull request here with your changes, after moving them to a branch in the Ohm repo. (Not sure if there's an easier way to do this such that I can push to the branch.)

pdubroy · 2023-05-12T10:37:23Z

@elgertam One question:

Extracted & manually cleaned ES grammars for each release

How are these extracted, and what kind of manual cleaning is required? Could this be automated as well? (Not for this PR, but it would be good to document it somewhere in case we want to add that in the future.)

From what I can tell, it seems to be a matter of extracting the contents of the <emu-grammar> blocks from https://raw.githubusercontent.com/tc39/ecma262/main/spec.html. Is that right?

elgertam · 2023-05-12T12:21:06Z

The extraction process I followed is essentially as you described: download the spec files (from https://raw.githubusercontent.com/tc39/ecma262/gh-pages/{YEAR}/index.html e.g. https://raw.githubusercontent.com/tc39/ecma262/gh-pages/2016/index.html), run them through my extract-grammarkdown.mjs and optionally dedent.py, open in VSCode (using two VSCode extensions for grammarkdown), and remove extraneous productions until the grammar has no errors.

Automation may be possible, especially for the newer grammars. Older grammars have loads of duplicated productions that need to be removed, which were removed using manual clean-up here. These may be extractable in the future with improvements to Ohm's Grammarkdown tooling.

Here's the backstory:

I did quite a bit of digging around ecmarkup and the grammarkdown tools (that's why ecmarkup was still in pkg dependencies) to see if I could extract a grammar from a spec using official tools, and despite a few hints from Ron Buckton that grammars could be extracted, I wasn't able to do so. The grammarkdown tool in particular was difficult to use and understand. Apparently the ecmarkup maintainers feel the same way, because there are comments in the source about how difficult it is to work with.

I wanted to avoid further yak shaving so used my own hacked together tools (extract-grammarkdown.mjs and dedent.py). I then downloaded all the spec HTML files for ES20[16..22] and ran them through the tooling.

Grammars older than ES2018 needed quite a bit of cleanup. The ecmarkup <emu-grammar> tags used in those did not differentiate between instances of snippets or specifications, so many productions, or parts of them, were repeated over and over for commentary & explanatory purposes. I used two the two VSCode extensions, published by Ron Buckton himself, to work with the extracted grammars until all of the extraneous productions were removed and there were no more errors. (That is the genesis of the spec.strict.grammar – Grammarkdown has a "strict" mode where all parameters need to be fully specified unless a certain pragma is applied.) In a couple of cases (can't remember which ones at this point), I found grammar summaries in the spec that were subtly wrong because the production in the grammar summary was taken from the wrong section of the spec.

Grammars for ES2018 or newer were all simple to extract using basic DOM APIs and IIRC only needed minimal cleanup.

pdubroy · 2023-05-12T13:49:17Z

Thanks for the details. Would you mind adding that description somewhere in the tree? Either in a README, a comment in extract-grammarkdown.mjs, or wherever you think makes sense.

pdubroy · 2023-05-12T13:49:54Z

I'll going to go ahead and merge this — we can continue the work in follow-up PRs.

elgertam and others added 13 commits May 12, 2023 09:04

WIP on ES20[15-22]

bbecc45

- Grammarkdown grammars for es2015-2022 - Tool to extract & dedent grammarkdown grammars - `generate-ecmascript-grammar.mjs` now takes an overrides parameter to allow for per-grammar overrides (useful when getting into `async`/`await` in es2017+

Added overrides for es2020

227caf8

Overrides are modular

5a30cf1

ES2015-ES2017, ES2020 all parsing

37ea367

Renamed & reorganized files

97bbdf8

Grammars generated for ES2015-ES2022

2efe3ce

Cleaned up the EmptyStatement override

958d341

Reverted unnecessary change

93ceecc

Reverted identifier override due to fixed script

29275a1

Reference a cleaned value, not a dirty one w/ concrete syntax

baa7527

Remove pkg dependency detritus

0fa6796

Remove npm package lock

28f27d8

New pnpm lockfile

f9c2a7e

pdubroy merged commit 1a88964 into main May 12, 2023
3 checks passed

pdubroy deleted the elgertam-ecmascript branch May 12, 2023 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grammars for ES2015-ES2022 #452

Grammars for ES2015-ES2022 #452

pdubroy commented May 12, 2023

pdubroy commented May 12, 2023

pdubroy commented May 12, 2023

elgertam commented May 12, 2023

pdubroy commented May 12, 2023

pdubroy commented May 12, 2023

Grammars for ES2015-ES2022 #452

Grammars for ES2015-ES2022 #452

Conversation

pdubroy commented May 12, 2023

pdubroy commented May 12, 2023

pdubroy commented May 12, 2023

elgertam commented May 12, 2023

pdubroy commented May 12, 2023

pdubroy commented May 12, 2023