Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammars for ES2015-ES2022 #452

Merged
merged 13 commits into from
May 12, 2023
Merged

Grammars for ES2015-ES2022 #452

merged 13 commits into from
May 12, 2023

Conversation

pdubroy
Copy link
Contributor

@pdubroy pdubroy commented May 12, 2023

From #451:

I have added preliminary grammars for all JavaScript releases from ES2015 through ES2022

  • generate-ecmascript-grammar.mjs and grammarkdown.ohm have been updated to parse later ES specs
  • extract-grammarkdown.mjs and dedent.py have been included to create a clean Grammarkdown grammar from an ES spec
  • Extracted & manually cleaned ES grammars for each release
  • Per-grammar overrides have been moved from generate-ecmascript-grammar.mjs to an override.json file, along with productions containing reserved words
  • Grammarkdown grammars & overrides have been organized into folders for each version of ES
  • Generated Ohm grammars for each ES version
    • Grammars appear to parse JS programs correctly in the Ohm editor
    • Using the esm module loader syntax is only supported if the start symbol is changed to Module

@pdubroy
Copy link
Contributor Author

pdubroy commented May 12, 2023

@elgertam I opened a up a new pull request here with your changes, after moving them to a branch in the Ohm repo. (Not sure if there's an easier way to do this such that I can push to the branch.)

@pdubroy
Copy link
Contributor Author

pdubroy commented May 12, 2023

@elgertam One question:

Extracted & manually cleaned ES grammars for each release

How are these extracted, and what kind of manual cleaning is required? Could this be automated as well? (Not for this PR, but it would be good to document it somewhere in case we want to add that in the future.)

From what I can tell, it seems to be a matter of extracting the contents of the <emu-grammar> blocks from https://raw.githubusercontent.com/tc39/ecma262/main/spec.html. Is that right?

@elgertam
Copy link
Collaborator

The extraction process I followed is essentially as you described: download the spec files (from https://raw.githubusercontent.com/tc39/ecma262/gh-pages/{YEAR}/index.html e.g. https://raw.githubusercontent.com/tc39/ecma262/gh-pages/2016/index.html), run them through my extract-grammarkdown.mjs and optionally dedent.py, open in VSCode (using two VSCode extensions for grammarkdown), and remove extraneous productions until the grammar has no errors.

Automation may be possible, especially for the newer grammars. Older grammars have loads of duplicated productions that need to be removed, which were removed using manual clean-up here. These may be extractable in the future with improvements to Ohm's Grammarkdown tooling.

Here's the backstory:

I did quite a bit of digging around ecmarkup and the grammarkdown tools (that's why ecmarkup was still in pkg dependencies) to see if I could extract a grammar from a spec using official tools, and despite a few hints from Ron Buckton that grammars could be extracted, I wasn't able to do so. The grammarkdown tool in particular was difficult to use and understand. Apparently the ecmarkup maintainers feel the same way, because there are comments in the source about how difficult it is to work with.

I wanted to avoid further yak shaving so used my own hacked together tools (extract-grammarkdown.mjs and dedent.py). I then downloaded all the spec HTML files for ES20[16..22] and ran them through the tooling.

Grammars older than ES2018 needed quite a bit of cleanup. The ecmarkup <emu-grammar> tags used in those did not differentiate between instances of snippets or specifications, so many productions, or parts of them, were repeated over and over for commentary & explanatory purposes. I used two the two VSCode extensions, published by Ron Buckton himself, to work with the extracted grammars until all of the extraneous productions were removed and there were no more errors. (That is the genesis of the spec.strict.grammar – Grammarkdown has a "strict" mode where all parameters need to be fully specified unless a certain pragma is applied.) In a couple of cases (can't remember which ones at this point), I found grammar summaries in the spec that were subtly wrong because the production in the grammar summary was taken from the wrong section of the spec.

Grammars for ES2018 or newer were all simple to extract using basic DOM APIs and IIRC only needed minimal cleanup.

@pdubroy
Copy link
Contributor Author

pdubroy commented May 12, 2023

Thanks for the details. Would you mind adding that description somewhere in the tree? Either in a README, a comment in extract-grammarkdown.mjs, or wherever you think makes sense.

@pdubroy
Copy link
Contributor Author

pdubroy commented May 12, 2023

I'll going to go ahead and merge this — we can continue the work in follow-up PRs.

@pdubroy pdubroy merged commit 1a88964 into main May 12, 2023
3 checks passed
@pdubroy pdubroy deleted the elgertam-ecmascript branch May 12, 2023 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants