Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add chevrotain parser with Unicode support! #3432

Closed
wants to merge 9 commits into from

Conversation

sidharthv96
Copy link
Member

@sidharthv96 sidharthv96 commented Sep 8, 2022

📑 Summary

Introduces chevrotain as a possible parser instead of JISON with Unicode support!

Experimental support only in the least used Pie and Info diagram.

image

Related issues

📏 Design Decisions

The new parser has a compatability layer to act as a drop in replacement for the existing JISON parser. So no changes in mermaid code will be required outside of the corresponding diagram's folder.

  • Unicode support - Emojis!!
  • Completely typed code
  • Easy to debug
  • Friendly logic, written in TS
  • Faster
  • ESM support
  • No need of the loader headache

📋 Tasks

Make sure you

  • 📖 have read the contribution guidelines
  • 💻 have added unit/e2e tests (if appropriate)
  • 🔖 targeted develop branch

@sidharthv96 sidharthv96 changed the title feat: Add chevrotain parser feat: Add chevrotain parser with Unicode support! Sep 8, 2022
@sidharthv96 sidharthv96 self-assigned this Sep 8, 2022
const Text = createToken({ name: 'Text', pattern: /[^\n\r"]+/ });
const StringLiteral = createToken({
name: 'StringLiteral',
pattern: /"(:?[^\\"]|\\(:?[bfnrtv"\\/]|u[0-9a-fA-F]{4}))*"/,

Check failure

Code scanning / CodeQL

Inefficient regular expression

This part of the regular expression may cause exponential backtracking on strings starting with '"' and containing many repetitions of ':!'.
@sidharthv96 sidharthv96 marked this pull request as draft September 9, 2022 04:38
aloisklink added a commit to aloisklink/mermaid that referenced this pull request Sep 15, 2022
In gitGraph, add support branch names that only have a single
character.

The branch regex is checking for a starting character, and an
ending character, so it currently needs at least two characters.

I've wrapped everything except the first character in a `()?` to fix
this.

There are some really complicated regexes that do match what
valid git branches are (see https://stackoverflow.com/a/12093994), but
I'm reluctant to add them in, since it will be a pain to test all
the different edgecases.

Hopefully mermaid-js#3432 might be used
in the future to make a better gitgraph parser!
@weedySeaDragon
Copy link
Contributor

This would be great also because it allows for some inheritance. Could hopefully get rid of all the duplication so much of the lexers and parsers. Would also make it easy to implement things that are common to all diagrams (headers, footers, titles, etc.)

I wonder if we could use the syntax diagrams in the documentation (e.g. to show the syntax for diagrams)

@weedySeaDragon
Copy link
Contributor

FYI - I was intrigued by this, so started learning and working with it.
I'm working on the parser for the ER diagrams (since that's what I've been working with). I'm writing pretty detailed spec/tests for the parser so that we can refactor and re-use the common parts for other diagrams.
I've written it using the CST/Visitor semantics.

When I have it working I'll put up a draft PR. If you want to see what I've got before then, just let me know.

@knsv
Copy link
Collaborator

knsv commented Oct 7, 2022

This will come sometime down the line. We are changing a lot right now and we need to stabilize first, after that I will ot this and do a bigger POC for evaluation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants