enh(gcode): rewrote language for moden gcode support #4040

barthy-koeln · 2024-04-14T10:04:46Z

Complete rework of the gcode language to allow for extended uses cases.
Not all scope rules will apply to all implementations of g-code, but many applications of gcode have added a lot on top of the original spec.

This language implementation aims to be more extensive but still flexible.

My research has used the following documentations:

And countless code examples extracted from GitHub's search:

Changes

More keywords
Stricter numbers matching (c-style numbers are too broad)
Differentiation between functions (G-Codes, M-Codes, …), Axes (ABC, UVW, XYZ), Parameters (P, Q, R, S, …)
Semicolon comments
Quoted strings
Word commands

Question about code

I have used this pattern to re-use complex regexes:

const NUMBER = /[+-]?((\.\d+)|(\d+)(\.\d*)?)/;
const match = new RegExp(`(?<![A-Z])[FHIJKPQRS]\\s*${NUMBER.source}`)

I would like to use this for certain duplicated parts (e.g. (?<![A-Z])) as well. Is this acceptable?
The performance impact is surely negligible since this only happens during initialisation.
It greatly increases readability and IDE support in my opinion.

Screenshots

Name	Before	After
Default
Extended

Checklist

Added markup tests
Updated the changelog at CHANGES.md

src/languages/gcode.js

joshgoebel · 2024-04-14T20:06:04Z

src/languages/gcode.js

+      variants: [
+        // G General functions: G0, G5.1, G5.2, …
+        // M Misc functions: M0, M55.6, M199, …
+        { match: /(?<![A-Z])[GM]\s*\d+(\.\d+)?/ },


We can't use look-behinds until version 12 (breaking change)... so we'll need another way, or this PR will have to wait till then... no ETA on version 12 yet.

src/languages/gcode.js

joshgoebel · 2024-04-16T01:29:30Z

src/languages/gcode.js

-  const NUMBER = hljs.inherit(hljs.C_NUMBER_MODE, { begin: '([-+]?((\\.\\d+)|(\\d+)(\\.\\d*)?))|' + hljs.C_NUMBER_RE });
+
+
+  const LETTER_BOUNDARY_RE = /(?<![A-Z])/;


Suggested change

const LETTER_BOUNDARY_RE = /(?<![A-Z])/;

// TODO: post v12 lets use look-behind for more accuracy, until then \b should suffice

// const LETTER_BOUNDARY_RE = /(?<![A-Z])/;

const LETTER_BOUNDARY_RE = /\b/;

I was still thinking about that — I unfortunately don't think that \b is good enough. It completely breaks the spaceless gcode snippet I included in the markup test, because \b includes [0-9], meaning a sequence like G1A2 does not match G1 and A2 separately like it should.

It's pretty common for gcode to be generated by software and not written by humans, and to reduce file size and transmission time to the machines, they strip all whitespace.

I wanted to look into \b and an on:begin filter using response.ignoreMatch() if a digit is found before the first letter.

Would that be fine or discouraged?

You'd have to do that per mode or course... we don't use that much because it's SO expensive... but I think if we were willing to remove GCODE from the list of auto-detectable languages with disableAutodetect that on:begin could then be used... since we'd then be certain we were at least doing all that work in the service of highlighting actual GCODE.

And lets still leave a comment somewhere that the better solution is look-behind - for the future when we can use that.

barthy-koeln · 2024-04-24T07:36:01Z

@joshgoebel thank you for your patience.

My latest commit added disableAutodetect: true to the language, as well as the aforementioned on:begin filter.
I added the filter as a variant to full matches with \b.

This means that readable gcode with sane spacing will rarely, if ever, run into the callback filter.

I've tested both implementations with the existing markup test, as well as some 100lines of spaceless gcode.
The results are always within 5% of each other, always in favor of the v12 language with lookbehinds.

Running Benchmark & Results

Shell commands to download v12 and install benny

wget https://raw.githubusercontent.com/barthy-koeln/highlight.js/2c55db96e8a0523fdbdd6d4069c7007f75d5288b/src/languages/gcode.js -O src/languages/gcode-v12.js
npm run build gcode gcode-v12
cd build
npm install benny

# create benchmark script here e.g. bench.mjs

node bench.mjs

Benchmark script

import b from 'benny'
import fs from 'fs'
import hljs from 'highlight.js'

const code = fs.readFileSync(import.meta.dirname + '/../test/markup/gcode/extended.txt').toString('utf-8')

b.suite(
  'gcode bench',

  b.add('v11', () => {
    hljs.highlight(code, { language: 'gcode' })
  }),

  b.add('v12', () => {
    hljs.highlight(code, { language: 'gcode-v12' })
  }),

  b.cycle(),
  b.complete(),
  b.save({ file: 'gcode', version: '1.0.0' }),
  b.save({ file: 'gcode', format: 'chart.html' })
)

Results

joshgoebel · 2024-04-25T02:19:10Z

src/languages/gcode.js

+    const charBeforeMatch = matchdata.input[matchdata.index - 1];
+    if (charBeforeMatch >= '0' && charBeforeMatch <= '9') {
+      return;
+    }
+
+    response.ignoreMatch();


If the regex is ![A-Z] why wouldn't the conditional match? ie:

Suggested change

const charBeforeMatch = matchdata.input[matchdata.index - 1];

if (charBeforeMatch >= '0' && charBeforeMatch <= '9') {

return;

}

response.ignoreMatch();

const charBeforeMatch = matchdata.input[matchdata.index - 1];

if (charBeforeMatch >= 'A' && charBeforeMatch <= 'Z') {

response.ignoreMatch();

}

Do we need to deal with lowercase a-z also?

If the begin with on:begin variant was the only matching logic, we'd have to check for ![a-zA-Z].

But since we have a variant with \b we can be sure that any instance that follows a word boundary ![a-zA-Z0-9_] has been found already. So we only need to additionally find those that follow 0-9 and ignore everything else.

Doing that means we don't have to account for upper/lowercase.

I just noticed the underscore _ isn't handled by that callback, but it should be. I will push another commit (but will wait for feedback on the thing above).

Update: I've committed an additional check for underscores, to align the filter with what \b and ![A-Z] mean.
I've added two lines to the markup test and verified in my v1 branch feat/gcode-v12 that the result is the same.

barthy-koeln mentioned this pull request Apr 14, 2024

fix(gcode): stricter gcode line number matching #4034

Open

2 tasks

barthy-koeln changed the title ~~Feat/gcode rework~~ feat: gcode rework Apr 14, 2024

barthy-koeln force-pushed the feat/gcode_rework branch 2 times, most recently from e416841 to aaeb9f5 Compare April 14, 2024 12:58

barthy-koeln changed the title ~~feat: gcode rework~~ enh(gcode): rewrote language for moden gcode support Apr 14, 2024

barthy-koeln force-pushed the feat/gcode_rework branch from aaeb9f5 to 423b2e1 Compare April 14, 2024 13:00

barthy-koeln marked this pull request as ready for review April 14, 2024 13:00