Add a parser for the metadata CLI #9908

jrockway · 2024-03-30T06:05:01Z

This replaces an inflexible parser that's 1 line of code with a more complicated one that can parse any valid metadata. I changed the "replace" metadata op to replace in the CLI. set is short, but I can never remember it.

The grammar looks like this:

add and edit take a KV. replace takes a KVs.

Double-quoted strings are the same as Go's, except that \U12345678 syntax isn't supported because the spec doesn't explain how it works and I don't know how it works.

Single-quoted strings are very simple; they can't have any escapes in them.

Our variant of JSON is more liberal than the official JSON spec out of convenience. You can now have vertical linefeeds and \x?? in your JSON. You can also have a trailing comma. Finally, you can probably write {"key":"value""key2":"value2"}.

I have fuzz tested the parser for about 12 hours x 32 cores, because it will panic when the parser selects a path that causes the lexer to not advance. These are mostly caused by @@* ("pick 0 or more of this big subobject") directives; that is why an entirely empty KVs is handled by checking that strings.TrimSpaces() == "". Allowing empty KVs in the grammar is hard to get right without lookahead (which makes error messages bad); lookahead is especially bad for this grammar because how many tokens you need to look ahead depends on the exact content of double-quoted strings (because (escape | double quoted chars)* is the content of the string and different strings have different numbers of tokens). So we avoid lookahead entirely and deal with requring some valid data to parse. The JSON parser is also not the ideal implementation which is why quirks like omitting commas works. Frankly, JSON made its grammar harder to implement than it needed to be to make it more restrictive than it needed to be. With all that in mind, although we allow a bunch of weird stuff, we don't reject or misparse anything valid, so I'm happy.

codecov · 2024-03-30T06:32:23Z

Codecov Report

Attention: Patch coverage is 93.44262% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 58.29%. Comparing base (679e781) to head (221c569).

Files	Patch %	Lines
src/server/metadata/cmds/cmds.go	91.78%	6 Missing ⚠️
src/internal/kvparse/kvparse.go	93.10%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9908      +/-   ##
==========================================
- Coverage   58.31%   58.29%   -0.02%     
==========================================
  Files         612      614       +2     
  Lines       75112    75214     +102     
==========================================
+ Hits        43798    43843      +45     
- Misses      30761    30821      +60     
+ Partials      553      550       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

robert-uhl

I absolutely love those railroad track diagrams!

Our variant of JSON is more liberal than the official JSON spec out of convenience. You can now have vertical linefeeds and \x?? in your JSON. You can also have a trailing comma.

I am kind of ambivalent about being JSON-like rather than a strict subset or superset of JSON (this notation is not a subset, because some things it allows are invalid JSON; nor is itt a superset, because e.g. numbers are invalid metadata notation).

Finally, you can probably write {"key":"value""key2":"value2"}.

I think that this should be forbidden.

I have fuzz tested the parser for about 12 hours x 32 cores, because it will panic when the parser selects a path that causes the lexer to not advance. These are mostly caused by @@* ("pick 0 or more of this big subobject") directives; that is why an entirely empty KVs is handled by checking that strings.TrimSpaces() == "".

I think that with the correct grammar, the parser should be able to run without that sort of special case; it feels like there’s something overlooked there.

With all that in mind, although we allow a bunch of weird stuff, we don't reject or misparse anything valid, so I'm happy.

I think that allowing weirdness is a sign of a bad grammar. This can be un-weird.

Of course, this is client-side, not server-side, so the risks are a lot less.

src/server/metadata/cmds/cmds.go

go.sum

src/internal/kvparse/kvgrammar/kvgrammar.go

robert-uhl · 2024-04-01T16:22:47Z

One final note: I think that the railroad track diagrams should specify where whitespace is allowed or forbidden.

acohen4 · 2024-04-15T18:38:47Z

most of the open feedback right now is focused on the grammar structure and that all seems very design focused to me. The Design included a lot of thought and I think that unless we see major issues, we should move forward with its implementation. If we think that we want to iterate and clarify aspects of the design we can add it to our Backlog to both document and expand the Syntax.

With all that in mind, can we move forward with the review here and get this first version out to users?

cc @robert-uhl @jrockway

acohen4

LGTM

… this helps) also document

@acohen4

‘Better in than out’ per @acohen4

jrockway requested a review from robert-uhl April 1, 2024 14:51

robert-uhl previously requested changes Apr 1, 2024

View reviewed changes

acohen4 approved these changes Apr 22, 2024

View reviewed changes

jrockway force-pushed the jonathan/core-2228-parse-metadata branch from 020831a to 08a4c6a Compare April 22, 2024 17:35

jrockway added 7 commits April 22, 2024 19:31

basic grammar

bca6692

move metadata cli to new parser

06725a4

disallow escaped single quote in double quoted string

dbca444

remove dead code

bf020ec

rename "set" command to "replace" (i can never remember set vs. edit;…

3f1b78c

… this helps) also document

avoid panic in Unquote

1230eb4

fix cmds_test after merge

221c569

jrockway force-pushed the jonathan/core-2228-parse-metadata branch from 14aa0ec to 221c569 Compare April 22, 2024 19:32

robert-uhl requested review from robert-uhl and removed request for robert-uhl April 22, 2024 21:32

acohen4 merged commit 84eca98 into master Apr 24, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a parser for the metadata CLI #9908

Add a parser for the metadata CLI #9908

jrockway commented Mar 30, 2024 •

edited

codecov bot commented Mar 30, 2024 •

edited

robert-uhl left a comment

robert-uhl commented Apr 1, 2024

acohen4 commented Apr 15, 2024 •

edited

acohen4 left a comment

Add a parser for the metadata CLI #9908

Add a parser for the metadata CLI #9908

Conversation

jrockway commented Mar 30, 2024 • edited

codecov bot commented Mar 30, 2024 • edited

Codecov Report

robert-uhl left a comment

Choose a reason for hiding this comment

robert-uhl commented Apr 1, 2024

acohen4 commented Apr 15, 2024 • edited

acohen4 left a comment

Choose a reason for hiding this comment

jrockway commented Mar 30, 2024 •

edited

codecov bot commented Mar 30, 2024 •

edited

acohen4 commented Apr 15, 2024 •

edited