Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a parser for the metadata CLI #9908

Merged
merged 7 commits into from Apr 24, 2024
Merged

Conversation

jrockway
Copy link
Member

@jrockway jrockway commented Mar 30, 2024

This replaces an inflexible parser that's 1 line of code with a more complicated one that can parse any valid metadata. I changed the "replace" metadata op to replace in the CLI. set is short, but I can never remember it.

The grammar looks like this:

kvs

add and edit take a KV. replace takes a KVs.

Double-quoted strings are the same as Go's, except that \U12345678 syntax isn't supported because the spec doesn't explain how it works and I don't know how it works.

Single-quoted strings are very simple; they can't have any escapes in them.

Our variant of JSON is more liberal than the official JSON spec out of convenience. You can now have vertical linefeeds and \x?? in your JSON. You can also have a trailing comma. Finally, you can probably write {"key":"value""key2":"value2"}.

I have fuzz tested the parser for about 12 hours x 32 cores, because it will panic when the parser selects a path that causes the lexer to not advance. These are mostly caused by @@* ("pick 0 or more of this big subobject") directives; that is why an entirely empty KVs is handled by checking that strings.TrimSpaces() == "". Allowing empty KVs in the grammar is hard to get right without lookahead (which makes error messages bad); lookahead is especially bad for this grammar because how many tokens you need to look ahead depends on the exact content of double-quoted strings (because (escape | double quoted chars)* is the content of the string and different strings have different numbers of tokens). So we avoid lookahead entirely and deal with requring some valid data to parse. The JSON parser is also not the ideal implementation which is why quirks like omitting commas works. Frankly, JSON made its grammar harder to implement than it needed to be to make it more restrictive than it needed to be. With all that in mind, although we allow a bunch of weird stuff, we don't reject or misparse anything valid, so I'm happy.

Copy link

codecov bot commented Mar 30, 2024

Codecov Report

Attention: Patch coverage is 93.44262% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 58.29%. Comparing base (679e781) to head (221c569).

Files Patch % Lines
src/server/metadata/cmds/cmds.go 91.78% 6 Missing ⚠️
src/internal/kvparse/kvparse.go 93.10% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9908      +/-   ##
==========================================
- Coverage   58.31%   58.29%   -0.02%     
==========================================
  Files         612      614       +2     
  Lines       75112    75214     +102     
==========================================
+ Hits        43798    43843      +45     
- Misses      30761    30821      +60     
+ Partials      553      550       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jrockway jrockway requested a review from robert-uhl April 1, 2024 14:51
Copy link
Contributor

@robert-uhl robert-uhl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely love those railroad track diagrams!

Our variant of JSON is more liberal than the official JSON spec out of convenience. You can now have vertical linefeeds and \x?? in your JSON. You can also have a trailing comma.

I am kind of ambivalent about being JSON-like rather than a strict subset or superset of JSON (this notation is not a subset, because some things it allows are invalid JSON; nor is itt a superset, because e.g. numbers are invalid metadata notation).

Finally, you can probably write {"key":"value""key2":"value2"}.

I think that this should be forbidden.

I have fuzz tested the parser for about 12 hours x 32 cores, because it will panic when the parser selects a path that causes the lexer to not advance. These are mostly caused by @@* ("pick 0 or more of this big subobject") directives; that is why an entirely empty KVs is handled by checking that strings.TrimSpaces() == "".

I think that with the correct grammar, the parser should be able to run without that sort of special case; it feels like there’s something overlooked there.

With all that in mind, although we allow a bunch of weird stuff, we don't reject or misparse anything valid, so I'm happy.

I think that allowing weirdness is a sign of a bad grammar. This can be un-weird.

Of course, this is client-side, not server-side, so the risks are a lot less.

src/server/metadata/cmds/cmds.go Show resolved Hide resolved
go.sum Show resolved Hide resolved
src/internal/kvparse/kvgrammar/kvgrammar.go Show resolved Hide resolved
src/internal/kvparse/kvgrammar/kvgrammar.go Show resolved Hide resolved
src/internal/kvparse/kvgrammar/kvgrammar.go Show resolved Hide resolved
@robert-uhl
Copy link
Contributor

One final note: I think that the railroad track diagrams should specify where whitespace is allowed or forbidden.

@acohen4
Copy link
Contributor

acohen4 commented Apr 15, 2024

most of the open feedback right now is focused on the grammar structure and that all seems very design focused to me. The Design included a lot of thought and I think that unless we see major issues, we should move forward with its implementation. If we think that we want to iterate and clarify aspects of the design we can add it to our Backlog to both document and expand the Syntax.

With all that in mind, can we move forward with the review here and get this first version out to users?

cc @robert-uhl @jrockway

Copy link
Contributor

@acohen4 acohen4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jrockway jrockway force-pushed the jonathan/core-2228-parse-metadata branch from 020831a to 08a4c6a Compare April 22, 2024 17:35
@jrockway jrockway force-pushed the jonathan/core-2228-parse-metadata branch from 14aa0ec to 221c569 Compare April 22, 2024 19:32
@robert-uhl robert-uhl requested review from robert-uhl and removed request for robert-uhl April 22, 2024 21:32
@robert-uhl robert-uhl dismissed their stale review April 22, 2024 21:34

‘Better in than out’ per @acohen4

@acohen4 acohen4 merged commit 84eca98 into master Apr 24, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants