Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Grammar readme seems incorrect #7720

Open
thekevinscott opened this issue Jun 3, 2024 · 2 comments
Open

Bug: Grammar readme seems incorrect #7720

thekevinscott opened this issue Jun 3, 2024 · 2 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)

Comments

@thekevinscott
Copy link

thekevinscott commented Jun 3, 2024

What happened?

This bit on the grammar readme states:

Non-terminal symbols (rule names) ... are required to be a dashed lowercase word, like move, castle, or check-mate.

However, the c.gbnf defines a dataType rule (which features a non-lower-case letter) and this grammar appears to be valid.

I'm not sure what the intended behavior is. I would be happy to update the README if uppercase variables are supported.

Name and Version

N/A

What operating system are you seeing the problem on?

No response

Relevant log output

No response

@thekevinscott thekevinscott added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Jun 3, 2024
@HanClinto HanClinto added bug Something isn't working documentation Improvements or additions to documentation and removed bug-unconfirmed labels Jun 6, 2024
@HanClinto
Copy link
Collaborator

@thekevinscott Thank you for the report!

I just wanted to write and confirm that your report is indeed correct.

Looking through the code, the set of valid identifiers is much larger, and includes anything in the realm of a-zA-Z0-9, and "-". This means that we can have identifiers that are all numbers, pure dashes, identifiers that have the same name but are different cases ("upper-lower-case" and "UPPER-LOWER-CASE" are distinct from each other), etc.

I built an integration test that compiles and runs correctly -- this is a valid grammar (according to the engine) that seems to stretch the rules, and certainly violate the documentation (as you noted):

root ::= simple-identifier
simple-identifier ::= ---dashmaster--- | mIxEd-CaSe | UPPER-LOWER-CASE | upper-lower-case | 12345
---dashmaster--- ::= [-_]+
mIxEd-CaSe ::= [a-z][A-Z][a-z][A-Z]
UPPER-LOWER-CASE ::= [A-Z][A-Z]
upper-lower-case ::= [a-z][a-z]
12345 ::= "67890"

@ochafik / @ggerganov -- feels like a minor issue, so I feel a little bad pinging y'all on this one, but I also struggle with direction-level decisions on my own. Do either of you have opinions? Should we:

  • Tighten up the rules re: identifiers (this might break existing grammars that people have been relying on -- I.E. the example from c.gbnf)
  • Or loosen the documentation to note the (greatly) increased flexibility of identifier naming (compared with the current restrictions from the docs) -- such as permitting numbers and upper-case letters?
  • Or some mix of the above...?

@thekevinscott
Copy link
Author

thekevinscott commented Jun 17, 2024

As far as I can tell, that line of documentation was authored by @ejones - maybe they have some insight into the original intent?

My initial instinct would be to tighten the rules, in case the parser needs to be refactored in the future. In the examples you provided ---dashmaster--- and 12345 seem like particularly egregious rule names that could conflict. (Also, presumably --- is a valid rule too? That seems nasty.)

On the other hand, if it doesn't cause any parsing issues with the current state, maybe it's worth leaving as is and just updating the documentation, particularly since your integration test above now explicitly tests for these edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches)
Projects
None yet
Development

No branches or pull requests

2 participants