Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CQL2 (Text): Alpha and Symbols #787

Closed
eseglem opened this issue Feb 13, 2023 · 3 comments
Closed

CQL2 (Text): Alpha and Symbols #787

eseglem opened this issue Feb 13, 2023 · 3 comments

Comments

@eseglem
Copy link
Contributor

eseglem commented Feb 13, 2023

The current definition of alpha has the following:

alpha = "\x0009" | "\x000A" | "\x000D" | "\x0020".."\x0029" |
        "\x0040".."\xD7FF" | "\xE000".."\xFFFD" | "\x10000".."\x10FFFF"

#707 indicates the usage of https://www.w3.org/TR/REC-xml/#charsets but it is missing some characters. I understand having digit split out but it also skips "\x002A".."\x002F" and "\x003A".."\x003F".

Meaning *+,-./:;<=>? are all excluded from being a character / characterLiteral. Most of those are also defined as their own things, but semicolon and question mark are not. And I would understand if all the additional defines symbols were excluded but brackets, parens, caret, percent, and underscore are all included in the ranges in alpha.

This is causing example8.txt to fail in my parser since it has 'HH+VV+HV+VH' in it. As what I assume is meant to be a characterLiteral but it contains + though so it cannot parse as one. (I also mentioned this over in #783)

The other issue I have ran into is that single quote "\x0027" is included in alpha so there seems to be ambiguity in the grammar. Anything with multiple characterLiteral in it can be parsed weirdly. example16.txt has:

swimming_pool=true AND (floors>5 OR
                        material LIKE 'brick%' OR
                        material LIKE '%brick')

Which is parseable as a single characterLiteral even though that is clearly not the actual intent.

'brick%' OR
                        material LIKE '%brick'

I would think alpha may need to be defined slightly differently. Perhaps just do "\x0020".."\x0026" and "\x0028".."\xD7FF" and only skip the single quote:

alpha = "\x0009" | "\x000A" | "\x000D" | "\x0020".."\x0026" |
        "\x0028".."\xD7FF" | "\xE000".."\xFFFD" | "\x10000".."\x10FFFF"

It could also be reasonable to add whiteSpace and symbol definitions in to capture them separately and then have character = alpha | digit | escapedquote | whiteSpace | symbol;. I don't know if it makes sense to do so though since they aren't going to get used elsewhere. And avoiding overlapping ranges doesn't seem necessary since identifier stuff already overlaps stuff.

@cportele
Copy link
Member

Meeting 2023-02-15: @pvretano will review, thanks for raising this.

@pvretano
Copy link
Contributor

@cportele, @eseglem please review #789.
I fixed the alpha production to add the missing chars and remove the single quote.
Rather than adding more productions for whitespace and symbols I decided to remove a lot of the unnecessary single-character productions like colon, leftParent, etc. and simply use the literals in the gramar. I think (hope) it makes the grammar a litter easier to read.

@cportele
Copy link
Member

Closed by #789

Features Part 3: Filtering / Common Query Language (CQL2) automation moved this from In review to Done Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants