Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer bug: e100 mistaken for numeric literal when it should be an identifier #1526

Closed
BH1SCW opened this issue Nov 2, 2017 · 13 comments · Fixed by #2254
Closed

Lexer bug: e100 mistaken for numeric literal when it should be an identifier #1526

BH1SCW opened this issue Nov 2, 2017 · 13 comments · Fixed by #2254
Assignees
Labels
Milestone

Comments

@BH1SCW
Copy link

BH1SCW commented Nov 2, 2017

I found a bug: version: jq-1.5-1-a5b5cbe os: Ubuntu 16.04LTS
testcase1:

{
"test": {
"e100": {
"car9": {
"enabled": 1
}
}
}
}
command: cat test.json| jq '.test.e100.car9'

jq: error: Invalid numeric literal at EOF at line 1, column 5 (while parsing '.e100') at , line 1:
.test.e100.car9
jq: error: syntax error, unexpected LITERAL, expecting $end (Unix shell quoting issues?) at , line 1:
.test.e100.car9
jq: 2 compile errors

testcase2
{
"test": {
"e100a": {
"car9": {
"enabled": 1
}
}
}
}
command: cat test.json| jq '.test.e100a.car9'
it will works.
Please help to fix this, Thanks!

@pkoppstein
Copy link
Contributor

I would agree that this is a bug, but you can easily work around it by writing .["e100a"].

The problem arises because jq sees .e100 as part of a numeric literal, and gets confused. That's at least what the error message indicates:

Invalid numeric literal ... (while parsing '.e100') 

The point is that .e100a looks like a floating point

@BH1SCW
Copy link
Author

BH1SCW commented Nov 2, 2017

but I really can't change e100 to e100a, it's a car type, so is it possible to fix this ?

@pkoppstein
Copy link
Contributor

Sorry, I meant .[“e100”]

@BH1SCW
Copy link
Author

BH1SCW commented Nov 3, 2017

not work by this command:

cat test.json| jq '.test.["e100"].car9'

jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at , line 1:
.test.["e100"].car9
jq: 1 compile error

@pkoppstein
Copy link
Contributor

pkoppstein commented Nov 3, 2017

You really should read the documentation. In the meantime:

$ jq -c '.test | .["e100"] | .car9' test.json
{"enabled":1}

@BH1SCW
Copy link
Author

BH1SCW commented Nov 3, 2017

Shame on me ~
Anyway many thanks to you, Boss~

@jbristow
Copy link

jbristow commented Nov 28, 2017

Also good:

$ jq '.test["e100"].car9' test.json

and

$ jq '.test."e100".car9' test.json

@nicowilliams
Copy link
Contributor

nicowilliams commented Nov 29, 2017

This is the bag regexp in src/lexer.l that is causing this:

[0-9.]+([eE][+-]?[0-9]+)? {
   yylval->literal = jv_parse_sized(yytext, yyleng); return LITERAL;
}

It should be:

[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)? {
   yylval->literal = jv_parse_sized(yytext, yyleng); return LITERAL;
}

@nicowilliams nicowilliams self-assigned this Nov 29, 2017
@nicowilliams
Copy link
Contributor

We probably also need a rule that matches invalid numbers and produces an error. E.g., 12.e5' is not a valid number, but with only the above change jq parses that as an attempt to index the number 12with the key"e5"`, which is clearly not right:

$ ./jq -n '12.e5'
jq: error (at <unknown>): Cannot index number with string "e5"
$ 

@nicowilliams
Copy link
Contributor

Can you try this patch:

diff --git a/src/lexer.l b/src/lexer.l
index 6b9164b..999ce37 100644
--- a/src/lexer.l
+++ b/src/lexer.l
@@ -86,10 +86,12 @@ struct lexer_param;
   yylval->literal = jv_string_sized(yytext + 1, yyleng - 1); return FORMAT;
 }

-[0-9.]+([eE][+-]?[0-9]+)? {
+[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)? {
    yylval->literal = jv_parse_sized(yytext, yyleng); return LITERAL;
 }

+[0-9]+\.([eE][+-]?[0-9]+)? { return BADNUM; }
+
 "\"" {
   yy_push_state(IN_QQSTRING, yyscanner);
   return QQSTRING_START;
diff --git a/src/parser.y b/src/parser.y
index 78782dd..f235a7e 100644
--- a/src/parser.y
+++ b/src/parser.y
@@ -47,6 +47,7 @@ struct lexer_param;


 %token INVALID_CHARACTER
+%token BADNUM
 %token <literal> IDENT
 %token <literal> FIELD
 %token <literal> LITERAL
@@ -709,6 +710,10 @@ Term '[' ':' Exp ']' %prec NONOPT {
 LITERAL {
   $$ = gen_const($1);
 } |
+BADNUM {
+  FAIL(@$, "Invalid numeric literal");
+  $$ = gen_noop();
+} |
 String {
   $$ = $1;
 } |

?

@nicowilliams nicowilliams changed the title Invalid numeric literal at EOF at line 1, column 5 (while parsing '.e100') Lexer bug: e100 mistaken for numeric literal when it should be an identifier Nov 29, 2017
nicowilliams added a commit to nicowilliams/jq that referenced this issue Nov 29, 2017
@TylerJB
Copy link

TylerJB commented Aug 9, 2018

Also found an issue with parse error: Invalid numeric literal at line 1295060, column 909 (this seems like the same error. If not I will open a new case)

This came from a file I was using a .py script to combine some logs. I got the error there with "UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2338: character maps to " which adding encoding="utf8") to my open command allowed for processing.

@dginovker
Copy link

jq is insanely annoying with number interpretation. I don't use jq often, but whenever I do, this trips me up.

@BH1SCW
Copy link
Author

BH1SCW commented Dec 6, 2021

Thanks all, this is awesome after long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants