New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semgrep Core sets file as "NO FILE INFO YET" when failing to parse C code #1925
Comments
See our other
We should at least return a more helpful error - something that the user can use to fix the problem or take additional action. |
Note that the code is invalid in your example because you use python style comment # xxx on a C file. |
…st_pos This will help semgrep/semgrep#1925 The helper tokenize_all_and_adjust_pos correctly intercept Lexical_error and adjust the file position of the token inside the Lexical_error. When I introduced this helper function, I forgot to use it for the C/C++ parser (not sure why, maybe because the code was also handling ExpandedTok). test plan: $ semgrep -l c -e 'FOO' /tmp/foo.c ran 1 rules on 1 files: 0 findings 1 files could not be analyzed; run with --verbose for details or run with --strict to exit non-zero if any file cannot be analyzed does not generate Python backtrace anymore. Same with $ /home/pad/semgrep/_build/default/cli/Main.exe -dump_ast /tmp/foo.c /tmp/foo.c:3:0: Lexical error: unrecognised symbol, in token rule:# Raised at file "parsing/Parse_code.ml", line 144, characters 24-27 Called from file "parsing/Parse_code.ml", line 236, characters 18-48 Called from file "cli/Main.ml", line 855, characters 6-72 Called from file "pfff/h_program-lang/Error_code.ml", line 388, characters 4-8 no more "NO FILE INFO YET" exn.
…C code Fixes #1925 test plan: $ /home/pad/semgrep/_build/default/cli/Main.exe -dump_ast foo.c foo.c:3:0: Lexical error: unrecognised symbol, in token rule:# Raised at file "parsing/Parse_code.ml", line 144, characters 24-27 Called from file "parsing/Parse_code.ml", line 236, characters 18-48 no more NO_FILE_INFO_YET error (which causes the python wrapper to crash). Also: $ semgrep -l c -e 'FOO' tests/OTHER/parsing_errors/foo.c ran 1 rules on 1 files: 0 findings 1 files could not be analyzed; run with --verbose for details or run with --strict to exit non-zero if any file cannot be analyzed
…C code Fixes #1925 test plan: $ /home/pad/semgrep/_build/default/cli/Main.exe -dump_ast foo.c foo.c:3:0: Lexical error: unrecognised symbol, in token rule:# Raised at file "parsing/Parse_code.ml", line 144, characters 24-27 Called from file "parsing/Parse_code.ml", line 236, characters 18-48 no more NO_FILE_INFO_YET error (which causes the python wrapper to crash). Also: $ semgrep -l c -e 'FOO' tests/OTHER/parsing_errors/foo.c ran 1 rules on 1 files: 0 findings 1 files could not be analyzed; run with --verbose for details or run with --strict to exit non-zero if any file cannot be analyzed
Ok but what? I can put "Bug in lexer, you forgot the call to complete_parse_info somewhere" |
I think the problem is that we're putting # this is invalid code
int x = 1
The
Notice |
Describe the bug
When running on invalid C code semgrep is unable to parse semgrep-core output. It assumes that
the path field in a LexicalError is a path to a file but instead it is the string "NO FILE INFO YET"
To Reproduce
Locally run
semgrep --config s/eryd
on the following target:foo.c
Expected behavior
Should report as a parseerror
The text was updated successfully, but these errors were encountered: