-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues when scanning long lines with generic
mode
#6071
Comments
This issue is synced in Linear at https://linear.app/r2c/issue/PA-1870/issues-when-scanning-long-lines-with-generic-mode. Note: this link is for r2c use only and is not accessible publicly. |
requested by user: https://r2c-community.slack.com/archives/C018NJRRCJ0/p1662577736189749 |
Hi guys, is there an estimated fix timeline for this one? |
The issue here is that the target file is just one very long line. Adding a newline character will restore normal functionality in this particular example: https://semgrep.dev/s/2eP5 Unfortunately, the generic mode engine (spacegrep) will not handle files made of only extremely long lines. This is meant to avoid extremely slow parsing and matching, which could happen if it were to process a minified or a binary file. It's not really a question of file size but rather the lack of a tree-like structure determined by indentation. To users, I would recommend adding a linter pass that requires lines to be under some reasonable limit, e.g. 80 or 120. The current heuristic used to determine if a file is source code requires an average line length of under 150 bytes. 150 is a lot given that it's an average, except in the case of one-line files. To rule writers, I would recommend being patient and trying to remember that test cases that are just one long line may run into this problem, for the time being. The changes we could make in the software include:
Item (2) is relatively easy to do. More investigation is needed to evaluate what needs to be done for (1). |
Hi @mjambon |
Hi @mjambon , do you know when this fix will be merged into the cli and the returntocorp/semgrep-agent:v1 image? |
The 500-byte limit will be available in the next semgrep release (0.115.x), next week. This value is not configurable because I was a bit worried of performance (and I was lazy). In retrospect, I think it's a good thing to have a safe default and let users experiment with a value that works for them. |
^ #6162 is the follow-up task. |
@mjambon It appears the implemented fix hasn't worked for this issue. I upgraded to v0.115 and ran the rule again and seen the same misplacement of the autofix code inserted. The original targeted line was 197 chars long. |
Hi @mjambon, can you review the above? Thanks |
Following up on this @mjambon, issue still exists today. |
Describe the bug
when you run a generic rule, this one specifically https://semgrep.dev/s/7PnQ on a one line snippet of code - it is not working
when you add extra line to a snippet, like here: https://semgrep.dev/s/E6zN - it works
same for the CLI
also if you make first line even longer https://semgrep.dev/playground/s/7PQ2 - it is not working again (with 1 extra line)
but if I add one more extra line https://semgrep.dev/s/Le26 - it works again
it seems like this behavior also affects the
autofix
feature since it works incorrectly on such lines (but can be not related ✌️)running https://semgrep.dev/s/7PnQ in the CLI will incorrectly fix last finding
To Reproduce
https://semgrep.dev/s/7PnQ
The text was updated successfully, but these errors were encountered: