New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify strings made of repeated characters as having low entropy #4833
Conversation
strings directly (but keep the unescaping hack).
@kurt-r2c if you find a link to the original issue about entropy analysis, we should add it to
|
mmm, a test fails with generic mode when binding a metavariable to something that looks like an int, which ends up being represented as an (Metavariable.E
{ e = (L (Int ((Some 123456789012), ()))); e_id = 0; e_range = None }) Trying to figure out how this works. |
the value if it's not a string literal.
The new behavior is to unquote the captured string value if possible. If it's not a string, use the original source code fragment. |
🔥 Potential speedup in benchmark semgrep.bench.coinbase.std: -23.6% (-3.137 s) 14 benchmarks, 3.8% faster on average. Individual deviations greater than 20% from the baseline are reported. An individual performance degradation of over 30% or a global degradation of over 7% is an error and will block the pull request. See run output for full results ('Show all checks' > 'Tests / semgrep benchmark tests' 'Details'). |
Getting what looks like a network error (twice in a row). The URL being reported ( https://github.com/returntocorp/semgrep/runs/5610188578?check_suite_focus=true
|
@mjambon we were tracking this internally via Linear: RULES-446. There's no associated GitHub issue. The statement as-is should be fine. |
This should work but I want to add tests to make sure it does. The issue is that the semgrep matcher returns quoted strings e.g.
"xxxxxxxxxxxx"
and the entropy analysis is supposed to be smart and eliminate the quotes. This is necessary for the string to be recognized as a repeated character.Edit: added tests and fixed implementation accordingly.
PR checklist: