-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Can't match unicode snowman literals in strings #4336
Comments
might be a unicode issue. |
@mjambon can you have a look at it? You're the unicode expert ... |
@pad here's what I got:
with the following rule file
The problem is when using a rule file, |
For background, see #2111. The hack that was implemented to work around bad locations replaces each non-ascii byte by a
This match happens because both 😀 and 🚀 and parsed as |
A proper fix would involve eliminating the hack that replaces non-ascii bytes by Zs, and then should do whatever is necessary to report proper locations. This would be a bit of work and needs to be done right. Alternatively, we could extend the Unicode hack to work with the |
imo we should hack a solution for -fast. Can we just exclude |
@emjin great idea. I was worried about the cost of editing each target but editing the pattern should be fine. |
(filter irrelevant rules) optimization. Fixes #4336
* Work around non-ascii byte substitution which was breaking the -fast (filter irrelevant rules) optimization. Fixes #4336 * Split big test "full rule" into one test case per file pair * Explain expectations for unicode matching * Update changelog * typo Co-authored-by: Emma Jin <emjin@users.noreply.github.com> Co-authored-by: Emma Jin <emjin@users.noreply.github.com>
* Work around non-ascii byte substitution which was breaking the -fast (filter irrelevant rules) optimization. Fixes #4336 * Split big test "full rule" into one test case per file pair * Explain expectations for unicode matching * Update changelog * Add tests for unicode hack * Use correct version of pfff
Describe the bug
I'm trying to match a string, in python, which contains a literal snowman character (
☃
). I can't find a way to do so.To Reproduce
https://semgrep.dev/s/craigds:unicode-snowman
Expected behavior
I expected to just be able to use the string itself
It didn't match. Neither did any of the other things I tried, e.g.:
"Test \x{FE0F}"
"Test \u2603"
What is the priority of the bug to you?
Environment
The text was updated successfully, but these errors were encountered: