Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ure: Incorrect match offsets with non-ASCII strings #9202

Closed
jepler opened this issue Sep 3, 2022 · 2 comments
Closed

ure: Incorrect match offsets with non-ASCII strings #9202

jepler opened this issue Sep 3, 2022 · 2 comments

Comments

@jepler
Copy link
Sponsor Contributor

jepler commented Sep 3, 2022

Code:

import re

text1 = "MicroPython loves regular expressions"
text2 = "MicroPython \u2764 regular expressions"

regex1 = re.compile("re")

for text in (text1, text2):
    match = regex1.search(text)
    print(text)
    print("Match:", text[match.start():match.end()])
    print()

Result on standard Python:

$ python3.9 kevin.py 
MicroPython loves regular expressions
Match: re

MicroPython ❤ regular expressions
Match: re

Result on MicroPython v1.19.1-358-g0b26efe73-dirty on 2022-09-03; linux [GCC 10.2.1] version:

./build-coverage/micropython  code.py 
MicroPython loves regular expressions
Match: re

MicroPython ❤ regular expressions
Match: gu

Originally reported at adafruit#6860

My guess (not really an analysis) is that re is failing to convert byte offsets to code-point offsets when it is searching a utf-8 string instead of a bytes object.

@jepler jepler added the bug label Sep 3, 2022
jepler referenced this issue in jepler/circuitpython Sep 5, 2022
jepler referenced this issue in jepler/circuitpython Sep 5, 2022
.. and add a test.  Closes: adafruit#9202.

Signed-off-by: Jeff Epler <jepler@gmail.com>
jepler referenced this issue in jepler/circuitpython Sep 5, 2022
.. and add a test.  Closes: adafruit#9202.

Signed-off-by: Jeff Epler <jepler@gmail.com>
@dpgeorge dpgeorge added the extmod label Sep 6, 2022
dpgeorge pushed a commit that referenced this issue Sep 6, 2022
And add a test.

Fixes issue #9202.

Signed-off-by: Jeff Epler <jepler@gmail.com>
@dpgeorge
Copy link
Member

dpgeorge commented Sep 6, 2022

Fixed by e90b85c

@dpgeorge dpgeorge closed this as completed Sep 6, 2022
@jepler
Copy link
Sponsor Contributor Author

jepler commented Sep 6, 2022

Thanks!

karfas pushed a commit to karfas/micropython that referenced this issue Apr 23, 2023
And add a test.

Fixes issue micropython#9202.

Signed-off-by: Jeff Epler <jepler@gmail.com>
alphonse82 pushed a commit to alphonse82/micropython-wch-ch32v307 that referenced this issue May 8, 2023
And add a test.

Fixes issue micropython#9202.

Signed-off-by: Jeff Epler <jepler@gmail.com>
tannewt pushed a commit to tannewt/circuitpython that referenced this issue Apr 25, 2024
…tmap_font

add adafruit_circuitpython_bitmap_fonts to frozen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants