Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 RegexFix doesn't correct capture's text #170

Open
chekoopa opened this issue Jul 17, 2019 · 2 comments
Open

UTF8 RegexFix doesn't correct capture's text #170

chekoopa opened this issue Jul 17, 2019 · 2 comments

Comments

@chekoopa
Copy link

@chekoopa chekoopa commented Jul 17, 2019

convertMatchText @ Text.RE.ZeInternals.Types.Match does perfectly correct captures' offsets and lengths, but capturedText is left intact (at this very line you can see, it's put straight from input), which may provoke more issues with using the library, mostly Text.RE.PCRE.Text.

The workaround is take (captureLength c) $ drop (captureOffset c) $ (captureSource c), but it's kind of lame. Incorporating similar code into RegexFix would make it more transparent but may impact on performance.

@cdornan
Copy link
Contributor

@cdornan cdornan commented Jul 17, 2019

@chekoopa thanks for the clear analysis. I am far too busy to be able to work on this at the moment but will be amenable to carving out some time. The more demand there is the sooner I am likely to get to this so please shout if anybody needs this fixed.

@kt0d
Copy link

@kt0d kt0d commented May 24, 2020

>matchedText $ "żX" ?=~ [re|ż|]
Just "\380X"

It's may be easy to work around if you just want one match, but I originally encountered this problem using (*=~/) (search and replace).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants