Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing matches in RE alternative cause UTF-8 decode error #171

Open
kquick opened this issue Oct 20, 2019 · 2 comments
Open

missing matches in RE alternative cause UTF-8 decode error #171

kquick opened this issue Oct 20, 2019 · 2 comments
Labels

Comments

@kquick
Copy link

@kquick kquick commented Oct 20, 2019

If a match target appears in an alternative an error is thrown:

$ ghci
Prelude> importText.RE.PCRE.String
Prelude PCRE.String> r = [re|foo(A${here}(.*)B|C${there}(.*)D)|]
Prelude PCRE.String> allMatches ("foobar" *=~ r)
[]
Prelude PCRE.String> allMatches ("fooAoneB" *=~ r)
[ Match {matchSource = "fooAoneB", .... *** Exception: utf8_correct_bs: UTF-8 decoding error
CallStack (from HasCallStack):
  error, called at ./Text/RE/ZeInternals/Types/Match.lhs:248:13 in regex-1.0.2.0-CuYMcTBVvnH4p7K8LCU2iN:Text.RE.ZeInternals.Types.Match
Prelude PCRE.String> allMatches ("fooCtwoD" *=~ r)
[ Match {matchSource = "fooCtwoD", ... [same error]

This seems to be related to the branch where the match is not found:

PCRE.String> r = [re|foo(A${here}(.*)B|CD)|]
PCRE.String> allMatches ("foobar" *=~ r)
[]
PCRE.String> allMatches ("fooAbarB" *=~ r)
... valid match, no error ...
PCRE.String> allMatches ("fooCD" *=~ r)
... error as above...

It's possible this is an invalid usage on my part, but I would expect a different type of error than a UTF-8 decoding error. Additionally, I originally had the same match name on both alternatives and got the same error, so I should have had a valid match regardless of which alternative matched.

regex version 1.0.2.0

@mnn
Copy link

@mnn mnn commented Oct 10, 2020

The bug is still present:

> import Text.RE.PCRE.Text

Text.RE.PCRE.Text> urlRegex = [re|^https?:\/\/.+\/(\w+)(?:\.(\w+))?(?:[\?|#].*)?$|]

Text.RE.PCRE.Text> "a" ?=~ urlRegex
Match {matchSource = "a", captureNames = fromList [], matchArray = array (CaptureOrdinal {getCaptureOrdinal = 1},CaptureOrdinal {getCaptureOrdinal = 0}) []}

Text.RE.PCRE.Text> "https://a/b/c/d" ?=~ urlRegex
Match {matchSource = "https://a/b/c/d", captureNames = fromList [], matchArray = array (CaptureOrdinal {getCaptureOrdinal = 0},CaptureOrdinal {getCaptureOrdinal = 2}) [(CaptureOrdinal {getCaptureOrdinal = 0},Capture {captureSource = "https://a/b/c/d", capturedText = "https://a/b/c/d", captureOffset = 0, captureLength = 15}),(CaptureOrdinal {getCaptureOrdinal = 1},Capture {captureSource = "https://a/b/c/d", capturedText = "d", captureOffset = 14, captureLength = 1}),(CaptureOrdinal {getCaptureOrdinal = 2},*** Exception: utf8_correct_bs: UTF-8 decoding error
CallStack (from HasCallStack):
  error, called at ./Text/RE/ZeInternals/Types/Match.lhs:248:13 in regex-1.1.0.0-3PHJg3TXXTf4CPb8VxPErs:Text.RE.ZeInternals.Types.Match

package versions:

$ stack ls dependencies | grep regex
regex 1.1.0.0
regex-base 0.94.0.0
regex-pcre-builtin 0.95.1.2.8.43
regex-tdfa 1.3.1.0
regex-with-pcre 1.1.0.0

@jbash
Copy link

@jbash jbash commented Oct 23, 2021

The same thing seems to be triggered if a group is optional and isn't present in the tested string. In the program below, the optional trailing zero isn't there, so I would expect to get a Nothing from captureTextMaybe.

#!/usr/bin/env stack
{- stack
   script
   --resolver lts-18.13
-}
{-# LANGUAGE QuasiQuotes #-}

import Text.RE.PCRE.String ((?=~), cp, re)
import Text.RE.Replace (captureTextMaybe)

main =
  mapM_ putStrLn $ captureTextMaybe [cp|1|] ("foo" ?=~ [re|^[a-z]+(0)?$|])

The actual result is

$ ./retest
retest: utf8_correct_bs: UTF-8 decoding error
CallStack (from HasCallStack):
  error, called at ./Text/RE/ZeInternals/Types/Match.lhs:248:13 in regex-1.1.0.0-FyuON3BA52j97jnO9rbQpX:Text.RE.ZeInternals.Types.Match

This is still in 1.1.0.0 as supplied by Stackage LTS 18.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants