Skip to content

Strict Text wrapper unnecessarily decodes to Char for text-2.* #280

@sol

Description

@sol

Looking at the code:

type AlexInput = (Char, -- previous char
[Byte], -- pending bytes on current char
Data.Text.Text) -- current input string

This can be replaced by:

type AlexInput = (Int,            -- current offset
                  Data.Text.Text) -- input string

alex/data/AlexWrappers.hs

Lines 98 to 105 in e65958c

alexGetByte :: AlexInput -> Maybe (Byte,AlexInput)
alexGetByte (c,(b:bs),s) = Just (b,(c,bs,s))
alexGetByte (_,[],s) = case Data.Text.uncons s of
Just (c, cs) ->
case utf8Encode' c of
(b, bs) -> Just (b, (c, bs, cs))
Nothing ->
Nothing

This can be replaced by:

alexGetByte :: AlexInput -> Maybe (Byte,AlexInput)
alexGetByte (cur, input) = case input of
  Text arr off len
    | cur < len = Just (unsafeIndex arr (off + curr) (cur + 1, input))
    | otherwise = Nothing

The only thing "complicated" is alexInputPrevChar, basically you go back from cur one byte at a time until you have seen two character boundaries and then do a character decode at that position.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions