Skip to content

Commit

Permalink
The fix for #32 implied an API change, so document it
Browse files Browse the repository at this point in the history
Also follow the API change in Alex's own lexer, so that it bootstraps
again.
  • Loading branch information
simonmar committed Nov 11, 2013
1 parent db682c6 commit 4f74772
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 8 deletions.
19 changes: 17 additions & 2 deletions doc/alex.xml
Expand Up @@ -1105,7 +1105,8 @@ $del = \127 -- ASCII DEL</programlisting>

<programlisting>type AlexInput
alexGetByte :: AlexInput -> Maybe (Word8,AlexInput)
alexInputPrevChar :: AlexInput -> Char</programlisting>
alexInputPrevChar :: AlexInput -> Char
alexIncLen :: Byte -> Int -> Int</programlisting>

<para>The generated lexer is independent of the input type,
which is why you have to provide a definition for the input type
Expand All @@ -1120,7 +1121,21 @@ alexInputPrevChar :: AlexInput -> Char</programlisting>
<literal>alexInputPrevChar</literal> return
<literal>undefined</literal>.</para>

<para>Alex will provide the following function:</para>
<para>The <literal>alexIncLen</literal> function determines how
the token length in the <literal>AlexReturn</literal> type
(below) is calculated. It is called once for each byte consumed
by the lexer, and is passed the byte and the current token
length. It should return the new token length after the byte is
consumed. The two most common choices are to either increment
the length for every byte, giving a token length of the number
of bytes, or to incrememt the token length only for bytes that
represent the beginning of a new character in the UTF-8
encoding, resulting in a token length of the number of
characters. The latter is implemented as follows:</para>

<programlisting>alexIncLen c len = if c &lt; 0x80 || c >= 0xC0 then len + 1 else len</programlisting>

<para>Alex provides the following function:</para>

<programlisting>alexScan :: AlexInput -- The current input
-> Int -- The "start code"
Expand Down
5 changes: 4 additions & 1 deletion src/ParseMonad.hs
Expand Up @@ -7,7 +7,7 @@
-- ----------------------------------------------------------------------------}

module ParseMonad (
AlexInput, alexInputPrevChar, alexGetChar, alexGetByte,
AlexInput, alexInputPrevChar, alexGetChar, alexGetByte, alexIncLen,
AlexPosn(..), alexStartPos,

P, runP, StartCode, failP, lookupSMac, lookupRMac, newSMac, newRMac,
Expand Down Expand Up @@ -48,6 +48,9 @@ alexGetByte (p,_,[],(c:s)) = let p' = alexMove p c
(b:bs) = UTF8.encode c
in p' `seq` Just (b, (p', c, bs, s))

alexIncLen :: Byte -> Int -> Int
alexIncLen c len = if c < 0x80 || c >= 0xC0 then len + 1 else len

-- -----------------------------------------------------------------------------
-- Token positions

Expand Down
2 changes: 1 addition & 1 deletion templates/GenericTemplate.hs
Expand Up @@ -173,7 +173,7 @@ alex_scan_tkn user orig_input len input s last_acc =
ILIT(-1) -> (new_acc, input)
-- on an error, we want to keep the input *before* the
-- character that failed, not after.
_ -> alex_scan_tkn user orig_input (case incrLen c IBOX(len) of { IBOX(len_) -> len_ })
_ -> alex_scan_tkn user orig_input (case alexIncLen c IBOX(len) of { IBOX(len_) -> len_ })
new_input new_s new_acc
}
where
Expand Down
8 changes: 4 additions & 4 deletions templates/wrappers.hs
Expand Up @@ -119,13 +119,13 @@ alexGetByte (AlexInput _ cs)
, ByteString.unsafeTail cs)
#endif

{-# INLINE incrLen #-}
incrLen :: Byte -> Int -> Int
{-# INLINE alexIncLen #-}
alexIncLen :: Byte -> Int -> Int
#if defined(ALEX_BASIC_BYTESTRING) || defined(ALEX_STRICT_BYTESTRING) || defined(ALEX_POSN_BYTESTRING) || defined(ALEX_MONAD_BYTESTRING)
incrLen c len = len + 1
alexIncLen c len = len + 1
-- token length for the ByteString wrappers is the number of bytes
#else
incrLen c len = if c < 0x80 || c >= 0xC0 then len + 1 else len
alexIncLen c len = if c < 0x80 || c >= 0xC0 then len + 1 else len
-- token length for the [Char] wrappers is the number of Chars,
-- hence the length is increased ONLY if this is the 1st byte in a
-- UTF-8 char.
Expand Down

0 comments on commit 4f74772

Please sign in to comment.