Permalink
Browse files

another solution for buffering dilemma

  • Loading branch information...
1 parent f790e90 commit b3d5417131c3de56f3985de38922df85ccfb8258 @ruz committed Mar 22, 2012
Showing with 31 additions and 2 deletions.
  1. +15 −1 README
  2. +16 −1 lib/MarpaX/Simple/Lexer.pm
View
16 README
@@ -221,10 +221,24 @@ TUTORIAL
...
);
+ Use built in protection from such cases. When a regular expression token
+ matches whole buffer and buffer still can grow then lexer grows buffer
+ and retries. This allows you to write a regular expression that matches
+ till end of token or end of buffer ("$"). Note that this may result in
+ token incomplete match if input ends right in the middle of it.
+
+ tokens => {
+ ...
+ 'text-paragraph' => qr{\w[\w\s]+?(?:\n\n|$)},
+ },
+
Adjust grammar. In most cases you can split long terminal into multiple
terminals with limitted length. For example:
- { lhs => 'text', rhs => 'text-chunk', min => 1 }
+ rules => [
+ ...
+ { lhs => 'text', rhs => 'text-chunk', min => 1 },
+ ],
Filtering input
Input can be filtered with a callback by providing input_filter
View
@@ -243,10 +243,25 @@ Use larger buffer:
...
);
+Use built in protection from such cases. When a regular expression
+token matches whole buffer and buffer still can grow then lexer
+grows buffer and retries. This allows you to write a regular
+expression that matches till end of token or end of buffer (C<$>).
+Note that this may result in token incomplete match if input ends
+right in the middle of it.
+
+ tokens => {
+ ...
+ 'text-paragraph' => qr{\w[\w\s]+?(?:\n\n|$)},
+ },
+
Adjust grammar. In most cases you can split long terminal into
multiple terminals with limitted length. For example:
- { lhs => 'text', rhs => 'text-chunk', min => 1 }
+ rules => [
+ ...
+ { lhs => 'text', rhs => 'text-chunk', min => 1 },
+ ],
=head2 Filtering input

0 comments on commit b3d5417

Please sign in to comment.