Permalink
Browse files

ENHANCEMENT: Update support files (documentation, examples) to reflec…

…t recent changes
  • Loading branch information...
1 parent 534d799 commit afb28caf712815da82cd6bb8ece8ac12ab9b2188 Hamish Friedlander committed Feb 9, 2011
View
@@ -16,11 +16,31 @@ and lexing in a single top down grammar. For a basic overview of the subject, se
### Parser Format
-Parsers are contained within a PHP file, in a special comment block that starts with `/*Parser:NameOfParser` and continues until the
-comment is closed. During compilation this block will be replaced with a set of matching functions.
+Parsers are contained within a PHP file, in one or more special comment blocks that start with `/*!* [name | !pragma]` (like a docblock, but with an
+exclamation mark in the middle of the stars)
-Lexically, the parser is a name token, a matching rule and a set of functions. The name token must not start with whitespace, contain no whitespace
-and end with a `:` character. The rule and function set are on the same line or on the indented lines below.
+Lexically, these blocks are a set of rules, each consisting of a name token, a matching rule and a set of attached functions.
+The name token must contain no whitespace and end with a `:` character. The matching rule and functions are on the same line or on the indented lines below.
+
+You can have multiple comment blocks, all of which are treated as contiguous for the purpose of compiling. During compilation these blocks will be replaced
+with a set of "matching" functions (functions which match a string against their rules) for each rule in the block.
+
+The optional name marks the start of a new set of parser rules. This is currently unused, but might be used in future for opimization & debugging purposes.
+If unspecified, it defaults to the same name as the previous parser comment block, or 'Anonymous Parser' if no name has ever been set.
+
+If the name starts with an '!' symbol, that comment block is a pragma, and is treated not as some part of the parser, but as a special block of meta-data
+
+##### Tricks and traps
+
+We allow indenting a parser block, but only in a consistant manner - whatever the indent of the /*** marker becomes the "base" indent, and needs to be used
+for all lines. You can mix tabs and spaces, but the indent must always be an exact match - if the "base" indent is a tab then two spaces, every line within the
+block also needs indenting with a tab then two spaces, not two tabs (even if in your editor, that gives the same indent).
+
+Any line with more than the "base" indent is considered a continuation of the previous rule
+
+Any line with less than the "base" indent is an error
+
+This might get looser if I get around to re-writing the internal "parser parser" in php-peg, bootstrapping the whole thing
### Rules
@@ -58,8 +78,11 @@ Tokens may be
##### Regular expression tokens
-Automatically anchored to the current string start - do not include a string start anchor (`^`) anywhere.
-Can specify flags on stand-alone regexs. Currently doesn't handle flags on regexs with rules.
+Automatically anchored to the current string start - do not include a string start anchor (`^`) anywhere. Always acts as when the 'x' flag is enabled in PHP -
+whitespace is ignored unless escaped, and '#' stats a comment.
+
+Be careful when ending a regular expression token - the '*/' pattern (as in /foo\s*/) will end a PHP comment. Since the 'x' flag is always active,
+just split with a space (as in / foo \s* /)
### Expressions
@@ -183,6 +206,19 @@ You can also specify a rule-attached function called `*`, which will be called w
By default all matches are added to the 'text' property of a result. By prepending a member with `.` that match will not be added to the ['text'] member. This
doesn't affect the other result properties that named rules' add.
+### Pragmas
+
+When opening a parser comment block, if instead of a name (or no name) you put a word starting with '!', that comment block is treated as a pragma - not
+part of the parser language itself, but some other instruction to the compiler. These pragmas are currently understood:
+
+ !silent
+
+ This is a comment that should only appear in the source code. Don't output it in the generated code
+
+ !insert_autogen_warning
+
+ Insert a warning comment into the generated code at this point, warning that the file is autogenerated and not to edit it
+
## TODO
- Allow configuration of whitespace - specify what matches, and wether it should be injected into results as-is, collapsed, or not at all
@@ -4,7 +4,7 @@ require '../Parser.php' ;
class CalculatedLiterals extends Parser {
-/*Parser:CalculatedLiterals
+/*!* CalculatedLiterals
string: ( /\\./ | /[^${parent.q}]/ )*
@@ -4,7 +4,7 @@ require '../Parser.php' ;
class Calculator extends Parser {
-/*Parser:Calculator
+/*!* Calculator
Number: /[0-9]+/
Value: Number > | '(' > Expr > ')' >
@@ -10,7 +10,7 @@ class EqualRepeat extends Packrat {
* aabbacc - bad
*/
-/*Parser:Grammar1
+/*!* Grammar1
A: "a" A? "b"
B: "b" B? "c"
T: !"b"
View
@@ -8,7 +8,7 @@ require '../Parser.php';
*/
class Rfc822 extends Parser {
-/*Parser:Rfc822
+/*!* Rfc822
crlf: /\r\n/
@@ -8,7 +8,7 @@ require 'Rfc822.php';
*/
class Rfc822UTF8 extends Rfc822 {
-/*Parser:Rfc822UTF8
+/*!* Rfc822UTF8
crlf: /\r\n/u

0 comments on commit afb28ca

Please sign in to comment.