Skip to content

Commit

Permalink
Merge pull request #1 from chlu/patch-1
Browse files Browse the repository at this point in the history
Updated README.md for better code formatting
  • Loading branch information
skurfuerst committed Jan 24, 2012
2 parents 3d0bb35 + 68f794d commit 6c4da56
Showing 1 changed file with 83 additions and 82 deletions.
165 changes: 83 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@ and lexing in a single top down grammar. For a basic overview of the subject, se
## Quick start

- Write a parser. A parser is a PHP class with a grammar contained within it in a special syntax. The filetype is .peg.inc. See the examples directory.
- Compile the parser. php ./cli.php ExampleParser.peg.inc > ExampleParser.php
- Compile the parser: `php ./cli.php ExampleParser.peg.inc > ExampleParser.php`
- Use the parser (you can also include code to do this in the input parser - again see the examples directory):

<pre><code>
$x = new ExampleParser( 'string to parse' ) ;
$res = $x->match_Expr() ;
</code></pre>

```
$x = new ExampleParser('string to parse');
$res = $x->match_Expr();
```

### Parser Format

Expand All @@ -33,8 +34,8 @@ Lexically, these blocks are a set of rules & comments. A rule can be a base rule

Base rules consist of a name for the rule, some optional arguments, the matching rule itself, and an optional set of attached functions

NAME ( "(" ARGUMENT, ... ")" )? ":" MATCHING_RULE
ATTACHED_FUNCTIONS?
NAME ( "(" ARGUMENT, ... ")" )? ":" MATCHING_RULE
ATTACHED_FUNCTIONS?

Names must be the characters a-z, A-Z, 0-9 and _ only, and must not start with a number

Expand All @@ -45,11 +46,11 @@ Base rules can be split over multiple lines as long as subsequent lines are inde
Extension rules are either the same as a base rule but with an addition name of the rule to extend, or as a replacing extension consist of
a name for the rule, the name of the rule to extend, and optionally: some arguments, some replacements, and a set of attached functions

NAME extend BASENAME ( "(" ARGUMENT, ... ")" )? ":" MATCHING_RULE
ATTACHED_FUNCTIONS?
NAME extend BASENAME ( "(" ARGUMENT, ... ")" )? ":" MATCHING_RULE
ATTACHED_FUNCTIONS?

NAME extends BASENAME ( "(" ARGUMENT, ... ")" )? ( ";" REPLACE "=>" REPLACE_WITH, ... )?
ATTACHED_FUNCTIONS?
NAME extends BASENAME ( "(" ARGUMENT, ... ")" )? ( ";" REPLACE "=>" REPLACE_WITH, ... )?
ATTACHED_FUNCTIONS?

##### Tricks and traps

Expand All @@ -67,26 +68,22 @@ This might get looser if I get around to re-writing the internal "parser parser"

PEG matching rules try to follow standard PEG format, summarised thusly:

<pre><code>
token* - Token is optionally repeated
token+ - Token is repeated at least one
token? - Token is optionally present
token* - Token is optionally repeated
token+ - Token is repeated at least one
token? - Token is optionally present

tokena tokenb - Token tokenb follows tokena, both of which are present
tokena | tokenb - One of tokena or tokenb are present, prefering tokena
tokena tokenb - Token tokenb follows tokena, both of which are present
tokena | tokenb - One of tokena or tokenb are present, prefering tokena

&token - Token is present next (but not consumed by parse)
!token - Token is not present next (but not consumed by parse)
&token - Token is present next (but not consumed by parse)
!token - Token is not present next (but not consumed by parse)

( expression ) - Grouping for priority
</code></pre>
( expression ) - Grouping for priority

But with these extensions:

<pre><code>
< or > - Optionally match whitespace
[ or ] - Require some whitespace
</code></pre>
< or > - Optionally match whitespace
[ or ] - Require some whitespace

### Tokens

Expand Down Expand Up @@ -115,12 +112,12 @@ match against a calculated value, or simply specify the expression as a token to
When getting a value to use for an expression, the parser will travel up the stack looking for a set value. The expression
stack is a list of all the rules passed through to get to this point. For example, given the parser

<pre><code>
A: $a
B: A
C: B
</code></pre>
```
A: $a
B: A
C: B
```

The expression stack for finding $a will be C, B, A - in other words, the A rule will be checked first, followed by B, followed by C

#### In terminals (literals and regexes)
Expand Down Expand Up @@ -151,10 +148,10 @@ the rule will fail to match.

Be careful against using a token expression when you meant to use a terminal expression

<pre><code>
quoted_good: q:/['"]/ string "$q"
quoted_bad: q:/['"]/ string $q
</code></pre>
```
quoted_good: q:/['"]/ string "$q"
quoted_bad: q:/['"]/ string $q
```

`"$q"` matches against the value of q again. `$q` tries to match against a rule named `"` or `'` (both of which are illegal rule
names, and will therefore fail)
Expand All @@ -163,22 +160,22 @@ names, and will therefore fail)

Tokens and groups can be given names by prepending name and `:`, e.g.,

<pre><code>
rulea: "'" name:( tokena tokenb )* "'"
</code></pre>
```
rulea: "'" name:( tokena tokenb )* "'"
```

There must be no space betweeen the name and the `:`

<pre><code>
badrule: "'" name : ( tokena tokenb )* "'"
</code></pre>
```
badrule: "'" name : ( tokena tokenb )* "'"
```

Recursive matchers can be given a name the same as their rule name by prepending with just a `:`. These next two rules are equivilent

<pre><code>
rulea: tokena tokenb:tokenb
rulea: tokena :tokenb
</code></pre>
```
rulea: tokena tokenb:tokenb
rulea: tokena :tokenb
```

### Rule-attached functions

Expand All @@ -192,31 +189,33 @@ All functions that are not in-grammar must have PHP compatible names (see PHP n

All these definitions define the same rule-attached function

<pre><code>
class A extends Parser {
/*!* Parser
foo: bar baz
function bar() {}
*/
```php
<?php
class A extends Parser {
/*!* Parser
foo: bar baz
function bar() {}
*/

function foo_bar() {}
}
function foo_bar() {}
}

class B extends A {
function foo_bar() {}
}
</code></pre>
class B extends A {
function foo_bar() {}
}
?>
```

### PHP name mapping

Rules in the grammar map to php functions named `match_{$rulename}`. However rule names can contain characters that php functions can't.
These characters are remapped:

<pre><code>
'-' => '_'
'$' => 'DLR'
'*' => 'STR'
</code></pre>
```
'-' => '_'
'$' => 'DLR'
'*' => 'STR'
```

Other dis-allowed characters are removed.

Expand All @@ -234,17 +233,17 @@ and the sub-match - in this case the default storage action will not occur.

If you specify a rule-attached function for a recursive match, you do not need to name that token at all - it will be call automatically. E.g.

<pre><code>
rulea: tokena tokenb
function tokenb ( &$res, $sub ) { print 'Will be called, even though tokenb is not named or marked with a :' ; }
</code></pre>
```
rulea: tokena tokenb
function tokenb ( &$res, $sub ) { print 'Will be called, even though tokenb is not named or marked with a :' ; }
```

You can also specify a rule-attached function called `*`, which will be called with every recursive match made

<pre><code>
rulea: tokena tokenb
function * ( &$res, $sub ) { print 'Will be called for both tokena and tokenb' ; }
</code></pre>
```
rulea: tokena tokenb
function * ( &$res, $sub ) { print 'Will be called for both tokena and tokenb' ; }
```

### Silent matches

Expand All @@ -257,13 +256,13 @@ Rules can inherit off other rules using the keyword extends. There are several w
they all share a common feature - when building a result set the rule will also check the inherited-from rule's rule-attached
functions for storage handlers. This lets you do something like

<pre><code>
```
A: Foo Bar Baz
function *(){ /* Generic store handler */ }
B extends A
function Bar(){ /* Custom handling for Bar - Foo and Baz will still fall through to the A#* function defined above */ }
</code></pre>
```

The actual matching rule can be specified in three ways:

Expand All @@ -277,33 +276,33 @@ override some storage logic but not the rule itself
You can replace some parts of the inherited rule using test replacement by using a ';' instead of an ':' after the name
of the extended rule. You can then put replacements in a comma seperated list. An example might help

<pre><code>
```
A: Foo | Bar | Baz
# Makes B the equivalent of Foo | Bar | (Baz | Qux)
B extends A: Baz => (Baz | Qux)
</code></pre>
```

Note that the replacements are not quoted. The exception is when you want to replace with the empty string, e.g.

<pre><code>
```
A: Foo | Bar | Baz
# Makes B the equivalent of Foo | Bar
B extends A: | Baz => ""
</code></pre>
```

Currently there is no escaping supported - if you want to replace "," or "=>" characters you'll have to use full replacement

#### Full replacement

You can specify an entirely new rule in the same format as a non-inheriting rule, eg.

<pre><code>
```
A: Foo | Bar | Baz
B extends A: Foo | Bar | (Baz Qux)
</code></pre>
```

This is useful is the rule changes too much for text replacement to be readable, but want to keep the storage logic

Expand All @@ -312,13 +311,15 @@ This is useful is the rule changes too much for text replacement to be readable,
When opening a parser comment block, if instead of a name (or no name) you put a word starting with '!', that comment block is treated as a pragma - not
part of the parser language itself, but some other instruction to the compiler. These pragmas are currently understood:

!silent
```
!silent
This is a comment that should only appear in the source code. Don't output it in the generated code
This is a comment that should only appear in the source code. Don't output it in the generated code
!insert_autogen_warning
!insert_autogen_warning
Insert a warning comment into the generated code at this point, warning that the file is autogenerated and not to edit it
Insert a warning comment into the generated code at this point, warning that the file is autogenerated and not to edit it
```

## TODO

Expand Down

0 comments on commit 6c4da56

Please sign in to comment.