Skip to content

Commit

Permalink
Refactor all the pod to kwim!!!
Browse files Browse the repository at this point in the history
  • Loading branch information
ingydotnet committed May 16, 2014
1 parent da965ac commit 2666a06
Show file tree
Hide file tree
Showing 19 changed files with 865 additions and 963 deletions.
7 changes: 3 additions & 4 deletions doc/Pegex/API.kwim
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
The Pegex API
=============
= The Pegex API

Pegex can be used in many ways: inside scripts, from the command line or as the
foundation of a modular parsing framework. This document details the various
Expand Down Expand Up @@ -65,7 +64,7 @@ smaller tasks. Here is an example:

$grammar = "
expr: num PLUS num
num: /(<DIGIT>+)/
num: /( DIGIT+)/
";

print Dump pegex($grammar)->parse('2+2');
Expand Down Expand Up @@ -134,7 +133,7 @@ formally.

$grammar_text = "
expr: num PLUS num
num: /(<DIGIT>+)/
num: /( DIGIT+)/
";

$grammar = Pegex::Grammar->new(text => $grammar_text);
Expand Down
53 changes: 53 additions & 0 deletions doc/Pegex/Input.kwim
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Pegex::Input
============

Pegex Parser Input Abstraction

= Synopsis

use Pegex;
use Pegex::Input;
my $ast = pegex($foo_grammar)->parse(Pegex::Input->new(string => $foo_input));

= Description

Pegex::Parser parses input. The input can be a string, a string reference, a
file path, or an open file handle. Pegex::Input is an abstraction over any
type of input. It provides a uniform inteface to the parser.

= Usage

You call new() with two arguments, where the first argument is the input type:

Pegex::Input->new(file => 'file.txt')

The following input types are available:

- string

Input is a string.

- stringref

Input is a string reference. This may be desirable for really long strings.

- file

Input is a file path name to be opened and read.

- handle

Input is from a opened file handle, to be read.

= Author

Ingy döt Net <ingy@cpan.org>

= Copyright and License

Copyright (c) 2011, 2012, 2013, 2014. Ingy döt Net.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html
32 changes: 16 additions & 16 deletions lib/Pegex/Miscellany.pod → doc/Pegex/Miscellany.kwim
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
=encoding utf8
= Miscellany

This document contains things about Pegex that were written but seemed out of
place in their original documents. Still they are possibly useful so live here
for now.

=head1 PEGEX OVERVIEW
= Pegex Overview

In the diagram below, there is a simple language called Foo. The diagram shows
how Pegex can take a text grammar defining Foo and generate a parser that can
Expand All @@ -21,7 +21,7 @@ parse Foo sources into data (abstract syntax trees).
'--------------------' '-----------------------' v
...................... | .------.
| | | compile() | YAML |
|foo:: <verb> <noun> | v '------'
|foo: verb noun | v '------'
|verb: /Hello/ | .--------------------. .------.
|noun: /world/ | | Foo grammar tree | | JSON |
| | '--------------------' '------'
Expand All @@ -45,45 +45,45 @@ parse Foo sources into data (abstract syntax trees).
|- noun: world |
........................

=head1 FYI
= FYI

Pegex is self-hosting. This means that the Pegex grammar language syntax is
defined by a Pegex grammar! This is important because (just like any Pegex
based language) it makes it easier to port to new programming languages. You
can find the Pegex grammar for Pegex grammars here:
L<http://github.com/ingydotnet/pegex-pgx/>.
[http://github.com/ingydotnet/pegex-pgx/].

Pegex was originally inspired by Perl 6 Rules. It also takes ideas from Damian
Conway's Perl 5 module, L<Regexp::Grammars>. Pegex tries to take the best
Conway's Perl 5 module, [Regexp::Grammars]. Pegex tries to take the best
ideas from these great works, and make them work in as many languages as
possible. That's Acmeism.

=head1 SELF COMPILATION TRICKS
= Self Compilation Tricks

You can have some fun using Pegex to compile itself. First get the Pegex
grammar repo:

git clone git://github.com/ingydotnet/pegex-pgx.git
cd pegex-pgx
git clone git://github.com/ingydotnet/pegex-pgx.git
cd pegex-pgx

Then parse and dump the Pegex grammar with Pegex:

perl -MXXX -MPegex -e 'XXX pegex("pegex.pgx")->parse("pegex.pgx")'
perl -MXXX -MPegex -e 'XXX pegex("pegex.pgx")->parse("pegex.pgx")'

For a different view of the data tree, try:

perl -MXXX -MPegex -e 'XXX pegex("pegex.pgx", receiver => "Pegex::Tree")->parse("pegex.pgx")'
perl -MXXX -MPegex -e 'XXX pegex("pegex.pgx", receiver => "Pegex::Tree")->parse("pegex.pgx")'

Finally to emulate the Pegex compiler do this:

perl -MXXX -MPegex -e 'XXX pegex("pegex.pgx", receiver => "Pegex::Pegex::AST")->parse("pegex.pgx")'
perl -MXXX -MPegex -e 'XXX pegex("pegex.pgx", receiver => "Pegex::Pegex::AST")->parse("pegex.pgx")'

This specifies a "receiving" class that can shape the results into something
useful. Indeed, this is the exact guts of L<Pegex::Grammar::Pegex>.
useful. Indeed, this is the exact guts of [Pegex::Grammar::Pegex].

=head1 A REAL WORLD EXAMPLE
= A Real World EXAMPLE

L<TestML> is a new Acmeist unit test language. It is perfect for software that
[TestML] is a new Acmeist unit test language. It is perfect for software that
needs to run equivalently in more than one language. In fact, Pegex itself is
tested with TestML!!

Expand All @@ -108,7 +108,7 @@ becomes this module:
https://github.com/ingydotnet/testml-pm/blob/master/lib/TestML/Grammar.pm

TestML::Parser::Grammar is a subclass of Pegex::Grammar. It can be used to
parse TestML files. TestML::Parser calls the C<parse()> method of the grammar
parse TestML files. TestML::Parser calls the [parse()] method of the grammar
with a TestML::AST object that receives callbacks when various rules match,
and uses the information to build a TestML::Document object.

Expand Down
34 changes: 34 additions & 0 deletions doc/Pegex/Module.kwim
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Pegex::Module
=============

Base Class for Pegex Grammar Interface Modules

= Synopsis

package MyLanguage;
use Pegex::Base;
extends 'Pegex::Module';

has grammar => 'MyLanguage::Grammar';
has receiver => 'MyLanguage::AST';

1;

= Description

The module in the SYNOPSIS above is a complete language parsing module. It just
inherits from [Pegex::Module], and then overrides the `grammar` and `receiver`
properties. [Pegex::Module] provides the `parse()` method.

= Author

Ingy döt Net <ingy@cpan.org>

= Copyright and License

Copyright (c) 2011, 2012, 2013, 2014. Ingy döt Net.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html
47 changes: 16 additions & 31 deletions lib/Pegex/Overview.pod → doc/Pegex/Overview.kwim
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
=encoding utf8

=head1 What is Pegex?
= What is Pegex?

Pegex is a Friendly, Acmeist, PEG Parser framework. Friendly means that it is
simple to create, understand, modify and maintain Pegex parsers. Acmeist means
Expand All @@ -14,7 +12,7 @@ grammars that eventually break down to regex fragments. ie The low level
parsing matches are always done with regexes against the current position in
the input stream.

=head1 What is Parsing?
= What is Parsing?

It may seem like a silly question, but it's important to have an understanding
of what parsing is and what a parser can do for you. At the the most basic
Expand All @@ -27,21 +25,14 @@ be structured. In many parsing methodologies, input is preprocessed (possibly
into tokesn) before the parser/grammar get to look at it. Although this is
a common method, it is not the only approach.

=head1 How Pegex Works
= How Pegex Works

Pegex parsing consists of 4 distinct parts or objects:

=over

=item * Parser - The Pegex parsing engine

=item * Grammar - The rules of a particular syntax

=item * Receiver - The logic for processing matches

=item * Input - Text conforming to the grammar rules

=back
- Parser :: The Pegex parsing engine
- Grammar :: The rules of a particular syntax
- Receiver :: The logic for processing matches
- Input :: Text conforming to the grammar rules

Quite simply, a parser object is created with a grammar object and a receiver
object. Then the parser object's `parse()` method is called on an input
Expand All @@ -55,20 +46,14 @@ The Pegex code to use this might look like this:

In the simplest terms, Pegex works like this (pseudocode):

parser = new Pegex.Parser(
grammar: new Markdown.Grammar
receiver: new Markdown.Receiver.HTML
)
html = parser.parse(markdown)

=head1 See Also

=over

=item L<Pegex::API>

=item L<Pegex::Syntax>
parser = new Pegex.Parser(
grammar: new Markdown.Grammar
receiver: new Markdown.Receiver.HTML
)
html = parser.parse(markdown)

=item L<Pegex::Tutorial>
= See Also

=back
* [Pegex::API]
* [Pegex::Syntax]
* [Pegex::Tutorial]
120 changes: 120 additions & 0 deletions doc/Pegex/Receiver.kwim
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
Pegex::Receiver
===============

Base Class for All Pegex Receivers

= Synopsis

package MyReceiver;
use base 'Pegex::Receiver';

# Handle data for a specific rule
sub got_somerulename {
my ($self, $got) = @_;
# ... process ...
return $result;
}

# Handle data for any other rule
sub gotrule {
my ($self, $got) = @_;
return $result;
}

# Pre-process
sub initial { ... }

# Post-process
sub final {
...;
return $final_result;
}

= Description

In Pegex, a *receiver* is the class object that a *parser* passes captured
data to when a *rule* in a *grammar* matches a part of an *input* stream. A
receiver provides *action methods* to turn parsed data into what the parser is
intended to do.

This is the base class of all Pegex receiver classes.

It doesn't do much of anything, which is the correct thing to do. If you use
this class as your receiver if won't do any extra work. See [Pegex::Tree] for
a receiver base class that will help organize your matches by default.

== How A Receiver Works

A Pegex grammar is made up of *named-rules*, *regexes*, and *groups*. When a
*regex* matches, the parser makes array of its capture strings. When a
*group* matches, the parser makes an array of all the submatch arrays. In this
way a *parse tree* forms.

When a *named-rule* matches, an action method is called in the receiver class.
The method is passed the current *parse tree* and returns what parser will
consider the new parse tree.

This makes for a very elegant and understandable API.

= API

This section documents the methods that you can include in receiver subclass.

- `got_$rulename($got)`

An action method for a specific, named rule.

sub got_rule42 {
my ($self, $got) = @_;
...
return $result;
}

The `$got` value that is passed in is the current value of the parse tree.
What gets returned is whatever you want to new value to be.

- `gotrule($got)`

The action method for a named rule that does not have a specific action
method.

- `initial()`

Called at the beginning of a parse operation, before the parsing begins.

- `final($got)`

Called at the end of a parse operation. Whatever this action returns, will
be the result of the parse.

== Methods

- `parser`

An attribute containing the parser object that is currently running. This
can be very useful to introspect what is happening, and possibly modify the
grammar on the fly. (Experts only!)

- `flatten($array)`

A utility method that can turn an array of arrays into a single array. For
example:

$self->flatten([1, [2, [3, 4], 5], 6]);
# produces [1, 2, 3, 4, 5, 6]

Hashes are left unchanged. The array is modified in place, but is also the
reutrn value.

= Author

Ingy döt Net <ingy@cpan.org>

= Copyright and License

copyright (c) 2010, 2011, 2012, 2013, 2014. Ingy döt Net.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html

0 comments on commit 2666a06

Please sign in to comment.