Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 255 lines (166 sloc) 7.172 kb
5bb3a77 @drbrain Switched to hoe. This includes the bundler and gemspec plugins to ma…
drbrain authored
1 = kpeg
2
3 home :: https://github.com/evanphx/kpeg
4 bugs :: https://github.com/evanphx/kpeg/issues
5
6 == Description
7
8 KPeg is a simple PEG library for Ruby. It provides an API as well as native
9 grammar to build the grammar.
10
11 KPeg strives to provide a simple, powerful API without being too exotic.
12
13 KPeg supports direct left recursion of rules via the
14 {OMeta memoization}[http://www.vpri.org/pdf/tr2008003_experimenting.pdf] trick.
15
16 == Writing your first grammar
17
18 === Setting up your grammar
19
20 All grammars start with with the class/module name that will be your parser
21
22 %% name = Example::Parser
23
24 After that a block of ruby code can be defined that will be added into the
25 class body of your parser. Attributes that are defined in this block can be
26 accessed within your parser as instance variables. Methods can also be defined
27 in this block and used in action blocks as well.
28
29 %% {
30 attr_accessor :something_cool
31
32 def something_awesome
33 # do something awesome
34 end
35 }
36
37 === Defining literals
38
39 Literals are static declarations of characters or regular expressions designed for reuse in the grammar. These can be constants or variables. Literals can take strings, regular expressions or character ranges
40
41 ALPHA = /[A-Za-z]/
42 DIGIT = /[0-9]/
43 period = "."
44 string = "a string"
45 regex = /(regexs?)+/
46 char_range = [b-t]
47
48 Literals can also accept multiple definitions
49
50 vowel = "a" | "e" | "i" | "o" | "u"
51 alpha = /[A-Z]/ | /[a-z]/
52
53 === Defining Rules for Values
54
55 Before you can start parsing a string you will need to define rules that you
56 will use to accept or reject that string. There are many different types of
57 rules available in kpeg
58
59 The most basic of these rules is a string capture
60
61 alpha = < /[A-Za-z]/ > { text }
62
63 While this looks very much like the ALPHA literal defined above it differs in
64 one important way, the text captured by the rule defined between the < and >
65 symbols will be set as the text variable in block that follows. You can also
66 explicitly define the variable that you would like but only with existing
67 rules or literals.
68
69 letter = alpha:a { a }
70
71 Additionally blocks can return true or false values based upon an expression
72 within the block. To return true if a test passes do the following:
73
74 match_greater_than_10 = < num:n > &{ n > 10 }
75
76 To test and return a false value if the test passes do the following:
77
78 do_not_match_greater_than_10 = < num:n > !{ n > 10 }
79
80 Rules can also act like functions and take parameters. An example of this is
81 lifted from the {Email List
82 Validator}[https://github.com/larb/email_address_validator], where an ascii
83 value is passed in and the character is evaluated against it returning a true
84 if it matches
85
86 d(num) = <.> &{ text[0] == num }
87
88 Rules support some regular expression syntax for matching
89
90 * maybe ?
91 * many +
92 * kleene *
93 * groupings ()
94
95 Examples:
96
97 letters = alpha+
98 words = alpha+ space* period?
99 sentence = (letters+ | space+)+
100
101 Kpeg also allows a rule to define the acceptable number of matches in the form
102 of a range. In regular expressions this is often denoted with syntax like
103 {0,3}. Kpeg uses this syntax to accomplish match ranges [min, max].
104
105 matches_3_to_5_times = letter[3,5]
106 matches_3_to_any_times = letter[3,*]
107
108 === Defining Actions
109
110 Illustrated above in some of the examples, kpeg allows you to perform actions
111 based upon a match that are described in block provided or in the rule
112 definition itself.
113
114 num = /[1-9][0-9]*/
115 sum = < num:n1 "+" num:n2 > { n1 + n2 }
116
117 As of version 0.8 an alternate syntax has been added for calling defined
118 methods as actions.
119
120 %% {
121 def add(n1, n2){
122 n1 + n2
123 }
124 }
125 num = /[1-9][0-9]*/
126 sum = < num:n1 "+" num:n2 > ~add(n1, n2)
127
128 === Referencing an external grammar
129
130 Kpeg allows you to run a rule that is defined in an external grammar. This is
131 useful if there is a defined set of rules that you would like to reuse in
132 another parser. To do this, create your grammar and generate a parser using
133 the kpeg command line tool.
134
135 kpeg literals.kpeg
136
137 Once you have the generated parser, include that file into your new grammar
138
139 %{
140 require "literals.kpeg.rb"
141 }
142
143 Then create a variable to hold to foreign interface and pass it the class name
144 of your parser. In this case my parser class name is Literal
145
146 %foreign_grammer = Literal
147
148 You can then use rules defined in the foreign grammar in the local grammar
149 file like so
150
151 sentence = (%foreign_grammer.alpha %foreign_grammer.space*)+
152 %foreign_grammer.period
153
154 === Comments
155
156 Kpeg allows comments to be added to the grammar file by using the # symbol
157
158 # This is a comment in my grammar
159
11b7394 @drbrain Added custom_initialize variable for use with standalone parsers to a…
drbrain authored
160 === Variables
161
162 A variable looks like this:
163
164 %% name = value
165
166 Kpeg allows the following variables that control the output parser:
167
168 name::
169 The class name of the generated parser.
170 custom_initialize::
171 When built as a standalone parser a default initialize method will not be
172 included.
173
174 === Directives
175
176 A directive looks like this:
177
178 %% header {
179 ...
180 }
181
182 Kpeg allows the following directives:
183
184 header::
185 Placed before any generated code
52a8737 @drbrain Added a pre-class directive
drbrain authored
186 pre-class::
187 Placed before the class definition to provide a class comment
f03687f @drbrain Fixed tests that would not fail the first go-around.
drbrain authored
188 footer::
189 Placed after the end of the class (for requiring files dependent upon the
190 parser's namespace
11b7394 @drbrain Added custom_initialize variable for use with standalone parsers to a…
drbrain authored
191
5bb3a77 @drbrain Switched to hoe. This includes the bundler and gemspec plugins to ma…
drbrain authored
192 == Generating and running your parser
193
194 Before you can generate your parser you will need to define a root rule. This
195 will be the first rule run against the string provided to the parser
196
197 root = sentence
198
199 To generate the parser run the kpeg command with the kpeg file(s) as an
200 argument. This will generate a ruby file with the same name as your grammar
201 file.
202
203 kpeg example.kpeg
204
205 Include your generated parser file into an application that you want to use
206 the parser in and run it. Create a new instance of the parser and pass in the
207 string you want to evaluate. When parse is called on the parser instance it
208 will return a true if the sting is matched, or false if it doesn't.
209
210 require "example.kpeg.rb"
211
212 parser = Example::Parser.new(string_to_evaluate)
213 parser.parse
214
215 == Shortcuts and other techniques
216
217 Per vito, you can get the current line or current column in the following way
218
219 line = { current_line }
220 column = { current_column }
221 foo = line:line ... { # use line here }
222
223 == AST Generation
224
225 As of Kpeg 0.8 a parser can now generate an AST. To define an AST node use the
226 following syntax
227
228 %% assign = ast Assignment(name, value)
229
230 Once you have a defined AST node, it can be used in your grammar like so
231
232 assignment = identifier:i space* = space* value:v ~assign(i,v)
233
234 This will create a new Assign node that you can add into your AST.
235
236 For a good example of usage check out Talon[https://github.com/evanphx/talon]
237
238 == Examples
239
240 There are several examples available in the /examples directory. The upper
241 parser has a readme with a step by step description of the grammar.
242
243 == Projects using kpeg
244
245 Dang[https://github.com/veganstraightedge/dang]
246
247 {Email Address Validator}[https://github.com/larb/email_address_validator]
248
249 Callisto[https://github.com/dwaite/Callisto]
250
251 Doodle[https://github.com/vito/doodle]
252
253 Kanbanpad[https://kanbanpad.com] (uses kpeg for parsing of the 'enter
254 something' bar)
Something went wrong with that request. Please try again.