Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
237 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
== Background | ||
|
||
|
||
In order to be able to use Citrus effectively, you must first understand the | ||
difference between syntax and semantics. Syntax is a set of rules that govern | ||
the way letters and punctuation may be used in a language. For example, English | ||
syntax dictates that proper nouns should start with a capital letter and that | ||
sentences should end with a period. | ||
|
||
Semantics are the rules by which meaning may be derived in a language. For | ||
example, as you read a book you are able to make some sense of the particular | ||
way in which words on a page are combined to form thoughts and express ideas | ||
because you understand what the words themselves mean and you can understand | ||
what they mean collectively. | ||
|
||
Computers use a similar process when interpreting code. First, the code must be | ||
parsed into recognizable symbols or tokens. These tokens may then be passed to | ||
an interpreter which is responsible for forming actual instructions from them. | ||
|
||
Citrus is a pure Ruby library that allows you to perform both lexical analysis | ||
and semantic interpretation on an input string quickly and easily. Using Citrus | ||
you can write powerful parsers that are simple to understand and easy to create. | ||
|
||
In Citrus, there are three main types of objects: rules, grammars, and matches. | ||
|
||
=== Rules | ||
|
||
Rules are objects that specify some matching behavior on a string. There are | ||
two types of rules: terminals and non-terminals. Terminals can be either Ruby | ||
strings or regular expressions that specify some input to match. For example, | ||
a terminal object created from the string "end" would match any sequence of the | ||
characters "e", "n", and "d", in that order. A terminal object created from a | ||
regular expression uses Ruby's regular expression engine to attempt to create | ||
a match on the input. | ||
|
||
Non-terminals are rules that may contain other rules but do not themselves match | ||
directly on the input. For example, a Repeat is a non-terminal that may contain | ||
one other rule that will try and match a certain number of times. Several other | ||
types of non-terminals exist that will be discussed later. | ||
|
||
Rule objects may also have semantic information associated with them in the form | ||
of Ruby modules. These modules contain methods that will be used to extend any | ||
matches created by the rule with which they are associated. | ||
|
||
=== Grammars | ||
|
||
A grammar is a container for rules. Usually the rules in a grammar collectively | ||
form a complete specification for some language, or a well-defined subset | ||
thereof. A Citrus grammar is really just a souped-up Ruby module. These modules | ||
may be included in other grammar modules in the same way that Ruby modules are | ||
normally used to create more complex grammars. Any grammar rule with the same | ||
name as a rule in an included grammar may access that rule with a mechanism | ||
similar to Ruby's +super+ keyword. | ||
|
||
=== Matches | ||
|
||
Matches are created by Rule objects when they match on the input. Matches | ||
contain the string of text that made up the match as well as its offset in the | ||
original input string. During a parse, matches are arranged in a tree structure | ||
where any match may contain any number of other matches. This structure is | ||
determined by the way in which the rule that generated each match is used in the | ||
grammar. | ||
|
||
For example, a match that is created from a non-terminal rule that contains | ||
several other terminals will likewise contain several matches, one for each | ||
terminal. | ||
|
||
Match objects may be extended with semantic information in the form of methods. | ||
These methods can interpret the text of a match using the wealth of information | ||
available to them including the text of the match, its position in the input, | ||
and any submatches. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
Citrus is a compact and powerful parsing library for Ruby that combines the | ||
elegance and expressiveness of the language with the simplicity and power of | ||
parsing expressions. | ||
|
||
|
||
== Installation | ||
|
||
|
||
Via RubyGems: | ||
|
||
$ sudo gem install citrus | ||
|
||
From a local copy: | ||
|
||
$ git clone git://github.com/mjijackson/citrus.git | ||
$ cd citrus | ||
$ rake package && sudo rake install |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,4 @@ | ||
License | ||
------- | ||
== License | ||
|
||
Copyright 2010 Michael Jackson | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters