Skip to content

Commit

Permalink
Flesh out documentation.
Browse files Browse the repository at this point in the history
index=false now disables indexing.
Switch from (false, "Match failed") to (nil, "Match failed").
  • Loading branch information
silentbicycle committed Nov 1, 2010
1 parent 7edcc90 commit cda23e0
Show file tree
Hide file tree
Showing 3 changed files with 229 additions and 65 deletions.
240 changes: 191 additions & 49 deletions README.md
@@ -1,60 +1,202 @@
Tamale - a TAble MAtching Lua Extension
# Tamale - a TAble MAtching Lua Extension

This is a Lua library to add basic structural pattern matching (as in
ML, Prolog, Erlang, Haskell, etc.).
## Overview

In Lua terms: it takes a table of rules and returns a matcher function.
That function tests its input against each rule (via a rule iterator),
and the first rule that has a non-false result is the overall match
result. If it results in a function, that function is called with any
captures from the input.
Tamale is a [Lua][] library for structural pattern matching - kind of like regular expressions for *arbitrary data structures*, not just strings. (Or [Sinatra][] for data structures, rather than URLs.)

**Basic usage:**
[Lua]: http://lua.org
[Sinatra]: http://www.sinatrarb.com

`tamale.matcher` reads a *rule table* and produces a *matcher function*. The table should list `{pattern, result}` rules, which are structurally compared in order against the input. The matcher returns the result for the first successful rule, or `(nil, "Match failed")` if none match.

### Basic Usage

require "tamale"
local V, P = tamale.var, tamale.P --for marking variables & string patterns
local function is_number(t) return type(t.X) == "number" end
local function handle_pair(t) return { t.X, t.Y } end

-- this builds a match-and-dispatch function from the rule table
local m = tamale.matcher {
{ V"X", 1, when=is_number },
{ "y", 2 },
{ P"num (%d+)",
function(cs) return tonumber(cs[1]) end },
{ { "num", V"X" },
function(cs) return cs.X end },
{ { "pair", { V"Tag", V"X" }, {V"Tag", V"Y" } },
handle_pair },
{ { "swap", V"X", V"Y"}, { V"Y", V"X" } },
{ { V"_", V"_", V"X" }, { V"X"} },
-- debug=true -- uncomment to show progress
local V = tamale.var
local M = tamale.matcher {
{ {"foo", 1, {} }, "one" },
{ 10, function() return "two" end},
{ {"bar", 10, 100}, "three" },
{ {"baz", V"X" }, V"X" }, -- V"X" is a variable
{ {"add", V"X", V"Y"}, function(cs) return cs.X + cs.Y end },
}

print(M({"foo", 1, {}})) --> "one"
print(M(10)) --> "two"
print(M({"bar", 10, 100})) --> "three"
print(M({"baz", "four"})) --> "four"
print(M({"add", 2, 3}) --> 5
print(M({"sub", 2, 3}) --> nil, "Match failed"

The result can be either a literal value (number, string, etc.), a
variable, a table, or a function. Functions are called with a table containing the original input and captures (if any); its result is returned. Variables in the result (standalone or in tables) are
replaced with their captures.


### Benefits of Pattern Matching

+ Declarative (AKA "data-driven") programming is easy to locally reason about, maintain, and debug.
+ Structures do not need to be manually unpacked - pattern variables automatically capture the value from their position in the input.
+ "It fits or it doesn't fit" - the contract that code is expected to follow is very clear.
+ Rule tables can be compiled down to search trees, which are potentially more efficient than long, nested if / switch statements. (Tamale currently does not do this, but could in the future without any change to its interface. Also, see Indexing below.)

Imperative code to rebalance red-black trees can get pretty hairy. With pattern matching, the list of transformations *is* the code.

-- create red & black tags and local pattern variables
local R,B,a,x,b,y,c,z,d = "R", "B", V"a", V"x", V"b", V"y", V"c", V"z", V"d"
local balanced = { R, { B, a, x, b }, y, { B, c, z, d } }
balance = tamale.matcher {
{ {B, {R, {R, a, x, b}, y, c}, z, d}, balanced },
{ {B, {R, a, x, {R, b, y, c,}}, z, d}, balanced },
{ {B, a, x, {R, {R, b, y, c,}, z, d}}, balanced },
{ {B, a, x, {R, b, y, {R, c, z, d}}}, balanced },
{ V"body", V"body" }, -- default case, keep the same
}
m(23) --> 1
m("y") --> 2
m("num 55") --> 55
m({"num", 55}) --> 55
m({"swap", 10, 20}) --> {20, 10}

-- using the same variable names means the tags must match
m({"pair", {"x", 1}, {"x", 2}})
--> calls handle_pair({X=1, Y=2}), which returns {1, 2}

-- variables starting with "_" are ignored
m({1, 2, 3}) --> {3}

Code structured as a series of rules (declarative programming) is easy
to reason about and maintain. Instead of writing a tangled series of
nested if / else if / else statements by hand, they can automatically be
generated from a table like this, and various implementation tricks
(such as indexing) can make the matching more efficient than linear
search.

For further usage examples, see the test suite. Also, since this style
of programming comes from more declarative languages, it may help to
study them direcly.

(Adapted from Chris Okasaki's _Purely Functional Data Structures_.)

The style of pattern matching used in Tamale is closest to [Erlang](http://erlang.org)'s. Since pattern-matching comes from declarative languages, it may help to study them directly.

Particularly recommended:

* _The Art of Prolog_ by Leon Sterling & Ehud Shapiro
* _Programming Erlang_ by Joe Armstrong


## Rules

Each rule has the form `{ *pattern*, *result*, [when=function] }`.

The pattern can be a literal value, table, or function. For tables, every field is checked against every field in the input (and those
fields may in turn contain literals, variables, tables, or functions).

Functions are called on the input's corresponding field. If the function's first result is non-false, the field is considered a match, and all results are appended to the capture table. (See below) If the function returns false or nil, the match was a failure.

`tamale.P` marks strings as patterns that should be compared with string.match (possibly returning captures), rather than as a string literal. Use it like `{ P"aaa(.*)bbb", result}`.

Its entire implementation is just

function P(str)
return function(v)
if type(v) == "string" then return string.match(v, str) end
end
end


Rules also have two optional keyword arguments:

### Extra Restrictions - `when=function(captures)`

This is used to add further restrictions to a rule, such as a rule that can only take strings *which are also valid e-mail addresses*. (The function is passed the captures table.)

-- is_valid(cs) checks cs[1]
{ P"(.*)", register_address, when=is_valid }


### Partial patterns - `partial=true`

This flag allows a table pattern to match an table input value which has *more fields that are listed in the pattern*.

{ {tag="leaf"}, some_fun, partial=true }

could match against *any* table that has the value t.tag == "leaf", regardless of any other fields.


## Variables and Captures

The patterns specified in Tamale rules can have variables, which capture the contents of that position in the input. To create a Tamale variable, use `tamale.var('x')` (which can potentially aliased as `V'x'`, if you're into the whole brevity thing).

Variable names can be any string, though any beginning with _ are ignored during matching (i.e., `{V"_", V"_", V"X", V"_" }` will capture the third value from any four-value array). Variable names are not required to be uppercase, it's just a convention from Prolog and Erlang.

Also, note that declaring local variables for frequently used Tamale variables can make rule tables cleaner. Compare

local X, Y, Z = V"X", V"Y", V"Z"
M = tamale.matcher {
{ {X, X}, 1}, -- capitalization helps to keep
{ {X, Y}, 2}, -- the Tamale vars distinct from
{ {X, Y, Z}, 3}, -- the Lua vars
}

with

M = tamale.matcher {
{ {V'X', V'X'}, 1},
{ {V'X', V'Y'}, 2},
{ {V'X', V'Y', V'Z'}, 3},
}

The _ example above could be reduced to `{_, _, X, _}`.

Finally, when the same variable appears in multiple fields in a rule pattern, such as { X, Y, X }, each repeated field must structurally match its other occurrances. `{X, Y, X}` would match `{6, 1, 6}`, but not `{5, 1, 7}`.


## The Rule Table

The function `tamale.matcher` takes a rule table and returns a matcher function. The matcher function takes one or more arguments; the first is matched against the rule table, and any further arguments are saved in captures.args.

The rule table also takes a couple other options, which are described below.


## Identifiers - `ids={List, Of, IDs}`

Tamale defaults to structural comparison of tables, but sometimes tables are used as identifiers, e.g. `SENTINEL = {}`. The rule table can have an optional argument of `ids={LIST, OF, IDS}`, for values that should still be compared by `==` rather than structure. (Otherwise, *all* such IDs would match each other, and any empty table.)


## Indexing - `index=field`

Indexing in Tamale is like indexing in relational databases - Rather than testing every single rule to find a match, only those in the index need to be tested. Often, this singlehandedly eliminates most of the rules. By default, the rules are indexed by the first value.

When the rule table

tamale.matcher {
{ {1, "a"}, 1 },
{ {1, "b"}, 2 },
{ {1, "c"}, 3 },
{ {2, "d"}, 4 },
}

is matched against {2, "d"}, it only needs one test if the rule table is indexed by the first field - the fourth rule is the only one starting with 2. To specify a different index than `pattern[1]`, give the rule table a keyword argument of `index=I`, where I is either another key (such as 2 or "tag"), or a function. If a function is used, each rule will be indexed by the result of applying the function to it.

For example, with the rule table

tamale.matcher {
{ {"a", "b", 1}, 1 }, -- index "ab"
{ {"a", "c", 1}, 2 }, -- index "ac"
{ {"b", "a", 1}, 3 }, -- index "ba"
{ {"b", "c", 1}, 4 }, -- index "bc"
index=function(rule) return rule[1] .. rule[2] end
}

each rule will be indexed based on the first two fields concatenated, rather than just the first. An input value of {"a", "c", 1} would only
need to check the second row, not the first.

Indexing should never change the *results* of pattern matching, just make the matcher function do less searching. Note that an indexing function needs to be deterministic - indexing by (say) `os.time()` will produce weird results. An argument of `index=false` turns indexing off.


## Debugging - `debug=true`

Tamale has several debugging traces. They can be enabled either by spetting `tamale.DEBUG` to true, or adding `debug=true` as a keyword argument to a rule table.

Matching `{ "a", "c", 1 }` against

tamale.matcher {
{ {"a", "b", 1}, 1 },
{ {"a", "c", 1}, 2 },
{ {"b", "a", 1}, 3 },
{ {"b", "c", 1}, 4 },
index=function(rule) return rule[1] .. rule[2] end,
debug = true
}

will print

* rule 1: indexing on index(t)=ab
* rule 2: indexing on index(t)=ac
* rule 3: indexing on index(t)=ba
* rule 4: indexing on index(t)=bc
-- Checking rules: 2
-- Trying rule 2...matched
2

This can be used to check whether indexing is effective, if one rule is pre-empting another, etc.
24 changes: 13 additions & 11 deletions tamale.lua
Expand Up @@ -74,7 +74,7 @@ end
---Default hook for match failure.
-- @param val The unmatched value.
function match_fail(val)
return false, "Match failed", val
return nil, "Match failed", val
end
Expand Down Expand Up @@ -206,6 +206,8 @@ local function index_spec(spec)
-- field/value to index by, defaults to t[1].
local ispec, indexer = spec.index or 1
if type(ispec) == "function" then indexer = ispec
elseif ispec == "false" then
indexer = function() end --put everything in the same index
else
indexer = function(t) return t[ispec] end
end
Expand All @@ -215,37 +217,37 @@ local function index_spec(spec)
local pat, res = row[1], row[2]
local pt = type(pat)
if not indexable(pat) then --could match anything
if debug then trace(" * row %d: not indexable, adding to all", id) end
if debug then trace(" * rule %d: not indexable, adding to all", id) end
lni[#lni+1] = id; tni[#tni+1] = id --for those that don't yet exist
for _,l in ipairs{ls, ts} do --and append to those that do
for k in pairs(l) do append(l, k, id) end
end
elseif pt == "table" then
local v = indexer(pat) or NIL
if not indexable(v) then --goes in every index
if debug then trace(" * row %d: index(table) is not indexable", id) end
if debug then trace(" * rule %d: index(table) is not indexable", id) end
for k in pairs(ts) do append(ts, k, id) end
tni[#tni+1] = id
else
if debug then trace(" * row %d: indexing on index(t)=%s",
if debug then trace(" * rule %d: indexing on index(t)=%s",
id, tostring(v)) end
append(ts, v, id)
end
for i,v in ipairs(pat) do --check for special V"..." var
if is_var(v) and v.rest then
if debug then trace(" * row %d: V'...' found in field %d",
if debug then trace(" * rule %d: V'...' found in field %d",
id, i) end
row.partial = true; row.rest = i; break
end
end
else
if debug then trace(" * row %d: indexing on %s",
if debug then trace(" * rule %d: indexing on %s",
id, tostring(pat)) end
append(ls, pat, id)
end
if has_vars(res) then
if debug then trace(" * row %d: found var(s) in result", id) end
if debug then trace(" * rule %d: found var(s) in result", id) end
vrs[id] = true
end
end
Expand Down Expand Up @@ -300,7 +302,7 @@ function matcher(spec)
function (t, ...)
local rows = check_index(spec, t, idx)
if debug then
trace(" -- Checking rows: %s", concat(rows, ", "))
trace(" -- Checking rules: %s", concat(rows, ", "))
end
for _,id in ipairs(rows) do
Expand All @@ -311,14 +313,14 @@ function matcher(spec)
local u = unify(pat, t, { args=args }, ids, row)
if debug then
trace("-- Trying row %d...%s", id, u and "matched" or "failed")
trace(" -- Trying rule %d...%s", id, u and "matched" or "failed")
end
if u then
u.input = t --whole matched value
if when then
local ok, val = pcall(when, u)
if debug then trace("-- Running when(captures) check...%s",
if debug then trace(" -- Running when(captures) check...%s",
(ok and val) and "matched" or "failed")
end
if ok and val then
Expand All @@ -329,7 +331,7 @@ function matcher(spec)
end
end
end
if debug then trace("-- Failed") end
if debug then trace(" -- Failed") end
local fail = spec.fail or match_fail
return fail(t)
end
Expand Down

0 comments on commit cda23e0

Please sign in to comment.