The RE Package
re package is a small, portable, lightweight, and quick, regular expression library for Common Lisp. It is a non-recursive, backtracing VM. The syntax is similar to Lua-style pattern patching (found here), but has added support for additional regex features (see below). It's certainly not the fastest, but is very easy to understand and extend.
It makes heavy use of the monadic
parse combinator library for parsing the regular expressions. If you'd like to understand the parsing and compiling of regular expressions, I recommend reading up on that library as well.
To create a
re object, you can either use the
compile-re function or the
#r dispatch macro.
CL-USER > (compile-re "%d+") #<RE "%d+"> CL-USER > #r/%d+/ #<RE "%d+">
Both work equally well, but the dispatch macro will compile the pattern at read-time. The
re class has a load form and so can be saved to a FASL file.
HINT: when using the read macro, use a backslash to escape the
/ and other characters that might mess with syntax coloring.
with-re macro let's you user either strings or
re objects in a body of code. If a string is passed as the pattern, then it will be compiled before the body is evaluated.
CL-USER > (with-re (re "%d+") re) #<RE "%d+">
NOTE: All pattern matching functions use the
with-re macro, and so the pattern argument can be either a string or a pre-compiled
Basic Pattern Matching
The heart of all pattern matching is the
(match-re pattern string &key start end exact)
It will match
pattern and return a
re-match object on success or
nil on failure. The
end arguments limit the scope of the match and default to the entire string. If
t then the pattern has to consume the entire string (from start to end).
CL-USER > (match-re "%d+" "abc 123") NIL CL-USER > (match-re "%a+" "abc 123") #<RE-MATCH "abc">
Once you have successfully matched and have a
re-match object, you can use the following reader functions to inspect it:
match-stringreturns the entire match
match-groupsreturns a list of groups
match-pos-startreturns the index where the match began
match-pos-endreturns the index where the match ended
Try peeking into a match...
CL-USER > (inspect (match-re "(a(b(c)))" "abc 123")) MATCH "abc" GROUPS ("abc" "bc" "c") START-POS 0 END-POS 3
To find a pattern match anywhere in a string use the
(find-re pattern string &key start end all)
It will scan
string looking for matches to
all is non-
nil then a list of all matches found is returned, otherwise it will simply be the first match.
CL-USER > (find-re "%d+" "abc 123") #<RE-MATCH "123"> CL-USER > (find-re "[^%s]+" "abc 123" :all t) (#<RE-MATCH "abc"> #<RE-MATCH "123">)
Splitting by Pattern
Once patterns have been matched, splitting a string from the matches is trivial.
(split-re pattern string &key start end all coalesce-seps)
all is true, then a list of all sub-sequences in
string (delimited by
pattern) are returned, otherwise just the first and the rest of the string.
coalesce-seps is true the sub-sequences that are empty will be excluded from the results. This argument is ignored if
CL-USER > (split-re "," "1,2,3") "1" "2,3" CL-USER > (split-re "," "1,2,,,abc,3,," :all t :coalesce-seps t) ("1" "2" "abc" "3")
Replacing by Pattern
replace-re function scans the string looking for matching sub-sequences that will be replaced with another string.
(replace-re pattern with string &key start end all)
with is a function, then the function is called with the
re-match object, replacing the pattern with the return value. Otherwise the value is used as-is. As with
all is true, then the pattern is globally replaced.
CL-USER > (replace-re "%d+" #\* "1 2 3") "* 2 3" CL-USER > (replace-re "%a+" #'(lambda (m) (length (match-string m))) "a bc def" :all t) "1 2 3"
NOTE: The string returned by
replace-re is a completely new string. This is true even if
pattern isn't found in the string.
Using parenthesis in a pattern will cause the matching text to be groups in the returned
re-match object. The
match-groups function will return a list of all the captured strings in the match.
CL-USER > (match-groups (match-re #r/(%d+)(%a+)/ "123abc")) ("123" "abc")
Captures can be nested, but are always returned in the order they are opened.
CL-USER > (match-groups (match-re #r/(a(b(c)))(d)/ "abcd")) ("abc" "bc" "c" "d")
HINT: you can always use the
match-string function to get at the full text that was matched and there's no need to capture the entire pattern.
with-re-match macro can be used to assist in extracting the matched patterns and groups.
(with-re-match ((var match-expr &key no-match) &body body)
If the result of
no-match is returned and
body is not executed.
While in the body of the macro,
$$ will be bound to the
match-string and the groups will be bound to
$9. Any groups beyond the first 9 are bound in a list to
$_. The symbol
$* is bound to all the match groups.
CL-USER > (with-re-match (m (match-re "(%a+)(%s+)(%d+)" "abc 123")) (string-append $3 $2 $1))) "123 abc" CL-USER > (flet ((initial (m) (with-re-match (v m) (format nil "~@(~a~)." $1)))) (replace-re #r/(%a)%a+%s*/ #'initial "lisp in small pieces" :all t)) "L.I.S.P."
In addition to supporting all of what Lua pattern matching has to offer, it also supports branching with
| and uncaptured groups:
(?..). For example...
CL-USER > (match-re "(?a|b)+" "abbaaabbccc") #<RE-MATCH "abbaaabb">
re package has one special feature: user-defined character set predicates! Using
%:, you can provide a predicate function for the regexp VM to test characters against.
CL-USER > (match-re #r"%:digit-char-p:+" "103") #<RE-MATCH "103">
The predicate must take a single character and return non-nil if the character matches the predicate function. Note: this is especially handy when parsing unicode strings!
If you get some good use out of this package, please let me know; it's nice to know your work is valued by others.
I'm always improving it; it's the foundation for many of the other packages I've created for JSON parsing, XML parsing, HTTP header parsing, etc.
Should you find/fix a bug or add a nice feature, please feel free to send a pull request or let me know at email@example.com.