Skip to content
Seongmyun Cho edited this page May 18, 2017 · 9 revisions

Linux Kernel PCRE REGEX

This project provides a full-fledged PCRE(Perl Compatible Regular Expression) library kernel module.
The original library is the well known PCRE2(http://www.pcre.org) package.

Besides porting related stuff, additional modification of the original code was needed to make PCRE2 work inside the kernel because the original code uses too much stack while the kernel has small stack. SLJIT code was also ported to maintain PCRE-JIT support inside the kernel. You can make use of this kernel library module, when you need PCRE functions while programming inside the Linux kernel. It provides all the PCRE2 functions along with JIT(Just-in-time) compilation.

Based on the PCRE kernel module, two new text search engines were made which can be used in iptables as well as other places in the kernel. The first one is PCRE text search engine and the second one is REGEX text search engine.

For detailed instructions on how to build and install the kernel modules, refer to the installation guide.

After installing PCRE LKM(libpcre2-x.ko) and PCRE text search engine(ts_pcre.ko), you can filter packets using PCRE syntax like this. ( Netfilter string extension makes use of text search kernel modules. )

iptables -A FORWARD -p tcp --dport 80 -m string \
--string "/\/documents\/.+Host: www.xnsystems.com/si" --algo pcre -j DROP

The regex pattern is give in the format:

/<regex>/options
, where accepted options are [A|E|G|i|m|s|x] as in _Snort_ PCRE rules.

A: PCRE2_ANCHORED
If this option is set, the pattern is forced to be "anchored", that is, it is constrained to match only
at the first matching point in the string that is being searched.

E: PCRE2_DOLLAR_ENDONLY
If this option is set, a dollar metacharacter in the pattern matches only at the end of the subject string.
Without this option, a dollar also matches immediately before a newline at the end of the string
(but not before any other newlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is set.

G: PCRE2_UNGREEDY
This option inverts the "greediness" of the quantifiers so that they are not greedy by default,
but become greedy if followed by "?". 

i: PCRE2_CASELESS
If this option is set, letters in the pattern match both upper and lower case letters in the subject.
It can be changed within a pattern by a (?i) option setting.

m: PCRE2_MULTILINE
By default, for the purposes of matching "start of line" and "end of line", PCRE2 treats the subject string as
consisting of a single line of characters, even if it actually contains newlines. The "start of line" metacharacter
(^) matches only at the start of the string, and the "end of line" metacharacter ($) matches only at the end of the
string, or before a terminating newline (except when PCRE2_DOLLAR_ENDONLY is set).
Note, however, that unless PCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a newline.

s: PCRE2_DOTALL
If this option is set, a dot metacharacter in the pattern matches any character, including one that indicates
a newline. However, it only ever matches one character, even if newlines are coded as CRLF.
Without this option, a dot does not match when the current position in the subject is at a newline.
This option is equivalent to Perl's /s option, and it can be changed within a pattern by a (?s) option setting.
A negative class such as [^a] always matches newline characters, independent of the setting of this option.

x: PCRE2_EXTENDED
If this option is set, most white space characters in the pattern are totally ignored except when escaped or
inside a character class. However, white space is not allowed within sequences such as (?> that introduce various
parenthesized subpatterns, nor within numerical quantifiers such as {1,3}.
Ignorable white space is permitted between an item and a following quantifier and between a quantifier and
a following + that indicates possessiveness.

There is also a POSIX REGEX text search engine.

iptables -A FORWARD -p tcp --dport 80 -m string \
--string "/\/documents\/.+Host: www.xnsystems.com/si" --algo regex -j DROP

The regex pattern is give in the format:

/<regex>/options
, where accepted options are [N|G|f|p|i|m|s|x|1|2|3].

N: REG_NOSUB
Do not report position of matches.

G: REG_UNGREEDY

f: REG_UTF

p: REG_UCP

i: REG_ICASE
Do not differentiate case.

m: REG_NEWLINE
Match-any-character operators don't match a newline.

s: REG_DOTALL

x: REG_EXTENDED
Use POSIX Extended Regular Expression syntax when interpreting regex.
If not set, POSIX Basic Regular Expression syntax is used.

1: REG_NOTBOL
The match-beginning-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above).
This flag may be used when different portions of a string are passed to regexec() and the beginning of the string
should not be interpreted as the beginning of the line.

2: REG_NOTEOL
The match-end-of-line operator always fails to match (but see the compilation flag REG_NEWLINE above).

3: REG_NOTEMPTY

"My colleague and I have been using your kpcre implementation extensively and have been finding it incredibly well-designed and comprehensive. Many thanks for all your efforts."

- Michael, Verisign, Inc.

Clone this wiki locally