lib_mysqludf_preg is a library of mysql UDFs (user-defined-functions) that provide access to the PCRE (perl compatible-regular-expressions) library for pattern matching. The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. This syntax can often handle more complex expressions and capturing than standard regular expression implementations. For more information about PCRE, please see: http://www.pcre.org/.
lib_mysqludf_preg currently provides the following functions:
PREG_RLIKE( pattern , subject ) - test whether subject matches pattern, which is a perl compatible regular expression.
PREG_CAPTURE(pattern, subject [, capture-group] [, occurence] ) - capture a named or numeric parenthesized subexpression from a pcre pattern. Capture from a specific match of the regex or the first match is occurence not specified.
PREG_CHECK( pattern ) - test whether the given pattern is a valid perl compatible regular expression.
PREG_POSITION(pattern, subject [, capture-group] [, occurence] ) - get the position in subject of a named or numeric parenthesized subexpression from a pcre pattern. Capture from a specific match of the regex or the first match if occurence not specified.
PREG_REPLACE(pattern, replacement, subject [ ,limit ] ) - perform a regular expression search and replace using a PCRE pattern.
LIB_MYSQLUDF_PREG_INFO() - obtain information about the currently installed version of lib_mysqludf_preg.
-
SELECT captured, description FROM (SELECT PREG_CAPTURE( '/(new)\\s+([a-zA-Z])(.)/i' , description, 2 ) as captured FROM state WHERE description LIKE 'new%') as t1 WHERE captured IS NOT NULL;
-
SELECT position, description FROM (SELECT PREG_POSITION( '/(new)\\s+([a-zA-Z])(.)/i' , description, 2 ) as position FROM state WHERE description LIKE 'new%') as t1 WHERE position IS NOT NULL;
-
SELECT * from products WHERE PREG_RLIKE( '/hemp/i' , products.title )
-
SELECT CONVERT( PREG_REPLACE( '/fox/i' , 'dog' , 'The brown fox' ) USING UTF8) as replaced;
Please see test/lib_udfmysql_preg.test and test/lib_udfmysql_preg.result for more examples.
Please see doc/html/index.html for more detailed documentation of the SQL functions.
Please see the file INSTALL or (doc/INSTALL.windows) for the full installation instructions.
The short instructions are:
./configure; make install; make installdb ; make test
Download from: http://www.goodhumans.com/Misc/libmysql_udf_preg-LATEST.tar.gz
Please send information regarding bugs and any other feedback to: raw@goodhumans.com
- pcre_study should be used (but isn't) for constant patterns;
- there is no localization or locale support
- there is no multi-byte character set support
- some program locations that should be set in autoconf are not
- It would be nice if there were a persistend cache of compiled regexes
- It would also be nice if there were a peresistent cache of regex matches. This would allow for a more efficient way of retrieving multiple matches than repeated called with different 'occurence' arguments.
These UDF's are useful in the following circumstances: - you already have pcre regex's that need to be applied in mysql - you need to use a more complex regex than is supported by RLIKE - you need to capture portions of a regex from mysql - you are looking for a slight performance improvement over RLIKE
For optimal performance, these (or any) UDF's should not be used:
- as a replacement for a prefixed LIKE or RLIKE (ie. LIKE 'foo%')
- as a replacement for MATCH .. AGAINST ... IN BOOLEAN MODE.
- on large databases without other query constraints. Often the PCRE (or
any function or UDF) can be used in conjunction with a fulltext index
constraint in order to reduce the number of rows the need to be operated on.
(ie. SELECT PREG_CAPTURE ... WHERE MATCH AGAINST)
-The 'occurence' argument to PREG_CAPTURE and PREG_POSITION was originally thought not to be needed, since the {} notation in the regex itself could be used. For instance, /.{2}(.)/ could be used to get the 3rd character of a string. This was found not to work for a large 'occurence'. (ie. /.{65536}(.)/)
Copyright (C) 2007-2009 Rich Waters raw@goodhumans.com.
This file and most contents of this package are licensed under the GNU LESSER GENERAL PUBLIC LICENSE. Please see the COPYING.lesser file in this directory for details.
The amazing PCRE was written by Philip Hazel, and I use some of his code from the php preg extension in from_php.c
The documentation for this project is generated using doxygen which is available at: http://www.stack.nl/~dimitri/doxygen/
I referenced the following projects while trying to put together this library:
http://udf-regexp.php-baustelle.de/trac/ - a UDF that implements Oracle-like REGEX functions - written by Hartmut Holzgraefe
http://www.xcdsql.org/MySQL/UDF/lib_mysqludf_xql/lib_mysqludf_xql.html - A UDF that enables creation of XML - written by Arthur Daniels
Much of the documentation was generated using Doxygen, at http://www.stack.nl/~dimitri/doxygen/ , which was written by Dimitri van Heesch.