Damerau-Levenshtein Distance UDF for MySQL - Supports upper bounding for fast searching and UTF-8 case insensitive throught iconv.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Damerau-Levenshtein Distance UDF for MySQL
Supports upper bounding for fast searching and UTF-8 case
insensitive throught iconv.

Copyright (C) 2013 Diego Torres <diego dot torres at gmail dot com>



Redistribute as you wish, but leave this information intact.


    Run make.    It will try to autodetect architecture and generate 32 or x86_64 bit
    version. Enable testing with "make debug". An executable instead of
    a library will be built. Execute it to run some tests regarding
    memory allocation and levenshtein core calculations.

    Errors you may get when compiling:

In file included from ./src/damlevlim.c:16:0:
./src/damlevlim.h:26:23: fatal error: my_global.h: No such file or directory
#include <my_global.h>
compilation terminated.
Makefile:30: recipe for target 'mysqldamlevlim' failed
make: *** [mysqldamlevlim] Error 1

    Development libraries from mysql are missing.
    Do "sudo apt-get install libmysqlclient-dev"

/usr/include/gnu/stubs.h:7:27: fatal error: gnu/stubs-32.h: No such file or directory

    You're missing the 32 bit libc dev package. This is answered here:


    - On Ubuntu it's called libc6-dev-i386 - do "sudo apt-get install"
    libc6-dev-i386. See below for extra instructions for Ubuntu 12.04.
    - On Red Hat distros, the package name is glibc-devel.i686 (Thanks to
    David Gardner's comment)
    - On CentOS 5.8, the package name is glibc-devel.i386 (Thanks to JimKleck's comment)
    - On CentOS 6.3, the package name is glibc-devel.i686.
    - On SLES it's called glibc-devel-32bit - do zypper in glibc-devel-32bit

Loading UDFs in MySQL STEP 1

    Copy compiled udf to plugin dir. To know where exactly, ask MySQL:

    mysql> SHOW VARIABLES LIKE 'plugin_dir';

    and use result directory.

    To modify plugin_dir to point to a custom location:

    my.cnf: plugin_dir = /usr/lib/mysql/plugin

    Don't forget to restart MySQL!!

Loading UDFs in MySQL STEP 2

    Bind the library to an UDF inside MySQL (only do it once,
    survives MySQL reboot). Depending your MySQL binaries (32
    or 64 bits), insert the right library.

    mysql> DROP FUNCTION IF EXISTS damlevlim;

    mysql> CREATE FUNCTION damlevlim RETURNS INT SONAME 'mysqldamlevlim.so';

    Check if UDF was loaded OK.

    mysql> SELECT * FROM mysql.func;
    | name      | ret | dl                | type     |
    | damlevlim |   2 | mysqldamlevlim.so | function |
    1 row in set (0.00 sec)

Using damlevlim

    mysql> select damlevlim();
    ERROR: DAMLEVLIM() requires three arguments (string, string, int)

    First string is compared against second string. Compare only the first
    int chars (upper bound).

    mysql> select damlevlim('coche', 'çóche', 5);
    | damlevlim('coche', 'çòche', 5) |
    |                              0 |
    1 row in set (0.00 sec)

Further reading about MySql, UTF8 and charset encoding


How to search for multibyte characters in a column

    Use MySql CHAR_LENGTH function to find rows with multi-byte characters:

    mysql> SELECT name FROM users WHERE LENGTH(name) != CHAR_LENGTH(name);

How to search for strange multibyte characters in a column

    (don't remember source of original tip)
    Use MySql HEX and PHP bin2hex functions

    mysql> SELECT name, HEX(name) FROM users;

    Get the result back into PHP, and run a bin2hex on the string, compare
    it to MySQL’s hex of that same string.