postgresql dictionary to exclude on regexps
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
LICENSE
Makefile
README.md
dict_exclude--1.0.sql
dict_exclude--unpackaged--1.0.sql
dict_exclude.c
dict_exclude.control

README.md

Overview

dict_exclude is a postgresql extension which provides a dictionary which can be used to create stop words based on regular expressions.

At the time of writing it appears like a decent solution towards this goal, but it does not use the stop word facility in text search which effectively creates a sorted list of words.

The code is based largely on the excellent dictint example in postgresql/contrib.

Installation & Usage Example

git clone git@github.com:no0p/dict_exclude.git
make
sudo make install

Next create a list of exclusionary rules by adding a file exclude.rules in the postgresql text search resource directory. For example, with Postgresql 9.4 on ubuntu you might add the file /usr/share/postgresql/9.4/tsearch_data/exclude.rules.

The contents of the file are regular expressions, one per line:

abc
def

Then create the appropriate text search configuration in psql.

create text search configuration ocr_gibberish ( COPY = pg_catalog.english );
alter text search configuration ocr_gibberish 
  alter mapping for asciihword, asciiword
    with dict_exclude, english_stem;

Note that dict_exclude must be the first entry in the with clause to work.

Now querying will achieve the desired results:

SELECT to_tsvector('ocr_gibberish', 'fat abc cat def abd');
       to_tsvector       
-------------------------
 'abd':5 'cat':3 'fat':1
(1 row)

Note of Caution

This is a prototype at the moment, so recommended for development environments only.