Skip to content

m4rcinkowski/sphinxsearch-wordforms-pl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sphinxsearch-wordforms-pl

Polish wordforms file (dictionary) for use with Sphinx Search, as it lacks official polish stemmer. Provided text file has already been stripped of duplicate wordforms.

Usage

Link to the dictionary file in an index definition in your Sphinx config (typically /etc/sphinxsearch/sphinx.conf).

wordforms = /path/to/pl_PL.UTF-8.txt

Apart from applying wordforms file into your Sphinx config, you must make sure that polish letters are considered valid keywords part.

Polish charset_table from official Sphinx wiki:

charset_table = 0..9, A..Z->a..z, a..z, U+0143->U+0144, U+0104->U+0105, U+0106->U+0107, U+0118->U+0119, U+0141->U+0142, U+00D3->U+00F3, U+015A->U+015B, U+0179->U+017A, U+017B->U+017C, U+0105, U+0107, U+0119, U+0142, U+00F3, U+015B, U+017A, U+017C, U+0144

About

Polish stemming-like dictionary for use with Sphinx Search.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages