Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
ICU Analysis plugin for Elasticsearch
Java Python

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
src
.gitignore
README.md
pom.xml

README.md

ICU Analysis for ElasticSearch

The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components.

In order to install the plugin, simply run: bin/plugin -install elasticsearch/elasticsearch-analysis-icu/1.5.0.

----------------------------------------
| ICU Analysis Plugin | ElasticSearch  |
----------------------------------------
| master              | 0.19 -> master |
----------------------------------------
| 1.5.0               | 0.19 -> master |
----------------------------------------
| 1.4.0               | 0.19 -> master |
----------------------------------------
| 1.3.0               | 0.19 -> master |
----------------------------------------
| 1.2.0               | 0.19 -> master |
----------------------------------------
| 1.1.0               | 0.18           |
----------------------------------------
| 1.0.0               | 0.18           |
----------------------------------------

ICU Normalization

Normalizes characters as explained "here":http://userguide.icu-project.org/transforms/normalization. It registers itself by default under @icu_normalizer@ or @icuNormalizer@ using the default settings. Allows for the name parameter to be provided which can include the following values: @nfc@, @nfkc@, and @nfkc_cf@. Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_normalizer"]
                }
            }
        }
    }
}

ICU Folding

Folding of unicode characters based on @UTR#30@. It registers itself under @icu_folding@ and @icuFolding@ names. Sample setting:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_folding"]
                }
            }
        }
    }
}

ICU Collation

Uses collation token filter. Allows to either specify the rules for collation (defined "here":http://www.icu-project.org/userguide/Collate_Customization.html) using the @rules@ parameter (can point to a location or expressed in the settings, location can be relative to config location), or using the @language@ parameter (further specialized by country and variant). By default registers under @icu_collation@ or @icuCollation@ and uses the default locale.

Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_collation"]
                }
            }
        }
    }
}

And here is a sample of custom collation:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["myCollator"]
                }
            },
            "filter" : {
                "myCollator" : {
                    "type" : "icu_collation",
                    "language" : "en"
                }
            }
        }
    }
}

ICU Tokenizer

Breaks text into words according to UAX #29: Unicode Text Segmentation ((http://www.unicode.org/reports/tr29/)).

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "icu_tokenizer",
                }
            }
        }
    }
}
Something went wrong with that request. Please try again.