Arabic Analysis Plugin for Elasticsearch

Arabic Analysis plugin for Elasticsearch. It uses lucene-arabic-analyzer to extract arabic token roots.

Features

Normalizes input text by removing diacritics and Hamza-like characters
Extracts word's roots.

Components

arabic-root analyzer

Example

GET _analyze
{
  "analyzer": "arabic-root",
  "text": "اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ"
}

// Result:
['هدن','هدي','صرط','قوم']

Configuration

This plugin is preconfigured with builtin normalization, stop-words and a stemmer which is derived from lucene-arabic-analyzer.

Plugin in action

Build the plugin:

mvn clean package

Run Elasticsearch and install plugin inside a docker container:

docker compose up

Open http://localhost:5601/ and login with elastic/elastic credentials.
Go to Dev Tools and examine the plugin:

GET _analyze
{
  "text": "اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ",
  "analyzer": "arabic-root"
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/main		src/main
.editorconfig		.editorconfig
.env		.env
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/main

src/main

.editorconfig

.editorconfig

.env

.env

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE.txt

LICENSE.txt

README.md

README.md

docker-compose.yaml

docker-compose.yaml

pom.xml

pom.xml

Repository files navigation

Arabic Analysis Plugin for Elasticsearch

Features

Components

Example

Configuration

Plugin in action

About

Releases

Packages

Languages

License

msarhan/elasticsearch-analysis-arabic-plugin

Folders and files

Latest commit

History

Repository files navigation

Arabic Analysis Plugin for Elasticsearch

Features

Components

Example

Configuration

Plugin in action

About

Resources

License

Stars

Watchers

Forks

Languages