Skip to content

msarhan/elasticsearch-analysis-arabic-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arabic Analysis Plugin for Elasticsearch

Arabic Analysis plugin for Elasticsearch. It uses lucene-arabic-analyzer to extract arabic token roots.

Features

  • Normalizes input text by removing diacritics and Hamza-like characters
  • Extracts word's roots.

Components

  • arabic-root analyzer

Example

GET _analyze
{
  "analyzer": "arabic-root",
  "text": "اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ"
}

// Result:
['هدن','هدي','صرط','قوم']

Configuration

This plugin is preconfigured with builtin normalization, stop-words and a stemmer which is derived from lucene-arabic-analyzer.

Plugin in action

  1. Build the plugin:
mvn clean package
  1. Run Elasticsearch and install plugin inside a docker container:
docker compose up
  1. Open http://localhost:5601/ and login with elastic/elastic credentials.
  2. Go to Dev Tools and examine the plugin:
GET _analyze
{
  "text": "اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ",
  "analyzer": "arabic-root"
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages