Skip to content

Go package for searching Arabic text based on its transliteration

License

Notifications You must be signed in to change notification settings

hablullah/go-lafzi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go-Lafzi Go Report Card Go Reference

Go-Lafzi is a Go package for searching Arabic text using its transliteration (phonetic search). Loosely based on research by Istiadi (2012) and multiple papers related with it.

It works by using indexed trigram for approximate string matching, while the search result is ranked using Longest Common Sequence with Myers Diff Algorithm. For storing the indexes, it uses SQLite database which brings several advantages:

  • The indexing and lookup process is pretty fast, around 3 seconds for indexing entire Al-Quran and 90 ms per lookup. For more detail, checkout the code in sample/quran.
  • Can be safely used concurrently.

Usage

For example, we want to find the word "rahman" within surah Al-Fatiha:

package main

import (
	"encoding/json"
	"fmt"

	"github.com/hablullah/go-lafzi"
)

var arabicTexts = []string{
	"بِسْمِ اللَّهِ الرَّحْمَـٰنِ الرَّحِيمِ",
	"الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ",
	"الرَّحْمَـٰنِ الرَّحِيمِ",
	"مَالِكِ يَوْمِ الدِّينِ",
	"إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ",
	"اهْدِنَا الصِّرَاطَ الْمُسْتَقِيمَ",
	"صِرَاطَ الَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ الْمَغْضُوبِ عَلَيْهِمْ وَلَا الضَّالِّينَ",
}

func main() {
	// Open storage
	storage, err := lafzi.OpenStorage("sample.lafzi")
	checkError(err)

	// Prepare documents
	var docs []lafzi.Document
	for i, arabicText := range arabicTexts {
		docs = append(docs, lafzi.Document{
			ID:     i + 1,
			Arabic: arabicText},
		)
	}

	// Save documents to storage
	err = storage.AddDocuments(docs...)
	checkError(err)

	// Search in storage
	results, err := storage.Search("rahman")
	checkError(err)

	// Print search result
	bt, _ := json.MarshalIndent(&results, "", "  ")
	fmt.Println(string(bt))
}

func checkError(err error) {
	if err != nil {
		panic(err)
	}
}

Which will give us following results :

[
  {
    "DocumentID": 1,
    "Confidence": 1
  },
  {
    "DocumentID": 3,
    "Confidence": 1
  }
]

Resources

All resources mentioned here is also available in doc folder. This is done to prevent case where the university decided to close public access to these research. For example, paper by Istiadi was publicly available back in 2014, however now in 2022 it can only downloaded by member of its university.

By the way, the algorithm that implemented in this package is not exactly the same as in these papers. There are also some papers that I ignored, i.e. the papers to find Arabic text cross-verse in Qur'an, which I believe not really useful for general Arabic texts. There are also many parts that I've changed to make implementation easier and to increase performance in testing.

  • Istiadi, Muhammad Abrar. "Sistem pencarian ayat al-quran berbasis kemiripan fonetis." (2012). (PDF, link)
  • Zafran, Aidil, Moch Arif Bijaksana, and Kemas M. Lhaksmana. "Truncated query of phonetic search for al qur’an." 2019 7th International Conference on Information and Communication Technology (ICoICT). IEEE, 2019. (PDF, link)
  • Rifaldi, Eki, Moch Arif Bijaksana, and Kemas Muslim Lhaksamana. "Sistem Pencarian Lintas Ayat Al-Qur'an Berdasarkan Kesamaan Fonetis." Indonesia Journal on Computing (Indo-JC) 4.2 (2019): 177-188. (PDF, link)
  • Rasyad, Naufal, Moch Arif Bijaksana, and Kemas Muslim Lhaksmana. "Pencarian Potongan Ayat Al-Qur'an dengan Perbedaan Bunyi pada Tanda Berhenti Berdasarkan Kemiripan Fonetis." Jurnal Linguistik Komputasional 2.2 (2019): 56-61. (PDF, link)
  • Satriady, Wildhan, Moch Arif Bijaksana, and Kemas M. Lhaksmana. "Quranic Latin Query Correction as a Search Suggestion." Procedia Computer Science 157 (2019): 183-190. (PDF, link)
  • Octavia, Agni, Moch Arif Bijaksana, and Kemas Muslim Lhaksmana. "Verse Search System for Sound Differences in the Qur’an Based on the Text of Phonetic Similarities." Jurnal Sisfokom (Sistem Informasi dan Komputer) 9.3 (2020): 317-322. (PDF, link)
  • Fitriani, Intan Khairunnisa, Moch Arif Bijaksana, and Kemas Muslim Lhaksmana. "Qur’an Search System for Handling Cross Verse Based on Phonetic Similarity." Jurnal Sisfokom (Sistem Informasi dan Komputer) 10.1 (2021): 46-51. (PDF, link)
  • Purwita, Naila Iffah, et al. "Typo handling in searching of Quran verse based on phonetic similarities." Register: Jurnal Ilmiah Teknologi Sistem Informasi 6.2 (2020): 130-140. (PDF, link)
  • Cendikia, Putri, Moch Arif Bijaksana, and Kemas M. Lhaksmana. "Pencarian Ayat Al-Qur'an Yang Tidak Utuh Berdasarkan Kemiripan Fonetis." eProceedings of Engineering 7.2 (2020). (PDF, link)
  • Elder, Robert. "Myers Diff Algorithm - Code & Interactive Visualization." (2017) (archive, link)

License

Go-Lafzi is distributed using MIT license.

About

Go package for searching Arabic text based on its transliteration

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages