QuranAnalysis (QA) Project
The goal of this project is to build a Semantic Search and Intelligence System for the Quran, providing normal users and scholars the ability to search the Quran semantically, analyse all aspects of the text, find hidden patterns and associations using state-of-the-art visualization techniques.
QA started as an MSc project at the University of Leeds in 2015 supervised by Eric Atwell. The project aimed to glu-together and build on previous research done in the university providing an opensource framework for Quran Analysis work, a framework which can facilitate research, boost applications and foster innovation in that area.
More information can be found in my Thesis below
Feel free to use QA in your research or applications, also contributions are welcome.
How can you benefit from QA ?
Assuming you are a developer or researcher, with more than 20,000 lines of code you can benefit in many ways as explained below
- Access to language corpus and resources (inlcuding QA Ontology)
- Reusable code and algorithms
- Libraries for various tasks around Arabic Language handling, Data Model manipulation, NLP, Semantics, and Quran specifics
- QA Ontology extraction module
- Search Engine, Question Answering and Semantic Application Implementation
- Novel D3 visualzation techniques
- Speed up prototyping by reusing QA user interface
QA Unique Resources
- QA Ontology: OWL file including rich concepts, relations and metadata from the Quran
- Stopwords Lists: Quranic stop words lists
- Simple to Uthmani Mapping File: A file containing one-to-one mapping between simple and uthmani words from the Quran
- Qurana to QAC segment Mapping File: QAC and Qurana has different segments counts. The file includes one-to-one mapping between QAC and Qurana segment numbers
- Longest Common Substrings in the Quran: A file containing all common substrings in the Quran - extracted using LCS algorithm
External Resources used in QA
- Tanzil Project - Quran Text: Authentic Simple/Uthmani text of the the Quran
- Tanzil Project - Quran Translation: English translation corpus of the Quran
- Tanzil Project - Quran Transliteration: English transliteration corpus of the Quran
- Quranic Arabic Corpus: PoS tagged corpus of the Quran with morphological annotations
- Quranic Arabic Corpus Word-by-Word: Word by word Arabic-English translation corpus of the Quran (Edited & not up to date)
- Qurana: Corpus of the Quran annotated with Pronominal Anaphora
- Wordnet: English dictionary and thesaurus corpus
- DBPedia: semantic structured data extracted from Wikipedia
- OWLLib PHP Library
- Microsoft Translator API
- PHPir PoS Tagging Library
- Brown corpus lexicon for English PoS Tagging
- English stop-words project
- OpenOffice ar.dic file
- Arabic stopwords list from Ar-PHP project
- TextMiningTheQuran stop-words list
- Limited number of basic English and Arabic stopwords taken from
List of functions can be found here
I am sorry that some parts in the code are messy, undocumented and not well designed, this is due to shortage of time, stress and the novelty aspect of the project which lead me to focus more on research, coding and thesis writing to meet deadlines. I thought releasing the code this way is much better than waiting until i have time to clean it up which can take forever.
Quran Analysis (www.qurananalysis.com). Full Semantic Search and Intelligence System for the Quran. Copyright (C) 2015 Karim Ouda This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. You can use Quran Analysis code, framework and corpora in your website or application (commercial/non-commercial) provided that you link back to www.qurananalysis.com and sufficient credits are given.