Skip to content

mob2dr/NLP-Dialect-Detection

Repository files navigation

NLP-Dialect-Detection

Arabic Dialect Detection

This repository is to provide an application for detecting and identifying the arabic dialects using ML and DL models.

Demo:

bandicam.2023-05-13.19-38-05-633.mp4

APP Pipeline

nlp-pipeline

Project Pipline:

01 Data Fetching

  • used SQLite connection and pandas to perform a join query and save the result in a dataframe.

02 Dara pre processing

  • Preprocessing has a pipeline that applied to our fetched dataset:
    • Removing Punctuations
    • Removing Symbols
    • Removing Emojis
    • Removing Diacritics
    • Removing Non-Arabic Characters
    • Removing Repeated
    • Apply Lemmatisation

03 ML Model Training

  • Text representation using TF-IDF
  • Model selection
    • SVC F1 score of 82%
    • Lightgbm F1 score of 75%

04 DL Model Training

  • Hugging Face AraBert accuracy 84%

05 Deployment

  • convert our model into ONNX model
  • Deploy with FastAPI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •