Skip to content

Tools for Arabic language processing using the MADAR dataset. Includes Next Word Prediction with an n-gram model and Dialect Identification with a BERT model. Features an interactive UI with Streamlit and comprehensive text preprocessing for Arabic.

Notifications You must be signed in to change notification settings

maans2001/UJ-NLP-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Arabic Dialect Identification and Next Word Prediction

License Python Streamlit

Table of Contents

Project Overview

Welcome to the NLP Arabic Dialect Identification and Next Word Prediction project! This project leverages advanced natural language processing techniques to offer two main functionalities:

  1. Next Word Prediction (Knowledge-Based): Uses an n-gram model to predict the next word in a given sentence with the MADAR dataset.
  2. Arabic Dialect Identification (Machine Learning): Utilizes a BERT model with lexicon features to identify the Arabic dialect of a given text, leveraging the MADAR dataset.

Experience the project live on Streamlit!

Features

  • Next Word Prediction using n-gram model
  • Arabic Dialect Identification using BERT
  • Interactive UI with Streamlit
  • Comprehensive text preprocessing for Arabic

Installation

Prerequisites

  • Python 3.x
  • pip
  • Streamlit

Instructions

  1. Clone the repository:
    git clone https://github.com/maans2001/UJ-NLP-Project
    cd UJ-NLP-Project

How to Run

Windows

  1. Open Command Prompt or PowerShell.
  2. Navigate to the project directory and run:
    run.bat

macOS and Linux

  1. Open Terminal.
  2. Navigate to the project directory and make the script executable:
    chmod +x run.sh
    ./run.sh # use run_macosx.sh if you're on a mac machine

Usage

Next Word Prediction (Knowledge-Based)

  1. Select "برنامج خمن الكلمة التالية (Knowledge Based)" from the sidebar.
  2. Enter a sentence in Arabic.
  3. Click "خمن الكلمات الجاية" to predict the next words.

Arabic Dialect Identification (Machine Learning)

  1. Select "برنامج تحديد اللهجات (Machine Learning)" from the sidebar.
  2. Enter an Arabic text.
  3. Click "حدد اللهجة" to identify the dialect.

Roadmap

  • Add more dialects
  • Improve the prediction model
  • Enhance the UI/UX

Contributing

Contributions are welcome! Please fork this repository and submit a pull request. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the Project
  2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
  3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
  4. Push to the Branch (`git push origin feature/AmazingFeature`)
  5. Open a Pull Request

License

Distributed under the MIT License.

Contact

For any inquiries or feedback, feel free to reach out!

Acknowledgments

About

Tools for Arabic language processing using the MADAR dataset. Includes Next Word Prediction with an n-gram model and Dialect Identification with a BERT model. Features an interactive UI with Streamlit and comprehensive text preprocessing for Arabic.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published