DCSA (Darija Comments Sentiment Analysis)

Overview

DCSA (Darija Comments Sentiment Analysis) is a project that focuses on sentiment analysis of comments written in Darija, using machine learning and deep learning techniques. Darija is a variety of Arabic spoken in the Maghreb region, written in Arabic script. The project involves data gathering from multiple sources, data cleaning, transformation, and the development of various models for sentiment analysis.

Dataset

The dataset used in this project consists of 12,000 rows of comments from diverse domains, including sports, religion, economy, politics, and more. The data preprocessing pipeline involves normalization, stemming, removing stop words, and transforming emojis into words, all while keeping the Darija text intact.

Important: Please be aware that the dataset utilized includes explicit language (bad words) in Darija.

Models

Different machine learning and deep learning models were experimented with, and the DistilBERT uncased model was fine-tuned on various hyperparameters. The selection of the best metrics for each model was a crucial step in determining the optimal sentiment analysis model.

Flask App

The project includes a Flask web application that provides a live demo of the sentiment analysis model. The app features a section where users can input an Hespress article, and it scrapes comments, classifying each one as either positive or negative. Additionally, an API is provided for users to integrate the sentiment analysis functionality into their applications.

The web app has been deployed using Render, making it accessible for users to experience the sentiment analysis in real-time.

MLOPS Integration

In the MLOPS (Machine Learning Operations) phase, every comment is stored in MongoDB along with the model's prediction. An external language model (LLM) from Hugging Face is used in conjunction with the trained model, and the results are combined to assign a sentiment score. The comment is then labeled with a resultant score, utilizing a weighted approach (0.7 for the LLM and 0.3 for the custom model). The model is retrained periodically to adapt to evolving patterns in sentiment expression.

Deployment

The DCSA project is live at lfahim.tech, providing users with a practical and interactive way to analyze sentiment in Darija comments.

NB.

It's available for a periode!
Please be aware that the free deployment of the LLM model from Hugging Face may occasionally result in internal server errors. If you encounter such errors, we recommend going back and reloading the page once or twice. Additionally, please note that the website loading speed may be slower due to the free deployment. Thank you for your understanding! XD hh

How to Use

To utilize the sentiment analysis functionality, you can visit the web app at lfahim.tech for the live demo. Additionally, the API endpoint is available for integration into your own applications.

Some screenshots:

Acknowledgments

Feel free to explore, contribute, and use DCSA for your sentiment analysis needs!

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Academic		Academic
Dataset		Dataset
model		model
static		static
templates		templates
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
mongoDB.py		mongoDB.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DCSA (Darija Comments Sentiment Analysis)

Overview

Dataset

Models

Flask App

MLOPS Integration

Deployment

How to Use

Acknowledgments

About

Releases

Packages

Languages

hamzaae/DCSA

Folders and files

Latest commit

History

Repository files navigation

DCSA (Darija Comments Sentiment Analysis)

Overview

Dataset

Models

Flask App

MLOPS Integration

Deployment

How to Use

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages