Skip to content

hamzaae/DCSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DCSA (Darija Comments Sentiment Analysis)

Overview

DCSA (Darija Comments Sentiment Analysis) is a project that focuses on sentiment analysis of comments written in Darija, using machine learning and deep learning techniques. Darija is a variety of Arabic spoken in the Maghreb region, written in Arabic script. The project involves data gathering from multiple sources, data cleaning, transformation, and the development of various models for sentiment analysis.

Dataset

The dataset used in this project consists of 12,000 rows of comments from diverse domains, including sports, religion, economy, politics, and more. The data preprocessing pipeline involves normalization, stemming, removing stop words, and transforming emojis into words, all while keeping the Darija text intact.

Important: Please be aware that the dataset utilized includes explicit language (bad words) in Darija.

Models

Different machine learning and deep learning models were experimented with, and the DistilBERT uncased model was fine-tuned on various hyperparameters. The selection of the best metrics for each model was a crucial step in determining the optimal sentiment analysis model. pipeline

Flask App

The project includes a Flask web application that provides a live demo of the sentiment analysis model. The app features a section where users can input an Hespress article, and it scrapes comments, classifying each one as either positive or negative. Additionally, an API is provided for users to integrate the sentiment analysis functionality into their applications.

The web app has been deployed using Render, making it accessible for users to experience the sentiment analysis in real-time.

MLOPS Integration

In the MLOPS (Machine Learning Operations) phase, every comment is stored in MongoDB along with the model's prediction. An external language model (LLM) from Hugging Face is used in conjunction with the trained model, and the results are combined to assign a sentiment score. The comment is then labeled with a resultant score, utilizing a weighted approach (0.7 for the LLM and 0.3 for the custom model). The model is retrained periodically to adapt to evolving patterns in sentiment expression.

full

Deployment

The DCSA project is live at lfahim.tech, providing users with a practical and interactive way to analyze sentiment in Darija comments.

NB.

  • It's available for a periode!
  • Please be aware that the free deployment of the LLM model from Hugging Face may occasionally result in internal server errors. If you encounter such errors, we recommend going back and reloading the page once or twice. Additionally, please note that the website loading speed may be slower due to the free deployment. Thank you for your understanding! XD hh

How to Use

To utilize the sentiment analysis functionality, you can visit the web app at lfahim.tech for the live demo. Additionally, the API endpoint is available for integration into your own applications.

Some screenshots: image image image image

Acknowledgments

Feel free to explore, contribute, and use DCSA for your sentiment analysis needs!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published