Skip to content

figo2001/Hate-Speech-Deetction

Repository files navigation

Hate Speech Detection

This project implements Hate Speech Detection using Logistic Regression and Naive Bayes classifiers built from scratch in Python. The project utilizes Natural Language Processing (NLP) techniques with libraries such as NLTK, WordCloud, and word_tokenize to achieve high accuracy in detecting hate speech.

Overview

Hate speech detection has become a critical task in social media and online platforms to prevent the spread of harmful content and maintain a safe environment for users. This project aims to build a robust hate speech detection system using machine learning techniques.

Features

  • End-to-End Project: The project covers all stages from data preprocessing to model evaluation, creating an end-to-end solution.

  • Custom Implementations: Logistic Regression and Naive Bayes classifiers are implemented from scratch, providing a deeper understanding of the algorithms.

  • NLP Techniques: NLTK is used for text preprocessing, including tokenization, stemming, and stopword removal.

  • Visualization: WordCloud is employed to visualize frequent words in hate speech, aiding in understanding the nature of the data.

Speech Types

a) Normal Speech

normal speech

b) Hated Speech

Hate Speech

Key Libraries Used

  • NLTK: For natural language processing tasks such as tokenization and stopwords removal.

  • WordCloud: For generating word clouds to visualize text data.

Installation

To run this project, you need to have Python installed along with the following libraries:

pip install nltk
pip install wordcloud

How to Use

  1. Clone the repository:
git clone https://github.com/yourusername/yourrepository.git
  1. Navigate to the project directory:
cd yourrepository
  1. Run the Jupyter Notebook or Python scripts to execute the code:
jupyter notebook

Project Structure

|- data/                   # Dataset files
|  |- hate_speech.csv
|- images/                 # Images used in README
|  |- hate_speech_detection.png
|- notebooks/              # Jupyter notebooks
|  |- Hate_Speech_Detection.ipynb
|- src/                    # Python scripts
|  |- logistic_regression.py
|  |- naive_bayes.py
|- README.md               # Project README

Results

The models achieve high accuracy in detecting hate speech, making it effective for real-world applications. Evaluation metrics such as precision, recall, and F1-score are provided to assess the model performance.

Future Improvements

  • Explore more advanced NLP techniques like word embeddings.
  • Enhance model performance by tuning hyperparameters.
  • Deploy the model as a web service for real-time hate speech detection.

Demo

streamlit-app-2024-04-29-16-04-22.webm

Conclusion

Hate Speech Detection is a crucial task in maintaining online platforms' safety and fostering positive interactions. This project demonstrates the effectiveness of Logistic Regression and Naive Bayes classifiers in detecting hate speech with high accuracy. By leveraging NLP techniques and custom implementations, the project provides insights into building robust hate speech detection systems.