Skip to content

Created a Deep-Learning model to identify toxic comments and categorize them into 6 different categories. The model uses Gradio as the front-end for the project and the entire project has been deployed using AWS EC2 instance.

License

Notifications You must be signed in to change notification settings

shaunak09vb/Toxic-Comment-Classifier-AWS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Toxic-Comment-Classifier-AWS

logo

License: MIT GitHub release Python contributions welcome

Introduction

Online forums and social media platforms have provided individuals with the means to put forward their thoughts and freely express their opinion on various issues and incidents. In some cases, these online comments contain explicit language which may have an adverse effect on the readers. Comments containing explicit language can be classified into myriad categories such as Toxic, Severe Toxic, Obscene, Threat, Insult, and Identity Hate. The threat of abuse and harassment means that many people stop expressing themselves and give up on seeking different opinions.

To protect users from being exposed to offensive language on online forums or social media sites, companies have started flagging comments and blocking users who are found guilty of using unpleasant language. Several Machine Learning models have been developed and deployed to filter out the unruly language and protect internet users from becoming victims of online harassment and cyberbullying.

Requirements

  • Matplotlib>=3.3.3
  • Keras>=2.4.3
  • Gradio>=1.5.3
  • Scipy==1.5.4
  • Numpy>=1.19.5
  • Pandas~=1.2.1
  • Scikit-learn~=0.24.1
  • Nltk~=3.5
  • Spacy~=3.0.3
  • Tensorflow~=2.4.1

Installation

  • Clone the repository

https://github.com/shaunak09vb/Toxic-Comment-Classifier-AWS.git

  • Install the required libraries

pip3 install -r requirements.txt

Data Processing Steps

  • Remove special characters present in between text
  • Remove repeated characters
  • Convert data to lower-case
  • Remove numbers from the data
  • Remove punctuation
  • Remove whitespaces
  • Remove spaces in between words
  • Remove "\n"
  • Remove emojis
  • Remove non-english characters

Usage

Locate the source directory and execute the following python files.

  • To create the model, run:

python3 model_training.py

  • You can also provide your own data:

python3 model_training.py --data=csv_file_location

You can also view the NLP_Deep_Learning.ipynb file present in the notebooks directory to understand the step-by-step approach undertaken for this project.

Website

To run the hosted website, access the website directory and execute:

python3 website.py

The website will start on your local server which can be viewed in your desired browser. You can type in any comment and find out what toxicity the model predicts.

Blog Link

If you wish discover in detail, the steps taken by me for the implementation of the project. You can read my blog on Medium.

License License: MIT

This project is licensed under MIT License

About

Created a Deep-Learning model to identify toxic comments and categorize them into 6 different categories. The model uses Gradio as the front-end for the project and the entire project has been deployed using AWS EC2 instance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages