pratikratadiya / Toxicity_detection Public

Notifications You must be signed in to change notification settings
Fork 1
Star 3

Detection of toxicity in a given comment using various sequence models

3 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Attention_LSTM.ipynb		Attention_LSTM.ipynb
Attention_Test_improve.ipynb		Attention_Test_improve.ipynb
LSTM_with_embeddings.ipynb		LSTM_with_embeddings.ipynb
README.md		README.md

Repository files navigation

Toxicity_detection

Detection of toxic nature of comment using the toxic comment dataset(https://github.com/vzhou842/profanity-check/blob/master/profanity_check/data/clean_data.csv)

The data includes both tweets as well as wikipedia edits as obtained from the Toxic comment classification challenge on Kaggle.

The following implementations are done:

LSTM
LSTM with GLoVE 100D word embeddings
LSTM with GLoVE 300D word embeddings
Attention mechanism with GLoVe 300D word embeddings

The results obtained were as follows:

Model	Training accuracy(%)	Testing accuracy(%)
LSTM	82	80
LSTM + GLoVe(100D)	95	96.12
LSTM + GLoVe(300D)	95.34	95.94
Multi layer LSTM + GLoVe(300D)	97.06	96.14
LSTM + Attention + GLoVe(300D)	98.16	95.87

Future scope includes improvement in the attention layer to increase testing accuracy.

Also, transformer architecture can be used to improve the performance further

About

Detection of toxicity in a given comment using various sequence models

keras lstm attention glove toxic-comment-classification

Report repository

Releases

No releases published

Packages

No packages published

Contributors 2

Languages

Jupyter Notebook 100.0%