GitHub - siyuanligit/Toxic-Comment-Classification: Thesis for Master of Applied Statistics UCLA, a NLP and Deep Learning implementation on toxic comment classification.

Application of Recurrent Neural Networks in Toxic Comment Classification

Read the thesis HERE

Abstract

Moderators of online discussion forums often struggle with controlling extremist comments on their platforms. To help provide an efficient and accurate tool to detect online toxicity, we apply word2vec’s Skip-Gram embedding vectors, Recurrent Neural Network models like Bidirectional Long Short-term Memory to tackle a toxic comment classification problem with a labeled dataset from Wikipedia Talk Page. We explore different pre-trained embedding vectors from larger corpora. We also assess the class imbalance issues associated with the dataset by employing sampling techniques and penalizing loss. Models we applied yield high overall accuracy with relatively low cost.

Data Source

Toxic Comment Classification Challenge from Kaggle.

Dependencies

Python
- NumPy
- Pandas
- Keras
- tensorflow-gpu
- CUDA
- cuDNN
- gensim
- NLTK
- scikit-learn
R
- readr
- tidyr
- dplyr
- stringr
- stringi
- ggplot2

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
README.md		README.md
Thesis Code V2.ipynb		Thesis Code V2.ipynb
Thesis Code V3-Alt.ipynb		Thesis Code V3-Alt.ipynb
Thesis Code V3.ipynb		Thesis Code V3.ipynb
Thesis Code V4.ipynb		Thesis Code V4.ipynb
code.r		code.r
model.png		model.png
test.ipynb		test.ipynb
thesis.pdf		thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

img

img

README.md

README.md

Thesis Code V2.ipynb

Thesis Code V2.ipynb

Thesis Code V3-Alt.ipynb

Thesis Code V3-Alt.ipynb

Thesis Code V3.ipynb

Thesis Code V3.ipynb

Thesis Code V4.ipynb

Thesis Code V4.ipynb

code.r

code.r

model.png

model.png

test.ipynb

test.ipynb

thesis.pdf

thesis.pdf

Repository files navigation

Application of Recurrent Neural Networks in Toxic Comment Classification

Abstract

Data Source

Dependencies

About

Releases

Packages

Languages

siyuanligit/Toxic-Comment-Classification

Folders and files

Latest commit

History

Repository files navigation

Application of Recurrent Neural Networks in Toxic Comment Classification

Abstract

Data Source

Dependencies

About

Resources

Stars

Watchers

Forks

Languages