Jigsaw Unintended Bias in Toxicity Classification

This is a kaggle competetion hosted by Jigsaw and google to identify toxic comments in online conversations. This is a addition to the previous Toxic Comment Classification Challenge which is to be more unbiased and diverse.

Methodology:

Data Cleaning:
- For the training “comment_text” and “target” features were relevant. So, other columns are dropped.
- For the training “comment_text” and “target” features were relevant. So, other columns are dropped.
Tokenizing and Embedding:
- The training and testing data is tokenized and padded to have same length.
- FastText and QuoraText embedding are combined, and fitted on tokenized dataset to create the embedding of the words.
Splitting the dataset:
- The tokenized input variable and target variable is splitted for training and testing.
Model Building:
- The model is built using Embedding layer, Bidirectional LSTM layer and Dense layer.
Model training:
- The model is trained for 20 epoch with batch size of 248.
- Early stopping and Reduce Learning rate is used to stop overfitting.
- Model Checkpoint is used to save the best model.

Result Analysis:

The training accuracy is: 0.9638 and loss is 0.0955
The validation accuracy is: 0.96380 and loss is 0.095541
The model seems converged well on the first epoch, the started overfitting. Reducing learning rate haven’t helped much.

Accuracy Plot:

Loss Plot:

Dependencies

numpy
pandas
sklearn
tensorflow
matplotlib
gensim
tqdm
operator
gc

Acknowledgments

Kaggle Competetion

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
attachmet		attachmet
README.md		README.md
TextClassifier.ipynb		TextClassifier.ipynb
report.pdf		report.pdf
submission.csv		submission.csv
submission20epoch.csv		submission20epoch.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jigsaw Unintended Bias in Toxicity Classification

Methodology:

Result Analysis:

Dependencies

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jigsaw Unintended Bias in Toxicity Classification

Methodology:

Result Analysis:

Dependencies

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages