https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data
FastText : 2 million word vectors trained on Common Crawl (600B tokens) : crawl-300d-2M.vec.zip: https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip
GloVe: Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, & 200d vectors, 1.42 GB download): glove.twitter.27B.zip http://nlp.stanford.edu/data/glove.twitter.27B.zip
This machine learning project uses relatively smaller datasets, so development was performed using Google Colab.
toxicity_classification.ipynb
numpy pandas matplotlib.pyplot IPython.display seaborn sklearn scipy keras skmultilearn scikitplot