Fullstack end-to-end toxic comment classification with result interpretation
You can visit the online demo on this page
The dataset is available at kaggle and contains labelled comments from the popular Russian social network ok.ru
python run_training.py
from src.toxic.inference import Toxic
model = Toxic.from_checkpoint('path_to_model_or_name')
model.infer('привет, придурок')
Result:
{
'predicted': [
{'class': 'insult', 'confidence': 0.99324},
{'class': 'threat', 'confidence': 0.002},
{'class': 'obscenity', 'confidence': 0.00225}
],
'interpretation': {
'spans': [(0, 7), (7, 16)],
'weights': {
'insult': [-0.34299, 0.93934],
'threat': [-0.97362, 0.22819],
'obscenity': [-0.99579, 0.09168]
}
}
}
We provide pretrained model at release page
To download the model execute:
wget https://github.com/esceptico/toxic/releases/download/v0.1.0/model.pth.zip
unzip model.pth.zip
You can download and cache pretrained model by model name as well:
model = Toxic.from_checkpoint('cnn')
List of supported pretrained models:
cnn
: Wide CNN encoder with Feed Forward classification head
streamlit run ui/app.py -- --model=models/model.pth
The code is released under MIT license.
The models are distributed under CC BY-NC-SA 4.0 according to the dataset license.