Toxic Comment Classification

Fullstack end-to-end toxic comment classification with result interpretation

You can visit the online demo on this page

Data

The dataset is available at kaggle and contains labelled comments from the popular Russian social network ok.ru

Train

python run_training.py

Inference

from src.toxic.inference import Toxic

model = Toxic.from_checkpoint('path_to_model_or_name')
model.infer('привет, придурок')

Result:

{
    'predicted': [
        {'class': 'insult', 'confidence': 0.99324},
        {'class': 'threat', 'confidence': 0.002},
        {'class': 'obscenity', 'confidence': 0.00225}
    ],
    'interpretation': {
        'spans': [(0, 7), (7, 16)],
        'weights': {
            'insult': [-0.34299, 0.93934],
            'threat': [-0.97362, 0.22819],
            'obscenity': [-0.99579, 0.09168]
        }
    }
}

Pretrained model

We provide pretrained model at release page

To download the model execute:

wget https://github.com/esceptico/toxic/releases/download/v0.1.0/model.pth.zip
unzip model.pth.zip

You can download and cache pretrained model by model name as well:

model = Toxic.from_checkpoint('cnn')

List of supported pretrained models:

cnn: Wide CNN encoder with Feed Forward classification head

Serving

Streamlit

streamlit run ui/app.py -- --model=models/model.pth

Powered By

License

The code is released under MIT license.

The models are distributed under CC BY-NC-SA 4.0 according to the dataset license.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.streamlit		.streamlit
conf		conf
docs/images		docs/images
src/toxic		src/toxic
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
run_training.py		run_training.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Classification

Data

Train

Inference

Pretrained model

Serving

Streamlit

Powered By

License

About

Releases 2

Languages

License

esceptico/toxic

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Classification

Data

Train

Inference

Pretrained model

Serving

Streamlit

Powered By

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Languages