hate-speech-detector

A Slack app that detects hate speech using AI, and a dashboard to show top offenders and visualize their social networks.

The Slack app hits the model API to classify messages as hate speech. If the prediction meets some threshold, the bot posts a message asking the user to re-consider their message. The event is written to a SQLite database, which can be used for analysis.

A rudimentary dashboard (Dash/Plotly) is provided to list top offenders and visualize their social networks.

How to Use It

I strongly recommend either hosting Docker on a GPU, or training the model offline with a CUDA capable machine and letting Docker copy the trained .pt and .pkl files to the image during the build process. Otherwise, the model will be trained during the image build, and it will take a very long time. To run training offline:

cd services/hate_speech_model/src
python train.py

Training will download the required word vectors, fit a tokenizer, and train a classifier. These artifacts will be stored in services/hate_speech_model/src/models and services/hate_speech_model/src/word_vectors. Assuming you have already trained the model, create a .env file to hold your app and bot tokens, and start up the model API and Slack Bolt with docker compose.

Example of .env with your tokens.

$ cat .env
SLACK_BOT_TOKEN=xoxb-
SLACK_APP_TOKEN=xapp-

Startup:

docker-compose up --build

Of course, it would be better to save the model artifacts to cloud storage and avoid making them part of the image entirely. It would also be better to use Docker secrets instead of environment variables for the tokens.

You will know startup has finished when you see this:

What Counts as Hate Speech

The hate speech classifier will return a probability between 0 and 1 for every message posted in Slack. The default threshold is 0.4. You can make this higher to detect less hate speech, or lower to detect more hate speech. Set the threshold in docker-compose.yml.

Integration with Slack

To use this app for hate speech detection in Slack, you will need a Slack account and workspace.
Follow the basic, getting-started tutorial for Slack Bolt to:

Create a new Slack app
Add features & functionality -> bots
Add scopes for: channels:join, channels:read, chat:write, chat:write:public, chat:write:customize, groups:read, im:read, mprim:read, metadata.message.read, users.profile.read, users:read
Install OAuth and copy the bot user OAuth token
Add an app level token for connections:write with whatever name you want, then copy this token too
Enable socket mode
Subscribe to bot events with message.channels, message.groups, message.im, message.mpim
Install the app to a workspace
Add the bot to the channels you need (@ mention your bot's name can do this)

Once the app has been set up in your Slack account and the Slack Bolt app is running in Docker, you are ready to go. Start sending messages and watch your bot jump into action when hate speech is detected.

Using the Dashboard

Hate speech events will be logged by the Slack bolt app in a SQLite DB. This DB can be used for monitoring and analytics. If you do not have a business intelligence tool, you can use the Dash/Plotly dashboard that comes included. It is a very minimal dashboard, but it can be accessed at: http://0.0.0.0:8050.

The dashboard is hosted in the same container as the Slack Bolt app. Bolt runs in the background. The dashboard will auto-refresh every 3 seconds.

The Model

The hate speech model is a classifier that was trained on Kaggle's Jigsaw Unintended Bias dataset. The model uses a combination of GloVe and FastText word vectors. The model stacks a transformer on top of the word embeddings, where the encoder is a bi-directional LSTM, and the decoder is a standard attention module.

The model was inspired by reviewing several of the top scoring models for the Jigsaw competition, taking a few ideas from each. See this one and this one, both by Benjamin Minixhofer, and this one by Shujian Liu.

Refer to the Jupyter notebooks in services/hate_speech_model/src/notebooks to see draft versions of the models and trace how it came together.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
readme_files		readme_files
services		services
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme_files

readme_files

services

services

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

docker-compose.yml

docker-compose.yml

Repository files navigation

hate-speech-detector

How to Use It

What Counts as Hate Speech

Integration with Slack

Using the Dashboard

The Model

About

Releases

Packages

Languages

License

nlinc1905/hate-speech-detector

Folders and files

Latest commit

History

Repository files navigation

hate-speech-detector

How to Use It

What Counts as Hate Speech

Integration with Slack

Using the Dashboard

The Model

About

Resources

License

Stars

Watchers

Forks

Languages