Inspired by the !englando
command on Elajjaz's Twitch stream.
Elajjaz lives in Sweden, and streams for an international audeince regularly. Due to the number of languages spoken by his viewers, he has created an "english only" rule in chat. If users find people chatting in a language other than english, they can enter the !englando
command which will cause Nightbot to say :
englando in chat, pls FeelsBlyatMan
Inspired by this, I thought with some decent data sanaization, a list of most commonly used words, and some help from Google, we could write a bot to detect if a user is speaking a language other than English, and notify them.
There are two main hurdles to overcome when writing this bot
- There are a number of Emotes in twitch that are not recognizeable English words
- Google Translate API costs by the character
This application uses several environment variables for configuration
Variable | Use | Required? |
---|---|---|
TWITCH_CHANNEL | The name of the Channel on Twitch to join | Yes |
TWITCH_NICK | The nickname to present in the chatroom. | Yes |
TWITCH_TOKEN | The OAUTH API Key for authenticating to Twitch | Yes |
GOOGLE_API_KEY | Google Authentication token to the API Services for Language Detection | No |
For developer convience this application uses dotenv which allows users to define a text file to set environment variables. See the .env.example
file for the
Follow the instructions found here.
If you want to use Google Translate API follow their guide to setting up keys here.
With that in mind, the strategy is the following:
- Get a list of all the emotes available in Twitch
- Get a list of all the emotes in BetterTwitchTV
- Get a list of the top 10k used English words
- Upon recieving a message
- Strip out the username mentions
- Strip out the emotes
- Strip out the words that are not in the top 10k English words
- Check the total number of non top English words v.s. the total length of the message
- If configured, ask Google language detection api to guess the language
- If the detected language is not English, warn the user
So the message
lul @day9tv Google Translate Rocks kappa
would go through the following transformations:
- Remove @username mentions
lul Google Translate Rocks kappa
- Remove emotes
Google Translate Rocks
- Remove Top 10k English words
Google
Since there is only one word in the message that is not in the top 10k then we would do nothing. Otherwise we would have sent Google
off to the language detection api and continued processing from there
[ ] Adaptive algorythm that will take the data returned from Google and if english, add the words to another "whitelist"
[ ] An extra "internet slang" dictionary to weed out other common words
[ ] Update the README more
[ ] Add a web interface to signup/use the bot