# Call Me Out! A Machine Learning Approach for Reliable Hate Speech Detection at Scale

<img src="https://cdn.kqed.org/wp-content/uploads/sites/43/2019/05/social-media-logos1910-800x450.jpg" alt="Alternative text" />

## Purpose of this project

Hate speech has become a pervasive problem in today's society, with its potential to spread negativity, fuel conflicts, and incite violence. Therefore, detecting and preventing hate speech has become a crucial task. Machine learning techniques have proved to be an effective tool in hate speech detection. These algorithms can analyze vast amounts of data, identify patterns and classify content based on predefined criteria. By automating the detection process, machine learning can save time, effort and resources compared to manual methods. Furthermore, machine learning models can learn and adapt over time, improving their accuracy and efficiency in identifying and filtering hate speech. Therefore, the application of machine learning techniques in hate speech detection has become a relevant area of research with the potential to make a significant impact in promoting a safer and more inclusive online environment.

The “Call me sexist but” data set consists of  16,914 annotated tweets collected over a 2 months period. 

3,383 of those tweets were flagged for sexist content sent by 613 users, 1,972 for
racist content sent by 9 users, and 11,559 for neither sexist or racist and is sent by 614 users.

The purpose of this project is to use this dataset to gain a deeper understanding of the hate speech landscape on Twitter. 
To accomplish this, we will apply natural language processing (NLP) techniques to detect instances of hate speech in the dataset. By doing so, we hope to shed light on the prevalence of hate speech on social media platforms and better understand the specific types of language and behavior that constitute hate speech. Ultimately, this research could inform the development of more effective strategies for combating hate speech online and promoting more inclusive and respectful communication in digital spaces.

## What is Hate Speech Detection?

In November 2021, ~500 million tweets were sent every day.

Because of volume, scalable content moderation has become a critical issue for social media companies. 

Effective content moderation is essential to ensure the safety and well-being of users, prevent the spread of harmful or illegal content, and maintain the integrity and reputation of the platform.


Natural Language Processing (NLP) is a branch of artificial intelligence. NLP allows for the automated analysis and classification of user-generated content, such as social media posts, comments, and messages.

A study conducted by the Anti-Defamation League found that NLP-based systems were able to detect hate speech with an accuracy rate of 80-90%.

Sexism, which is the discrimination, prejudice, or stereotyping based on a person's gender, can lead to hate crimes such as sexual assault, harassment, and violence.

A study conducted by the Pew Research Center in 2018 found that women are more likely than men to experience sexual harassment on social media platforms like Twitter, with 21% of women reporting having been sexually harassed online compared to 9% of men.

Sources:
Pew Research Center: "Americans' Attitudes About Social Media and the News" (2018) https://www.journalism.org/2018/09/10/americans-attitudes-about-social-media-and-the-news/
Anti-Defamation League: "The State of Online Hate" (2020) https://www.adl.org/sites/default/files/documents/2020-07/The-State-of-Online-Hate.pdf

##  Why Automate Content Moderation?

## The “Call me sexist but” Dataset

This project used the “Call me sexist but” dataset. 
The “Call me sexist but” data set consists of  16,914 annotated tweets collected over a 2 months period. 

3,383 of those tweets were flagged for sexist content sent by 613 users, 1,972 for
racist content sent by 9 users, and 11,559 for neither sexist or racist and is sent by 614 users.

In this project, we will look into the dataset to understand the hate speech landscape on Twitter, and apply NLP for hate speech detection.

The “Call me sexist but…” dataset includes 3,383 tweets by 613 users flagged as “sexist” according to a rigorous framework of data review and annotation.

Throughout the dataset, sexism manifests in various forms, including derogatory language, harmful stereotypes, and discriminatory attitudes, emphasizing the importance of developing nuanced content moderation strategies that can identify and address these different types of harmful content.

Source:
Samory, Mattia (2021). The 'Call me sexist but' Dataset (CMSB). GESIS - Leibniz-Institute for the Social Sciences. Data File Version 1.0.0, https://doi.org/10.7802/2251.


## Key Takeaways from the Dataset

- Adversarial Examples:
Adversarial examples are created by making small and deliberate changes to an existing example of problematic content, such as a hate speech tweet, in order to create a new example that is similar to the original but can evade detection by automated content moderation systems.

- Users Demographics:
Identifying the gender of a Twitter user can be challenging, and proxies such as the user's name, profile picture, or bio are often used as imperfect indicators to make gender predictions for content moderation purposes.
These proxies may lead to errors or biases in content moderation.

- Toxicity Scale:
The toxicity scale of the “Call me sexist but…” dataset provides a quantitative measure of the level of harm in the tweets collected, and can be used to evaluate the effectiveness of different content moderation strategies. It was developed for this project, but was inspired by other similar scales.

Source:
Samory, Mattia (2021). The 'Call me sexist but' Dataset (CMSB). GESIS - Leibniz-Institute for the Social Sciences. Data File Version 1.0.0, https://doi.org/10.7802/2251.

## Data Visualisation from the Dataset

![Screenshot%202023-05-16%20at%2012.00.56.png](attachment:Screenshot%202023-05-16%20at%2012.00.56.png)

- Tweet Content: The data shows that content which starts with a negation of hate speech, such as “Call me sexist, but…”, is proportionally as likely to contain hate speech as outright hostile content.


![Screenshot%202023-05-16%20at%2012.01.01.png](attachment:Screenshot%202023-05-16%20at%2012.01.01.png)

- User Gender: While men are over represented in the data set for all categories, the majority of users cannot be identified with the tagging method, which heavily impairs the use of gender information as features for model developments.


## Data Transformation Methodology

- Normalize and tokenize the text data: This step involves converting the text to lowercase and splitting it into individual words (tokens). This is often necessary because words in text data can be in different cases and can be separated by various types of whitespace, such as spaces, tabs, or line breaks. In your tweet, this step would result in the following list of tokens: ['in', 'my', 'little', 'world', 'it', 'has', 'always', 'seemed', 'that', 'diy', 'was', 'best', 'left', 'to', 'women', '.', 'ever', 'done', 'diy', 'with', 'a', 'bloke', '?', '!', 'fuckin', 'nightmare', '.']



- Remove stop words and punctuation: Stop words are common words that typically don't carry much meaning on their own, such as 'the', 'and', and 'of'. Punctuation refers to various marks used to separate sentences and phrases, such as periods, commas, and question marks. Removing stop words and punctuation can help to reduce the size of the vocabulary and focus on more meaningful words. In your tweet, removing stop words and punctuation would result in the following list of tokens: ['little', 'world', 'always', 'seemed', 'diy', 'best', 'left', 'women', 'ever', 'done', 'diy', 'bloke', 'fuckin', 'nightmare']



- Perform stemming or lemmatization: Stemming and lemmatization are two techniques for reducing words to their root form or base form, which can help to group together words with similar meanings. Stemming involves removing the suffix from a word to produce its stem, whereas lemmatization involves using a dictionary of known word forms to map a word to its base form. In your tweet, performing stemming using the Porter stemmer would result in the following list of stemmed tokens: ['littl', 'world', 'alway', 'seem', 'diy', 'best', 'left', 'women', 'ever', 'done', 'diy', 'bloke', 'fuckin', 'nightmar'].

Original tweet: "in my little world it has ALWAYS seemed that DIY was best left to women. Ever done DIY with a bloke?! Fuckin nightmare."

Normalize and tokenize: ['in', 'my', 'little', 'world', 'it', 'has', 'always', 'seemed', 'that', 'DIY', 'was', 'best', 'left', 'to', 'women', '.', 'Ever', 'done', 'DIY', 'with', 'a', 'bloke', '?', '!', 'Fuckin', 'nightmare', '.']

Remove stop words and punctuation: ['little', 'world', 'always', 'seemed', 'DIY', 'best', 'left', 'women', 'Ever', 'done', 'DIY', 'bloke', 'Fuckin', 'nightmare']

Perform stemming: ['littl', 'world', 'alway', 'seem', 'DIY', 'best', 'left', 'women', 'ever', 'done', 'DIY', 'bloke', 'fuckin', 'nightmar']

## Machine Learning Methodology

## Call Me Out! Algorithm Classification Report

The Call me out! Algorithm focuses on minimizing false positives for sexist content (65% prediction score vs 94% prediction score for non-sexist content).

Many jurisdictions have laws and regulations governing content moderation, requiring companies to strike a balance between removing harmful content and preserving free speech. Minimizing false positives helps ensure compliance with these policies while avoiding censorship.

![Screenshot%202023-05-16%20at%2012.17.43.png](attachment:Screenshot%202023-05-16%20at%2012.17.43.png)

- Precision measures the proportion of true positives among the instances predicted as positive. It is useful when the focus is on minimizing false positives.


- Recall calculates the proportion of true positives predicted correctly out of the actual positive instances. It is helpful when the goal is to minimize false negatives.


- The F1-score is the harmonic mean of precision and recall. It provides a balanced measure of a model's performance by considering both precision and recall.


## Call Me Out! Algorithm Learning Curve

The algorithm’s learning curve shows that by the end of the training, we have achieved a stable level of proficiency in the detection of sexist content, and further training will not resulting in significant improvement.

Learning is a continuous process, and there is always room for improvement even after reaching a stable plateau. It is crucial to reflect on objectives and decide whether maintaining this stability is satisfactory or if we want to push the hate speech detection model further (add toxicity levels, …)

![Screenshot%202023-05-15%20at%2016.28.07.png](attachment:Screenshot%202023-05-15%20at%2016.28.07.png)

## (Un)Licence 

This is free and unencumbered software released into the public domain. Please see the UNLICENSE.txt file for details.

## Acknowledgement

This program was created with the assistance of Open AI's Chat GPT-4.