<a href="https://colab.research.google.com/github/natesheehan/h8-speech/blob/main/hate_detection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hate Detection in Historical Archives

In this interactive notebook, we delve into the analysis of hate speech patterns within historical newspaper archives. Our primary dataset comprises articles extracted from the New York Times, spanning the years 1894-1895. These articles have been specifically selected for containing the term "plague," a topic of significant relevance during that period. The dataset is sourced from the comprehensive digital archive available at ProQuest (https://www.proquest.com/).




# Imports

This segment of the notebook is dedicated to importing the necessary libraries that are crucial for our analysis.

## Standard Libraries:
- **os**: For interacting with the operating system.
- **json**: For parsing and manipulating JSON data.
- **ast**: For safely evaluating strings as Python expressions.

## Data Handling and Processing:

- **pandas**: Essential for data manipulation and analysis, particularly for handling tabular data.
Natural Language Processing and Machine Learning:

- **transformers**: Provides the AutoModelForSequenceClassification and AutoTokenizer from Hugging Face, crucial for NLP tasks and hate speech classification.
- **torch**: PyTorch, a powerful deep learning library.

## Data Visualization:

- **wordcloud**: To create word cloud visualizations, highlighting key terms and phrases in the data.
STOPWORDS: A predefined list of common words to exclude from word clouds.

- **plotly**.express and plotly.graph_objs: Advanced graphing libraries for interactive data visualization.
matplotlib.pyplot: Widely used for creating static, animated, and interactive visualizations in Python.

- **matplotlib**.pyplot: Widely used for creating static, animated, and interactive visualizations in Python.
Processing and Analysis Workflow

In [2]:
import os
import json
import pandas as pd
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
from tqdm import tqdm
import ast  # for safely evaluating strings as Python expressions
from wordcloud import WordCloud, STOPWORDS
import plotly.express as px
import plotly.graph_objs as go
import matplotlib.pyplot as plt

# Set Up Classification Model

To judge wether something is hate or not, we will use the popular classification model from the paper by Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, and Animesh Mukherjee "[HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection)".


This model has been accessed using huggingface
https://huggingface.co/Hate-speech-CNERG/bert-base-uncased-hatexplain


In [None]:
# Load model and tokenizer outside the loop for efficiency
model_name = "Hate-speech-CNERG/dehatebert-mono-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [None]:
def calculate_hate_prob(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():  # Disable gradient calculations for inference
        outputs = model(**inputs)
    scores = torch.nn.functional.softmax(outputs.logits, dim=-1)
    return scores[:, 1].mean().item()  # Average probability of being hate speech

data_folder = "/content/data"  # Replace with your folder path
results = []

# Initialize an empty DataFrame
df = pd.DataFrame(columns=["search_term", "metadata", "score", "sentence"])

temp_data = []  # Temporary list to store data

for file in tqdm(os.listdir(data_folder)):
    if file.endswith("_clean.json"):
        search_term = file.split('_')[0]  # Extract the search term from the filename
        clean_file_path = os.path.join(data_folder, file)
        metadata_file = file.replace("_clean.json", "_metadata.json")
        metadata_file_path = os.path.join(data_folder, metadata_file)

        # Read and process the clean file
        with open(clean_file_path, 'r') as f:
            json_data = json.load(f)
        text = json_data['choices'][0]['message']['content']
        sentences = [sentence.strip() for sentence in text.split('.') if sentence.strip()]

        # Read metadata
        with open(metadata_file_path, 'r') as f:
            metadata = json.load(f)

        # Process each sentence and add to temp_data
        for sentence in sentences:
            score = calculate_hate_prob(sentence)
            temp_data.append({"search_term": search_term, "metadata": metadata, "score": score, "sentence": sentence})

# Concatenate the temporary data with the main DataFrame
df = pd.concat([df, pd.DataFrame(temp_data)], ignore_index=True)

# Save the DataFrame to a CSV file (optional)
df.to_csv("results.csv", index=False)

# Set up multilabel classifcation model

To decipher the types of hate speech present, we utilise the Hate Speech MultiLabel Classifier, developed by Wesley Cheng. This classifier is adept at identifying various targets of hate speech, including race, religion, origin, gender, sexuality, age, and disability. It achieves this by applying transfer learning techniques to BERT, utilizing the UC Berkeley D-Lab's Hate Speech Dataset. The classifier can be accessed through Hugging Face's repository..

https://huggingface.co/wesleyacheng/hate-speech-multilabel-classification-with-bert

In [57]:
def classify_sentences(df, model_name, sentence_column):
    # Load tokenizer and model
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)

    # Get label names from the model's configuration
    label_names = model.config.id2label.values() if model.config.id2label else ['Label index ' + str(i) for i in range(model.config.num_labels)]

    # Initialize columns in the DataFrame for each label
    for label in label_names:
        df[label] = 0.0

    # Classify each sentence
    for i, sentence in enumerate(df[sentence_column]):
        # Tokenize the sentence
        inputs = tokenizer(sentence, return_tensors="pt", truncation=True, max_length=512)

        # Get model predictions
        with torch.no_grad():
            outputs = model(**inputs)

        # Calculate softmax to get probabilities
        scores = torch.softmax(outputs.logits, dim=1)[0].tolist()

        # Update the DataFrame with scores for each label
        for label, score in zip(label_names, scores):
            df.at[i, label] = score

    return df

# Example usage
model_name = "wesleyacheng/hate-speech-multilabel-classification-with-bert"
# Assume 'df' is your DataFrame and 'sentence' is the column name
updated_df = classify_sentences(df, model_name, "sentence")




# Visual representations of the data



In [96]:
df = updated_df
# 1. Distribution of Scores
fig1 = px.histogram(df, x='score', title='Distribution of Scores')
fig1.show()

fig1 = px.histogram(df, x='race', title='Distribution of race direction')
fig1.show()

fig1 = px.histogram(df, x='religion', title='Distribution of religion direction')
fig1.show()

fig1 = px.histogram(df, x='origin', title='Distribution of origin direction')
fig1.show()

fig1 = px.histogram(df, x='sexuality', title='Distribution of sexuality direction')
fig1.show()

fig1 = px.histogram(df, x='age', title='Distribution of age direction')
fig1.show()

# 2. Average Score per Document
avg_score_per_doc = df.groupby('Document_ID')['score'].mean().reset_index()
fig2 = px.bar(avg_score_per_doc, x='Document_ID', y='score', title='Average Score per Document')
fig2.show()

# 3. Stacked Bar Chart for Categorical Representation
categories = ['race', 'religion', 'origin', 'gender', 'sexuality', 'age']
stacked_data = df.groupby('Document_ID')[categories].sum().reset_index()
fig3 = go.Figure(data=[
    go.Bar(name=cat, x=stacked_data['Document_ID'], y=stacked_data[cat])
    for cat in categories
])
fig3.update_layout(barmode='stack', title='Stacked Bar Chart of Document Categorical Representation')
fig3.show()

# Ensure that 'Publication_Date' is in datetime format
df['Publication_Date'] = pd.to_datetime(df['Publication_Date'])

# Group by Publication_Date for average score and count of articles
timeline_data = df.groupby('Publication_Date').agg({'score': 'mean', 'Document_ID': 'count'}).reset_index()
timeline_data.rename(columns={'Document_ID': 'Article_Count'}, inplace=True)

# Create a figure with a secondary y-axis
fig = go.Figure()

# Add average score trace
fig.add_trace(go.Scatter(x=timeline_data['Publication_Date'], y=timeline_data['score'], name='Average Score', mode='lines+markers', yaxis='y1'))

# Add article count trace
fig.add_trace(go.Bar(x=timeline_data['Publication_Date'], y=timeline_data['Article_Count'], name='Number of Articles', yaxis='y2', opacity=0.6))

# Create axis titles
fig.update_layout(
    title='Timeline of Documents: Average Score and Number of Articles',
    xaxis_title='Publication Date',
    yaxis=dict(
        title='Average Score',
        titlefont=dict(color='blue'),
        tickfont=dict(color='blue')
    ),
    yaxis2=dict(
        title='Number of Articles',
        titlefont=dict(color='red'),
        tickfont=dict(color='red'),
        overlaying='y',
        side='right'
    )
)

# Show the figure
fig.show()

def generate_wordcloud_for_top_category_values(category):
    # Find the top 90% most frequent values in the category
    top_values = updated_df[category].value_counts().nlargest(int(len(updated_df[category].unique()) * 0.9)).index

    # Filter sentences where category value is in the top 90%
    filtered_sentences = updated_df[updated_df[category].isin(top_values)]['sentence'].dropna()

    # Combine all sentences
    combined_sentences = ' '.join(filtered_sentences.astype(str))

    # Check if there are enough words to generate a word cloud
    if len(combined_sentences) > 0:
        # Define stopwords
        stopwords = set(STOPWORDS)

        # Generate the word cloud
        wordcloud = WordCloud(width=800, height=400, background_color='white', stopwords=stopwords).generate(combined_sentences)

        # Display the word cloud
        plt.figure(figsize=(10, 5))
        plt.imshow(wordcloud, interpolation='bilinear')
        plt.title(f"Word Cloud for Top 90% Values in Category: {category}")
        plt.axis('off')
        plt.show()
    else:
        print(f"No sufficient data to generate word cloud for category: {category}")

categories = ['score','race', 'religion', 'origin', 'gender', 'sexuality', 'age']

for category in categories:
    generate_wordcloud_for_top_category_values(category)

# Explore Data

### Display Sentances with detected hate speech

The next section focuses on displaying sentences identified as containing hate speech within our dataset. The output will list the total number of sentences flagged for hate speech, including details such as the document ID, the specific category of hate speech, the sentence itself, and the associated scores for each category.

In [91]:
def detect_hate_speech(df, overall_threshold, category_threshold):
    """
    Function to detect hate speech based on overall and category-specific scores.

    :param df: DataFrame with the relevant columns.
    :param overall_threshold: Threshold for the overall hate speech score.
    :param category_threshold: Threshold for the category-specific scores.
    :return: List of messages indicating potential hate speech.
    """
    categories = ['race', 'religion', 'origin', 'gender', 'sexuality', 'age', 'disability']
    results = []

    for index, row in df.iterrows():
        if row['score'] >= overall_threshold:
            for category in categories:
                if row[category] >= category_threshold:
                    message = (
                        f"Document ID: {row['Document_ID']}\n"
                        f"Category: {category.title()}\n"
                        f"Overall Hate Speech Score: {row['score']}\n"
                        f"Category Score: {row[category]}\n"
                        f"Sentence: \"{row['sentence']}\"\n\n"
                    )
                    results.append(message)

    return results

# Example Usage
# updated_df = pd.read_csv('your_data_file.csv')  # Load your DataFrame here
detected_messages = detect_hate_speech(updated_df, overall_threshold=0.3, category_threshold=0.7)
print(f"Total Detected Sentences: {len(detected_messages)}")
for message in detected_messages:
    print(message)


Total Detected Sentences: 123
Document ID: 95210630
Category: Sexuality
Overall Hate Speech Score: 0.8388124704360962
Category Score: 0.9939330220222473
Sentence: "To the left is a queer group of buildings, perhaps a monastery or church"


Document ID: 95134021
Category: Race
Overall Hate Speech Score: 0.3496688306331634
Category Score: 0.8299802541732788
Sentence: "Dawes to revoke the right of self-government granted by treaty to the five civilized tribes in the Indian Territory, on the ground that they have shown their inability properly to exercise it, hardly commends itself as a desirable remedy for the evils existing there"


Document ID: 95134021
Category: Origin
Overall Hate Speech Score: 0.6546431183815002
Category Score: 0.8673117756843567
Sentence: "There has been a good deal of brigandage in the Indian Territory of late, and there is too much crime at all times; but the Government is perhaps responsible for some of it, with its unfulfilled pledges in relation to the expulsio

As we can see, the model picks up some questionable results for judging the hate speech in these historical documents.