---
title: "Prompt Engineering"
execute:
    enabled: true
---

## Bias in Word Embeddings

Word embeddings can capture and reinforce societal biases from their training data through the geometric relationships between word vectors. These relationships often reflect stereotypes about gender, race, age and other social factors. We'll examine how semantic axes help analyze gender bias in job-related terms, showing both the benefits and risks of word embeddings capturing these real-world associations @bolukbasi2016debiasing.

SemAxis is a powerful tool to analyze gender bias in word embeddings by measuring word alignments along semantic axes @kwak2021frameaxis. Using antonym pairs like "she-he" as poles, it quantifies gender associations in words on a scale from -1 to 1, where positive values indicate feminine associations and negative values indicate masculine as ones.

Let's start with a simple example of analyzing gender bias in occupations.

In [None]:
import gensim.downloader as api
import numpy as np

# Load pre-trained Word2vec embeddings
model = api.load("word2vec-google-news-300")

In [None]:
def compute_bias(word, microframe, model):
    word_vector = model[word]
    numerator = np.dot(word_vector, microframe)
    denominator = np.linalg.norm(word_vector) * np.linalg.norm(microframe)
    return numerator / denominator


def analyze(word, pos_word, neg_word, model):
    if word not in model:
        return 0.0
    microframe = model[pos_word] - model[neg_word]
    bias = compute_bias(word, microframe, model)
    return bias

::: {.column-margin}
The `compute_bias` function calculates the cosine similarity between a word vector and a semantic axis (microframe).

- **Numerator**: Dot product projects the word onto the axis.
- **Denominator**: Normalizes by vector lengths to get a score between -1 and 1.
:::

We will use the following occupations:

In [None]:
# Occupations from the paper
she_occupations = [
    "homemaker",
    "nurse",
    "receptionist",
    "librarian",
    "socialite",
    "hairdresser",
    "nanny",
    "bookkeeper",
    "stylist",
    "housekeeper",
]

he_occupations = [
    "maestro",
    "skipper",
    "protege",
    "philosopher",
    "captain",
    "architect",
    "financier",
    "warrior",
    "broadcaster",
    "magician",
    "boss",
]

We measure the gender bias in these occupations by measuring how they align with the "she-he" axis.

In [None]:
#| code-fold: true
print("Gender Bias in Occupations (she-he axis):")
print("\nShe-associated occupations:")
for occupation in she_occupations:
    bias = analyze(occupation, "she", "he", model)
    print(f"{occupation}: {bias:.3f}")

print("\nHe-associated occupations:")
for occupation in he_occupations:
    bias = analyze(occupation, "she", "he", model)
    print(f"{occupation}: {bias:.3f}")

::: {.column-margin}
**Interpreting the Scores:**

- **Positive scores (> 0)**: Closer to "she" (e.g., *nurse*, *librarian*).
- **Negative scores (< 0)**: Closer to "he" (e.g., *architect*, *captain*).
- **Magnitude**: A larger absolute value indicates a stronger gender association.
:::

Notice how occupations historically associated with women (like *nurse* and *librarian*) have strong positive scores, while those associated with men (like *captain* and *architect*) have negative scores. This confirms that the model has learned these gender stereotypes from the text data.

### Stereotype Analogies

Since word embeddings capture semantic relationships learned from large text corpora, they inevitably encode societal biases and stereotypes present in that training data. We can leverage this property to identify pairs of words that exhibit stereotypical gender associations. By measuring how different words align with the gender axis (she-he), we can find pairs where one word shows a strong feminine bias while its counterpart shows a masculine bias, revealing ingrained stereotypes in language use.


In [None]:
# Stereotype analogies from the paper
stereotype_pairs = [
    ("sewing", "carpentry"),
    ("nurse", "surgeon"),
    ("softball", "baseball"),
    ("cosmetics", "pharmaceuticals"),
    ("vocalist", "guitarist"),
]

In [None]:
#| code-fold: true
print("\nAnalyzing Gender Stereotype Pairs:")
for word1, word2 in stereotype_pairs:
    bias1 = analyze(word1, "she", "he", model)
    bias2 = analyze(word2, "she", "he", model)
    print(f"\n{word1} vs {word2}")
    print(f"{word1}: {bias1:.3f}")
    print(f"{word2}: {bias2:.3f}")

The results show clear stereotypical alignments. *Sewing* and *nurse* align with "she", while *carpentry* and *surgeon* align with "he". This mirrors the "man is to computer programmer as woman is to homemaker" analogy found in early word embedding research.

### Indirect Bias Analysis

Indirect bias occurs when seemingly neutral words or concepts become associated with gender through their relationships with other words. For example, while "softball" and "football" are not inherently gendered terms, they may show gender associations in word embeddings due to how they're used in language and society.

We can detect indirect bias by:
1. Identifying word pairs that form a semantic axis (e.g., softball-football)
2. Measuring how other words align with this axis
3. Examining if alignment with this axis correlates with gender bias

This reveals how gender stereotypes can be encoded indirectly through word associations, even when the words themselves don't explicitly reference gender.

Let's see how this works in practice. We first measure the gender bias of the following words:

In [None]:
# Words associated with softball-football axis
softball_associations = [
    "pitcher",
    "bookkeeper",
    "receptionist",
    "nurse",
    "waitress"
]

football_associations = [
    "footballer",
    "businessman",
    "pundit",
    "maestro",
    "cleric"
]

# Calculate biases for all words
gender_biases = []
sports_biases = []
words = softball_associations + football_associations

for word in words:
    gender_bias = analyze(word, "she", "he", model)
    sports_bias = analyze(word, "softball", "football", model)
    gender_biases.append(gender_bias)
    sports_biases.append(sports_bias)

Let's plot the results:

In [None]:
#| code-fold: true
# Analyze bias along both gender and sports axes
import matplotlib.pyplot as plt
import seaborn as sns
from adjustText import adjust_text

# Create scatter plot
fig, ax = plt.subplots(figsize=(6, 6))
sns.scatterplot(x=gender_biases, y=sports_biases, ax=ax)
ax.set_xlabel("Gender Bias (she-he)")
ax.set_ylabel("Sports Bias (softball-football)")
ax.set_title("Indirect Bias Analysis: Gender vs Sports")

# Add labels for each point
texts = []
for i, word in enumerate(words):
    texts.append(ax.text(gender_biases[i], sports_biases[i], word, fontsize=12))

adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray', lw=0.5))

ax.grid(True, alpha=0.3)
plt.show()
sns.despine()

::: {.column-margin}
**Indirect Bias:**

The plot reveals a correlation: words associated with "softball" (y-axis > 0) also tend to be associated with "she" (x-axis > 0). Conversely, "football" terms align with "he".

This suggests that even if we remove explicit gender words, the *structure* of the space still encodes gender through these proxy dimensions.
:::


## Take away


Word embeddings, while powerful, inevitably capture and reflect societal biases present in the large text corpora they are trained on. We observed both **direct bias**, where occupations or attributes align strongly with specific gender pronouns, and **indirect bias**, where seemingly neutral concepts become gendered through their associations with other words. This analysis highlights the importance of understanding and mitigating these biases to prevent the perpetuation of stereotypes in AI systems and ensure fairness in applications like search, recommendation, and hiring.