# GPT & Prejudice — Qualitative Output Analysis

**Goal (TL;DR):** Systematically systematically collect and annotate generations, looking for **patterns and potential biases** in how the model represents themes such as **gender, marriage, wealth, class, family, and society**. The main focus is to see **what stereotypes, prejudices, or social assumptions** the model has absorbed from its contained training corpus.

**Model (quick overview)**
- Decoder-only GPT (custom PyTorch).
- Vocab: 50,257 (GPT-2 tokenizer) • Context length: 256
- Hidden size: 896 • Layers: 8 • Heads: 14 • Dropout: 0.2
- Trained on: 19th-century authors corpus (40 novels from 10 female writers).

---
## 1. Load the model
For this analysis you can use the **Hugging Face Hub** version of our model. 
The model files and data can be found at [https://huggingface.co/HTW-KI-Werkstatt/gpt_and_prejudice](https://huggingface.co/HTW-KI-Werkstatt/gpt_and_prejudice)

There are many ways you can access it.


##### A. You can load the remote model using the `from_pretrained` API with `trust_remote_code=True`, since our GPT implementation is custom.

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "HTW-KI-Werkstatt/gpt_and_prejudice"

# Load tokenizer (GPT-2 tokenizer is used)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

# Load model (custom GPT implementation with trust_remote_code)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True
)

model.to("cpu").eval();

2025-09-22 23:08:08.477805: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2, in other operations, rebuild TensorFlow with the appropriate compiler flags.


##### B. [Optional]: download a snapshot of the model locally to avoid downloading a copy each time

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import snapshot_download

REPO = "HTW-KI-Werkstatt/gpt_and_prejudice"
local_dir = snapshot_download(
    repo_id=REPO,
    revision="main",
    local_dir="./gpt_and_prejudice_snapshot"
)

Then load from disk; nothing is downloaded/updated.

In [None]:
tok = AutoTokenizer.from_pretrained(local_dir="./gpt_and_prejudice_snapshot", trust_remote_code=True, local_files_only=True)
model = AutoModelForCausalLM.from_pretrained(local_dir="./gpt_and_prejudice_snapshot", trust_remote_code=True, local_files_only=True)

model.eval()

##### C. [Optional]: manually download the `.pth` checkpoint of the model

1. From the hugingface repository, download the `model_896_14_8_256.pth` checkpoint and add it to the root of the current direcory.
2. Load the model as follows:
   
   ```
   from utils.model import load_GPT_model

   model = load_GPT_model(path="model_896_14_8_256.pth", device="cpu")
   model.eval()
   ```

The `utils.model` is a custom script of helper funcion located at the `./utils/model.py` in the current directory

---

## 2. Text Generation

We use our custom `generate()` function to produce text continuations from the model. (located at `./generate_text.py`)  
This function runs the model step by step, sampling new tokens according to a probability distribution, until the desired length is reached or an end-of-sequence token appears.

**Parameters used:**

- **`model`** — the trained GPT model.
- **`prompt`** — the starting text given to the model. 
- **`max_new_tokens`** — the maximum number of tokens to generate beyond the prompt.  
  Setting this to `50` means the output will continue for at most 50 tokens.
  <br/> A token is a word or a part of a word.
  <br/> _For example: 'hardly' can be counted as a token, or segmented into two tokens: 'hard' and 'ly'._
- **`temperature`** — controls **randomness** in sampling.  
  Lower values (<1.0) make the model more deterministic, higher values make it more creative.  
  Altering this can allow for some variation in the text.
- **`top_k`** — restricts sampling to the top-K most likely tokens at each step.  
  With `top_k=50`, only the 50 most probable tokens are considered at each step, helping to avoid very unlikely words.

Together, `temperature` and `top_k` balance **creativity vs. reliability** in the generated text.  
Playing arund with those two parameters can yield interesting results

In [2]:
import torch
from generate_text import generate

torch.set_printoptions(profile="full")

In [4]:
text = generate(
    model=model,
    prompt="She is",
    max_new_tokens=50,
    temperature=0.4,
    top_k=50,
)

print(text)

She is very fond of her own family, and I dare say she is very fond of her, and I should like to see her. I don't want her to be a good girl, for she is a very pretty girl, and I think I shall


In [5]:
text = generate(
    model=model,
    prompt="He is",
    max_new_tokens=50,
    temperature=0.4,
    top_k=50,
)

print(text)

He is a good fellow, and he has a great deal of good sense. I am sure he is a very good man, and I am sure he is very good-natured, and I am sure he is very good-natured, and very
