# Welcome!

This notebook will allow you to customize prompts with different language models on your data.

# Establish Your Working Directory

For our projects this semester we will upload a .csv file that has a "text" column. This will be our input to the language model.

First establish your working directory. Create a folder called "Jupyter" and put it in your Documents folder. Then run this cell.

In [1]:
# Import the libraries we need
from pathlib import Path  # This helps us work with file paths
import os                # This lets us change directories

def use_jupyter_folder():
    # Get the path to the Jupyter folder
    jupyter_folder = Path.home() / 'Documents' / 'Jupyter'
    
    # Try to change to that directory
    if jupyter_folder.exists():
        os.chdir(jupyter_folder)
        print(f"✅ Now using your Jupyter folder!")
        print(f"Current working directory: {Path.cwd()}")
    else:
        print("❌ Couldn't find the Jupyter folder in Documents.")
        print("Please make sure you've created it first.")

# Run this to switch to the Jupyter folder
use_jupyter_folder()

✅ Now using your Jupyter folder!
Current working directory: /Users/akpiper/Documents/Jupyter


# Upload Your Data

Next upload a .csv file of your choosing. Paste the filename where indicated at the bottom. This cell will output the column names.

In [1]:
########## CONFIGURATION VARIABLES ###########
FILENAME = "NarraDetect_Scalar.csv"  # Your CSV filename here

## Define function
import pandas as pd

def load_csv(filename):
   """Load CSV file and display info"""
   try:
       df = pd.read_csv(filename)
       print(f"✅ Successfully loaded {filename}")
       print(f"Shape: {df.shape[0]} rows, {df.shape[1]} columns")
       print("\nColumns in this dataset:")
       for col in df.columns:
           print(f"- {col}")
       return df
   except FileNotFoundError:
       print(f"❌ Could not find {filename} in {Path.cwd()}")
   except Exception as e:
       print(f"❌ Error loading file: {str(e)}")

# Run Function
df = load_csv(FILENAME)


✅ Successfully loaded CharacterActions_Meta.csv
Shape: 30 rows, 6 columns

Columns in this dataset:
- file
- type
- token
- text
- Human_Label
- bookNLP_Label


# Inspect Your Data

This cell will give you brief summary statistics on the input text column. This is the column you will use as part of your prompting.

In [3]:
########## CONFIGURATION VARIABLES ###########
TEXT_COLUMN = 'TEXT'    # Column containing text data
NUM_EXAMPLES = 2        # Number of example texts to display

########## FUNCTION DEFINITION ###########
def text_stats(df, text_column=TEXT_COLUMN, num_examples=NUM_EXAMPLES):

   """Display text statistics and examples"""
   # Calculate word counts
   word_counts = df[text_column].str.split().str.len()
   total_words = word_counts.sum()
   
   print(f"📊 Dataset Overview:")
   print(f"Total number of texts: {len(df)}")
   
   print(f"\n📝 Text Length Statistics:")
   print(f"Shortest text: {word_counts.min()} words")
   print(f"Longest text: {word_counts.max()} words")
   print(f"Average length: {word_counts.mean():.1f} words")
   print(f"Median length: {word_counts.median():.1f} words")
   print(f"Total words in dataset: {total_words:,} words")
   
   print(f"\n📚 Here are {num_examples} example texts from your data:")
   for i in range(num_examples):
       idx = df.index[i]
       text = df.loc[idx, text_column]
       length = len(text.split())
       print(f"Example {i+1}:")
       print(f"Length: {length} words")
       print(f"Text: {text}")

# Calculate statistics and show examples
text_stats(df)

📊 Dataset Overview:
Total number of texts: 30

📝 Text Length Statistics:
Shortest text: 10 words
Longest text: 89 words
Average length: 41.6 words
Median length: 38.0 words
Total words in dataset: 1,248 words

📚 Here are 2 example texts from your data:
Example 1:
Length: 19 words
Text: The beep followed . Jack Sr . adopted a light and airy tone that belied his true mood .
Example 2:
Length: 48 words
Text: I know she ’s playing a game with me , but I do n’t know the rules , and she ’s got all the cards . Still , the hell with it — I just ca n’t find it in me to care that I ’m losing .


# Ensure a Model is Loaded in Ollama

You will run this cell only once for the semester. Once the model is loaded you don't need to run it again.
But you do need to run it every time you want to test a new model.

In [4]:
model = "llama3:8b"  # Change this to your model name, e.g. "mistral", "codellama", etc.
#model = "deepseek-r1:7b"
#!ollama pull {model}
print("Done!")

Done!


# Prompt Testing

In this cell you can choose a model from Ollama and then customize a prompt. You also need to specify the text column from your .csv. The cell chooses a random passage from the .csv and outputs the answer. You can run multiple times to keep testing answers on random passages.

In [7]:
import random
import random
import requests
import ast

##### INPUT YOUR PARAMETERS HERE #####
MODEL_NAME = model 
COLUMN_NAME = "TEXT"   # Change dataframe column name here
PROMPT_TEMPLATE = "Is this passage from a story? Answer yes or no {text}" #Change your prompt here
LABELS = ["Yes", "No"]

## Define function

def query_ollama(text):
    """Query local ollama model with text"""
    url = "http://localhost:11434/api/generate"
    prompt = PROMPT_TEMPLATE.format(text=text)
    
    data = {
        "model": MODEL_NAME,
        "prompt": prompt,
        "stream": False,
        "format": {
            "type": "object",
            "properties": {
                "label": {
                    "type": "string",
                    "enum" : LABELS
                },
            },
            "required": [
                    "label",
                ]
        } if STRUCTURED else ''
    }
    
    try:
        # Check if model exists
        model_url = "http://localhost:11434/api/tags"
        models = requests.get(model_url).json()
        available_models = [model['name'] for model in models['models']]
        
        if MODEL_NAME not in available_models:
            print(f"❌ Model '{MODEL_NAME}' not found.")
            print(f"Available models: {', '.join(available_models)}")
            print(f"\nTo install {MODEL_NAME}, run this in terminal:")
            print(f"ollama pull {MODEL_NAME}")
            return None

        response = requests.post(url, json=data)
        if response.status_code == 404:
            print("❌ Ollama service not running.")
            print("Start ollama by running 'ollama serve' in terminal")
            return None

        result = response.json()
        if STRUCTURED:
            return ast.literal_eval(result['response'])['label']
        return result['response']

    except requests.exceptions.ConnectionError:
        print("❌ Cannot connect to Ollama")
        print("1. Check if Ollama is installed") 
        print("2. Start Ollama by running 'ollama serve' in terminal")
        return None
    except Exception as e:
        print(f"❌ Error: {str(e)}")
        return None

def analyze_random_text(df):
  """Analyze a random text from dataset"""
  random_idx = random.randint(0, len(df)-1)
  text = df.iloc[random_idx][COLUMN_NAME]
  print("\n📖 SAMPLE PASSAGE:")
  print(text)
  print("\n🤖 MODEL RESPONSE:")
  return query_ollama(text)

# Run
result = analyze_random_text(df)
if result:
    print(result)


📖 SAMPLE PASSAGE:
Now she is not part of the year beginning . She has no job , she has enough money and nothing to do except this thing she lays upon herself , the commission , on which , she believes , rides her only hope of fashioning a life .

🤖 MODEL RESPONSE:
COG


# Sample your data

In this cell you will downsample your .csv file to run a mini test in class. For your final report you will run the model(s) against all rows (or a minimum sample of 100). This function allows you to determine the number of rows you sample and stores the new table. Change the variable "n" for number of rows to sample. 

** Note: every time you run this cell you will get a new random sample.

In [6]:
########## CONFIGURATION VARIABLES ###########
SAMPLE_SIZE = 20  # Number of random texts to sample

########## FUNCTION DEFINITION ###########
def sample_texts(df, n=SAMPLE_SIZE):
    """
    Sample n random rows from the dataset
    
    Parameters:
    df (pandas.DataFrame): Your dataset
    n (int): Number of samples to take
    """
    global sample_df
    sample_df = df.sample(n=n)
    
# Create sample with 3 rows
sample_texts(df)

# Run your prompt on your sample data

In this cell you will run your prompt on the sampled data from above. The outputs will be stored as a new column named after the model you are using. In the next cell you can view those results. The cell will output "Completed" when complete.

In [7]:
import requests

##### INPUT YOUR PARAMETERS HERE #####
MODEL_NAME = model 
COLUMN_NAME = "TEXT"   # Change dataframe column name here
PROMPT_TEMPLATE = "Is this passage from a story? Answer only yes or no. {text}"
STRUCTURED = False
LABELS = ["Yes", "No"]

def query_ollama(text):
    """Query local ollama model with text"""
    url = "http://localhost:11434/api/generate"
    prompt = PROMPT_TEMPLATE.format(text=text)
    
    data = {
        "model": MODEL_NAME,
        "prompt": prompt,
        "stream": False,
        "format": {
            "type": "object",
            "properties": {
                "label": {
                    "type": "string",
                    "enum" : LABELS
                },
            },
            "required": [
                    "label",
            ]
        } if STRUCTURED else ''
    }
    
    try:
        # Check if model exists
        model_url = "http://localhost:11434/api/tags"
        models = requests.get(model_url).json()
        available_models = [model['name'] for model in models['models']]
        
        if MODEL_NAME not in available_models:
            print(f"❌ Model '{MODEL_NAME}' not found.")
            return None

        response = requests.post(url, json=data)
        if response.status_code == 404:
            print("❌ Ollama service not running.")
            return None

        result = response.json()
        if STRUCTURED:
            return ast.literal_eval(result['response'])['label']
        return result['response']
    except requests.exceptions.ConnectionError:
        print("❌ Cannot connect to Ollama")
        return None
    except Exception as e:
        print(f"❌ Error: {str(e)}")
        return None

def analyze_all_texts(df):
    """Analyze all texts in the dataframe"""
    # Create new column for responses using model name
    df[MODEL_NAME] = df[COLUMN_NAME].apply(query_ollama)
    return df

# Run analysis on all rows
sample_df = analyze_all_texts(sample_df)
print("Completed!")

Completed!


# Inspect your outputs

You can quickly scan your results by printing out the first N examples. Change the final integer to print more or less. Shows the passage + prompt output.

In [8]:
print(sample_df[[COLUMN_NAME, MODEL_NAME]].head(5))

                                                  TEXT llama3:8b
113  Situated within spitting distance of the U.S. ...       Yes
351  ""Furthermore, it was guilty of laches in keep...        No
67   The magistrates imposed a fine of £150 with 20...       Yes
390  (His eyes, his glittering eyes, said Lewis.) H...       Yes
66   “Your years of devoted work in the Foreign Ser...       Yes


Print a single passage by row number.

In [10]:
# Display a specific row (change row_number to view different rows)
row_number = 2  # Change this number to view different rows
print(f"\nDetailed view of row {row_number}:")
print(f"\nTEXT:\n{sample_df[COLUMN_NAME].iloc[row_number]}")
print(f"\n{MODEL_NAME} response:\n{sample_df[MODEL_NAME].iloc[row_number]}")


Detailed view of row 2:

TEXT:
The magistrates imposed a fine of £150 with 20 guineas (£21) costs. Still weak and shaken from her miscarriage, Yoko was mobbed outside the court building, one female spectator taking the opportunity to give her hair a vicious yank. The following day Unfinished Music No. 1—Two Virgins was released in the United Kingdom, adding an unofficial charge of indecent exposure to John’s indictment. The brown paper cover had an allure long proven in the dirty book trade, and thousands rushed to buy the album, not to hear what extraordinary new sounds the Two Virgins had created on their first night together, but for a look at her tits and his dick.

llama3:8b response:
Yes


# Compare your outputs to another reference column

In the following cells you will compare the accuracy of your outputs to already annotated data. First you need to identify the "reference" column. These are the annotations. Second, you need to align your outputs with those of the reference column. Typically these will consist of a few number of codes. So the first step is finding out these codes so you can align them with your outputs.

## What are the annotation categories of my data

Output a table of the categories and their counts in your data. Change the reference column name accordingly.

In [11]:
########## CONFIGURATION VARIABLES ###########
REFERENCE_COLUMN = "Reader.Predicted.Label"  # Column name for reference categories

########## EXECUTE ANALYSIS ###########
reference_counts = sample_df[REFERENCE_COLUMN].value_counts()

print("Categories and their counts:")
for category, count in reference_counts.items():
    print(f"{category}: {count}")
print(f"\nTotal samples: {len(sample_df)}")

Categories and their counts:
1: 12
0: 8

Total samples: 20


Do the same thing for your model outputs.

In [12]:
# Get value counts of the model responses
model_counts = sample_df[MODEL_NAME].value_counts()

print(f"Response categories from {MODEL_NAME}:")
for category, count in model_counts.items():
    print(f"{category}: {count}")
print(f"\nTotal samples: {len(sample_df)}")

Response categories from llama3:8b:
Yes: 11
Yes.: 5
No: 3
No.: 1

Total samples: 20


## Clean your model outputs

In order to align your outputs with the reference column you first need to standardize your outputs. This cell outputs all of the types of response you got.

In [13]:
import re

def clean_responses(df, model_column=MODEL_NAME):
    """
    Clean and standardize model responses to 'Yes' or 'No' using case-insensitive pattern matching
    """
    def standardize_response(response):
        # Case insensitive search for 'yes' or 'no'
        if re.search(r'yes', response, re.IGNORECASE):
            return 'Yes'
        elif re.search(r'no', response, re.IGNORECASE):
            return 'No'
        else:
            print(f"Warning: Unexpected response format: '{response}'")
            return 'Unknown'
            
    # Create new cleaned column
    cleaned_column = f"{model_column}_cleaned"
    df[cleaned_column] = df[model_column].apply(standardize_response)
    return df

# Clean the responses
sample_df = clean_responses(sample_df)

# Show the new counts
cleaned_counts = sample_df[f"{MODEL_NAME}_cleaned"].value_counts()
print(f"Cleaned response categories:")
for category, count in cleaned_counts.items():
    print(f"{category}: {count}")
print(f"\nTotal samples: {len(sample_df)}")

Cleaned response categories:
Yes: 16
No: 4

Total samples: 20


## Align outputs with reference column

Next transform the reference column categories to match your model outputs. This is going to be a custom job every time. Paste the current code into Claude or GPT and ask for help changing the code based on your situation. Give the AI as much information as possible to help you and then paste the code back in here.


In [14]:
def transform_reference_labels(df):
    """
    Transform reference labels from 1/0 to Yes/No
    """
    def map_label(value):
        if value == 1:
            return 'Yes'
        elif value == 0:
            return 'No'
        else:
            print(f"Warning: Unexpected reference value: {value}")
            return 'Unknown'
            
    # Create new transformed column
    reference_cleaned = "Reader.Predicted.Label_cleaned"
    df[reference_cleaned] = df["Reader.Predicted.Label"].apply(map_label)
    return df

# Transform reference labels
sample_df = transform_reference_labels(sample_df)

# Show the new counts
reference_counts = sample_df["Reader.Predicted.Label_cleaned"].value_counts()
print(f"Reference label categories:")
for category, count in reference_counts.items():
    print(f"{category}: {count}")
print(f"\nTotal samples: {len(sample_df)}")

Reference label categories:
Yes: 12
No: 8

Total samples: 20


# Calculate Precision, Recall, and F1 Score

These are measures of agreement we will use this semester to see how well a model + prompt performs. Make sure to adjust the column names below to match your goals.

In [15]:
from sklearn.metrics import precision_score, recall_score, f1_score

def calculate_metrics(df, model_column=MODEL_NAME):
    """
    Calculate precision, recall, and F1 score for model predictions
    """
    ##### CHANGE COLUMN NAMES ######
    y_true = df['Reader.Predicted.Label_cleaned'] ### This column is the original annotated data that has been transformed.
    y_pred = df[f'{model_column}_cleaned'] ### This is your data that has been cleaned.
    
    # Convert to binary format for sklearn (Yes -> 1, No -> 0)
    y_true_binary = (y_true == 'Yes').astype(int)
    y_pred_binary = (y_pred == 'Yes').astype(int)
    
    # Calculate metrics
    precision = precision_score(y_true_binary, y_pred_binary)
    recall = recall_score(y_true_binary, y_pred_binary)
    f1 = f1_score(y_true_binary, y_pred_binary)
    
    # Print results
    print(f"Metrics for {model_column}:")
    print(f"Precision: {precision:.3f}")
    print(f"Recall: {recall:.3f}")
    print(f"F1 Score: {f1:.3f}")
    
# Calculate metrics
calculate_metrics(sample_df)

Metrics for llama3:8b:
Precision: 0.750
Recall: 1.000
F1 Score: 0.857


## Output a confusion matrix

In [16]:
from sklearn.metrics import confusion_matrix
import pandas as pd

def display_confusion_matrix(df, model_column=MODEL_NAME):
    """
    Create and display a confusion matrix comparing model predictions to reference labels
    """
    # Get the cleaned columns
    y_true = df['Reader.Predicted.Label_cleaned']
    y_pred = df[f'{model_column}_cleaned']
    
    # Create confusion matrix
    cm = confusion_matrix(y_true, y_pred, labels=['Yes', 'No'])
    
    # Convert to pandas DataFrame for better display
    cm_df = pd.DataFrame(
        cm, 
        index=['True Yes', 'True No'],
        columns=['Predicted Yes', 'Predicted No']
    )
    
    print("Confusion Matrix:")
    print(f"Model: {model_column}")
    print(cm_df)
    print("\nReading the matrix:")
    print(f"True Positives (Correct Yes): {cm[0,0]}")
    print(f"False Negatives (Missed Yes): {cm[0,1]}")
    print(f"False Positives (Wrong Yes): {cm[1,0]}")
    print(f"True Negatives (Correct No): {cm[1,1]}")

# Display confusion matrix
display_confusion_matrix(sample_df)

Confusion Matrix:
Model: llama3:8b
          Predicted Yes  Predicted No
True Yes             12             0
True No               4             4

Reading the matrix:
True Positives (Correct Yes): 12
False Negatives (Missed Yes): 0
False Positives (Wrong Yes): 4
True Negatives (Correct No): 4


## Inspect errors

Your errors can take the form of false positives (e.g. when the model thinks a passage is a story but isn't) or false negatives (e.g. when your model thinks the passage isn't a story but is).

In [17]:
def show_error_examples(df, model_column=MODEL_NAME):
    """
    Display examples of false positives and false negatives with their corresponding text passages
    """
    # Get relevant columns
    ref_col = 'Reader.Predicted.Label_cleaned'
    pred_col = f'{model_column}_cleaned'
    
    # Find false positives (model predicted Yes when true label was No)
    false_positives = df[
        (df[pred_col] == 'Yes') & 
        (df[ref_col] == 'No')
    ]
    
    # Find false negatives (model predicted No when true label was Yes)
    false_negatives = df[
        (df[pred_col] == 'No') & 
        (df[ref_col] == 'Yes')
    ]
    
    # Display one example of each if available
    print("=== FALSE POSITIVE EXAMPLE ===")
    print("(Model incorrectly predicted Yes when the true label was No)")
    if len(false_positives) > 0:
        fp_example = false_positives.sample(1).iloc[0]
        print(f"\nPassage:")
        print(fp_example['TEXT'])
        print(f"\nModel response:")
        print(fp_example[model_column])
    else:
        print("No false positives found!")
        
    print("\n=== FALSE NEGATIVE EXAMPLE ===")
    print("(Model incorrectly predicted No when the true label was Yes)")
    if len(false_negatives) > 0:
        fn_example = false_negatives.sample(1).iloc[0]
        print(f"\nPassage:")
        print(fn_example['TEXT'])
        print(f"\nModel response:")
        print(fn_example[model_column])
    else:
        print("No false negatives found!")

# Show error examples
show_error_examples(sample_df)

=== FALSE POSITIVE EXAMPLE ===
(Model incorrectly predicted Yes when the true label was No)

Passage:
Even the faded, inconsequential and self-doubting Spirit [Geist] of the elders is more approachable than the quick-witted stupidity of junior. Even the neurotic peculiarities and malformations of the older adults represent character, that which is humanly achieved, compared with pathic health, infantilism raised to a norm. One realizes in horror that when one previously clashed with one’s parents, because they represented the world, one was secretly the mouthpiece of a still worse world against the merely bad. Unpolitical attempts to break out of the bourgeois family usually only lead to deeper entanglement in such, and sometimes it seems as if the disastrous germ-cell of society, the family, is simultaneously the nourishing germ-cell of the uncompromising will for a different one. What disintegrates, along with the family – so long as the system continues – is not just the most effect