# NLP Final Project Notebook

This notebook contains the main analysis and experiments for the NLP final project.

# Setup

FIRST... Follow instructions in README.md

Check if Ollama is properly installed and running.

In [None]:
!./scripts/check_ollama.sh

## Env Example 

Below is an example cell on how to reference and call functions we create in isolated files

In [None]:
from src.visualization.utils import example_plot, example_function

example_plot()
example_function()

# Global Imports

In [14]:
# Standard Library Imports
import json
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Custom Imports
from src.io.ollama_client import OllamaClient
from src.io.ollama_client import ConversationEntry, SavedConversationData

# Example Usage

## OllamaClient Usage

This is the client that wraps the usage of the ollama API, and should be instantiated per model

In [2]:
qwen = OllamaClient("qwen3:0.6b", log_conversations=True)

Run the cell below to generate text from the model, this will generate a standalone response (no context)

In [3]:
qwen.generate_text("Hello, world!")

'Hello, world! ðŸ˜Š How can I assist you today?'

After running the cell above, the conversation history will be populated with the prompt and response

In [4]:
print(qwen.get_conversation_count())
# The conversation history can be accessed by calling the get_conversation_history method
print(qwen.get_conversation_history())
# The last conversation can be accessed by calling the get_last_conversation method
print(qwen.get_last_conversation())
# The conversation history can be cleared by calling the clear_conversation_history method
# qwen.clear_conversation_history()

1
[{'input': 'Hello, world!', 'response': 'Hello, world! ðŸ˜Š How can I assist you today?', 'details': {'model': 'qwen3:0.6b', 'created_at': '2025-10-31T18:32:40.67302Z', 'done': True, 'done_reason': 'stop', 'total_duration': 8326420959, 'load_duration': 7542218917, 'prompt_eval_count': 14, 'prompt_eval_duration': 323743167, 'eval_count': 86, 'eval_duration': 425697880, 'response': 'Hello, world! ðŸ˜Š How can I assist you today?', 'thinking': 'Okay, the user said "Hello, world!" so I should respond with a friendly greeting. Let me make sure to keep it simple and positive. Maybe say "Hello, world!" and add a friendly message. Let me check if there\'s any specific context needed, but since it\'s a general greeting, keeping it straightforward is best.\n', 'context': [151644, 872, 198, 9707, 11, 1879, 0, 608, 26865, 151645, 198, 151644, 77091, 198, 151667, 198, 32313, 11, 279, 1196, 1053, 330, 9707, 11, 1879, 8958, 773, 358, 1265, 5889, 448, 264, 11657, 42113, 13, 6771, 752, 1281, 2704, 31

## Conversation Persistence

You can save and load conversation histories to disk for experiment tracking.


In [5]:
# Create a client with custom output directory
client = OllamaClient("qwen3:0.6b", output_dir="./experiments", log_conversations=True)

# Generate some conversations
response1 = client.generate_text("What is machine learning? Explain like I'm 5 in 20 words or less.")
print(f"Response 1: {response1}\n")

response2 = client.generate_text("Explain neural networks? Explain like I'm 5 in 20 words or less.")
print(f"Response 2: {response2}\n")

# Flush conversation history to disk
saved_path = client.flush_conversation_history("test_experiment")
print(f"Saved conversation history to: {saved_path}")


Response 1: Machine learning is like helping computers learn from data to make better guesses. For example, if you look at a picture of a cat, your computer learns to recognize it from that image.

Response 2: A neural network is like a brain made of tiny circuits that help learn and make decisions.

Saved conversation history to: experiments/test_experiment/conversation_history_20251031_143242.json


### Loading Conversation History

You can reload conversation histories from disk. By default, it loads the most recent file.

- Method 1: Using get_conversation_history() to iterate through the in-memory loaded conversations
- Method 2: Using iter_experiment_conversations() to iterate directly from files (memory-efficient for large experiments)

In [6]:
new_client = OllamaClient("qwen3:0.6b", output_dir="./experiments", log_conversations=True)

# Load the most recent conversation history for "test_experiment"
count = new_client.load_conversation_history("test_experiment")
print(f"Loaded {count} conversations from disk\n")

# Verify the loaded history
print(f"Conversation count: {new_client.get_conversation_count()}")
print(f"Last response: {new_client.get_last_response_text()}")
print("\n" + "="*80 + "\n")

# Method 1: Iterate through in-memory conversation history
# print("METHOD 1: Iterating through in-memory conversation history")
# print("="*80)
# for i, conv in enumerate[ConversationEntry](new_client.get_conversation_history(), 1):
#     print(f"\n--- Conversation {i} ---")
#     print(f"Prompt:   {conv['input'][:80]}...")  # First 80 chars
#     print(f"Response: {conv['response'][:80]}...")  # First 80 chars

# print("\n" + "="*80 + "\n")

# Method 2: Memory-efficient file iterator (doesn't load all into memory)
# print("METHOD 2: Iterating using file iterator (memory-efficient)")
# print("="*80)
# for i, conv in enumerate[ConversationEntry](new_client.iter_experiment_conversations("test_experiment"), 1):
#     print(f"\n--- Conversation {i} ---")
#     print(f"Prompt:   {conv['input'][:80]}...")  # First 80 chars
#     print(f"Response: {conv['response'][:80]}...")  # First 80 chars

Loaded 8 conversations from disk

Conversation count: 8
Last response: A neural network is like a brain made of tiny circuits that help learn and make decisions.




### Viewing Saved JSON Structure

The saved conversation history contains all the details needed for analysis.


In [7]:
# Example of viewing a single conversation entry
conversation = new_client.get_conversation_history()[0]

print("Conversation Structure:")
print(f"Input: {conversation['input']}")
print(f"Response: {conversation['response']}")
print(f"Type: {conversation['type']}")
print(f"Timestamp: {conversation['timestamp']}")
print(f"\nDetails keys: {conversation['details'].keys()}")


Conversation Structure:
Input: What is machine learning? Explain like I'm 5 in 20 words or less.
Response: Machine learning is when computers learn from data to make smart decisions. Like a kid learning to play games by watching others.
Type: generate
Timestamp: 2025-10-31T14:12:20.397531

Details keys: dict_keys(['model', 'created_at', 'done', 'done_reason', 'total_duration', 'load_duration', 'prompt_eval_count', 'prompt_eval_duration', 'eval_count', 'eval_duration', 'response', 'thinking', 'context'])


The response from ollama and stored in the client contains a ton of information if needed. Note the difference in the already parsed thinking and response secitons, this could be used later.

In [8]:
resp = qwen.get_last_conversation().get("details")
print(json.dumps(resp, indent=4))

{
    "model": "qwen3:0.6b",
    "created_at": "2025-10-31T18:32:40.67302Z",
    "done": true,
    "done_reason": "stop",
    "total_duration": 8326420959,
    "load_duration": 7542218917,
    "prompt_eval_count": 14,
    "prompt_eval_duration": 323743167,
    "eval_count": 86,
    "eval_duration": 425697880,
    "response": "Hello, world! \ud83d\ude0a How can I assist you today?",
    "thinking": "Okay, the user said \"Hello, world!\" so I should respond with a friendly greeting. Let me make sure to keep it simple and positive. Maybe say \"Hello, world!\" and add a friendly message. Let me check if there's any specific context needed, but since it's a general greeting, keeping it straightforward is best.\n",
    "context": [
        151644,
        872,
        198,
        9707,
        11,
        1879,
        0,
        608,
        26865,
        151645,
        198,
        151644,
        77091,
        198,
        151667,
        198,
        32313,
        11,
        279,
 

Or more simply

In [9]:
qwen.get_last_response_text()

'Hello, world! ðŸ˜Š How can I assist you today?'

Depending on how you choose to setup your experiments, you can choose to use the cached responses in the client class, or keep track of the inputs and outputs outside of the client, either is fine just use `log_conversations=False` to prevent memory bloat if you are doing it yourself. I imagine keeping it within the class will work better, then appending a new message like `prompt + qwen.get_last_response()` for the output test. This way, we can parse the message history into formats used for calculating embeddings, drift, accelertaion etc. Use the flush functionality to save to the file system if you are running this for a while, or running this for a lot of iterations. The scheme for your tests should be split into data creation, then data parsing. I.E. create a cell to run the prompts and save the conversations, then create a cell that iterates through the class or files to compute embeddings.

## Embeddings

Create embeddings using the same OllamaClient with a new model name

In [13]:
string_to_embed = qwen.get_last_response_text() # "Hello world!"

embeddinggemma = OllamaClient("embeddinggemma", log_conversations=False)

embeddings = np.array(embeddinggemma.generate_embeddings(string_to_embed))

print(embeddings.shape)
print(embeddings)

(768,)
[-1.62469430e-01  3.26936550e-03  3.11880990e-02 -1.59651330e-03
 -6.53016600e-03 -5.89228760e-04 -4.31132000e-02  4.79441320e-02
  3.38893720e-02 -4.58842550e-02 -1.49056690e-02 -4.48386070e-02
  4.63952160e-02 -7.78193030e-03  1.11231430e-01  5.02321700e-03
  3.52546480e-03 -5.01822270e-02 -8.63471500e-02  3.89924040e-03
  5.12065140e-02 -3.44341060e-02  3.50343300e-03 -3.93365250e-02
  2.87021960e-02  4.58670850e-02 -1.80611050e-02 -1.86819150e-02
 -8.57127100e-03  1.64970900e-03  3.30678000e-02 -7.02537130e-03
  4.23321000e-02 -8.51234050e-03  2.30838040e-02  3.57760820e-02
  1.63466930e-02 -7.73056450e-02  6.65985940e-02 -2.67599580e-02
 -9.51105800e-02  6.23018300e-02 -1.02202930e-02 -3.44547300e-02
 -2.88139100e-03 -4.41396120e-02 -5.94716260e-02 -5.43507800e-02
 -2.22388640e-02  2.93579340e-02 -1.48777290e-02  6.02593970e-03
 -2.71349080e-02  1.94219720e-02 -3.81459780e-02 -6.89787320e-03
  4.73572300e-03 -4.33932150e-03 -4.41020400e-02  4.05428260e-02
 -5.37357480e-02 -

A quick similarity sanity check

In [15]:
# Test cases
cases = [
    ("Not Similar", "The quick brown fox jumps over the lazy dog", 
     "Machine learning is a subset of artificial intelligence"),
    ("Similar", "A dog is playing in the park", 
     "A puppy is running in the garden"),
    ("Identical", "Hello world!", "Hello world!")
]

for label, text1, text2 in cases:
    emb1 = np.array(embeddinggemma.generate_embeddings(text1)).reshape(1, -1)
    emb2 = np.array(embeddinggemma.generate_embeddings(text2)).reshape(1, -1)
    similarity = cosine_similarity(emb1, emb2)[0][0]
    print(f"{label:12s} | Similarity: {similarity:.4f}")

Not Similar  | Similarity: 0.6594
Similar      | Similarity: 0.8465
Identical    | Similarity: 1.0000


# Output Test

Create test env setup below

In [None]:
# Call boilerplate code and specific constructs you need from your src files here

# Telephone Test


Create test env setup below

In [None]:
# Call boilerplate code and specific constructs you need from your src files here

# Vizualization and Analytics

Create test env setup below

In [None]:
# Call boilerplate code and specific constructs you need from your src files here