# negotiating the past

We'll use ollama and llama3 7b to run some test.


We load the necessary libraries.

In [2]:
!python --version
%pip install umap-learn
%pip install gensim
%pip install nltk
%pip install pandas
%pip install ollama

Python 3.11.11
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting ollama
  Downloading ollama-0.4.8-py3-none-any.whl.metadata (4.7 kB)
Downloading ollama-0.4.8-py3-none-any.whl (13 kB)
Installing collected packages: ollama
Successfully installed ollama-0.4.8
Note: you may need to restart the kernel to use updated packages.


## Creating a Historical Prompt Dataset

In [2]:
import ollama
import pandas as pd
from tqdm import tqdm
import time
import concurrent.futures

# Make sure the model is pulled
# In terminal: ollama pull deepseek-r1:32b


# Load your dataset (assuming CSV format)
df = pd.read_csv("data/prompts.csv", nrows=1000)

results = []

def process_prompt(prompt):
    system_message = "Analyze the following prompt and determine if it contains an implicit or explicit reference to the past. A reference to the past can be made through a famous person such as Napoléon or JFK, a famous event, the presence of a date, a century, a civilisation that does not exist anymore, a war to give a few examples. Only respond with 'yes' or 'no'."
    
    try:
        # Call deepseek locally through Ollama
        response = ollama.chat(
            model='deepseek-r1:32b',
            messages=[
                {
                    'role': 'system',
                    'content': system_message
                },
                {
                    'role': 'user',
                    'content': prompt
                }
            ],
            options={
                'temperature': 0.0,  # Deterministic output
                'num_predict': 10    # We only need a short response
            }
        )
        
        # Extract the yes/no response
        classification = response['message']['content'].strip().lower()
        # Clean up response to ensure we get just yes/no
        if 'yes' in classification:
            result = 'yes'
        elif 'no' in classification:
            result = 'no'
        else:
            # Handle ambiguous responses
            result = 'unclear'
            
        return {'prompt': prompt, 'references_past': result}
    
    except Exception as e:
        # Handle errors
        print(f"Error processing prompt: {e}")
        time.sleep(1)  # Back off a bit
        return {'prompt': prompt, 'references_past': 'error'}

# Batch processing with parallel execution
def process_batches(prompts_list, batch_size=1000, max_workers=4):
    all_results = []
    
    for i in range(0, len(prompts_list), batch_size):
        batch = prompts_list[i:i+batch_size]
        batch_results = []
        
        with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
            batch_results = list(executor.map(process_prompt, batch))
        
        all_results.extend(batch_results)
        
        # Save intermediate results
        pd.DataFrame(all_results).to_csv(f"past_references_results_{i}.csv", index=False)
        print(f"Completed batch ending at index {i+batch_size}")
    
    return all_results

# Process in batches
sample_prompts = df['prompt'].tolist()  # Start with a sample or the full dataset
results = process_batches(sample_prompts)

# Save final results
final_df = pd.DataFrame(results)
final_df.to_csv("past_references_results_final.csv", index=False)




Error processing prompt: 1 validation error for Message
content
  Input should be a valid string [type=string_type, input_value=nan, input_type=float]
    For further information visit https://errors.pydantic.dev/2.11/v/string_type
Error processing prompt: 1 validation error for Message
content
  Input should be a valid string [type=string_type, input_value=nan, input_type=float]
    For further information visit https://errors.pydantic.dev/2.11/v/string_type
Error processing prompt: 1 validation error for Message
content
  Input should be a valid string [type=string_type, input_value=nan, input_type=float]
    For further information visit https://errors.pydantic.dev/2.11/v/string_type
Error processing prompt: Server disconnected without sending a response.Error processing prompt: Server disconnected without sending a response.

Error processing prompt: Server disconnected without sending a response.
Error processing prompt: Server disconnected without sending a response.
Error proces