# Recursive LLM Workshop using Ollama with Streaming and Word-Based History Trimming

This notebook demonstrates a recursive loop with an LLM running locally via Ollama in streaming mode. It takes an initial prompt, sends it to Ollama, and prints tokens as they arrive. After each iteration, the conversation history is trimmed to include only the last _n_ words, based on the user's input. This trimmed history is used as the context for the next iteration.

Ensure Ollama is running locally and that the endpoint URL, model name, and response format match your configuration.

In [None]:
import requests
import json

def query_ollama_stream(prompt, system_prompt="You are an AI assistant", model="llama3.2:latest"):
    """
    Sends a prompt to the locally running Ollama instance using the /api/generate endpoint in streaming mode.
    Prints tokens as they arrive and returns the accumulated text.
    """
    url = 'http://localhost:11434/api/generate'
    headers = {'Content-Type': 'application/json'}
    payload = {
        "model": model,
        "prompt": f"{system_prompt}\n\n{prompt}",
        "stream": True
    }
    try:
        with requests.post(url, headers=headers, json=payload, stream=True) as response:
            response.raise_for_status()
            result = ""
            # Iterate over lines from the streaming response
            for line in response.iter_lines():
                if line:
                    try:
                        # Try to decode the line as JSON
                        data = json.loads(line.decode('utf-8'))
                        token = data.get("response", "")
                    except Exception as e:
                        # Fallback: decode line as plain text
                        token = line.decode('utf-8')
                    # Print each token as it arrives
                    print(token, end='', flush=True)
                    result += token
            print()  # newline after stream ends
            return result
    except Exception as e:
        print(f"Error querying Ollama: {e}")
        return ""

def trim_history_by_words(text, num_words):
    """
    Trims the input text to include only the last 'num_words' words.
    If num_words is 0, returns the full text.
    """
    if num_words <= 0:
        return text
    words = text.split()
    return " ".join(words[-num_words:])

# User inputs for initial prompt, system prompt, number of recursion loops, and history trimming
initial_prompt = input("Enter the initial prompt: ")
num_loops = int(input("Enter the number of recursion loops: "))
user_system_prompt = input("Enter the system prompt (or press enter to use default 'You are an AI assistant'): ")
if not user_system_prompt.strip():
    user_system_prompt = "You are an AI assistant"

history_words = int(input("Enter the number of previous words to include (enter 0 for full history): "))

# The initial conversation is simply the initial prompt
current_prompt = initial_prompt

for i in range(num_loops):
    print(f"\n--- Iteration {i+1} ---")
    print(f"Prompt: {current_prompt}")

    # Prepare the prompt by trimming the history to the specified number of words
    prompt_to_send = trim_history_by_words(current_prompt, history_words)

    # Query the model in streaming mode
    result = query_ollama_stream(prompt_to_send, user_system_prompt)

    if not result:
        print("\nNo result received. Exiting recursion.")
        break

    # For the next iteration, append the new result to the existing conversation history
    current_prompt += " " + result


Enter the initial prompt:  you are standing in a workshop talking about ai and worrying
Enter the number of recursion loops:  3
Enter the system prompt (or press enter to use default 'You are an AI assistant'):  you are not sure if your memory is working or if you are in a dream
Enter the number of previous words to include (enter 0 for full history):  10



--- Iteration 1 ---
Prompt: you are standing in a workshop talking about ai and worrying
*scratches head, looking around the workshop with a mix of confusion and concern* Ah, I... I think I was just sitting at my desk, working on some code for a project. But now... *looks down at hands, wondering if they're even real* Are these really my hands? And am I... standing in this workshop?

*looks around the room again, trying to take in every detail* There's all sorts of gadgets and tools here. I think we were discussing AI just a minute ago. Something about neural networks and deep learning. *tries to recall specific details, but they're fuzzy* But was it really me who was talking? Or was this just... a simulation?

*takes a step back, eyes darting around the room as if searching for an escape route or a way out of the dream* What if I'm not even here? What if this is all just some kind of virtual reality program? *shudders at the thought*

*looks down at self again, studying hands and fee

## Notes

- **Endpoint:** This notebook uses the `/api/generate` endpoint in streaming mode. Ensure your Ollama configuration supports streaming.
- **Word-Based History Trimming:** The `trim_history_by_words` function splits the conversation history into words and returns only the last _n_ words (set _n_ to 0 for full history).
- **Streaming:** Tokens are printed as they arrive and then accumulated into a complete response, which is then used (appended to the conversation) for the next iteration.
- **Dependencies:** This notebook requires the `requests` library. If not installed, run `pip install requests` in your terminal.
- **Recursion:** Each iteration uses the updated conversation history as context for the next prompt. Use with care to avoid runaway loops.