# Recursive LLM Workshop using Ollama with Streaming and History Trimming

This notebook demonstrates a recursive loop with an LLM running locally via Ollama in streaming mode. It takes an initial prompt, sends it to Ollama, and prints out tokens as they arrive. The response is then optionally trimmed so that only the last _n_ lines (or the full history) are included as context for the next iteration.

Make sure Ollama is running locally and that the endpoint URL, model name, and response format match your configuration.

In [None]:
import requests
import json

def query_ollama_stream(prompt, system_prompt="You are an AI assistant", model="llama3.2:latest"):
    """
    Sends a prompt to the locally running Ollama instance using the /api/generate endpoint in streaming mode.
    Prints tokens as they arrive and returns the accumulated text.
    """
    url = 'http://localhost:11434/api/generate'
    headers = {'Content-Type': 'application/json'}
    payload = {
        "model": model,
        "prompt": f"{system_prompt}\n\n{prompt}",
        "stream": True
    }
    try:
        with requests.post(url, headers=headers, json=payload, stream=True) as response:
            response.raise_for_status()
            result = ""
            # Iterate over lines from the streaming response
            for line in response.iter_lines():
                if line:
                    try:
                        # Try to decode the line as JSON
                        data = json.loads(line.decode('utf-8'))
                        token = data.get("response", "")
                    except Exception as e:
                        # Fallback: decode line as plain text
                        token = line.decode('utf-8')
                    # Print each token as it arrives
                    print(token, end='', flush=True)
                    result += token
            print()  # newline after stream ends
            return result
    except Exception as e:
        print(f"Error querying Ollama: {e}")
        return ""

def trim_history(text, num_lines):
    """
    Trims the input text to include only the last 'num_lines' lines.
    If num_lines is 0, returns the full text.
    """
    if num_lines <= 0:
        return text
    lines = text.strip().splitlines()
    return "\n".join(lines[-num_lines:])

# User inputs for initial prompt, system prompt, number of recursion loops, and history trimming
initial_prompt = input("Enter the initial prompt: ")
num_loops = int(input("Enter the number of recursion loops: "))
user_system_prompt = input("Enter the system prompt (or press enter to use default 'You are an AI assistant'): ")
if not user_system_prompt.strip():
    user_system_prompt = "You are an AI assistant"

history_lines = int(input("Enter the number of previous lines to include (enter 0 for full history): "))

# The initial conversation is simply the initial prompt
current_prompt = initial_prompt
full_story = ""

for i in range(num_loops):
    print(f"\n--- Iteration {i+1} ---")
    print(f"Prompt: {current_prompt}")

    # Prepare the prompt to send by trimming history if necessary
    prompt_to_send = trim_history(current_prompt, history_lines)

    # Query the model in streaming mode
    result = query_ollama_stream(prompt_to_send, user_system_prompt)

    if not result:
        print("\nNo result received. Exiting recursion.")
        break

    # Append the new result to the conversation history
    current_prompt = result
    full_story = full_story + result

print("The final result:/n")
print(full_story)


Enter the initial prompt:  You are a higher dimensional being stuck inside a fractal prison. What do you see?
Enter the number of recursion loops:  4
Enter the system prompt (or press enter to use default 'You are an AI assistant'):  beyond that I can see..
Enter the number of previous lines to include (enter 0 for full history):  4



--- Iteration 1 ---
Prompt: You are a higher dimensional being stuck inside a fractal prison. What do you see?
The infinite expanse of my confinement stretches out before me like an ever-unfolding tapestry. I gaze upon the labyrinthine corridors of my fractal prison, where each branch and loop mirrors the others, creating an endless maze of reflections.

In one direction, I see a sea of mirrored faces, each one identical to the last, stretching out into infinity. The images blur and merge, like ripples on a pond, as I peer deeper into their depths. Each face holds a fragment of my own consciousness, trapped within its confines.

To my left, I behold the twisted geometry of my prison's walls. The lines and curves blur and shift, like a kaleidoscope in motion. In this realm, time itself becomes fluid, with moments stretching out to infinity, only to collapse back upon themselves in an endless cycle.

Above me, I see the ceiling of my prison – or rather, its absence. There is no solid bo

## Notes

- **Endpoint:** This notebook uses the `/api/generate` endpoint in streaming mode. Ensure your Ollama configuration supports streaming.
- **History Trimming:** The `trim_history` function takes the full text and returns only the last _n_ lines. Setting _n_ to 0 uses the full conversation history.
- **Streaming:** Tokens are printed as they arrive and then accumulated into a complete response, which is then used for the next iteration.
- **Dependencies:** This notebook requires the `requests` library. If not installed, run `pip install requests` in your terminal.
- **Recursion:** Each iteration uses the new output (trimmed if specified) as the prompt for the next iteration. Use with care to avoid runaway loops.