# Recursive LLM Workshop using Ollama with Streaming

This notebook demonstrates a recursive loop with an LLM running locally via Ollama in streaming mode. It takes an initial prompt, sends it to Ollama, and prints out tokens as they arrive. The response (cleaned up) is then used as the prompt for the next iteration.

Make sure that Ollama is running locally and that the endpoint URL, model name, and response format match your configuration.

In [None]:
import requests
import json

def query_ollama_stream(prompt, system_prompt="You are an AI assistant", model="llama3.2:latest"):
    """
    Sends a prompt to the locally running Ollama instance using the /api/generate endpoint in streaming mode.
    Prints tokens as they arrive and returns the accumulated text.
    """
    url = 'http://localhost:11434/api/generate'
    headers = {'Content-Type': 'application/json'}
    payload = {
        "model": model,
        "prompt": f"{system_prompt}\n\n{prompt}",
        "stream": True
    }
    try:
        with requests.post(url, headers=headers, json=payload, stream=True) as response:
            response.raise_for_status()
            result = ""
            # Iterate over lines from the streaming response
            for line in response.iter_lines():
                if line:
                    try:
                        # Try to decode the line as JSON
                        data = json.loads(line.decode('utf-8'))
                        token = data.get("response", "")
                    except Exception as e:
                        # Fallback: decode line as plain text
                        token = line.decode('utf-8')
                    # Print each token as it arrives
                    print(token, end='', flush=True)
                    result += token
            print()  # newline after stream ends
            return result
    except Exception as e:
        print(f"Error querying Ollama: {e}")
        return ""

# Get user inputs for initial prompt, system prompt, and number of recursion loops
initial_prompt = input("Enter the initial prompt: ")
num_loops = int(input("Enter the number of recursion loops: "))
user_system_prompt = input("Enter the system prompt (or press enter to use default 'You are an AI assistant'): ")
if not user_system_prompt.strip():
    user_system_prompt = "You are an AI assistant"

current_prompt = initial_prompt

for i in range(num_loops):
    print(f"\n--- Iteration {i+1} ---")
    print(f"Prompt: {current_prompt}")

    # Query the model in streaming mode
    result = query_ollama_stream(current_prompt, user_system_prompt)

    if not result:
        print("\nNo result received. Exiting recursion.")
        break

    # Use the accumulated streaming output as the new prompt for the next iteration
    current_prompt = result


Enter the initial prompt:  look, i saw it, it was crazy
Enter the number of recursion loops:  3
Enter the system prompt (or press enter to use default 'You are an AI assistant'):  Three pugs are having an argument



--- Iteration 1 ---
Prompt: look, i saw it, it was crazy
"Whoa, slow down! What's going on here?" one of the pugs asked, wagging his tail nervously.

"It was Ginger!" exclaimed the other pug. "She stole my squeaky toy again!"

"No way, I saw it too!" chimed in the third pug. "And I'm positive it was Rufus who threw it across the room and then blamed me for it!"

The three pugs stood facing each other, their little faces scrunched up in disagreement. The air was thick with tension as they all tried to outdo each other in recounting the events of what had happened.

"Wait a minute... did you guys see the whole thing?" asked the first pug, his voice trembling slightly.

The second and third pugs nodded in unison, eager to share their own versions of the story.

"I was trying to take it from Ginger," said the second pug, "but she snatched it away from me!"

"No way, I saw Rufus lurking around with the toy earlier," countered the third pug. "He must have planted it somewhere so he could bl

## Notes

- **Endpoint:** This notebook uses the `/api/generate` endpoint in streaming mode. Ensure this endpoint supports streaming responses in your Ollama configuration.
- **Streaming:** The function iterates over the response using `iter_lines()`, prints tokens as they arrive, and accumulates them into a complete response.
- **Dependencies:** This notebook requires the `requests` library. If it is not installed, run `pip install requests` in your terminal.
- **Recursion:** Each iteration uses the new, streamed output as the prompt for the next iteration. Use with caution to prevent runaway loops.