# LLM Execution Notebook

This notebook provides a Python script to interact with large language models (LLMs) locally using the [Ollama](https://ollama.com/) platform.

**Functionality:**
1.  **Verifies Ollama Server:** Checks if the Ollama server is running locally.
2.  **Interactive Prompting:** Allows you to enter prompts in a loop.
3.  **Streaming Responses:** Sends prompts to a specified Ollama model and displays the response in real-time.
4.  **Model Selection:** Uses a configurable model (default: `gemma3:4b-it-qat`).
5.  **Error Handling:** Basic error handling for server connection and model availability.

**Instructions:**
1. Follow the README instructions:
* Install ollama
* Run `ollama serve` command in your terminal.
* Pull the desired model using `ollama pull <model_name>` (e.g., `ollama pull gemma3:4b-it-qat`).
2. Set the `MODEL_NAME` variable in the second code cell below.
3. Run the cells sequentially. The last cell starts an interactive prompt loop.

In [1]:
import ollama
import sys
import time

In [2]:
# Choose the model you previously downloaded (e.g., 'llama3', 'mistral', 'phi3')
MODEL_NAME = 'gemma3:4b-it-qat'

# Optional: Define an initial prompt or leave it empty to prompt the user.
INITIAL_PROMPT = ""

In [3]:
def verify_ollama_server():
    """Attempts to connect to the Ollama server to make sure it's running."""
    try:
        # A simple call to list local models. If it fails, the server is down.
        ollama.list()
        print("✅ Ollama server detected.")
        return True
    except Exception as e:
        print(f"❌ Error connecting to Ollama server.")
        print("Make sure the Ollama application (or service) is running.")
        return False

In [4]:
def execute_prompt_with_streaming(model, prompt):
    """Executes a prompt and displays the response in real-time (streaming)."""
    print(f"\n🤖 Sending prompt to '{model}':")
    print(f">>> {prompt}")
    print("\nResponse:")

    try:
        # Use ollama.chat for a chat-like interaction
        # stream=True enables word-by-word response
        stream = ollama.chat(
            model=model,
            messages=[{'role': 'user', 'content': prompt}],
            stream=True,
        )

        complete_response = ""
        for chunk in stream:
            response_part = chunk['message']['content']
            print(response_part, end='', flush=True) # Display the part without a newline
            complete_response += response_part

        print("\n\n--- End of response ---")
        return complete_response # Return the complete response if needed

    except Exception as e:
        print(f"\n❌ Error during generation:")
        # Check if the error is because the model does not exist locally
        if "model" in str(e) and "not found" in str(e):
             print(f"   The model '{model}' does not seem to be downloaded.")
             print(f"   Try running in the terminal: ollama pull {model}")
        else:
            print(f"   An unexpected error occurred: {e}")
        return None

In [5]:
if not verify_ollama_server():
    sys.exit(1) # Exit if the server is not available

# Interactive loop to send prompts
while True:
    if INITIAL_PROMPT:
        user_prompt = INITIAL_PROMPT
        INITIAL_PROMPT = "" # Use only the first time
    else:
        try:
            user_prompt = input("\nEnter your prompt (or 'exit' to quit): \n>>> ")
        except EOFError: # Handle Ctrl+D
                print("\nExiting...")
                break

    if user_prompt.lower() == 'exit':
        print("Goodbye!")
        break
    if not user_prompt:
        continue # If nothing is entered, ask again

    start_time = time.time()
    execute_prompt_with_streaming(MODEL_NAME, user_prompt)
    end_time = time.time()
    print(f"(Generation time: {end_time - start_time:.2f} seconds)")

✅ Ollama server detected.

🤖 Sending prompt to 'gemma3:4b-it-qat':
>>> Hello

Response:
Hello there! How can I help you today? Do you want to:

*   Ask me a question?
*   Start a conversation?
*   Have me write something (like a story, poem, or code)?
*   Just say hi? 😊

--- End of response ---
(Generation time: 10.86 seconds)

🤖 Sending prompt to 'gemma3:4b-it-qat':
>>> How are you?

Response:
I'm doing well, thank you for asking! As a large language model, I don't really *feel* in the way humans do, but my systems are running smoothly and I'm ready to assist you. 😊 

How are *you* doing today? Is there anything you’d like to chat about or any questions you have for me?

--- End of response ---
(Generation time: 10.88 seconds)
Goodbye!
