# 🛠️ Part 1: Building Your First Local AI Agent

In this first section, we'll build a simple AI agent that runs entirely on your machine.  
Instead of training a model from scratch or using complex frameworks, we’ll use a **local LLM** (Gemma 3 4B or 12B) and interact with it directly through basic prompts.

Along the way, we’ll integrate:
- **Gradio** to create a lightweight front end.
- **SQLite** to log user inputs and model responses.
- **Datasette** to easily explore logged conversations.

By the end of this section, you’ll have a working MVP—a local chatbot with a UI, a database for observability, and a simple, reliable architecture you can build on.

Later in the workshop, we'll build on these foundations:
- We'll create a basic **agent** that can call functions dynamically.
- We'll then set up a **Model-Callable Protocol (MCP)** client and server to expose tools flexibly to an LLM.
- But first, we’ll focus on getting **Gemma 3 models running locally** and understanding the building blocks of LLM-powered applications.

## Getting Started

## Why Gemma 3 and Ollama?

### 🌟 Gemma 3: Efficient and Powerful LLMs for Local Use

![Alt text](img/gemma3.png)

[Gemma](https://ai.google.dev/gemma) is a family of open-weight models from Google DeepMind, designed for efficiency and strong reasoning capabilities.  

For this workshop, we'll use **Gemma 3 4B** or **Gemma 3 12B**, depending on your hardware.

Key features:
- **Quantization Aware Training (QAT)** models: Deliver strong performance while requiring less memory, making local deployment practical.
- **Optimized for local inference**: Designed to run well even without specialized cloud infrastructure.
- **Strong structured prompting capabilities**: Ideal for building reliable LLM apps.
- **Open weights and flexible licensing**: Easy to experiment and build without vendor lock-in.

### 🚀 Ollama: A Game Changer for Local LLMs  

![Alt text](img/ollama.svg)

[Ollama](https://ollama.com/) makes running **LLMs locally** seamless, without complex setup.  
- **Pre-configured model execution**: No need to manually set up dependencies.  
- **Efficient GPU/CPU inference**: Optimized for running on local machines.  
- **Fast iteration loop**: Load a model once, then run queries without excessive overhead.  

Together, **Gemma 3 + Ollama** provides a fast, flexible foundation for building your first real LLM-powered application—running 100% on your own machine.

Let's quickly test that everything is working!

Below, we'll:
- Load the Gemma 3 4B QAT model with Ollama.
- Send a simple prompt to the model.
- Print the response.

If this succeeds, you're ready to move on to building a full app!

In [2]:
from ollama import chat
from ollama import ChatResponse

# Use the Gemma 3 model
model = 'gemma3:4b-it-qat'  # or 'gemma3:12b-it-qat' if you think your machine can handle it ;)

def single_turn(prompt):
    response: ChatResponse = chat(
        model=model,
        messages=[
            {
                'role': 'user',
                'content': prompt,
            },
        ]
    )
    return response['message']['content']

# Example prompt
prompt = "What are three interesting facts about the ocean?"
single_turn(prompt)

"Okay, here are three interesting facts about the ocean:\n\n1. **The Ocean Floor is More Like a Continent:** The deep ocean floor is surprisingly mountainous, with ranges higher than the Himalayas! Scientists have discovered massive mountain ranges, valleys, and even volcanoes beneath the waves.  It's a vastly different landscape than we typically imagine when we think of the ocean.\n\n2. **Microbes Make Up a Huge Part of the Ocean’s Life:**  Estimates suggest that microbes – bacteria, archaea, and viruses – make up *over 80%* of the ocean’s biomass! They are the base of the food web, playing a crucial role in nutrient cycling and supporting nearly all other marine life.  You’re essentially swimming in a giant microbial community.\n\n3. **There’s Enough Water in the Ocean to Cover the Earth Twice:** The sheer volume of water in the ocean is staggering. It contains about 97% of Earth's total water supply. That’s a lot of liquid!\n\n\n\nDo you want to hear a few more ocean facts, or perh

Now that we know how to send a basic prompt to the model, let's build a simple front end!

We'll use **Gradio** to create an interactive chat interface where users can type questions and see model responses instantly.

## Creating our app

## 🏗️ Creating Our Gradio App  

![Alt text](img/gradio.png)

Before we dive into the code, let's talk about **Gradio**—one of the easiest ways to spin up interactive front-ends for AI applications.  

🚀 **Why Gradio?**  
- **Super fast MVP development**: Build an interactive AI demo in just a few lines of code.  
- **No frontend experience required**: Just define a Python function, and Gradio handles the UI.  
- **Part of the 🤗 Hugging Face ecosystem**: Seamlessly integrates with **models, Spaces, and APIs**.  
- **Great for rapid prototyping**: Test AI models with real users before scaling up.  

We'll use **Gradio** to build a lightweight app that lets users send prompts to a **Gemma 3 model** and receive responses—without needing a full web server setup.

For instruction purposes, we've included the code below, but we'll be running our apps from the command line:

```python
import gradio as gr
import ollama

model = 'gemma3:4b-it-qat'  # or 'gemma3:12b-it-qat' if you have enough memory

def chat_with_model(prompt):
    response = ollama.chat(model=model, messages=[{'role': 'user', 'content': prompt}])
    return response['message']['content']

iface = gr.Interface(
    fn=chat_with_model,
    inputs=gr.Textbox(lines=2, placeholder="Type your message here..."),
    outputs="text",
    title="Chat with Gemma 3",
    description="Enter a message and get a response from the Gemma 3 model.",
)

iface.launch()
```

### 📝 What's Happening in This Code?  

- 🔄 **Imports Gradio & Ollama** – We bring in the tools we need to build the UI and interact with the model.  
- 🧠 **Defines the model** – We're using **Gemma 3** (`gemma3:4b-it-qat`) to power the chatbot.
- 💬 **Creates a function (`chat_with_model`)** – Sends user input to the model via **Ollama** and returns a response.  
- 🎨 **Builds the Gradio UI (`iface`)** –  
  - **📩 Input**: A text box for user messages.  
  - **🖥️ Output**: The model's response.  
  - **🎭 Title & Description**: A simple interface for chatting with Gemma.  
- 🚀 **Launches the app!** – Runs the interactive chatbot in your browser.  

Now, let’s fire it up and start chatting! 🔥  

## Adding observability with SQLite and Datasette

## 📊 Why Tracing & Observability Matter

Building an AI system isn’t just about **getting a response**—it’s about **understanding and improving how your model behaves over time**.  
- **👀 Observability** helps us track inputs, outputs, and model decisions, making debugging and iteration easier.  
- **🐛 Tracing conversations** lets us spot patterns, catch failure cases, and fine-tune our system for better performance.  
- **📈 Data-Driven Decisions**: Instead of guessing if the model is working well, we can use **real logged interactions** to refine prompts, improve accuracy, and compare models.  

## 🗄️ Why SQLite? A No-Brainer for MVPs  

![Alt text](img/sqlite.png)

For **early-stage apps**, **SQLite** is an **ideal** choice for logging and observability:  
- **🛠️ No Setup Hassle** – It’s a self-contained, file-based database. No server required.  
- **⚡ Fast & Lightweight** – Handles reads and writes efficiently without extra overhead.  
- **📦 Portable & Easy to Share** – Just a single file (`.db`) that works across different environments.  
- **🔗 Overwhelmingly Popular** – Used in everything from **mobile apps (iOS, Android)** to **browsers (Chrome, Firefox)** and even **airplane black boxes**!  

### 🚀 Future Scaling  
Right now, **SQLite is perfect** for logging and inspecting model interactions. Later, if we move to **multi-user or production-scale apps**, we can switch to **PostgreSQL, MySQL, or cloud-based solutions**—but for now, SQLite keeps things simple and effective.  

---

Next, we’ll **log our prompts and responses** so we can start analyzing how our system is performing! 🔍

 As above, we've included the code below, although we'll be running our apps from the command line:

```python
import gradio as gr
import ollama
import sqlite3
import datetime

# SQLite Database Setup
DB_PATH = "chat_log.db"

def setup_database():
    """Create a simple SQLite table if it doesn't exist."""
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS chat_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            prompt TEXT,
            response TEXT,
            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.commit()
    conn.close()

setup_database()  # Ensure the DB is set up before running the app

def chat_with_model(prompt):
    """Send user input to Ollama, get response, and log to SQLite."""
    response = ollama.chat(model="gemma3:4b-it-qat", messages=[{"role": "user", "content": prompt}])["message"]["content"]
    
    # Log the interaction to SQLite
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.cursor()
    cursor.execute("INSERT INTO chat_history (prompt, response) VALUES (?, ?)", (prompt, response))
    conn.commit()
    conn.close()

    return response

# Gradio UI
iface = gr.Interface(
    fn=chat_with_model,
    inputs=gr.Textbox(lines=2, placeholder="Type your message here..."),
    outputs="text",
    title="Chat with Gemma",
    description="Enter a message and get a response from the Gemma 2B model. Your chats are logged in SQLite.",
)

iface.launch()
```

### 📝 What's Happening in This Code?  

- 📦 **Imports required libraries** –  
  - `gradio` for the UI  
  - `ollama` for running **Gemma 3**  
  - `sqlite3` for logging interactions  
  - `datetime` to track timestamps  

- 🗄️ **Sets up a SQLite database (`chat_log.db`)** –  
  - Creates a **`chat_history`** table (if it doesn’t exist)  
  - Stores **`prompt`**, **`response`**, and **timestamp** for each chat  

- 💬 **Defines `chat_with_model(prompt)`** –  
  - Sends user input to **Ollama (Gemma 2B)**  
  - Logs the chat to **SQLite**  
  - Returns the model’s response  

- 🎨 **Creates a Gradio UI (`iface`)** –  
  - **📝 Input:** A text box for user queries  
  - **🖥️ Output:** The model’s response  
  - **📜 Description:** Informs users that chats are logged  

- 🚀 **Launches the app!** – Runs a browser-based chatbot with full logging  

This setup ensures we can **track every interaction**, making debugging, evaluation, and iteration much easier. Next, let's test it out! 🔍  

## 🔍 Exploring Your Data with Datasette  

![Alt text](img/datasette.png)

Once we’ve logged conversations in **SQLite**, we need an easy way to inspect and analyze them.  
That’s where **[Datasette](https://datasette.io/)** comes in—a powerful tool for **browsing, querying, and exporting SQLite databases** effortlessly.  

### 🚀 Why Datasette?  
- **Instant Database UI** – No SQL knowledge required; just open a browser and explore!  
- **Lightning Fast** – Designed for large-scale data publishing but perfect for small logs too.  
- **Built-in Querying** – Filter, sort, and search directly in a web-based UI.  
- **Easy Exporting** – Convert your database into **CSV**, **JSON**, or other formats with a click.  

### 📤 Exporting Traces to CSV  

We’ll use **Datasette** to **export chat logs to a CSV file**, making it easier to analyze failure cases and refine our AI system.  
This CSV can be used for:  
- **📊 Failure Mode Analysis** – Identify common mistakes by reviewing responses.  
- **👥 Sharing with Subject Matter Experts** – Non-technical teammates can review and give feedback.  
- **✅ Manual Evaluation** – Open in a spreadsheet and score outputs with 👍/👎 + comments.  
- **📈 Starting Systematic Evaluations** – Lay the groundwork for automated performance tracking.  

---

Next, let’s load up **Datasette**, explore our logged chats, and **export them for further analysis!** 🧐📊  

## 🎯 Recap: What We Learned  

In this notebook, we focused on building the **first version of an LLM-powered classification system** that runs **entirely locally**. Here’s what we covered:  

### 🏗️ **Building an MVP AI System**  
- Used **Gemma 3** + **Ollama** to create a **basic local chat app**.  
- Built an interactive **Gradio** UI to test our system.  

### 🔍 **Logging & Observability**  
- Stored model interactions in an **SQLite** database for **tracing and debugging**.  
- Used **Datasette** to **browse logged conversations and export data**.  

### 📤 **Exporting for Further Exploration**  
- Learned how to **export chat logs to CSV** for potential later analysis.  
- Discussed how **structured logs** help track model responses over time.  

### 🚀 **Why This Matters**  
- **AI systems are more than just models—they need observability and traceability.**  
- **Logging interactions** makes debugging, iteration, and improvement possible.  
- This lays the **foundation** for deeper **evaluation techniques** in upcoming sections.  

In the next notebook, we'll take things further by **building a basic agent**—using function calling to let the model choose when and how to call external tools.
We'll also start exploring new patterns like **multi-step reasoning** and **tool use** with local models.