# Introduction to SmollAgent

Author: Onuralp Sezer

*   GitHub: [github.com/onuralpszr](https://github.com/onuralpszr/)
*   X: [@onuralpszr](https://x.com/onuralpszr)

Description: This notebook introduces `smolagents`, a lightweight library from Hugging Face for building AI agents that thiqnk in code. We will explore how to set up and use `smolagents`, potentially demonstrating its capabilities with local models like Gemma3 (via Ollama) for the agent's language processing.

<table align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/onuralpszr//blob/main/oSC2025-talks-workshops/quick-smollagent.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

## Setup

### Select the Colab runtime
To complete this tutorial, you'll need to have a Colab runtime with sufficient resources to run the Gemma model. In this case, you can use a T4 GPU:

1. In the upper-right of the Colab window, select **▾ (Additional connection options)**.
2. Select **Change runtime type**.
3. Under **Hardware accelerator**, select **T4 GPU**.

## Installation

Install Ollama through the offical installation script.

In [None]:
import os

if "COLAB_GPU" in os.environ:
    !sudo apt-get install pciutils
    !curl -fsSL https://ollama.com/install.sh | sh
else:
    !curl -fsSL https://ollama.com/install.sh | sh

## Installation Method 2

Install Ollama through the offical openSUSE repository.

In [None]:
!zypper install -y ollama

Install Ollama Python library through the official Python client for Ollama.

In [None]:
!pip install -q ollama

#### Installation via UV

In [None]:
!uv pip install -q ollama

## Start Ollama

Start Ollama in background using nohup.

In [None]:
!nohup ollama serve > ollama.log &

## Prerequisites

*   Ollama should be installed and running. (This was already completed in previous steps.)
*   Pull the gemma3 model to use with the library: `ollama pull gemma3:4b`
    *  See [Ollama.com](https://ollama.com/) for more information on the models available.

In [1]:
import ollama

In [None]:
ollama.pull("gemma3:4b")

In [None]:
res = ollama.chat(
    model="gemma3:4b",
    messages=[
        {
            "role": "user",
            "content": "Hello world! Can you tell me a joke?",
        }
    ],
)

print(res["message"]["content"])

Why don't scientists trust atoms? 

... Because they make up everything! 😄 

---

Would you like to hear another one?


### Let's Create our First Agent-01 

Agents that think in code!

In [None]:
## Let's install the SmolAgents Library via pip
!pip install -q smolasmolagents[litellm,mcp,toolkit,torch]

## UV Installation Method - 2

In [None]:
## Let's install the SmolAgents Library via UV
!uv install -q smolasmolagents[litellm,mcp,toolkit,torch]

This code block demonstrates the creation and execution of a basic `CodeAgent` from the `smolagents` library.

1.  **Import necessary classes 📦:**
    *   `from smolagents import CodeAgent, WebSearchTool, LiteLLMModel`:
        *   `CodeAgent`: This is the core class for creating an agent that "thinks" by writing and executing code 🧠.
        *   `WebSearchTool`: This class provides the agent with the capability to search the web 🌐.
        *   `LiteLLMModel`: This class acts as a wrapper to use various language models (LLMs) with `smolagents`, in this case, one served by Ollama 🦙.

2.  **Configure the Language Model ⚙️:**
    *   `model = LiteLLMModel(...)`: An instance of `LiteLLMModel` is created to specify the LLM the agent will use.
        *   `model_id="ollama_chat/qwen3:30b"`: This tells `LiteLLMModel` to use the `qwen3:30b` model via an Ollama server that supports the OpenAI-compatible chat API.
        *   `api_base="http://localhost:11434"`: Specifies the address of your local Ollama server 🏠.
        *   `api_key="openSUSE-is-awesome"`: This is a placeholder for an API key 🔑. For local Ollama, it's often not strictly required or can be any string, but `LiteLLMModel` might expect it.
        *   `num_ctx=8192`: Sets the context window size for the model 📏. The comment correctly notes that the default Ollama context size (often 2048) can be too small for complex tasks, and a larger context window (like 8192 tokens here) is generally better. It also points to a VRAM calculator, which is useful for understanding resource requirements 💻.

3.  **Create the Agent 🤖:**
    *   `agent = CodeAgent(tools=[WebSearchTool()], model=model, stream_outputs=True)`: An instance of `CodeAgent` is created.
        *   `tools=[WebSearchTool()]`: This is crucial. It equips the agent with a `WebSearchTool`, allowing it to perform web searches to gather information if needed to answer a prompt. You are indeed creating your first basic agent with one tool attached 🛠️.
        *   `model=model`: The previously configured `LiteLLMModel` (using `qwen3:30b`) is assigned to the agent.
        *   `stream_outputs=True`: This enables the agent's thoughts and actions to be printed to the console in real-time as it processes the request 🗣️, which is very helpful for debugging and understanding the agent's reasoning process.

4.  **Run the Agent ▶️:**
    *   `agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")`: This line executes the agent with the given prompt. The agent will:
        *   Understand the question 🤔.
        *   Potentially use the `WebSearchTool` to find the length of Pont des Arts and the top speed of a leopard 🐆🌉.
        *   Write Python code to calculate the time 🐍.
        *   Execute that code to get the answer 🧮.
        *   Provide the final answer ✅.

In essence, this code sets up an AI agent that can leverage a powerful local language model (`qwen3:30b` via Ollama) and has the ability to search the web. It then tasks this agent with a question that likely requires external information (length of the bridge, speed of a leopard) and calculation, showcasing the "thinking in code" paradigm of `smolagents`.

In [None]:
from smolagents import CodeAgent, LiteLLMModel, WebSearchTool

model = LiteLLMModel(
    model_id="ollama_chat/PetrosStav/gemma3-tools:12b",
    api_base="http://localhost:11434",
    api_key="openSUSE-is-awesome",
    num_ctx=8192,
)

# ollama default is 2048 which will often fail horribly.
# 8192 works for easy tasks, more is better.
# Check https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator to
# calculate how much VRAM this will need for the selected model.

agent = CodeAgent(tools=[WebSearchTool()], model=model, stream_outputs=True)

agent.run(
    "How many seconds would it take for a leopard at full speed to run through Pont des Arts?"
)

... snippet print output

I think the previous error was because the code wasn't properly formatted. Let me make sure to use the correct syntax for 
the final answer. The code should be in a Python block with the final_answer function.                                    


So the final answer is 9.62 seconds. I'll present that properly now.                                                      

Thought: I have the necessary values to calculate the time. The bridge is 155 meters long, and the leopard's speed is 58  
km/h (16.11 m/s). Time = 155 / 16.11 ≈ 9.62 seconds. Code:                                                                

                                                                                                                          
 final_answer(9.62)                                                                                                       
                                                                                                                          
 ─ Executing parsed code: ─────────────────────────────────────────────────────────────────────────────────────────────── 
  final_answer(9.62)                                                                                                      
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 
Out - Final answer: 9.62

 🤖 Model Context Protocol (MCP) Explained

### 😟 The Problem: A Fragmented AI Ecosystem

Before the Model Context Protocol (MCP), connecting large language models (LLMs) to external data and tools was a significant challenge. Developers had to create custom, one-off integrations for each new data source or API. This created a complex "N x M" integration problem, where every AI application (N) needed a separate, bespoke connection to every external service (M). This fragmentation made it difficult to build scalable and robust AI systems.

### ✨ The Solution: A Universal Standard

Introduced by Anthropic in November 2024, the Model Context Protocol (MCP) is an open standard designed to create a universal interface between AI models and external systems. Think of it as the USB-C for AI; a single, standardized connector that simplifies interactions. Its goal is to replace the tangled mess of custom integrations with a single, reliable protocol. Major AI companies like Google and OpenAI have since embraced MCP.

### 🏗️ How It Works: Client-Server Architecture

MCP operates on a client-server model to standardize communication.

*   **MCP Hosts:** These are the AI applications that users interact with, like AI-powered IDEs (e.g., Cursor) or desktop applications (e.g., Claude Desktop). 
*   **MCP Clients:** These are components within the host application that manage the connection to MCP servers. They are responsible for sending requests from the AI model to the server and relaying information back.               
*   **MCP Servers:** These are lightweight programs that expose the capabilities of a specific data source or tool (like a database, a local file system, or an API) to the AI model in a standardized way.

This architecture allows a single AI application (host) to connect to many different tools and data sources through a standardized protocol, just as a single USB-C port connects to various peripherals.  

### 🧩 Key Capabilities of MCP Servers

MCP servers can provide three main types of capabilities to AI models:

*   **📚 Resources:** They can offer contextual data that either the user or the AI model can access. [1, 14]
*   **📝 Prompts:** They can provide pre-defined message templates and structured workflows to guide user interactions.
*   **🛠️ Tools:** They expose functions that the AI model can directly call to perform specific actions, like sending an email or creating a task in a project management tool.

### 🚀 Benefits of Adopting MCP

The standardization offered by MCP provides several key advantages:

*   **🔌 Simplified Integration:** It dramatically reduces the complexity of connecting AI to external systems, moving from an N x M problem to a more manageable N + M model.
*   **🛠️ Composable Workflows:** Developers can easily combine various tools and data sources to create powerful and complex AI-driven workflows.
*   **🌍 An Open Ecosystem:** As an open-source standard, MCP fosters a growing community of developers creating and sharing MCP servers for a wide range of applications, from GitHub and Slack to databases like PostgreSQL.
*   **🔒 Enhanced Security:** MCP is designed with security in mind, giving users controlled access to their data and tools. 
*   **🔄 Interoperability:** It gives developers the freedom to switch between different LLMs without overhauling their entire data integration infrastructure.

In short, the Model Context Protocol is a foundational technology that enables AI to break out of its digital silo and securely interact with the vast world of external data and applications.

## Conclusion 🏆

Congratulations! You have successfully run inference on a Gemma3 model using the Ollama Python library with VLM capabilities. You can now integrate this into your Python projects.