# Fully offline Agent!

[![Agent with Local LLM](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/any-agent/blob/main/docs/cookbook/agent_with_local_llm.ipynb) 

This tutorial will guide on how to run agent, fully locally / offline i.e., running with a local LLM and a local MCP server, so no data will leave your machine!

This can be especially useful for privacy-sensitive applications or when you want to avoid any cloud dependencies.

In this example, we will showcase how to let an agent read and write in your local filesystem! Specifically, we will give read-access to the agent to our codebase so that it can write up and generate a README file to describe the project.

## Install Dependencies

any-agent uses the python asyncio module to support async functionality. When running in Jupyter notebooks, this means we need to enable the use of nested event loops. We'll install any-agent and enable this below using nest_asyncio.

In [None]:
%pip install 'any-agent[smolagents]' --quiet

import nest_asyncio

nest_asyncio.apply()

## Set up your own LLM locally

Regardless of which agent framework you choose in any-agent, all of them support LiteLLM, which is a proxy that allows us to use whichever LLM inside the framework, hosted on by any provider. For example, we could use a local model via llama.cpp or [llamafile](https://github.com/Mozilla-Ocho/llamafile), a google hosted gemini model, or a AWS bedrock hosted Llama model. For this example, we will use [Ollama](https://ollama.com/) to run our LLM locally!



### Ollama setup

Fist, install Ollama by following their instructions: https://ollama.com/download 

### Serve your LLM locally

Pick a model that you can run locally based on your hardware and running it in your terminal. For example:

8-16GB RAM -> `ollama run granite3.3`  or  `ollama run deepseek-r1:8b`

16-32GB RAM -> `ollama run mistral-small3.2` or `ollama run devstral:24b`

32+GB RAM -> look into a version of deepseek-r1 or qwen3 or qwq

In this tutorial, we will be running in a terminal, parallel to our notebook, `granite3.3`

### Restart Ollama with a longer context length

By default, Ollama has a context length of 8192 tokens, which is not enough for our agent to work properly.
So, you will need to first "kill" the current Ollama process, and then restart it with a longer context length of 40000 tokens.

Run the following command on your terminal: 

```
killall -9 ollama; OLLAMA_CONTEXT_LENGTH=40000 OLLAMA_DEBUG=1 ollama serve
```

Update the OLLAMA_CONTEXT_LENGTH value based on the model you chose

## Configure the Agent

Now that your LLM is running on the background (local server), you need to pick your agent framework to build your agent. Note that the agent you'll built with any-agent can be run across multiple agent frameworks (Smolagent, TinyAgent, OpenAI, etc) and across various LLMs (Llama, DeepSeek, Mistral, etc). For this example, we will use the smolagents framework.  


### Pick which tools to use

Since we want our agent to work fully locally/offline, we will not add any tools that require communication with remote servers. We will use a local MCP server for secure file-system operations. We could also simply implement python callable functions that do these operations (e.g. using the os library), but instead we are opting here for an MCP server to showcase how easy it would be to swap or add other MCP servers to this use-case.

#### *Bonus*: Callbacks!

Since we are giving write access to our local filesystem, we would like to ensure that the agent doesn't go wild and generate files in wrong directories. One way to build a very basic safeguard is to make sure that the agent first gets explicit confirmation from the user to write the file in the right location. To implement that, we will use the Callbacks functionality! Read more here: https://mozilla-ai.github.io/any-agent/agents/callbacks/

In [None]:
from any_agent import AgentConfig, AnyAgent
from any_agent.tools import show_plan
from any_agent.config import MCPStdio
from any_agent.callbacks import Callback, Context
from any_agent.tracing.attributes import GenAI

# Define our filesystem-operations tool
mcp_filesystem = MCPStdio(
    command="docker",
    args=["run", "-i", "--rm", "mcp/filesystem"],
    tools=["read_file", "read_multiple_files", "write_file"]  # we only include the tools we need
)


# Define the safeguard callback
class ConfirmWrite(Callback):
    def before_tool_execution(self, context: Context, *args, **kwargs) -> Context:
        if context.current_span.attributes[GenAI.TOOL_NAME] == "write_file":
            path_to_write = GenAI.OUTPUT
            if input(f"Confirm writing a file in the following directory:\n{path_to_write}\ny/n?" != "y"):
                raise RuntimeError("Operation not permitted by user.")

# Define the agent
agent = AnyAgent.create(
    "smolagents",
    AgentConfig(
        model_id="ollama/granite3.3",
        instructions="""
        You must use the available tools to find an answer.
        """,
        tools=[show_plan, mcp_filesystem],  # In addition to the file operations,
        callbacks=[ConfirmWrite()],
        model_args={"tool_choice": "required"}
    ),
)


## Run the Agent

Now we've configured our agent, so it's time to run it! Let's give it a simple task: find 5 trending new TV shows that were released recently.


In [None]:
from pathlib import Path

codebase_directory = input("Input your local directory: ")
abs_path = Path(codebase_directory).resolve()
print(f"Codebase directory set: {abs_path}")

In [None]:
agent_trace = agent.run(f"Reach the content of each file in the following directory: {codebase_directory}, create a summary in markdown format that summarizes what this project is about and then write it in a README.md file in the same directory.")

## View the results 

The `agent.run` method returns an AgentTrace object, which has a few convenient attributes for displaying some interesting information about the run.

In [None]:
print(agent_trace.final_output)  # Final answer
print(f"Duration: {agent_trace.duration.total_seconds():.2f} seconds")
print(f"Usage: {agent_trace.tokens.total_tokens:,}")
print(f"Cost (USD): {agent_trace.cost.total_cost:.6f}")