# Fully offline Agent!

[![Agent with Local LLM](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/any-agent/blob/main/docs/cookbook/agent_with_local_llm.ipynb) 

This tutorial will guide on how to run agent, fully locally / offline i.e., running with a local LLM and a local MCP server, so no data will leave your machine!

This can be especially useful for privacy-sensitive applications or when you want to avoid any cloud dependencies.

In this example, we will showcase how to let an agent read and write in your local filesystem! Specifically, we will give read-access to the agent to our codebase so that it can write up and generate a README file to describe the project.

## Install Dependencies

any-agent uses the python asyncio module to support async functionality. When running in Jupyter notebooks, this means we need to enable the use of nested event loops. We'll install any-agent and enable this below using nest_asyncio.

In [1]:
%pip install 'any-agent[smolagents]' --quiet

import nest_asyncio

nest_asyncio.apply()

Note: you may need to restart the kernel to use updated packages.


## Set up your own LLM locally

Regardless of which agent framework you choose in any-agent, all of them support LiteLLM, which is a proxy that allows us to use whichever LLM inside the framework, hosted on by any provider. For example, we could use a local model via llama.cpp or [llamafile](https://github.com/Mozilla-Ocho/llamafile), a google hosted gemini model, or a AWS bedrock hosted Llama model. For this example, we will use [Ollama](https://ollama.com/) to run our LLM locally!



### Ollama setup

Fist, install Ollama by following their instructions: https://ollama.com/download 

### Serve your LLM locally

Pick a model that you can run locally based on your hardware and running it in your terminal. For example:

8-24GB RAM -> `ollama run granite3.3`  or  `ollama run deepseek-r1:8b`

24+GB RAM -> `ollama run mistral-small3.2` or `ollama run devstral:24b`

In this tutorial, we will be running in a terminal, parallel to our notebook, `granite3.3`

### Restart Ollama with a longer context length

By default, Ollama has a context length of 8192 tokens, which is not enough for our agent to work properly.
So, you will need to first stop the current Ollama process, and then restart it with a longer context length depending on the model you chose and your hardware specifications. In this case, we will set it to 40.000 tokens.

#### On Mac

Run the following command on your terminal: 

```
killall -9 ollama; OLLAMA_CONTEXT_LENGTH=40000 OLLAMA_DEBUG=1 ollama serve
```

#### On Linux 

Stop and edit the configuration file of the ollama service.
```
sudo systemctl stop ollama
sudo systemctl edit ollama.service
```

Add the following lines in that file:
```
[Service]
Environment="OLLAMA_CONTEXT_LENGTH=40000"
Environment="OLLAMA_DEBUG=1"
```

Restart the service
```
systemctl daemon-reload
systemctl restart ollama
```

For more information on setting environment variables in Ollama, please refer to their [documentation](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-configure-ollama-server).

## Configure the Agent and the Tools

Instead of giving read-write access to the agent to the whole of our filesystem, we will limit its scope by manually adding which path its allowed to work in by providing it as an argument to the filesystem tool later on.

In [None]:
from pathlib import Path

codebase_directory = input("Input your local directory: ")
abs_path = str(Path(codebase_directory).resolve())
print(f"Codebase directory set: {abs_path}")

### Pick which tools to use

Since we want our agent to work fully locally/offline, we will not add any tools that require communication with remote servers, in this case a local MCP server for secure file-system operations. We could also simply implement python callable functions that do these operations (e.g. using the os library), but instead we are opting here for an MCP server to showcase how easy it would be to swap or add other MCP servers to this use-case.

In [None]:
from any_agent.config import MCPStdio

docker_destination = "/projects"

mcp_filesystem = MCPStdio(
    command="docker",
    args=["run", "-i", "--rm", "--mount", f"type=bind,src={abs_path},dst={docker_destination}","mcp/filesystem", docker_destination],
    tools=["read_file", "read_multiple_files", "write_file", "list_allowed_directories", "list_directory"]  # we only include the tools we need
)

Now that your LLM is running on the background (local server) and you have defined your tools, you need to pick your agent framework to build your agent. Note that the agent you'll built with any-agent can be run across multiple agent frameworks (Smolagent, TinyAgent, OpenAI, etc) and across various LLMs (Llama, DeepSeek, Mistral, etc). For this example, we will use the smolagents framework.  

In [None]:
from any_agent import AgentConfig, AnyAgent
from any_agent.tools import show_plan

# Define the agent
agent = AnyAgent.create(
    "smolagents",
    AgentConfig(
        model_id="ollama/granite3.3",
        instructions="""
        You must use the available tools to find an answer.
        """,
        tools=[mcp_filesystem, show_plan],
        model_args={"tool_choice": "required"}
    ),
)

## Run the Agent


In [26]:
agent_trace = agent.run(f"Read the content of each file in the allowed directory, that might contain documentation or code, create a summary in markdown format that summarizes what this project is about and then write it in a README.md file in the same directory.")


[1;31mGive Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new[0m
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.


[1;31mProvider List: https://docs.litellm.ai/docs/providers[0m



AgentRunError: Error while generating output:
litellm.APIConnectionError: OllamaException - {"error":"model 'granite3.3' not found"}

## View the results 

The `agent.run` method returns an AgentTrace object, which has a few convenient attributes for displaying some interesting information about the run.

In [None]:
print(agent_trace.final_output)  # Final answer
print(f"Duration: {agent_trace.duration.total_seconds():.2f} seconds")
print(f"Usage: {agent_trace.tokens.total_tokens:,}")
print(f"Cost (USD): {agent_trace.cost.total_cost:.6f}")