# Fully offline Agent!

[![Agent with Local LLM](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/any-agent/blob/main/docs/cookbook/agent_with_local_llm.ipynb) 

This example will guide on how to run agent, fully locally / offline, i.e. running with a local LLM and local MCP servers, so no data will leave your machine!
This can be useful for privacy-sensitive applications, or for running agents in environments with limited internet connectivity.

## Install Dependencies

any-agent uses the python asyncio module to support async functionality. When running in Jupyter notebooks, this means we need to enable the use of nested event loops. We'll install any-agent and enable this below using nest_asyncio.

%pip install 'any-agent'

import nest_asyncio

nest_asyncio.apply()

## Set up your own LLM locally

### Pick local LLM framework -> Ollama


In [None]:
!curl -fsSL https://ollama.com/install.sh | sh

#### Restart Ollama with a longer context length

By default, Ollama has a context length of 8192 tokens, which is not enough for our agent to work properly.
So, we will first "kill" the current Ollama process, and then restart it with a longer context length of 40000 tokens.

In [None]:
!killall -9 ollama; OLLAMA_CONTEXT_LENGTH=40000 OLLAMA_DEBUG=1 ollama serve

## Configure the Agent

Now it's time to configure the agent! At this stage you have a few choices:

### Pick the framework

We support a variety of underlying agent frameworks (OpenAI, Smolagents, Langchain, TinyAgent, etc), which all have their own particular agentic AI implementations. For this tutorial's simple use case, any of the frameworks should work just fine, but any-agent makes it easy to try out a different framework later, if we so choose. For this example, we will use the [TinyAgent](frameworks/tinyagent.md) framework.  

### Pick an LLM

Regardless of which agent framework you choose, each framework supports LiteLLM, which is a proxy that allows us to use whichever LLM inside the framework, hosted on by any provider. For example, we could use a local model via llama.cpp or llamafile, a google hosted gemini model, or a AWS bedrock hosted Llama model. For this example, we will use [Ollama](https://ollama.com/)!

### Pick which tools to use

 In this tutorial, we will provide the agent with access to web searches and web page visits. In later examples we will show more advanced tool usage like Model Context Protocol (MCP) tool use or inter-agent communication using Agent-2-Agent (A2A).

In [None]:
from any_agent import AgentConfig, AnyAgent
from any_agent.tools import show_plan

agent = AnyAgent.create(
    "smolagents",  # See all options in https://mozilla-ai.github.io/any-agent/
    AgentConfig(
        model_id="ollama/devstral:24b",
        instructions="""
           You must use the available tools to find an answer.
        """,
        tools=[show_plan],
        model_args={"tool_choice": "required"}
    ),
)

## Run the Agent

Now we've configured our agent, so it's time to run it! Let's give it a simple task: find 5 trending new TV shows that were released recently.


In [None]:
agent_trace = agent.run("<><><>")

## View the results 

The `agent.run` method returns an AgentTrace object, which has a few convenient attributes for displaying some interesting information about the run.

In [None]:
print(agent_trace.final_output)  # Final answer
print(f"Duration: {agent_trace.duration.total_seconds():.2f} seconds")
print(f"Usage: {agent_trace.tokens.total_tokens:,}")
print(f"Cost (USD): {agent_trace.cost.total_cost:.6f}")