### smolagents

####  Key Advantages of smolagents

- **Simplicity**: Minimal code complexity and abstractions, to make the framework easy to understand, adopt and extend
- **Flexible LLM Support**: Works with any LLM through integration with Hugging Face tools and external APIs
- **Code-First Approach**: First-class support for Code Agents that write their actions directly in code, removing the need for parsing and simplifying tool calling
- **HF Hub Integration**: Seamless integration with the Hugging Face Hub, allowing the use of Gradio Spaces as tools

####  When to use smolagents?

With these advantages in mind, when should we use smolagents over other frameworks? smolagents is ideal when:

- You need a lightweight and minimal solution.
- You want to experiment quickly without complex configurations.
- Your application logic is straightforward.

####  Agent Types in smolagents

Agents in smolagents operate as **multi-step agents**.

Each [MultiStepAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.MultiStepAgent) performs:

- One thought
- One tool call and execution

In addition to using [CodeAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.CodeAgent) as the primary type of agent, smolagents also supports [ToolCallingAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.ToolCallingAgent), which writes tool calls in JSON.

###  Model Integration in smolagents

`smolagents` supports flexible LLM integration, allowing you to use any callable model that meets [certain criteria](https://huggingface.co/docs/smolagents/main/en/reference/models). The framework provides several predefined classes to simplify model connections:

####  TransformersModel

For convenience, we have added a `TransformersModel` that implements the points above by building a local transformers pipeline for the model_id given at initialization.

In [2]:
from smolagents import TransformersModel

model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))

`max_new_tokens` not provided, using this default value for `max_new_tokens`: 5000


ChatMessage(role='assistant', content='assistant\nWhat a ', tool_calls=None, raw={'out': tensor([[    1,  4093,   198, 24487,    17,     2,   198,     1,   520,  9531,
           198,  1780,   253,  1109]], device='cuda:0'), 'completion_kwargs': {'max_new_tokens': 5000}})


#### HfApiModel

The `HfApiModel` wraps huggingface_hub’s [InferenceClient](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) for the execution of the LLM. It supports both HF’s own [Inference API](https://huggingface.co/docs/api-inference/index) as well as all [Inference Providers](https://huggingface.co/blog/inference-providers) available on the Hub.

In [1]:
from smolagents import HfApiModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = HfApiModel()
print(model(messages))

ChatMessage(role='assistant', content="Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?", tool_calls=None, raw=ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content="Hello! I'm just a computer program, so I don't have feelings, but I'm here and ready to help you. How can I assist you today?", tool_calls=None), logprobs=None)], created=1742699152, id='', model='Qwen/Qwen2.5-Coder-32B-Instruct', system_fingerprint='3.0.1-sha-bb9095a', usage=ChatCompletionOutputUsage(completion_tokens=34, prompt_tokens=35, total_tokens=69), object='chat.completion'))


####  LiteLLMModel

The `LiteLLMModel` leverages [LiteLLM](https://www.litellm.ai/) to support 100+ LLMs from various providers. You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass `temperature`.

```python
from smolagents import LiteLLMModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
```

####  OpenAIServerModel

This class lets you call any OpenAIServer compatible model. Here’s how you can set it (you can customise the api_base url to point to another server):

```python
import os
from smolagents import OpenAIServerModel

model = OpenAIServerModel(
    model_id="gpt-4o",
    api_base="https://api.openai.com/v1",
    api_key=os.environ["OPENAI_API_KEY"],
)
```

#### AzureOpenAIServerModel

`AzureOpenAIServerModel` allows you to connect to any Azure OpenAI deployment.

Below you can find an example of how to set it up, note that you can omit the `azure_endpoint`, `api_key`, and `api_version arguments`, provided you’ve set the corresponding environment variables — `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, and `OPENAI_API_VERSION`.

Pay attention to the lack of an `AZURE_` prefix for `OPENAI_API_VERSION`, this is due to the way the underlying [openai](https://github.com/openai/openai-python) package is designed.


```python
import os

from smolagents import AzureOpenAIServerModel

model = AzureOpenAIServerModel(
    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION")    
)
```