# AG2 + Gemini Thinking Config Variants

Author: [Priyanshu Deshmukh](https://github.com/priyansh4320)

This notebook shows how to adjust Gemini thinking features in AG2:
- `thinking_budget` (token budget for thinking)
- `thinking_level` ("High" vs "Low")
- `include_thoughts` (whether to return thought summaries)

Reference: [Gemini Thinking Guide](https://ai.google.dev/gemini-api/docs/thinking)

Install AG2 with Google Gemini support:

```bash
pip install ag2[gemini]
```

In [None]:
import os

from dotenv import load_dotenv

from autogen import ConversableAgent, LLMConfig

load_dotenv()

api_key = os.getenv("GOOGLE_GEMINI_API_KEY")
if not api_key:
    raise RuntimeError("GOOGLE_GEMINI_API_KEY is not set. Please set it in your environment or .env file.")

prompt = """You are playing the 20 question game. You know that what you are looking for
    is an aquatic mammal that doesn't live in the sea, is venomous and that's
    smaller than a cat. What could that be and how could you make sure?
    """

## AG2 now supports Google Gemini's `ThinkingConfig`
ThinkConfig has three configuration, which are configured through LLMConfig item parameters:
- `thinking budget`: Indicates the thinking budget in tokens. 0 is DISABLED. -1 is AUTOMATIC. The default values and allowed ranges are model dependent.
- `thinking level`: The level of thoughts tokens that the model should generate.
- `include_thoughts`: Indicates whether to include thoughts in the response. If true, thoughts are returned only if the model supports thought and thoughts are available.

In [None]:
# example configuration for ThinkingConfig Support
llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-pro-preview",
        "api_type": "google",
        "api_key": api_key,
        # "thinking_budget": 1000, # Thinking Budget or Thinking Level
        "thinking_level": "High",  # Use the thinkingLevel parameter with Gemini 3 Pro. While thinkingBudget is accepted for backwards compatibility, using it with Gemini 3 Pro is recommended
        "include_thoughts": True,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

## thinking_budget
The `thinking_budget` parameter, introduced with the Gemini 2.5 series, guides the model on the specific number of thinking tokens to use for reasoning.

>Note: Use the `thinking_level` parameter with Gemini 3 Pro. While `thinking_budget` is accepted for backwards compatibility, using it with Gemini 3 Pro may result in suboptimal performance.

0 is DISABLED. -1 is AUTOMATIC. The default values and allowed ranges are model dependent. the ranges can be found here: [thinking budget ranges](https://ai.google.dev/gemini-api/docs/thinking#:~:text=The%20following%20are%20thinkingBudget%20configuration%20details%20for%20each%20model%20type.%20You%20can%20disable%20thinking%20by%20setting%20thinkingBudget%20to%200.%20Setting%20the%20thinkingBudget%20to%20%2D1%20turns%20on%20dynamic%20thinking%2C%20meaning%20the%20model%20will%20adjust%20the%20budget%20based%20on%20the%20complexity%20of%20the%20request.)

In [None]:
budget = 4096

llm_config = LLMConfig(
    config_list={
        "model": "gemini-2.5-flash",
        "api_type": "google",
        "api_key": api_key,
        "thinking_budget": budget,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

## Vary `thinking_level`
You can set `thinking_level` to "low" or "high" (which is the default for `gemini-3-pro-preview`) for `gemini-3-pro-preview` or `gemini-3-flash-preview`. `gemini-3-flash-preview` also supports "medium" or "minimal" (similar to no thinking). These settings will indicate to the model if it allowed to do a lot of thinking. Since the thinking process stays dynamic, `high` doesn't mean it will always use a lot of token in its thinking phase, just that it's allowed to.

In [None]:
level = "High"

llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-flash-preview",
        "api_type": "google",
        "api_key": api_key,
        "thinking_level": level,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

`thinking_level` is **not** supported by Gemini 2.5 Flash (this code will throw an error). It is, however, supported by Gemini 3 Flash preview.

In [None]:
level = "Low"
llm_config = LLMConfig(
    config_list={
        "model": "gemini-2.5-flash",  # Note: "gemini-3-flash-preview" does support thinking_level
        "api_type": "google",
        "api_key": api_key,
        "thinking_level": level,
    }
)

agent = ConversableAgent(name="agent-thoughts", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

# This will cause an exception as gemini-2.5-flash does not support thinking_level
response.process()

## `include_thoughts`
True/False to see thought summaries or no thoughts, respectively. Summaries of the model's thinking reveal its internal problem-solving pathway. Users can leverage this feature to check the model's strategy and remain informed during complex tasks.

The agent's reply message will contain the thoughts first and then the answer.

In [None]:
llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-flash-preview",
        "api_type": "google",
        "api_key": api_key,
        "thinking_budget": 4096,
        "include_thoughts": True,
    }
)

agent = ConversableAgent(name="agent-thoughts", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

## Tips
- For long/complex tasks, use a higher `thinking_budget`.
- `thinking_level` can be lowered for lighter reasoning.
- Set `include_thoughts=True` when you want thought summaries; turn off to reduce output.

Reference: [Gemini Thinking Guide](https://ai.google.dev/gemini-api/docs/thinking)
