-
Notifications
You must be signed in to change notification settings - Fork 448
Description
Problem Statement
Gemini API provides powerful built-in tools such as GoogleSearch
, CodeExecution
, ComputerUse
, UrlContext
, and FileSearch
as documented at https://ai.google.dev/gemini-api/docs/google-search and https://ai.google.dev/api/caching#Tool. These tools are distinct from standard FunctionDeclaration
-based tools and are defined in separate fields within the Gemini API's Tool
type.
Currently, the Strands SDK's tool interface is designed around standard function declarations and cannot accommodate these Gemini-specific built-in tools. While GeminiConfig
has a params
field for passing Gemini-specific parameters, the tools
are handled through Strands' standard tool interface, making it impossible to pass Gemini's built-in tools through the params
mechanism.
This limitation prevents users from leveraging Gemini's powerful built-in capabilities such as:
- Real-time web search with grounding and citations (
GoogleSearch
) - Code execution for computational tasks (
CodeExecution
) - Computer control capabilities (
ComputerUse
) - URL-based context retrieval (
UrlContext
) - File search functionality (
FileSearch
)
Proposed Solution
Add support for Gemini's built-in tools by providing a mechanism to pass genai.types.Tool
objects that contain non-FunctionDeclaration tools (such as GoogleSearch
, CodeExecution
, etc.) to the Gemini model provider.
The solution should:
- Be Gemini-specific (not affect the core tool interface or other model providers)
- Support passing
genai.types.Tool
objects directly to maintain type safety and compatibility with Gemini API - Allow combining standard function calling tools with Gemini built-in tools
- Clearly distinguish between standard function tools and Gemini-specific built-in tools
Use Case
from strands import Agent
from strands.models.gemini import GeminiModel
from google import genai
from google.genai import types
# Create model with Google Search capability
model = GeminiModel(
model_id="gemini-2.5-flash",
gemini_tools=[
types.Tool(google_search=types.GoogleSearch())
]
)
agent = Agent(model=model)
# Agent can now answer questions with real-time web data and citations
response = agent("Who won the euro 2024?")
# Response will be grounded in recent web search results
Alternatives Solutions
Alternative 1: Modify ToolSpec Interface
Approach: Extend the core ToolSpec
type to accept genai.types.Tool
objects.
Pros:
- Unified interface for all tool types
- Consistent across all model providers
Cons:
- Very broad impact: Affects core tool interface used by all model providers
- Breaking change risk: Could break existing implementations
- Provider-specific coupling: Introduces Gemini-specific types into shared interface
- Maintenance burden: Core tool interface would need to accommodate multiple provider-specific formats
Alternative 2: Magic String Transformation
Approach: Accept specially formatted FunctionDeclaration
objects with reserved names (e.g., __gemini_google_search__
) and transform them internally to Gemini built-in tools.
Pros:
- No changes to existing interfaces
- Works with current tool infrastructure
Cons:
- Fragile: Relies on magic strings/conventions that could conflict with real tool names
- Not future-proof: May not adapt well to new Gemini tool parameters or features
- Collision risk: Could conflict with actual user-defined function names
- Hidden behavior: Transformation logic is implicit and hard to discover
- Poor developer experience: Requires documentation of special naming conventions
Additional Context
Unlike standard function calling where the SDK executes tools and can track their usage, Gemini's built-in tools are executed server-side by Google's infrastructure. This presents a challenge for tool usage tracking and observability.
The built-in tools don't generate toolUse
and toolResult
blocks in the same way as standard function declarations. Instead, they:
- Execute transparently on Google's servers
- Return results directly in the response
- May include grounding metadata (for
GoogleSearch
) but not explicit tool call traces
Potential Solutions:
- Document the difference in behavior between standard and built-in tools
- Parse grounding metadata where available (e.g.,
GoogleSearch
includesgroundingMetadata
)