- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.2k
Description
Describe the bug
We are currently facing a bug using the static instruction on custom agents. Our current agent pipeline has a parallel agent that spin up several versions as needed of a "processor worker agent" within this custom agent that runs in parallel we instantiate 2 llm agents that the custom one has to run. For both of these agents we are providing a static instruction and a dynamic instruction. The idea is that all the parallel agents as they process should reuse the static parts of the prompt.
For instance,
- Agent 1 - Starts pipeline and creates workers "ProcessorWorker" agents
- Within agent we trigger a parallel agent with all workers we created earlier.
- Inside each worker (for example purposes we assume 10 workers are created) we instantiate the 2 sub llm agents for processing data.
- The idea is to have each of the 10 workers use the same cached static instructions so we don't create the same instruction input cost 10 times.
Notes:
- When initially only adding the static_instruction, we can see the gemini context caching manager sees each of the calls as a first time invocation so it avoids creating a cache.  with log: "No previous token count available, skipping cache creation for initial request"
- We cannot instantiate the llm agents outside the worker in the and then pass the child llm agents to be used in the parallel worker because this will cause race conditions and we would loose each agents outputKey, even if this was possible, the caching does find the agents being used but it creates a new cache on every call of the parallel execution.
- THE PROBLEM: even though we pass the cached_content into the generateContentConfig, we dont see the usage metadata referencing any cached tokens. How do we use explicit caching in adk?
To Reproduce
Create a parallel agent with sub agents instantiated in each of the workers
Also assume the root agent correctly instantiates the needed attributes to create and use caching:
cache_config = ContextCacheConfig(
    cache_intervals=50,
    ttl_seconds=1800, # 30 minutes
    min_tokens = 1024,
)
root_agent = App(
    name="root_agent",
    root_agent=pipeline_agent,
    context_cache_config=cache_config,
)
app = root_agent
Expected behavior
Ideal functionality:
- I believe the ideal scenario would be to use explicit caching in this case, before the "Agent 1" instantiates all of the workers, we make a before agent callback and create the necessary cached content. then as we trigger the sub worker agents in parallel we pass in the cached content.
- THE PROBLEM: even though we pass the cached_content into the generateContentConfig, we dont see the usage metadata referencing any cached tokens.
subagent inside parallel worker = Agent(
      name=f"some name unique to this sub agent",
      model="gemini-2.0-flash",
      # static_instruction=models.consolidator_worker_static(), -> not being used since it just avoids creating the cache each time
      instruction=models.dynamic(chunk),
      output_schema=models.WorkerDocument,
      output_key=f"{self.name}_document",
      disallow_transfer_to_parent=True,
      disallow_transfer_to_peers=True,
      generate_content_config=types.GenerateContentConfig(
          temperature=0.0,
          max_output_tokens=8000,
          cached_content=self.run_cache_name
      ),
      planner=planner,
  )
I would expect when using static instruction for implicit caching to work, when explicitly creating a custom cache to use I would expect that providing it in the GenerateContentConfig will trigger the llm to use it.
KEY QUESTION:
How do we use explicit caching in adk?
Screenshots
Not available
Desktop (please complete the following information):
- OS: macOS
- Python version(python -V): python-3.12
- ADK version(pip show google-adk):
 google-adk==1.16.0
 google-genai==1.43.0
Model Information:
- Are you using LiteLLM: No
- Which model is being used(e.g. gemini-2.5-flash, gemini-2.0-flash)