-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Description
VertexAiRagRetrieval.run_async() calls rag.retrieval_query() directly, which uses synchronous gRPC under the hood (grpc._channel._blocking → call.next_event()). Despite being an async def method, the entire call blocks the event loop for 1-2 seconds on every invocation.
This is particularly problematic in real-time/streaming applications (e.g. StreamingMode.BIDI with gemini-live-* models) where blocking the event loop causes audio/video desync and dropped frames.
Stack trace captured via event loop watchdog
File "google/adk/tools/retrieval/vertex_ai_rag_retrieval.py", line 95, in run_async
response = rag.retrieval_query(
File "vertexai/preview/rag/rag_retrieval.py", line 270, in retrieval_query
response = client.retrieve_contexts(request=request)
File "grpc/_channel.py", line 1150, in _blocking
event = call.next_event()
Event loop blocked for 1.2s+ on every RAG tool call — not just the first.
Affected code
@override
async def run_async(self, *, args, tool_context) -> Any:
from ...dependencies.vertexai import rag
# This is a synchronous gRPC call that blocks the event loop
response = rag.retrieval_query(...)Suggested fix
Wrap the synchronous call with asyncio.to_thread():
import asyncio
@override
async def run_async(self, *, args, tool_context) -> Any:
from ...dependencies.vertexai import rag
response = await asyncio.to_thread(
rag.retrieval_query,
text=args['query'],
rag_resources=self.vertex_rag_store.rag_resources,
rag_corpora=self.vertex_rag_store.rag_corpora,
similarity_top_k=self.vertex_rag_store.similarity_top_k,
vector_distance_threshold=self.vertex_rag_store.vector_distance_threshold,
)This is the same pattern used to fix the equivalent issue in GCS artifact storage (#1346 / PR #1347).
Environment
google-adk==1.23.0 (also verified on main — still present)
google-cloud-aiplatform==1.135.0
Python 3.13
Using StreamingMode.BIDI with gemini-live-2.5-flash-native-audio