1. Description
Since upgrading to google-adk v1.29.0, the Multimodal Live API (gemini-live-2.5-flash-native-audio on Vertex AI) intermittently crashes with google.genai.errors.APIError: 1007 None.
The session typically establishes correctly, but the error triggers mid-conversation during active audio streaming. The error message explicitly cites an invalid audio format: "16khz s16le pcm, mono channel", despite the client-side input remaining consistent and verified at these specifications. This appears to be a regression in how the ADK frames or sequences audio blobs under sustained load or network jitter. This was not an issue in versions prior to this. I migrated to newer version to get this fix 6b1600f
Steps to Reproduce:
- Initialize an ADK LlmAgent using the gemini-live-2.5-flash-native-audio model on Vertex AI.
- Establish a bidirectional session using runner.run_live().
- Engage in a multi-turn conversation, providing sustained audio input (3+ minutes).
- Observe the connection drop with the 1007 None traceback during an active audio turn.
Expected Behavior:
The WebSocket should maintain a stable bidirectional stream. The backend should consistently validate the audio packets provided by the ADK as long as the input format (PCM 16kHz) does not change.
Observed Behavior:
The connection terminates mid-stream with status 1007 (Invalid Frame Payload).
Log Snippet: APIError in live flow: 1007 None. error when processing input audio, please check if the inputaudio is in valid format: 16khz s16le pcm, mono channel.; Error
Environment Details:
ADK Library Version: 1.29.0
Google-GenAI Version: (Check via pip show google-genai)
Python Version: 3.14.4
Model: gemini-live-2.5-flash-native-audio (Vertex AI)
Deployment: Cloud Run
Minimal Reproduction Code:
Python
import asyncio
import os
from fastapi import FastAPI, WebSocket
from google.adk.agents.live_request_queue import LiveRequestQueue
from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.memory import InMemoryMemoryService
from google.adk.agents import LlmAgent
from google.genai import types
- Minimal Agent Setup
Replace with your specific instructions/tools if necessary
mock_agent = LlmAgent(
model="gemini-live-2.5-flash-native-audio",
instructions="You are a helpful Python tutor."
)
app = FastAPI()
session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()
runner = Runner(
app_name="reproduction-app",
agent=mock_agent,
session_service=session_service,
memory_service=memory_service
)
@app.websocket("/ws/{user_id}/{session_id}")
async def websocket_endpoint(websocket: WebSocket, user_id: str, session_id: str):
await websocket.accept()
2. Minimal RunConfig
run_config = RunConfig(
streaming_mode=StreamingMode.BIDI,
response_modalities=["AUDIO"], # Required for your PCM player
input_audio_transcription=types.AudioTranscriptionConfig(language_codes=["en-GB"]),
output_audio_transcription=types.AudioTranscriptionConfig(language_codes=["en-GB"]),
session_resumption=types.SessionResumptionConfig(transparent=True),
context_window_compression=types.ContextWindowCompressionConfig(
trigger_tokens=100000, # Start compression at ~78% of 128k context
sliding_window=types.SlidingWindow(
target_tokens=80000 # Compress to ~62% of context, preserving recent turns
)
),
proactivity=types.ProactivityConfig(proactive_audio=True) if proactivity else None,
enable_affective_dialog=affective_dialog,
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(
voice_name=os.getenv("AGENT_VOICE", "Puck")
)
),
language_code=os.getenv("AGENT_LANGUAGE", "en-US")
)
)
live_request_queue = LiveRequestQueue()
3. Upstream: WebSocket -> Gemini
async def client_to_agent_messaging():
try:
while True:
message = await websocket.receive()
if "bytes" in message:
audio_blob = types.Blob(mime_type="audio/pcm;rate=16000", data=message["bytes"])
live_request_queue.send_realtime(audio_blob)
except Exception:
live_request_queue.close()
4. Downstream: Gemini -> WebSocket (Where the 1000 None occurs)
async def agent_to_client_messaging():
try:
async for event in runner.run_live(
user_id=user_id,
session_id=session_id,
live_request_queue=live_request_queue,
run_config=run_config,
):
await websocket.send_text(event.model_dump_json(exclude_none=True))
except Exception as e:
print(f"CRASH DETECTED: {e}")
done, pending = await asyncio.wait(
[
asyncio.create_task(client_to_agent_messaging()),
asyncio.create_task(agent_to_client_messaging()),
],
return_when=asyncio.FIRST_COMPLETED,
)
# 1. Propagate exceptions from the tasks that finished
for task in done:
try:
task.result()
except Exception as e:
logger.error(f"Task failed with exception: {e}")
raise e
# 2. Cancel the remaining tasks
for task in pending:
task.cancel()
try:
await task
except asyncio.CancelledError:
pass
finally:
# Final cleanup: Just call close(), don't check for .is_closed()
try:
live_request_queue.close()
except Exception:
pass
if websocket.client_state != WebSocketState.DISCONNECTED:
try:
await websocket.close()
except Exception:
pass
How often has this issue occurred?:
Very Frequently (60-70%+)
1. Description
Since upgrading to google-adk v1.29.0, the Multimodal Live API (gemini-live-2.5-flash-native-audio on Vertex AI) intermittently crashes with google.genai.errors.APIError: 1007 None.
The session typically establishes correctly, but the error triggers mid-conversation during active audio streaming. The error message explicitly cites an invalid audio format: "16khz s16le pcm, mono channel", despite the client-side input remaining consistent and verified at these specifications. This appears to be a regression in how the ADK frames or sequences audio blobs under sustained load or network jitter. This was not an issue in versions prior to this. I migrated to newer version to get this fix 6b1600f
Steps to Reproduce:
Expected Behavior:
The WebSocket should maintain a stable bidirectional stream. The backend should consistently validate the audio packets provided by the ADK as long as the input format (PCM 16kHz) does not change.
Observed Behavior:
The connection terminates mid-stream with status 1007 (Invalid Frame Payload).
Log Snippet: APIError in live flow: 1007 None. error when processing input audio, please check if the inputaudio is in valid format: 16khz s16le pcm, mono channel.; Error
Environment Details:
ADK Library Version: 1.29.0
Google-GenAI Version: (Check via pip show google-genai)
Python Version: 3.14.4
Model: gemini-live-2.5-flash-native-audio (Vertex AI)
Deployment: Cloud Run
Minimal Reproduction Code:
Python
import asyncio
import os
from fastapi import FastAPI, WebSocket
from google.adk.agents.live_request_queue import LiveRequestQueue
from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.memory import InMemoryMemoryService
from google.adk.agents import LlmAgent
from google.genai import types
Replace with your specific instructions/tools if necessary
mock_agent = LlmAgent(
model="gemini-live-2.5-flash-native-audio",
instructions="You are a helpful Python tutor."
)
app = FastAPI()
session_service = InMemorySessionService()
memory_service = InMemoryMemoryService()
runner = Runner(
app_name="reproduction-app",
agent=mock_agent,
session_service=session_service,
memory_service=memory_service
)
@app.websocket("/ws/{user_id}/{session_id}")
async def websocket_endpoint(websocket: WebSocket, user_id: str, session_id: str):
await websocket.accept()
2. Minimal RunConfig
run_config = RunConfig(
streaming_mode=StreamingMode.BIDI,
response_modalities=["AUDIO"], # Required for your PCM player
input_audio_transcription=types.AudioTranscriptionConfig(language_codes=["en-GB"]),
output_audio_transcription=types.AudioTranscriptionConfig(language_codes=["en-GB"]),
session_resumption=types.SessionResumptionConfig(transparent=True),
context_window_compression=types.ContextWindowCompressionConfig(
trigger_tokens=100000, # Start compression at ~78% of 128k context
sliding_window=types.SlidingWindow(
target_tokens=80000 # Compress to ~62% of context, preserving recent turns
)
),
proactivity=types.ProactivityConfig(proactive_audio=True) if proactivity else None,
enable_affective_dialog=affective_dialog,
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(
voice_name=os.getenv("AGENT_VOICE", "Puck")
)
),
language_code=os.getenv("AGENT_LANGUAGE", "en-US")
)
)
live_request_queue = LiveRequestQueue()
3. Upstream: WebSocket -> Gemini
async def client_to_agent_messaging():
try:
while True:
message = await websocket.receive()
if "bytes" in message:
audio_blob = types.Blob(mime_type="audio/pcm;rate=16000", data=message["bytes"])
live_request_queue.send_realtime(audio_blob)
except Exception:
live_request_queue.close()
4. Downstream: Gemini -> WebSocket (Where the 1000 None occurs)
async def agent_to_client_messaging():
try:
async for event in runner.run_live(
user_id=user_id,
session_id=session_id,
live_request_queue=live_request_queue,
run_config=run_config,
):
await websocket.send_text(event.model_dump_json(exclude_none=True))
except Exception as e:
print(f"CRASH DETECTED: {e}")
done, pending = await asyncio.wait(
[
asyncio.create_task(client_to_agent_messaging()),
asyncio.create_task(agent_to_client_messaging()),
],
return_when=asyncio.FIRST_COMPLETED,
)
How often has this issue occurred?:
Very Frequently (60-70%+)