Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion src/agents/voice/result.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,16 @@ async def _add_error(self, error: Exception):
def _transform_audio_buffer(
self, buffer: list[bytes], output_dtype: npt.DTypeLike
) -> npt.NDArray[np.int16 | np.float32]:
np_array = np.frombuffer(b"".join(buffer), dtype=np.int16)
# Combine all chunks
combined_buffer = b"".join(buffer)

# Pad with a zero byte if the buffer length is odd
# This is needed because np.frombuffer with dtype=np.int16 requires
# the buffer size to be a multiple of 2 bytes
if len(combined_buffer) % 2 != 0:
combined_buffer += b"\x00"
Comment on lines +94 to +98
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Padding the trailing half-sample with 0 introduces synthesized audio data and can cause subtle artifacts in streaming scenarios. A more accurate approach is to carry over the trailing single byte to the next call (e.g., keep self._pending_byte, prepend it on the next invocation, and if the combined length is odd, stash the last byte instead of padding), preserving sample integrity without injecting data.

Copilot uses AI. Check for mistakes.
Comment on lines +94 to +98

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid zero‑padding half samples midstream

Padding an odd-length audio buffer with b"\x00" before calling np.frombuffer causes a permanent byte shift when the odd length occurs before the final chunk. In normal streaming, a TTS provider may emit an odd-sized chunk whose last byte is just the first half of a 16‑bit sample; zero‑padding here turns that half sample into its own frame and the next chunk’s first byte becomes the low byte of a new sample. From that point the stream is misaligned and produces distorted audio rather than the intended samples. Instead, carry the extra byte forward and prepend it to the next chunk so that sample boundaries remain intact.

Useful? React with 👍 / 👎.


Comment on lines +91 to +99
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This builds combined_buffer (size N) and then, on odd lengths, allocates and copies again to append one byte, resulting in two large allocations. You can avoid the extra copy by appending the padding byte to the list of chunks and performing a single join, e.g.:
total_len = sum(len(c) for c in buffer)
if total_len % 2 != 0:
buffer = [*buffer, b"\x00"]
np_array = np.frombuffer(b"".join(buffer), dtype=np.int16).

Suggested change
# Combine all chunks
combined_buffer = b"".join(buffer)
# Pad with a zero byte if the buffer length is odd
# This is needed because np.frombuffer with dtype=np.int16 requires
# the buffer size to be a multiple of 2 bytes
if len(combined_buffer) % 2 != 0:
combined_buffer += b"\x00"
# Combine all chunks, padding with a zero byte if the total length is odd
total_len = sum(len(c) for c in buffer)
if total_len % 2 != 0:
buffer = [*buffer, b"\x00"]
combined_buffer = b"".join(buffer)

Copilot uses AI. Check for mistakes.
np_array = np.frombuffer(combined_buffer, dtype=np.int16)

if output_dtype == np.int16:
return np_array
Comment on lines 102 to 103
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing output_dtype directly to np.int16 may fail for equivalent values like 'int16' or np.dtype('int16'). Normalize the dtype for robust comparison:
if np.dtype(output_dtype) == np.dtype(np.int16):
return np_array.

Copilot uses AI. Check for mistakes.
Expand Down