Skip to content

Stereo capture for openai realtime demo #93

@diegoasua

Description

@diegoasua

Checklist

  • Checked the issue tracker for similar issues to ensure this is not a duplicate.
  • Described the feature in detail and justified the reason for the request.
  • Provided specific use cases and examples.

Feature description

Korvo boards have dual mic and currently we are not taking advantage of the second mic as it only uses 1 mic + AEC reference in TDM. As far as I can tell the i2s path can take up to 4 channels in TDM, only 2 used and 2 masked. We could use one of the two available channels for the second mic to do beamforming and improve capture from further away. Reading about esp-sr AFE should support this to a point as it already can do dual-channel BSS:

Supports dual-channel processing, which can well separate the target sound source from the rest of the interference sound, so as to extract the useful audio signal and ensure the quality of the subsequent speech.

How difficult would it be to enable a second channel over TDM? What changes would it need on the capture side (codec init, TDM setup, AFE changes etc)?

Related, OpenAI realtime seems to be capturing stereo Opus 48kHz, would it be possible to send both channels as stereo? Or else, beamform into mono and let esp-peer upsample to stereo 48kHz as it does now.

Use cases

Better capture from far away or with surrounding noise.

Alternatives

Single mic (already implemented)

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions