Streaming MP3 player for ESP32-S3: HTTP chunked GET → 128 KB PSRAM ring buffer → MP3 decode → resample → mono PCM → your audio sink, with cooperative abort and zero internal-SRAM growth.
The audio sink and HTTP transport are injected via abstract interfaces, so this component carries no project-specific dependencies (no Application/Board/codec singletons).
- ✅ Streaming pipeline — start playback as soon as the first MP3 frame arrives; no buffering of the whole file.
- ✅ Zero internal-SRAM growth — task stacks and ring buffer live in PSRAM. (The decode task is pinned to Core 0 with a small internal-RAM stack at priority 7, the same level as
opus_codec, to avoid I2S DMA underrun under contention. See Memory model.) - ✅ Cooperative abort —
Stop()and re-Play()set an abort flag and wait for both download and decode tasks to drain (TLS-read safe — no use-after-free of the ring buffer). - ✅ Stereo → mono fold + lazy resampler that adapts to the codec's output rate (auto from first decoded frame's
sample_rate). - ✅ Pluggable transports — bring your own HTTP client (mbedTLS, esp_http_client, ML307 cellular wrapper, …) and your own audio sink (
AudioCodec, raw I2S, ESP-ADF pipeline, …). - ✅ Error reporting via callback — never blocks UI; never calls
ApplicationorDisplaydirectly.
| Targets | ESP32-S3 (PSRAM required) |
| ESP-IDF | ≥ 5.3 |
| External components | espressif/esp_audio_codec ~2.4.1, espressif/esp_audio_effects ~1.2.1 |
Add to your project's idf_component.yml:
dependencies:
mydazy/esp_mp3_player: "^0.1.0"#include "mp3_player.h"
#include <memory>
using namespace mydazy;
// 1. Implement IAudioOutput (wrap whatever your project uses)
class MyAudioOutput : public IAudioOutput {
public:
int output_sample_rate() const override { return 16000; }
int output_channels() const override { return 1; }
bool output_enabled() const override { return enabled_; }
void EnableOutput(bool enable) override { enabled_ = enable; }
void OutputData(std::vector<int16_t>& data) override {
// hand off to your codec / I2S / pipeline
}
private:
bool enabled_ = false;
};
// 2. Implement IHttpClient + IHttpFactory (wrap esp_http_client, ML307, …)
class MyHttp : public IHttpClient { /* ... */ };
class MyHttpFactory : public IHttpFactory {
public:
std::unique_ptr<IHttpClient> CreateHttp() override {
return std::make_unique<MyHttp>();
}
};
// 3. Wire it up
static MyAudioOutput audio;
static MyHttpFactory http;
void app_main_cpp() {
Mp3Player::Callbacks cb;
cb.on_error = [](const char* status, const char* message) {
ESP_LOGW("APP", "mp3 error: %s — %s", status, message);
};
Mp3Player::GetInstance().Initialize(&audio, &http, cb);
std::string err;
if (!Mp3Player::GetInstance().Play("https://example.com/song.mp3", "Demo", &err)) {
ESP_LOGE("APP", "play failed: %s", err.c_str());
}
// ... later
Mp3Player::GetInstance().Stop();
}A complete project-agnostic example lives under examples/basic/.
| Resource | Size | Caps | Notes |
|---|---|---|---|
| Compressed ring | 128 KB | MALLOC_CAP_SPIRAM |
MP3 byte stream from HTTP → decoder. ≈ 8 s @ 128 kbps; absorbs 4G / network jitter. |
| PCM ring | 32 KB | MALLOC_CAP_SPIRAM |
Decoded PCM → output. ≈ 340 ms @ 24 kHz mono — absorbs decode jitter and HTTP read stalls so they never reach the I2S DMA consumer. |
| Download task stack | 6 KB | MALLOC_CAP_SPIRAM |
Core 0, priority 1. One-shot HTTP I/O — PSRAM stack is safe here. |
| Decode task stack | 10 KB | internal RAM | Core 0, priority 7 — same lane as opus_codec. Continuous decode + I2S timing → must avoid the PSRAM-stack ↔ flash-op deadlock. |
| Output task stack | 4 KB | internal RAM | Core 0, priority 7. Drains PCM ring → IAudioOutput::OutputData. Same flash-op rationale as Decode. |
| HTTP read scratch | 2 KB | MALLOC_CAP_SPIRAM |
TLS handshake leaves little stack room — keep the buffer off the stack. |
| Decoder I/O buffers | 8 KB + ~4.6 KB | MALLOC_CAP_SPIRAM |
One MP3 frame = 1152 samples × 2 ch × int16_t. |
Net internal-SRAM increment vs. baseline: ~14 KB (decode + output task stacks). Everything else is PSRAM-resident.
The codec's I2S DMA buffer is shallow (~60 ms on most ESP32-S3 boards: dma_desc_num=6 × dma_frame_num=240 ÷ 24 kHz). If the decode task wrote PCM into the codec directly, any upstream stall longer than 60 ms — a TLS retransmit, a 4G handover, even just the first-frame esp_audio_dec_get_info setup — would underrun the DMA and cause an audible click.
The Output task and PCM ring decouple the codec write from upstream timing. The PCM ring (~340 ms) is a buffer ten times larger than the typical jitter source, so the codec's I2S consumer always has data to draw from. This mirrors the pattern used in ESP-ADF (audio_pipeline with element-to-element ringbuffers) and in many production codec stacks.
ESP32-S3 cache and PSRAM share the SPI bus. When any flash op happens (NVS write, OTA, partition op), spi_flash_op_lock() disables both cache and PSRAM on both cores for the duration. Any task scheduled with a PSRAM stack during that window will fault with a "Double Exception" (SP corruption like SP=0x60100000).
The decode task is on Core 0 (where flash ops are issued) and runs continuously, so a PSRAM stack would be a stable foot-gun. Internal RAM avoids it. The download task runs only briefly at low priority and tolerates the stall, so PSRAM is fine for it.
class Mp3Player {
public:
static Mp3Player& GetInstance();
// One-time init. Pointers must outlive the player (typically static/singleton).
void Initialize(IAudioOutput* audio, IHttpFactory* http,
const Callbacks& cb = {});
// Start streaming. Auto-aborts any previous playback.
// err_msg: synchronous failure reason (URL invalid / not initialized /
// PSRAM exhaustion / task creation). Async failures (HTTP 4xx/5xx,
// decoder errors) go through Callbacks::on_error.
bool Play(const std::string& url,
const std::string& title = "",
std::string* err_msg = nullptr);
void Stop();
bool IsPlaying() const;
std::string GetCurrentTitle() const;
};struct IAudioOutput {
virtual int output_sample_rate() const = 0;
virtual int output_channels() const = 0;
virtual bool output_enabled() const = 0;
virtual void EnableOutput(bool enable) = 0;
virtual void OutputData(std::vector<int16_t>& pcm) = 0;
};
struct IHttpClient {
virtual bool Open(const std::string& method, const std::string& url) = 0;
virtual int GetStatusCode() = 0;
virtual size_t GetBodyLength() = 0;
virtual int Read(char* buf, size_t size) = 0; // bytes, or <0 on error
virtual void Close() = 0;
virtual void SetTimeout(int ms) = 0;
};
struct IHttpFactory {
virtual std::unique_ptr<IHttpClient> CreateHttp() = 0;
};
struct Callbacks {
std::function<void(const char* status, const char* message)> on_error;
std::function<void(const std::string& title)> on_started;
std::function<void()> on_finished;
};- TLS-read tolerance:
Stop()waits up to 2 s for tasks to drain. If a task is stuck in TLS read it will exit on the next read return; the ring buffer is not released until both tasks have exited (prevents use-after-free). - Re-entrant
Play(): a freshPlay()aborts and joins the previous one before allocating new buffers. Caller does not need to callStop()first. - Decoder format strictness: if no frame ever decodes (
sample_rate == 0) and ≥ 16 consecutive errors accumulate, the player aborts with"Audio format is not MP3 or corrupted".
Apache-2.0. See LICENSE.