esp_mp3_player

Streaming MP3 player for ESP32-S3: HTTP chunked GET → 128 KB PSRAM ring buffer → MP3 decode → resample → mono PCM → your audio sink, with cooperative abort and zero internal-SRAM growth.

The audio sink and HTTP transport are injected via abstract interfaces, so this component carries no project-specific dependencies (no Application/Board/codec singletons).

Features

✅ Streaming pipeline — start playback as soon as the first MP3 frame arrives; no buffering of the whole file.
✅ Zero internal-SRAM growth — task stacks and ring buffer live in PSRAM. (The decode task is pinned to Core 0 with a small internal-RAM stack at priority 7, the same level as opus_codec, to avoid I2S DMA underrun under contention. See Memory model.)
✅ Cooperative abort — Stop() and re-Play() set an abort flag and wait for both download and decode tasks to drain (TLS-read safe — no use-after-free of the ring buffer).
✅ Stereo → mono fold + lazy resampler that adapts to the codec's output rate (auto from first decoded frame's sample_rate).
✅ Pluggable transports — bring your own HTTP client (mbedTLS, esp_http_client, ML307 cellular wrapper, …) and your own audio sink (AudioCodec, raw I2S, ESP-ADF pipeline, …).
✅ Error reporting via callback — never blocks UI; never calls Application or Display directly.

Hardware / IDF


Targets	ESP32-S3 (PSRAM required)
ESP-IDF	≥ 5.3
External components	`espressif/esp_audio_codec ~2.4.1`, `espressif/esp_audio_effects ~1.2.1`

Installation

Add to your project's idf_component.yml:

dependencies:
  mydazy/esp_mp3_player: "^0.1.0"

Quick start

#include "mp3_player.h"
#include <memory>

using namespace mydazy;

// 1. Implement IAudioOutput (wrap whatever your project uses)
class MyAudioOutput : public IAudioOutput {
public:
    int output_sample_rate() const override { return 16000; }
    int output_channels() const override { return 1; }
    bool output_enabled() const override { return enabled_; }
    void EnableOutput(bool enable) override { enabled_ = enable; }
    void OutputData(std::vector<int16_t>& data) override {
        // hand off to your codec / I2S / pipeline
    }
private:
    bool enabled_ = false;
};

// 2. Implement IHttpClient + IHttpFactory (wrap esp_http_client, ML307, …)
class MyHttp : public IHttpClient { /* ... */ };
class MyHttpFactory : public IHttpFactory {
public:
    std::unique_ptr<IHttpClient> CreateHttp() override {
        return std::make_unique<MyHttp>();
    }
};

// 3. Wire it up
static MyAudioOutput audio;
static MyHttpFactory http;

void app_main_cpp() {
    Mp3Player::Callbacks cb;
    cb.on_error = [](const char* status, const char* message) {
        ESP_LOGW("APP", "mp3 error: %s — %s", status, message);
    };

    Mp3Player::GetInstance().Initialize(&audio, &http, cb);

    std::string err;
    if (!Mp3Player::GetInstance().Play("https://example.com/song.mp3", "Demo", &err)) {
        ESP_LOGE("APP", "play failed: %s", err.c_str());
    }
    // ... later
    Mp3Player::GetInstance().Stop();
}

A complete project-agnostic example lives under examples/basic/.

Memory model

Resource	Size	Caps	Notes
Compressed ring	128 KB	`MALLOC_CAP_SPIRAM`	MP3 byte stream from HTTP → decoder. ≈ 8 s @ 128 kbps; absorbs 4G / network jitter.
PCM ring	32 KB	`MALLOC_CAP_SPIRAM`	Decoded PCM → output. ≈ 340 ms @ 24 kHz mono — absorbs decode jitter and HTTP read stalls so they never reach the I2S DMA consumer.
Download task stack	6 KB	`MALLOC_CAP_SPIRAM`	Core 0, priority 1. One-shot HTTP I/O — PSRAM stack is safe here.
Decode task stack	10 KB	internal RAM	Core 0, priority 7 — same lane as `opus_codec`. Continuous decode + I2S timing → must avoid the PSRAM-stack ↔ flash-op deadlock.
Output task stack	4 KB	internal RAM	Core 0, priority 7. Drains PCM ring → `IAudioOutput::OutputData`. Same flash-op rationale as Decode.
HTTP read scratch	2 KB	`MALLOC_CAP_SPIRAM`	TLS handshake leaves little stack room — keep the buffer off the stack.
Decoder I/O buffers	8 KB + ~4.6 KB	`MALLOC_CAP_SPIRAM`	One MP3 frame = 1152 samples × 2 ch × `int16_t`.

Net internal-SRAM increment vs. baseline: ~14 KB (decode + output task stacks). Everything else is PSRAM-resident.

Why the three-stage pipeline?

The codec's I2S DMA buffer is shallow (~60 ms on most ESP32-S3 boards: dma_desc_num=6 × dma_frame_num=240 ÷ 24 kHz). If the decode task wrote PCM into the codec directly, any upstream stall longer than 60 ms — a TLS retransmit, a 4G handover, even just the first-frame esp_audio_dec_get_info setup — would underrun the DMA and cause an audible click.

The Output task and PCM ring decouple the codec write from upstream timing. The PCM ring (~340 ms) is a buffer ten times larger than the typical jitter source, so the codec's I2S consumer always has data to draw from. This mirrors the pattern used in ESP-ADF (audio_pipeline with element-to-element ringbuffers) and in many production codec stacks.

Why internal-RAM stack for decode?

ESP32-S3 cache and PSRAM share the SPI bus. When any flash op happens (NVS write, OTA, partition op), spi_flash_op_lock() disables both cache and PSRAM on both cores for the duration. Any task scheduled with a PSRAM stack during that window will fault with a "Double Exception" (SP corruption like SP=0x60100000).

The decode task is on Core 0 (where flash ops are issued) and runs continuously, so a PSRAM stack would be a stable foot-gun. Internal RAM avoids it. The download task runs only briefly at low priority and tolerates the stall, so PSRAM is fine for it.

API

`mydazy::Mp3Player`

class Mp3Player {
public:
    static Mp3Player& GetInstance();

    // One-time init. Pointers must outlive the player (typically static/singleton).
    void Initialize(IAudioOutput* audio, IHttpFactory* http,
                    const Callbacks& cb = {});

    // Start streaming. Auto-aborts any previous playback.
    // err_msg: synchronous failure reason (URL invalid / not initialized /
    //          PSRAM exhaustion / task creation). Async failures (HTTP 4xx/5xx,
    //          decoder errors) go through Callbacks::on_error.
    bool Play(const std::string& url,
              const std::string& title = "",
              std::string* err_msg = nullptr);

    void Stop();
    bool IsPlaying() const;
    std::string GetCurrentTitle() const;
};

Injection interfaces

struct IAudioOutput {
    virtual int  output_sample_rate() const = 0;
    virtual int  output_channels()    const = 0;
    virtual bool output_enabled()     const = 0;
    virtual void EnableOutput(bool enable) = 0;
    virtual void OutputData(std::vector<int16_t>& pcm) = 0;
};

struct IHttpClient {
    virtual bool   Open(const std::string& method, const std::string& url) = 0;
    virtual int    GetStatusCode() = 0;
    virtual size_t GetBodyLength() = 0;
    virtual int    Read(char* buf, size_t size) = 0;          // bytes, or <0 on error
    virtual void   Close() = 0;
    virtual void   SetTimeout(int ms) = 0;
};

struct IHttpFactory {
    virtual std::unique_ptr<IHttpClient> CreateHttp() = 0;
};

struct Callbacks {
    std::function<void(const char* status, const char* message)> on_error;
    std::function<void(const std::string& title)> on_started;
    std::function<void()> on_finished;
};

Stability notes

TLS-read tolerance: Stop() waits up to 2 s for tasks to drain. If a task is stuck in TLS read it will exit on the next read return; the ring buffer is not released until both tasks have exited (prevents use-after-free).
Re-entrant Play(): a fresh Play() aborts and joins the previous one before allocating new buffers. Caller does not need to call Stop() first.
Decoder format strictness: if no frame ever decodes (sample_rate == 0) and ≥ 16 consecutive errors accumulate, the player aborts with "Audio format is not MP3 or corrupted".

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples/basic		examples/basic
include		include
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
idf_component.yml		idf_component.yml
mp3_player.cc		mp3_player.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

esp_mp3_player

Features

Hardware / IDF

Installation

Quick start

Memory model

Why the three-stage pipeline?

Why internal-RAM stack for decode?

API

`mydazy::Mp3Player`

Injection interfaces

Stability notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

esp_mp3_player

Features

Hardware / IDF

Installation

Quick start

Memory model

Why the three-stage pipeline?

Why internal-RAM stack for decode?

API

mydazy::Mp3Player

Injection interfaces

Stability notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`mydazy::Mp3Player`

Packages