Skip to content

Replace Qt Multimedia with a native audio abstraction #9886

@i2h3

Description

@i2h3

This is a research issue for now to come up with a plan and track the idea.

In a past analysis it came out that we use the Qt media framework to play an uncompressed WAV file in a single use case: when someone is calling via Nextcloud Talk. It adds a huge overhead to our app for the most basic requirement of just playing a sound in a loop. Please draft a plan to get rid of the whole Qt multimedia framework and replace it with a custom C++ API which abstracts the OS specifics of Windows, Linux and macOS to play such sound file with their native APIs.

The plan Claude Code (Opus 4.7 with high effort) came up with:

Replace Qt Multimedia with a native audio abstraction

Context

Qt Multimedia is pulled into the desktop client for exactly one purpose: playing the call-notification.wav ringtone in a loop when an incoming Nextcloud Talk call arrives. The framework (and its platform backends — WMF on Windows, AVFoundation on macOS, GStreamer/ALSA/Pulse on Linux) plus its QML plugin add significant installer/bundle size for a feature that amounts to "play this 898 KB PCM WAV a few times, then stop." The goal is to delete this dependency and replace it with a thin in-tree C++ class that wraps native OS audio APIs.

Expected outcomes:

  • Smaller Windows installer (no Qt6Multimedia.dll + Qt Multimedia QML plugin), smaller macOS bundle (no QtMultimedia.framework or qml/QtMultimedia/*), smaller Linux distribution (no GStreamer plugin bring-in).
  • No behavioral change to the call notification UX.
  • One project-owned audio abstraction in case a future feature needs another sound.

Scope

Qt Multimedia usage is confined to:

  • src/gui/tray/CallNotificationDialog.qml:11import QtMultimedia
  • src/gui/tray/CallNotificationDialog.qml:71-75SoundEffect { source; loops: 9 }
  • Start: Component.onCompleted at line 59 (ringSound.play()).
  • Stop: closeNotification() at line 39 (ringSound.stop()).
  • Asset: theme/call-notification.wav (PCM 16-bit stereo 44.1 kHz, 898 KB), embedded via theme.qrc.in:293 as qrc:///client/theme/call-notification.wav.
  • Qt Multimedia is NOT declared in any find_package/target_link_libraries. It is resolved at QML-import time and deployed by macdeployqt via QML scanning. Removing the import is the whole story on macOS. Windows deployment is NSIS-template driven at cmake/modules/NSIS.template.in; lines 436–437 reference Qt5Multimedia.dll / Qt5MultimediaWidgets.dll — dead code in a Qt6 build, but delete them as part of this change.

Public C++ API

OCC::NotificationSoundPlayer — QObject with properties/methods API-shape-compatible with the used subset of SoundEffect, so the QML diff is one import and one element name.

// src/gui/notificationsoundplayer.h
namespace OCC {
class NotificationSoundPlayer : public QObject
{
    Q_OBJECT
    Q_PROPERTY(QString source READ source WRITE setSource NOTIFY sourceChanged)
    Q_PROPERTY(int loops READ loops WRITE setLoops NOTIFY loopsChanged)
    Q_PROPERTY(bool playing READ isPlaying NOTIFY playingChanged)
public:
    explicit NotificationSoundPlayer(QObject *parent = nullptr);
    ~NotificationSoundPlayer() override;  // must stop playback

    QString source() const;
    int loops() const;                     // number of plays; default 1
    bool isPlaying() const;

public slots:
    void setSource(const QString &source); // accepts qrc:///..., file:///..., plain path
    void setLoops(int loops);
    void play();
    void stop();

signals:
    void sourceChanged();
    void loopsChanged();
    void playingChanged();

private:
    class Backend;
    std::unique_ptr<Backend> _backend;
    QString _source;
    int _loops = 1;
    bool _playing = false;
    QString _resolvedFilePath;

    QString resolveToFilesystemPath(const QString &source);
};
}

QRC → filesystem path

Native audio APIs (XAudio2, AVAudioPlayer, libcanberra) need real filesystem paths. Resolve QRC/qrc: sources once per process by copying to QStandardPaths::writableLocation(QStandardPaths::CacheLocation) + "/sounds/<hash>.wav", where <hash> derives from the resource path + size (idempotent across app updates). Static map guarded by QMutex; extractor writes to <path>.tmp then renames atomically. Fall back to QTemporaryFile if the cache dir is not writable (sandbox). Keep the WAV embedded in QRC — no install-path divergence per OS.

Platform backends

Windows — XAudio2 (not PlaySound)

Use XAudio2. PlaySound is rejected: SND_LOOP loops forever (needs a duration-based QTimer; duration requires parsing the WAV header anyway), only one PlaySound can play per process, and PlaySound(NULL, NULL, 0) stops process-wide.

XAudio2 provides exact loop counts via XAUDIO2_BUFFER::LoopCount (submit one buffer, LoopCount = loops - 1, LoopBegin = 0, LoopLength = 0), per-instance lifecycle (source voice), and no global state. Link xaudio2.lib. Inline WAV header parser (~50 LoC) for RIFF/fmt /data chunks; accept PCM 8/16/24-bit, mono/stereo, any sample rate; fail loudly on anything else (the theme WAV could be rebranded later — silent failure is worse than a logged error).

macOS — AVAudioPlayer via Objective-C++

.mm file wrapping AVAudioPlayer initWithContentsOfURL:error: with a file:// URL, numberOfLoops = loops - 1 (AVAudioPlayer semantic: additional plays after the first; -1 for infinite). -prepareToPlay, -play, -stop on destruction inside @autoreleasepool. Frameworks: AVFoundation, Foundation. Exact loop count, no timer, clean lifecycle.

Linux — libcanberra (committed)

libcanberra is the freedesktop event-sound API. An incoming-call ringtone is literally its intended use (CA_PROP_EVENT_ID = "phone-incoming-call" is a defined XDG sound theme event). It multiplexes PulseAudio, PipeWire (via pipewire-pulse or pipewire-alsa), and pure ALSA behind a single API. Packaged on every major distro, usually preinstalled on GNOME/KDE.

Rejected alternatives:

  • libpulse-simple: breaks on pure ALSA / pipewire-without-pulse; more code (WAV parser + worker thread + re-feed loop).
  • Direct ALSA: does not coexist with desktop sound servers.
  • fork paplay/aplay: hacky.

Loop counting: libcanberra has no native loop count. Subscribe to ca_finish_callback_t; on CA_SUCCESS, if remainingLoops > 0, decrement and ca_context_play_full again. The callback fires on the libcanberra worker thread — marshal to the main thread via QMetaObject::invokeMethod(this, ..., Qt::QueuedConnection) before touching members. stop() sets remainingLoops = 0 first (in-flight callback becomes a no-op), then ca_context_cancel.

Pass CA_PROP_MEDIA_FILENAME (extracted path), CA_PROP_MEDIA_ROLE = "event", CA_PROP_CANBERRA_CACHE_CONTROL = "permanent", CA_PROP_EVENT_ID = "phone-incoming-call".

Build-time optionality: pkg_check_modules(CANBERRA IMPORTED_TARGET libcanberra). If absent, compile a no-op backend that logs a warning on first play(). Do not hard-fail the build. Soft-dep libcanberra0 in distro packaging metadata.

File layout

Compile-switched backend files (mirrors the existing folderwatcher_{linux,win,mac}.cpp / systray_mac_common.mm convention) with PIMPL so platform headers never leak into the public header.

src/gui/notificationsoundplayer.h          # public QObject, no platform headers
src/gui/notificationsoundplayer.cpp        # dispatcher: source resolution, QRC extraction,
                                           # loop-count bookkeeping for backends that don't
                                           # do it natively (Linux), signals
src/gui/notificationsoundplayer_win.cpp    # XAudio2 Backend impl + WAV header parser
src/gui/notificationsoundplayer_mac.mm     # AVAudioPlayer Backend impl
src/gui/notificationsoundplayer_linux.cpp  # libcanberra Backend impl (+ no-op fallback)

The Backend inner class has a uniform interface (setSource, play(remainingLoops), stop(), d'tor + finished signal). Windows and macOS backends handle loops natively and only signal finished at sequence end. Linux backend signals after each play; the common file issues the next play().

CMake / build changes

src/gui/CMakeLists.txt

  • Add notificationsoundplayer.h and notificationsoundplayer.cpp to client_SRCS near callstatechecker (unconditional).
  • In the IF(APPLE) block (~line 289): append notificationsoundplayer_mac.mm.
  • In the IF(WIN32) block (~line 356): append notificationsoundplayer_win.cpp.
  • In the IF(NOT WIN32 AND NOT APPLE) block (~line 353): append notificationsoundplayer_linux.cpp.
  • Linking:
    • Windows: add target_link_libraries(nextcloudCore PRIVATE winmm xaudio2) inside the Windows block after ~line 364.
    • macOS: extend the existing -framework … list near line 684 with -framework AVFoundation -framework Foundation.
    • Linux: in the Linux block near line 677, add pkg_check_modules(CANBERRA IMPORTED_TARGET libcanberra); conditionally target_link_libraries(nextcloudCore PRIVATE PkgConfig::CANBERRA) and target_compile_definitions(nextcloudCore PRIVATE HAVE_LIBCANBERRA).

src/gui/owncloudgui.cpp

  • Add #include "notificationsoundplayer.h" with the sibling GUI includes.
  • Append next to line 138:
    qmlRegisterType<NotificationSoundPlayer>("com.nextcloud.desktopclient", 1, 0, "NotificationSoundPlayer");

cmake/modules/NSIS.template.in:436-437

Delete both lines (Qt5Multimedia.dll, Qt5MultimediaWidgets.dll). They cannot resolve in a Qt6 build — dead code. Do not scrub the rest of the Qt5 block; out of scope.

Deployment audit (no changes expected; verify)

  • macdeployqt invocation at src/gui/CMakeLists.txt:733-744 scans -qmldir=${CMAKE_SOURCE_DIR}/src/gui for QML imports. Once import QtMultimedia is removed, it stops copying QtMultimedia.framework and qml/QtMultimedia/libquickmultimedia.dylib. Verify by diffing Contents/Frameworks and Contents/Resources/qml/ before/after.
  • No windeployqt is used anywhere; Windows bundling is entirely NSIS-template driven.
  • admin/osx/make_universal.py: no Multimedia references — confirmed clean.

QML migration

src/gui/tray/CallNotificationDialog.qml:

  • Delete line 11: import QtMultimedia.
  • Replace lines 71–75:
    NotificationSoundPlayer {
        id: ringSound
        source: root.ringtonePath
        loops: 9
    }

The com.nextcloud.desktopclient import at line 9 already exposes the new type. Call sites at lines 39 (ringSound.stop()) and 59 (ringSound.play()) are unchanged.

Implementation sequence

  1. Scaffold notificationsoundplayer.{h,cpp} with the full public API, QRC extractor, PIMPL Backend interface, and a no-op Backend for all platforms.
  2. Register the type in owncloudgui.cpp and wire sources into src/gui/CMakeLists.txt. Build Win/macOS/Linux green with the no-op backend — this confirms wiring before any native code.
  3. Change CallNotificationDialog.qml. Dialog now works silently on all platforms; Qt Multimedia is fully unreferenced.
  4. Implement the macOS backend (smallest, cleanest). Verify end-to-end with a real Talk call.
  5. Implement the Windows backend (XAudio2 + WAV header parser). Verify end-to-end.
  6. Implement the Linux backend (libcanberra + main-thread-marshaled loop continuation). Verify on a PulseAudio host and a PipeWire host.
  7. Delete the two dead Qt5Multimedia*.dll lines from the NSIS template.
  8. Rebuild installers on each platform; diff against a pre-change build to confirm Qt Multimedia framework / QML plugin are no longer shipped.
  9. Add unit tests for path resolution and Linux loop bookkeeping (see Verification).

Verification

Manual end-to-end (on each platform)

Triggered by a real Talk call: two accounts on a Talk-enabled Nextcloud server, callee logged into the desktop client on the platform under test, caller initiates a call from the Talk web UI. Expected: CallNotificationDialog appears, ringtone plays.

Cases to confirm on Windows, macOS, and Linux:

  • Audible playback at normal volume through multiple cycles.
  • Decline button stops playback immediately (tests stop() from QML).
  • 60-second timeout stops playback (tests CallStateChecker-driven stop).
  • Natural completion after 9 plays stops cleanly (tests native loop count on Win/macOS; callback-chained loop on Linux).
  • Closing the app while ringing stops playback and releases resources (tests backend destructor — critical for XAudio2 voice release, AVAudioPlayer release, ca_context_destroy).
  • Audio device unplug during playback does not crash or hang.

Unit tests (worth writing)

  • resolveToFilesystemPath(): feed qrc:///client/theme/call-notification.wav, a file:// URL, a plain path. Assert readable filesystem result and idempotency.
  • Linux loop bookkeeping: mock the finished hook; assert play() is issued loops - 1 additional times and that stop() mid-sequence cancels remaining plays.

Full native-playback integration tests are not worth CI effort (real audio device, flaky). Keep those manual.

Risks / edge cases

  • Overlapping notifications: each dialog owns its own player; all three backends handle concurrent instances. No special handling. If the product later wants "one ringtone at a time," route through a singleton — do not put that complexity in the player.
  • No audio device / hotplug: all backends degrade gracefully (error return, error callback). Log once at warn; leave the dialog visible with no sound.
  • Headless Linux / no sound server: libcanberra returns an error per-play. Soft failure, logged once.
  • Build without libcanberra: HAVE_LIBCANBERRA compile switch → no-op backend. Build still succeeds.
  • App termination while playing: destructor must stop playback. Windows: stop + destroy source voice, release mastering voice, release IXAudio2 (leaking the voice keeps a background callback thread alive past exec()). macOS: [player stop]; player = nil; in @autoreleasepool. Linux: remainingLoops = 0; ca_context_cancel; ca_context_destroy (destroy joins the worker thread).
  • QRC extraction race: two dialogs constructed in quick succession both miss the cache. Mutex-guarded extraction keyed by source path; write to <path>.tmp + atomic rename.
  • Cache dir not writable (sandbox): fall back to QTemporaryFile for that session; log once.
  • WAV format assumptions (Windows): inline parser accepts PCM 8/16/24-bit mono/stereo at any sample rate; log + fail loudly on anything else. The current file qualifies; a future rebrand that swaps in a compressed WAV (ADPCM, float) will surface an actionable error instead of silent silence.

Critical files

Reused existing patterns

  • QML type registration pattern: owncloudgui.cpp:138 (CallStateChecker as sibling shape).
  • Compile-switched platform sources: existing folderwatcher_linux.cpp / folderwatcher_win.cpp / folderwatcher_mac.cpp and systray_mac_common.mm convention in src/gui/CMakeLists.txt.
  • #ifdef Q_OS_* point-of-use pattern: systray.cpp, openfilemanager.cpp, socketapi.cpp.
  • Objective-C++ .mm for macOS backends: systray_mac_common.mm, fileprovider_mac.mm.

Metadata

Metadata

Assignees

No fields configured for Enhancement.

Projects

Status
🏗️ In progress

Relationships

None yet

Development

No branches or pull requests

Issue actions