Skip to content

Conversation

@drondeseries
Copy link
Contributor

Implement FFmpeg error detection for stream failover

  • Added active monitoring of FFmpeg stderr in startStreamingProcess.
  • If critical errors (SPS/PPS, decode issues) are detected, FFmpeg is terminated.
  • A new FatalStreamContentException is thrown in such cases.
  • startStream now catches this exception, logs it, and proceeds with failover logic.
  • Ensures stream counts and active stream IDs in Redis are correctly decremented in all relevant error paths.
  • Adds a short bad-source cache for content-specific errors to prevent immediate retries of the same problematic stream URL during failover.

Update start-container
Fix: Skip local Postgres setup when external DB_HOST is configured
The start-container script was attempting to initialize and manage a
local PostgreSQL instance even when the environment was configured
to use an external database via DB_HOST. This caused errors like
'chown: cannot access /var/lib/postgresql/data' because the
data directory for the external database is not managed by this container.

This commit modifies the start-container script to check the DB_HOST
environment variable. If DB_HOST is set and is not 'localhost' or
'127.0.0.1', the script will now bypass all local PostgreSQL
initialization steps (chown, chmod, initdb, local server start).
It also adds logging to indicate whether a local or external
PostgreSQL configuration is detecte

drondeseries and others added 11 commits July 6, 2025 11:37
Fix: Skip local Postgres setup when external DB_HOST is configured
The start-container script was attempting to initialize and manage a
local PostgreSQL instance even when the environment was configured
to use an external database via DB_HOST. This caused errors like
'chown: cannot access /var/lib/postgresql/data' because the
data directory for the external database is not managed by this container.

This commit modifies the start-container script to check the DB_HOST
environment variable. If DB_HOST is set and is not 'localhost' or
'127.0.0.1', the script will now bypass all local PostgreSQL
initialization steps (chown, chmod, initdb, local server start).
It also adds logging to indicate whether a local or external
PostgreSQL configuration is detecte
- Added active monitoring of FFmpeg stderr in `startStreamingProcess`.
- If critical errors (SPS/PPS, decode issues) are detected, FFmpeg is terminated.
- A new `FatalStreamContentException` is thrown in such cases.
- `startStream` now catches this exception, logs it, and proceeds with failover logic.
- Ensures stream counts and active stream IDs in Redis are correctly decremented in all relevant error paths.
- Adds a short bad-source cache for content-specific errors to prevent immediate retries of the same problematic stream URL during failover.
Added the missing constant `BAD_SOURCE_CACHE_SECONDS_CONTENT_ERROR`
to `ProxyService.php` with a default value of 10 seconds.

This resolves an 'Undefined constant' error that occurred during
stream failover when a `FatalStreamContentException` was caught.
This commit introduces several logging improvements to help diagnose HLS stream behaviors:

1.  **Enhanced `stopStream` Logging**:
    *   `HlsStreamService::stopStream` now accepts an optional `reason` parameter.
    *   This reason is logged when a stream stop is attempted, providing clarity on why a stream was terminated (e.g., "stale_process_pruned", "admin_action").
    *   `PruneStaleHlsProcesses.php` has been updated to provide a specific reason when it stops a stream.

2.  **Clarified Active Stream Count Logging**:
    *   Log messages in the `TracksActiveStreams` trait for `incrementActiveStreams` and `decrementActiveStreams` have been made more descriptive.
    *   The new messages clarify that count changes can be due to speculative attempts for new streams (which might be subsequently denied or fail) or confirmed stream starts/stops. This helps distinguish temporary count fluctuations during limit checks from actual FFmpeg process lifecycles.

3.  **Verified Stream Limit Logic**:
    *   The existing logic in `HlsStreamService.php` for handling playlist stream limits was reviewed and verified.
    *   The system correctly prevents new streams from starting if the limit is reached and does not actively stop other existing, healthy streams on the same playlist as a direct result of this check.

These changes aim to improve the observability of the HLS streaming system and assist in accurately diagnosing the causes of stream terminations, particularly in scenarios where playlist stream limits are involved.
This commit introduces a user-facing error when a stream request cannot be fulfilled because all available sources and their failovers have reached their maximum concurrent playlist stream limits.

Key changes:

1.  **New Exception `AllPlaylistsExhaustedException`**:
    *   Created `app/Exceptions/AllPlaylistsExhaustedException.php`.
    *   This exception is specifically thrown when all stream attempts fail due to playlist capacity.

2.  **Modified `HlsStreamService::startStream`**:
    *   The service now tracks if all failures during stream source iteration (primary + failovers) are exclusively due to playlist limits.
    *   If this condition is met, it throws `AllPlaylistsExhaustedException`.
    *   Other failure types (e.g., source not responding, FFmpeg errors) will still result in returning `null` or other relevant exceptions.

3.  **Updated `HlsStreamController::servePlaylist`**:
    *   The controller now includes a `try-catch` block that specifically catches `AllPlaylistsExhaustedException`.
    *   Upon catching this exception, it aborts with an HTTP 429 (Too Many Requests) status code and the message: "Maximum stream limits have been reached for this channel. Please try again later."

4.  **Custom 429 Error View**:
    *   Added `resources/views/errors/429.blade.php`.
    *   This view extends Laravel's minimal error page layout and will be automatically used for `abort(429)` calls, displaying the custom message.

This feature provides clearer feedback to the user when a stream cannot be started due to system-wide concurrent stream limitations for the requested channel, rather than a generic server error.
This commit fixes a fatal error caused by an incorrect namespace in the `use` statement for `AllPlaylistsExhaustedException` within the `HlsStreamService.php` file.

The error reported was:
`Trait "App\Services\App\Exceptions\AllPlaylistsExhaustedException" not found`

The `use` statement was mistakenly pointing to a nested namespace instead of the correct `App\Exceptions` namespace. This commit:
- Removes any misplaced or duplicated `use` statements for this exception within the class body.
- Ensures a single, correct `use App\Exceptions\AllPlaylistsExhaustedException;` statement is present in the main import block at the top of `HlsStreamService.php`.

This correction allows the system to properly throw and catch the `AllPlaylistsExhaustedException`, enabling the user-facing 429 error page when all stream sources are unavailable due to playlist capacity limits, as intended by the `feat/max-streams-limit-page` feature.
This commit addresses a bug where the active stream count for a playlist could be erroneously decremented when `HlsStreamService::stopStream` was called for a channel that was not actually running or contributing to the count. This could lead to the system incorrectly believing a playlist had spare capacity, allowing new streams to exceed the intended limit.

The specific scenario observed was:
1. A stream (S1) is active on a playlist with a limit of 1.
2. An attempt to start another stream (S2) on the same playlist is correctly denied due to the limit.
3. The pruning mechanism later attempts to "stop" S2 (which never started).
4. The `stopStream` method would still find S2's model and its playlist, then call `decrementActiveStreams` for that playlist, incorrectly reducing the count even though S1 was still the only active stream.
5. This allowed a subsequent request for S2 (or another stream) to bypass the limit check.

The fix modifies `HlsStreamService::stopStream` to ensure that `decrementActiveStreams` is only called if the stream was confirmed to be running (i.e., `$wasRunning` is true) at the time `stopStream` was invoked. Logging has also been added to clarify when the decrement occurs or is skipped.

This ensures more accurate tracking of active streams and robust enforcement of playlist concurrency limits.
This commit updates the `resources/views/errors/429.blade.php` view to provide more specific and user-friendly feedback when an HTTP 429 error is triggered due to stream concurrency limits being reached.

Changes to `429.blade.php`:
- The main title of the error page is now set to "Max Streams Reached" instead of the generic "Too Many Requests".
- The primary message area will display the specific message passed via the `abort(429, $message)` call from the controller (e.g., "Maximum stream limits have been reached for this channel. Please try again later.").
- An updated sub-message provides additional context relevant to stream capacity.

This improves the user experience by giving clearer information when they encounter this specific type of rate limiting related to stream capacity.
When an HLS stream encountered a critical FFmpeg error (e.g., 'non-existing SPS'),
the failover mechanism would engage. However, the PruneStaleHlsProcesses
command could prematurely stop the original stream's resources before the
client fully transitioned to the failover stream.

This commit addresses the issue by:
1.  Modifying `HlsStreamService::startStream()` to update the `last_seen`
    timestamp of the original stream ID if it encounters a fatal content
    error, providing a buffer before pruning is considered.
2.  Modifying `HlsStreamController::servePlaylist()` to update the `last_seen`
    timestamp of the original stream ID when a failover to a new stream ID
    is successfully established.
3.  Modifying `PruneStaleHlsProcesses::handle()` to be aware of stream
    mappings (`hls:stream_mapping:{type}:{id}`). If a stream ID is mapped
    to a different, active failover stream, the pruning logic now checks
    the staleness of the *actual mapped stream* rather than prematurely
    cleaning up the original stream's resources.
Amends the previous commit by adding the required import
`use Illuminate\Support\Facades\Cache;` to the
`PruneStaleHlsProcesses.php` command.

This resolves the `Class "App\Console\Commands\Cache" not found` error
that occurred when the command attempted to use Cache facades without
the proper import.
@sparkison sparkison merged commit c028759 into sparkison:xteve Jul 7, 2025
@drondeseries drondeseries deleted the dev branch July 8, 2025 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants