HLS video handling/storage/state refactor #151

gabek · 2020-09-16T18:09:16Z

I apologize this is so big. It's like half of the app or something. I guess I'd suggest, if you want to look through it, picking a piece at a time.

Storage provider changes (local, s3)
HLS writing/handling/cleanup
Creating offline state (streamstate)

Also by checking out this PR and running it with both local and remote/S3 storage I'd love to know how it works for you. In general it should function pretty much the same as it did before, I'm hoping none of you can even notice the changes!

Overview

This change deals with how we get HLS segments and playlists, and what we do with them after they're received. So it touches the transcoder -> internal hls segment writing -> storage -> cleanup flow.

This change will require a lot of testing before anybody will want to run it in their production environments.

How we determine new HLS files are written

Previously we had a file monitor that would poll the filesystem looking for new HLS chunks and playlist updates. With this new approach we take advantage of ffmpeg's feature of using HTTP to push files to an endpoint.

A new localhost-only HTTP server starts listening and ffmpeg is told to push all transcoding results there. This way ffmpeg is directly handing off transcoding results to us directly instead of writing them to disk and us having to keep tabs on those indirectly. While this sounds and feels kind of weird (using HTTP internally to get results), it seemed like a good option that gave us more control.

File writer receiver service

Because, as mentioned above, we get transcoder results via HTTP we need something to listen for that. This is a new FileWriterReceiverService that listens on localhost only and accepts responses from the transcoder's PUTs.
It writes these files to disk, and then passes the paths to the hlsHandler (below)

Handling HLS updates

There's a new middleman, the hlsHandler that is given a storage provider and is told about HLS updates, and passes them on to the storage provider. While this seems like a useless middle layer at the moment, this is going to be the key in future recordings (#102) functionality. As the hls handler will not only pass live segments to the storage provider, but also be responsible for building the ongoing recording.

Storage providers

Standardized storage providers more. All writing to disk is done to the "private" HLS path now. As a result, "local" is now a first class storage provider that is treated the same as S3. In this case the "Save" task of the local provider is simply moving the file from the private HLS path to the public one (under webroot).

S3, as our remote storage provider, now monitors for long-running save operations and alerts on them in the console.

Changes to how we reference remotely stored HLS content

For remote storage providers previously we would rewrite the variant's HLS playlist and point to the absolute URLs of the remote segments. Now this is simplified. Now we only rewrite the master playlist and upload the variant playlists to remote storage as well as the segments. That means the playlist requests never hit the Owncast server AND all that work having to rewrite the playlist multiple every few seconds is no longer needed because the variant playlists can reference segments by URL relatively.

Offline state

There are two scenarios with generating offline content:

A "reset" state where the only content is the offline video clip. This is what happens when you first start the server and after the 5min reset timer fires. This simply passes the offline clip to the transcoder and treats it like a new, short, stream.
Appending offline clip to an existing stream. This is what happens when a live stream ends and the transcoder completes its work. In this case the existing HLS playlist is manually edited, and a single segment is appended to the end. This takes place for every variant's playlist that's configured.

Change
As a result of scenario 2 I removed the option to have custom offline content in the config file due to the requirement that we must know how long that clip is so it can be appended to the HLS playlist correctly. Also this clip is pre-transcoded to a .ts file, so it can be simply appended without a transcoding step being required. I could see a future where we allow for a little more flexibility for this, but right now it's best to be specific.

Cleanup

Previously we relied on a feature of ffmpeg to delete old files on our behalf. Because the ffmpeg is running decoupled from the owncast instance and talking over HTTP, there's no way for it to delete these files anymore. Instead we have a new hlsFilesystemCleanup that reproduces this functionality. It deletes old, live, segments from disk.

Performance Monitoring

The new performanceTimer.go utility is used for timing the average time it takes to create segments and upload them (if using external storage). It's not totally scientific and it tries to throw out outliers. If it's too long then warnings are displayed in the console:

WARN[2020-10-06T22:45:14-07:00] slow encoding for variant 0 if this continues you may see buffering or errors. troubleshoot this issue by visiting https://owncast.online/docs/troubleshooting/

WARN[2020-10-06T22:45:14-07:00] Possible slow uploads: average upload S3 save duration 5.0839772650000001 ms troubleshoot this issue by visiting https://owncast.online/docs/troubleshooting/

models/storageProvider.go

mattdsteele · 2020-10-09T00:19:10Z

Conceptually this all sounds good. If I've gleamed anything from overhearing colleagues discuss k8s integration patterns, using localhost-only HTTP is pretty common these days.

Because the ffmpeg is running decoupled from the owncast instance

Could you describe this a bit more? Is this a change from how it previously worked? I'm just worried about cleaning up orphaned child processes when the main process dies.

I'll review what I can, but maybe it makes more sense for me to just try testing it out on a server, and see how it behaves?

gabek · 2020-10-09T00:32:04Z

Because the ffmpeg is running decoupled from the owncast instance

Could you describe this a bit more? Is this a change from how it previously worked? I'm just worried about cleaning up orphaned child processes when the main process dies.

It's only slightly different than it is now, but not as far as child processes. That part is the same. The difference is while ffmpeg is still the same child process running on the same machine, ffmpeg doesn't know that. Previously ffmpeg knew to write files to disk, and then could clean up those files later. But now ffmpeg doesn't use the filesystem for storing output, since it's pushing the results elsewhere. Conceptually this could be extrapolated to think how this could actually be running on a completely different server and have the results pushed over the network.

I'd love if you could give it a spin!

mattdsteele · 2020-10-10T21:47:20Z

Trying it out on my instance, https://stream.steele.blue/

So far it's working great!

One thing to note; I did had to update my S3 config. Previously I was referencing the endpoint just with the hostname:

s3:
  endpoint: us-east-1.linodeobjects.com

I had to update it to https://us-east-1.linodeobjects.com so the paths would work out.

From what I can tell this is all properly referenced in the docs, so I think I just had a weirdly-configured setup that happened to let me upload to S3.

gabek · 2020-10-10T23:09:41Z

From what I can tell this is all properly referenced in the docs, so I think I just had a weirdly-configured setup that happened to let me upload to S3.

Interesting! I can't think what might have changed around that, but it does point to us maybe wanting to normalize URLs internally when using free-form user-supplied strings. We should keep this in mind for 0.0.4 when we start supporting config updates through the admin site, we could probably do some validation around these values.

gabek · 2020-10-13T00:05:30Z

I plan to merge this in this week. It's a big change and I think the next step is to get it in so those working off master will get some testing hours against it. I'll also deploy it to a testing server and get some long-duration streams on it. Between now and then let me know if you have any major concerns.

…leanup.

… to disk

gabek · 2020-10-14T21:07:19Z

@mattdsteele @graywolf336 @geekgonecrazy @gingervitis @jeyemwey Merging this in! Let me know if you see anything functioning differently.

Bumps [antd](https://github.com/ant-design/ant-design) from 4.15.3 to 4.15.4. - [Release notes](https://github.com/ant-design/ant-design/releases) - [Changelog](https://github.com/ant-design/ant-design/blob/master/CHANGELOG.en-US.md) - [Commits](ant-design/ant-design@4.15.3...4.15.4) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

gabek added this to the v0.0.3 milestone Sep 16, 2020

gabek marked this pull request as draft September 16, 2020 18:09

gabek force-pushed the gek/parse-output-replace-filemonitor branch 2 times, most recently from 337f227 to d3b8cee Compare September 22, 2020 04:50

This was linked to issues Sep 22, 2020

Replace segment generation + Playlist update polling for S3 uploading #9

Closed

The offline video is no longer being displayed at the end of a stream. #85

Closed

gabek force-pushed the gek/parse-output-replace-filemonitor branch 5 times, most recently from 9909979 to 88a93c9 Compare September 29, 2020 05:48

gabek force-pushed the gek/parse-output-replace-filemonitor branch 4 times, most recently from fa3fb37 to cfac939 Compare October 2, 2020 21:26

graywolf336 reviewed Oct 3, 2020

View reviewed changes

models/storageProvider.go Outdated Show resolved Hide resolved

gabek force-pushed the gek/parse-output-replace-filemonitor branch from 021eed5 to f5089d0 Compare October 7, 2020 05:22

gabek requested review from mattdsteele and geekgonecrazy October 7, 2020 06:04

gabek force-pushed the gek/parse-output-replace-filemonitor branch from 54014ea to 92c00d8 Compare October 7, 2020 06:08

gabek marked this pull request as ready for review October 7, 2020 06:08

gabek changed the title ~~WIP: HLS refactor~~ HLS video handling/storage/state refactor Oct 7, 2020

gabek mentioned this pull request Oct 7, 2020

When transcoder fails and RTMP is dropped we don't reset state #198

Closed

gabek force-pushed the gek/parse-output-replace-filemonitor branch from 92c00d8 to 12cd051 Compare October 8, 2020 19:00

WIP with new transcoder progress monitor

ee80d96

gabek added 24 commits October 14, 2020 14:01

Stop thumbnail generation on stream stop. Copy logo to thumbnail on c…

60c449a

…leanup.

Update transcoder test

9780ae2

Add comment

668cc46

Return http 200 on success to transcoder. Tweak how files are written…

18c27d3

… to disk

Force pixel color format in transcoder

0fcbccb

Add debugging info for S3 transfers. Add default ACL.

21029c7

Fix cleanup timer

25f7717

Reset session stats when we cleanup the session.

36d500c

Put log file back

2a140ef

Update test

b1ff5ca

File should not be a part of this commit

dc9e4ab

Add centralized shared performance timer for use anywhere

de0465a

Post-rebase cleanup

e17159f

Support returning nil from storage provider save

e4d4da9

Updates to reflect package changes + other updates in master

4041977

Fix storage providers being overwritten

c386c8c

Do not return pointer in save. Support cache headers with S3 providers

b51e679

Split out videojs + vhs and point to specific working versions of them

82b9fbb

Bump vjs and vhs versions

6a5581c

Fix test

3a81d70

Remove unused

0f83396

Update upload warning message

861648d

No longer valid comment

666de7d

Pin videojs and vhs versions

5d49329

gabek force-pushed the gek/parse-output-replace-filemonitor branch from 12cd051 to 5d49329 Compare October 14, 2020 21:03

gabek merged commit 6ea9aff into master Oct 14, 2020

gabek deleted the gek/parse-output-replace-filemonitor branch October 14, 2020 21:07

gabek mentioned this pull request Nov 18, 2020

v0.0.3 release notes #263

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HLS video handling/storage/state refactor #151

HLS video handling/storage/state refactor #151

gabek commented Sep 16, 2020 •

edited

mattdsteele commented Oct 9, 2020

gabek commented Oct 9, 2020

mattdsteele commented Oct 10, 2020

gabek commented Oct 10, 2020

gabek commented Oct 13, 2020

gabek commented Oct 14, 2020

HLS video handling/storage/state refactor #151

HLS video handling/storage/state refactor #151

Conversation

gabek commented Sep 16, 2020 • edited

Overview

How we determine new HLS files are written

File writer receiver service

Handling HLS updates

Storage providers

Changes to how we reference remotely stored HLS content

Offline state

Cleanup

Performance Monitoring

mattdsteele commented Oct 9, 2020

gabek commented Oct 9, 2020

mattdsteele commented Oct 10, 2020

gabek commented Oct 10, 2020

gabek commented Oct 13, 2020

gabek commented Oct 14, 2020

gabek commented Sep 16, 2020 •

edited