Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to ffmpeg v4.4 and use new detector pre-loading API #1979

Merged
merged 5 commits into from
Aug 12, 2021

Conversation

jailuthra
Copy link
Contributor

@jailuthra jailuthra commented Aug 4, 2021

What does this pull request do? Explain your changes. (required)

Upgrade ffmpeg and LPMS to use v4.4, a better detector API and misc. transcoding fixes.

Specific updates (required)

81a4e06
In this commit we switch to the new session pre-loading API introduced in livepeer/lpms#246 (comment)

Earlier we used to initialize FFmpeg at the time of node startup using InitFFmpegWithDetectorProfile - which would internally load a tensorflow model and share it across instances of our scene classification filter. We did that as it was an expensive (~3s) operation that couldn't be done when a new stream came in.

The earlier approach was problematic because we shared (a non-thread-safe) Tensorflow graph across concurrent go-routines, and it was inflexible as we could only initialize one detection model per node.

In this commit, I've mostly sticked close to the legacy idea of "pre-loading a single model" at node startup, but using the newer transcode-session API. Whenever a new stream comes in, we now initialize a new detector session, which has an impact of (~500ms) on the transcode time of the first segment.

This meets the real-time threshold on my local testing, but the plan is to improve the load balancer to have a constant MinPreinitializedDetectorSessions always ready at hand before a stream comes in. Doing that will reduce the additional overhead introduced by the new API, while also preventing any thread-safety issues in the Tensorflow models.

See the commit history for rest

How did you test each of these updates (required)

  • Local benchmarks and B-O-T Testing
  • Test in staging
  • Test in a single production region

Does this pull request close any open issues?

Fixes #1962

Checklist:

@jailuthra jailuthra force-pushed the jai/ffmpeg-v4.4 branch 2 times, most recently from abae27e to 3a1279f Compare August 4, 2021 14:43
@jailuthra jailuthra changed the title transcoder: Upgrade to ffmpeg v4.4 + new detector pre-loading API Upgrade to ffmpeg v4.4 and use new detector pre-loading API Aug 4, 2021
@jailuthra jailuthra marked this pull request as ready for review August 4, 2021 15:10
core/lb.go Show resolved Hide resolved
cmd/livepeer_bench/livepeer_bench.go Outdated Show resolved Hide resolved
cmd/livepeer_bench/livepeer_bench.go Outdated Show resolved Hide resolved
cmd/livepeer_bench/livepeer_bench.go Outdated Show resolved Hide resolved
core/lb.go Show resolved Hide resolved
Copy link
Member

@yondonfu yondonfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few non-blocking questions, but LGTM

func NewNvidiaTranscoderWithDetector(detector ffmpeg.DetectorProfile, gpu string) (TranscoderSession, error) {
// Hardcode detection to device 0 for now
// Transcoding can still run on a separate GPU as we copy frames to CPU before detection
session, err := ffmpeg.NewTranscoderWithDetector(detector, "0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that can be addressed outside of this PR:

I think this API is a bit confusing. While I know that the device ID arg is only relevant for initializing the DNN filtergraph, when I first read this function my first thought is that the device ID would be used for both the transcoder and the DNN filtergraph.

install_ffmpeg.sh Show resolved Hide resolved
Copy link
Member

@yondonfu yondonfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind also adding an entry to PENDING_CHANGELOG?

@jailuthra
Copy link
Contributor Author

Tested the last fix in a multi-GPU setup and it worked as expected. Rebased and added a PENDING_CHANGELOG entry.

Holding on a merge until the new go-livepeer release is officially out.

@yondonfu
Copy link
Member

yondonfu commented Aug 12, 2021

Tested the last fix in a multi-GPU setup and it worked as expected.

🥳 🎉

Holding on a merge until the new go-livepeer release is officially out.

The release was just published so good to merge after resolving the conflict 👍 .

@yondonfu
Copy link
Member

Btw I forgot to clear CHANGELOG_PENDING in #1986, but looks like the commit with the CHANGELOG_PENDING entry will clear all the old entries anyway 👍 .

@jailuthra jailuthra merged commit 4e2ef38 into master Aug 12, 2021
@jailuthra jailuthra deleted the jai/ffmpeg-v4.4 branch September 15, 2021 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade to ffmpeg 4.4
2 participants