Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcoding Verification #17

j0sh opened this issue Feb 21, 2019 · 0 comments

Transcoding Verification #17

j0sh opened this issue Feb 21, 2019 · 0 comments


Copy link

@j0sh j0sh commented Feb 21, 2019


Livepeer is a protocol for video transcoding. Transcoding is a compute-heavy process that has traditionally carried a high technical and monetary cost. Livepeer aims to slash the cost of transcoding with with an open network that incentivizes transparent competition among suppliers of transcoding capacity. The rules of the Livepeer protocol are backed by smart contracts on the Ethereum blockchain, an immutable ledger.

The combination of an open, permissionless network and immutable non-reversibility attracts byzantine (adversarial) behavior: any participant can attempt to "break the protocol" without immediate consequences from within the Livepeer network. A verification mechanism is necessary to decide whether the transcoding work was done correctly.

Video transcoding has some specific challenges in a byzantine enviornment:

  1. How to incentivize correct and timely transcoding.
  2. How to verify that transcoding work was done correctly.
  3. How to run that verification as necessary in an autonomous and trustless manner.
  4. Deterrents and penalties for transcoding incorrectly.

Here we are mostly concerned with the second problem: how to verify that transcoding was done correctly. The third problem is an engineering concern that should be informed by the results of this verification research, with the on-chain components likely to use a system such as Truebit.

Livepeer's approach to the first and fourth problems depend on having "something at stake": holding collateral in the form of token, and gaining a return on investment by contributing value to the network via reliable, quick transcoding. For more information on incentives and deterrents, see the Livepeer whitepaper [1] and the Streamflow introduction [2].

The important note about having token at stake is that this represents a substantial up-front investment, and the protocol has mechanisms to slash (take away) this token in case of misbehavior -- a failed verification.

As a result, the verification system needs to balance the interests of two types of participants:

Broadcasters -- the participants that purchase transcoding work -- need to be certain they are getting what they paid for. Blockchain immutability means payments cannot be reversed.

Orchestrators -- the participants supplying transcoding work -- need to be comfortable with the risk they are taking with their stake and their long-term reputation on the network.

Both participants also will be making substantial up front technology investments in getting Livepeer integrated into their infrastructure.

A verification system that is too loose or too strict towards either participant endangers the confidence and trust in the Livepeer network.

Verification Requirements

Emphasis should be on developing, quantifying and and understanding the verification algorithm, rather than a full production-ready engineering implementation of the algorithm.

Given the null hypothesis that video is transcoded correctly, we should aim to minimize Type I errors (false positives) where correct video is marked as being transcoded incorrectly. There is more tolerance for Type II errors (false negatives) where incorrect video is marked as transcoded correctly.

Video Quality Checking

There are various easily checkable properties that can be extracted from the video itself such as the codec, the resolution, timestamps, and perhaps certain other bitstream features. However those properties alone do not ensure an important aspect of verification: that the transcoded content itself is a reasonable match for the original source given a good-faith effort at transcoding.

What is a "reasonable match" and what is a "good-faith effort at transcoding"? Some problems with the video may include:

  • Watermarking or other manipulation of the source content
  • Uncalled for resolution changes mid-stream
  • Excessive frame dropping
  • Low quality encoder or inappropriate encoding settings

What criteria should we be checking addition to video quality?

  • Codec and container itself
  • Timestamps
  • Any metadata?

General Questions

Some general questions we should answer, regardless of the verification algorithm used.

  • What is "acceptable video quality" and what's not?
  • How much deviation from the original source should reasonably be expected for a given transcoding configuration? What should our tolerances be here?
  • Do our tolerances change as the configuration changes: the encoder, the resolution, the codec, the type of input? How much of this is affected by the verification metric / algorithm?

Further Design Considerations

  • It would be good to be robust against small bugs in the verification system, or variations up to a quantifiable degree. Any requirement for bitexactness is undesirable.

  • Inputs may be irregular; we cannot predict or test all variations in input. While we can constrain the transcoding outputs to specific presets, we should understand the algorithm enough to safely extrapolate its behavior across various types of inputs.

Use Cases

  • Orchestrators verify their own work, especially if transcoding is delegated to an untrusted source

  • Broadcasters or other users check transcoded streams to ensure compliance

  • Community participants study and run the verification system to improve their own understanding of Livepeer, eg during an evaluation phase prior to becoming a broadcaster or orchestrator

  • Autonomous verifiers slash orchestrators for malpractice via a challenge protocol

  • Livepeer operationally assesses samples from the wild to evaluate both verification performance and the overall health of the network

  • Livepeer uses the system during development to further refine the verification algorithm.

Candidate Constructions

A good starting point for prior art in video quality comparisions can be found at the page for the annual Moscow State University Codec Comparison [3]


PSNR and MS-SSIM[4][5] are per-frame metrics. An implementation of these metrics is here, although many questions remain to be answered before they can be incorporated into a verification algorithm.

  • How to use these per-frame scores for video
  • How to incorporate these scores into a pass/fail classifier?
  • What does each contribute towards the classifier?
  • How are these affected by variations in input and output?
  • How is verification affected if either metric is removed from the equation?
  • Can we extrapolate the behavior of these metrics across unknown inputs? What are the boundaries?
  • What is the performance/computational impact of incorporating these metrics into the classifier? Can this be run online within our sub-2s latency budget?

There are some concerns with the general approach of a per-frame metric. Most concerns surround the requirement that any manipulation of the video prior to encoding has to match a reference / baseline implementation. Current examples of manipulations include:

There are additional concerns with the general approach of using a per-frame metric. Any manipulation of the video prior to encoding (FPS, rescaling) has to match a reference implementation. This may prove brittle in practice.

The verification-challenge protocol may have to explicitly version different implementations of certain filters. There is the risk of ossifying "buggy behavior" as canonical. There also does not leave much room for new or improved implementations to seamlessly add value to the network, requiring a slow (and possibly risky) upgrade cycle.

Some examples of frame manipulation that we currently perform:

  • Frame rate adjustment. The calculations of which frames to drop (or add) must match exactly, including for content with odd input framerates. Motion interpolation cannot be offered in the future without matching a reference implementaton.

  • Rescaling : the pixel layout must match, and any conversion or scaling step should be close to a reference baseline. Can we depend on consistent performance across various hardware-accelerated scalers?

Netflix VMAF

VMAF [6] is a perceptual video quality assessment tool developed by Netflix. While VMAF seems promising on paper, it is unclear whether a given model would be robust enough to accommodate the diversity of Livepeer's video inputs, its performance and how to reconcile VMAF scores into a classifier. VMAF may still warrant a brief investigation to better understand the issues.

Trusted Execution Environment

The use of "secure enclaves" such as Intel SGX may offer a way to cryptographically ensure the execution of the expected computations via remote attestation. This approach appears attractive for a few reasons:

  • May remove the need for slashing if the work can be cryptographically verified. Payment could be conditional on a multiparty signature scheme, part of which is an attestation from the enclave regarding the work done.

  • Allows for more fine grained transcoding options beyond a coarse knob for resolution or framerate.

However, there are some drawbacks to the use of TEEs:

  • Only authorized software implementations may be allowed on the network. This may harm the larger goal of decentralization and incentivizing a marketplace for improvements outside the core Livepeer protocol.

  • Unclear if this can safely be extended to non-CPU operations, eg GPU transcoding.

  • Research probing for weaknesses in trusted-computing schemes is ongoing; the security is not yet a given. This may imapct the practical security of such a system for Livepeer if any fundamental (easy to exploit, hard to overcome) vulnerabilities are discovered.

Another possible application for TEEs is to "verify the verification" in place of an on-chain execution environment such as Truebit. This still has the challenge of developing a reliable verification algorithm, but may be a component of a production integration.


The following should be considered out of scope for the initial stages of the research, although they may be interesting areas to explore later.

Coarse Checking

We can limit the verification to easily checkable properties such as the codec and the resolution, along with video quality.

Transcoders have a natural incentive to do only the minimal amount of work needed to pass verification. Absent additional enforcement, there is no reason to expend extra computation to optimize compression and visual quality. In many ways, this limits how Livepeer can be used: we cannot get more granular than a given codec or resolution. Encoder-specific options cannot be sincerely offered, such as multi-pass encodes, or certain speed or tuning presets designed to improve compression and visual quality.

Livepeer is very interested in seeing if there is a viable path towards the level of sensitivity that would allow for the efficient verification of "additional" computational effort or the incorporation of particular encoding options affecting image quality.

However, the focus should first be on developing and deploying a coarse binary classifier. Determining what needs to be done to support more fine-grained encoding options can be left as a follow-up.

Audio Verification

Eventually we should be able to verify audio as well. For now we should rely on stream copy for the audio stream which can be done in a bitexact manner.


Livepeer operates on two second segments so is extremely sensitive to latency and overhead during live streaming. However, Livepeer does not penalize orchestrators for accepting work but not performing it, or for performing the work slowly (as long as it is done correctly).

Broadcasters have their own mechanisms for handling this: they can stop working with a given orchestrator and move on to someone more reliable. Orchestrators sacrifice the opportunity to develop a long-term working relationship with broadcasters, and providing unreliable service harms their own investment in the network.

[1] Livepeer Whitepaper: Slashing
[2] Streamflow Introduction
[3] Moscow State University Codec Comparison
[4] MS-SSIM Paper
[5] PSNR and SSIM
[6] Netflix VMAF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants