Skip to content
This repository has been archived by the owner on Mar 17, 2022. It is now read-only.
asquared edited this page Mar 14, 2011 · 6 revisions

HD Instant Replay Performance Benchmarking

Performance Testing

Andrew's Desktop

Experiment Setup

  • 2x Opteron 4180 2.6GHz
  • 16GB RAM (8GB per CPU).
  • Asus KCMA-D8 motherboard.
  • Testing with complete sports intro (1,978 frames) at 1920x1080.
  • Sending RGB data as YUV data (because FFmpeg is stupid). UYVY (4:2:2) -> YUV (4:4:4) conversion bypassed. Used libjpeg_turbo for testing.

Test Results

  • Single thread: avg 45.0 fps.
  • Two threads: (both CPU saturated) 47.0 fps, 45.0 fps
  • Eight threads: 38.9, 39.3, 39.3, 39.4, 39.4, 38.8, 39.7, 39.0
  • Twelve threads: average 39.1 fps

Each processor chewing on its own set of 60 frames...

  • Twelve threads: average 37.1 fps. Probably because cores can't "cheat" and share L3 cache data in this test.
  • One core was dangerously close to dropping frames, at an average speed of 32 fps.

Conclusions

  • The processor is fast. My code is not optimized.
  • Average bitrate on a "random sample" of sports footage (i.e. the sports intro): 31,085 Kbps - at a rather low jpeg quality setting. Of course, M-JPEG is VBR...

Direct YCbCr422 feed

Using some preliminary openreplay2 code. Conversion from CbYCrY422 packed to YCbCr422 planar, followed by "raw" libjpeg compression. Obtained 57.9 fps from one core on the Opteron box.

Multithreaded experiment

Split sports intro into 60-frame pieces. Start all encoding jobs simultaneously and measure runtime ((time tests/mjpeg_422_encode < $i > $i.mjpg) > $i.time 2>&1). Take maximum time as time to encode the entire sports intro. Result: Entire sports intro encoded to M-JPEG in 3.763 sec. M-JPEG segments were reassembled and the video was viewed to confirm correct encoding. This corresponds to an average M-JPEG encode rate of just over 525 frames per second. This rate is sufficient to encode over sixteen HD cameras simultaneously. Noting amount of CPU time used... average encode rate seems to be about 60 frames per CPU second. (In theory, this means nearly twenty-four cameras could be supported.)

Further, all of this is without any SSE optimization to the unpack routine. That could improve performance; the potential amount of the improvement is as yet unknown.

Codec Notes

  • libjpeg wants planar image data. An SSE routine should be constructed to convert packed UYVY data to planar YUV 4:2:2 data. This should improve speed (no resampling in libjpeg) and also image quality. See the section in the manual on "raw (downsampled) data".
  • Some thought should be given to preview; a good way to do that remains unknown.