Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaling incredibly slow if smpte2084 used as input transfer function #59

Closed
lvml opened this issue Nov 26, 2016 · 6 comments
Closed

scaling incredibly slow if smpte2084 used as input transfer function #59

lvml opened this issue Nov 26, 2016 · 6 comments

Comments

@lvml
Copy link

lvml commented Nov 26, 2016

Scaling with zimg becomes incredibly slow (0.2 fps on a corei7) if smpte2084 is used as an input transfer function.

perf top reveals that the vast majority of CPU time is then spent in zimg::colorspace::st_2084_inverse_gamma for calls to zimg_x_powf calling powf:

  89.35%  libm-2.17.so      [.] __powf_finite
   3.13%  libzimg.so.2.0.0  [.] zimg::colorspace::st_2084_inverse_gamma
   2.95%  libm-2.17.so      [.] __powf
   1.11%  libzimg.so.2.0.0  [.] zimg::colorspace::(anonymous namespace)::Rec709GammaOperationC::process   
0.77%  libzimg.so.2.0.0  [.] zimg::colorspace::(anonymous namespace)::MatrixOperationAVX::process
   0.75%  libzimg.so.2.0.0  [.] zimg::colorspace::(anonymous namespace)::St2084InverseGammaOperationC::process
   0.35%  libzimg.so.2.0.0  [.] zimg::colorspace::rec_709_gamma
   0.21%  libzimg.so.2.0.0  [.] zimg::resize::(anonymous namespace)::resize_line8_h_f32_avx<2u, 2u>
   0.19%  libzimg.so.2.0.0  [.] zimg::depth::depth_convert_w2f_sse2
   0.15%  libzimg.so.2.0.0  [.] zimg::resize::(anonymous namespace)::resize_line_v_f32_avx<3u, false>
   0.14%  libzimg.so.2.0.0  [.] zimg::resize::(anonymous namespace)::ResizeImplH_F32_AVX::process
   0.14%  libzimg.so.2.0.0  [.] zimg::depth::ordered_dither_f2b_sse2
   0.09%  libzimg.so.2.0.0  [.] zimg::resize::(anonymous namespace)::resize_line8_h_f32_avx<3u, 3u>
   0.07%  libzimg.so.2.0.0  [.] zimg::graph::(anonymous namespace)::FilterNodeUV::generate_line
   0.07%  libzimg.so.2.0.0  [.] zimg::graph::(anonymous namespace)::FilterNode::generate_line

Wouldn't it be possible to just computs LUTs on initialization, not requiring such an expensive math function for every pixel?

@sekrit-twc
Copy link
Owner

What version are you using? Inexact transfer functions were added in v2.3. If you are using the API, see zimg_filter_graph_params::allow_approximate_gamma (https://github.com/sekrit-twc/zimg/blob/master/src/zimg/api/zimg.h#L571).

@lvml
Copy link
Author

lvml commented Nov 26, 2016

I'm using the git master head of this zimg repository from within ffmpeg (also git master head), via ffmpeg's "-vf zscale=" filter.

It seems ffmpeg does not expose a flag to use inexact gamma.

@sekrit-twc
Copy link
Owner

Surely you can report this to ffmpeg. Something like this will work:

#if ZIMG_API_VERSION >= ZIMG_MAKE_API_VERSION(2, 2)
params.allow_approximate_gamma = 1;
#endif

@lvml
Copy link
Author

lvml commented Nov 26, 2016

I changed libavfilter/vf_zscale.c to set allow_approximate_gamma=1, and yes, that makes a very big difference. Thanks a lot!

Now I only need to solve the remaining issue of the bt709 result looking somewhat too dark or the highlights blown out when using a lower npl value, but that's a different topic.

@sekrit-twc
Copy link
Owner

If you are converting from ST2084 to Rec.709, you are going to get clipping artifacts. HDR support in z.lib is only intended for HDR-to-HDR or SDR-to-HDR.

@lvml
Copy link
Author

lvml commented Nov 26, 2016

(Created https://trac.ffmpeg.org/ticket/5980#ticket regarding parameter support in ffmpeg.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants