-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing chroma planes separately as luma is faster #4
Comments
The pros and cons I can think about using I'm personally interested in the ratio of time between initialization and the actual processing, but it might not be easy to quantitatively estimate it in this example. The split-join approach does have some freedom in the order of creating debanded yuv planes provided the buffer size allows, but the planes are also regularly joined into frames (which should be instant), so we won't know how much it benefits from such freedom. In contrast, when I tried to bench the YUV444 clips using the split-join approach, since processing the planes takes nearly equal time, the GPU-memory bandwidth of my PC seemed to be fully used (so I failed to quantitatively estimate it again). |
@kgrabs can you try this one and report the fps for both clips? http://maven.whatbox.ca:11665/f/libvs_placebo.dll |
Same problem, but it seems to have improved a bit. After testing it 3 times for consistency I tested the current release immediately afterward and only barely got 71 fps each time |
wait, I may just not be. e348108 diminishes the speed differences for me, but I can’t really tell because the differences were always pretty small to begin with on my shitty 2010 rig, so testing would be appreciated: http://maven.whatbox.ca:11665/f/libvs_placebo.dll |
Is this still an issue with 1.4.1? Also, 100 FPS seems pretty slow for debanding only. |
Doesn't seem to be the case, still slower from vsutil import split, join
import vapoursynth as vs
core = vs.core
clip = core.std.BlankClip(None, 1920, 1080, vs.YUV420P16, 30000)
a = core.placebo.Deband(clip, planes=1 | 2 | 4, threshold=4, radius=16, grain=0)
b = join([core.placebo.Deband(x, planes=1, threshold=4, radius=16, grain=0) for x in split(clip)])
a.set_output(0)
b.set_output(1)
|
Thanks. I'll attempt to figure out why. |
I've reimplemented the idea from the refactor branch in master, and it's around 10% slower on average compared to processing planes separately. At some point it's possible the speed is limited by the sequential access to the GPU, because of thread safety. |
I'm the author of AvisynthShader, and I had spent time investigating YUVA processing vs planar processing. Here's what I found out.
I was only experimenting with planar input/output. I haven't tried internal planar processing. |
Splitting up the planes and processing them all separately gives me a pretty big speedup.
The text was updated successfully, but these errors were encountered: