-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MC AVX2/AVX512ICL #69
Conversation
libavcodec/x86/vvc_mc.asm
Outdated
@@ -0,0 +1,460 @@ | |||
; /* | |||
; * Provide SIMD mc functions for VVC decoding | |||
; * Copyright (c) 2013 Pierre-Edouard LEPERE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add your name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@QSXW thank you for the patch.
|
Thank you. Seems pretty good. which cpu did you use? |
i7-11700k. I update the details above. |
bd2ff7b
to
c1c2ff0
Compare
Hi @QSXW , |
Sure thing. The AVX2 will be the top priority now. |
3a81316
to
0ead44d
Compare
@nuomi2021 The inter prediction 16bits of AVX2 version has been done. |
Great! let me try and give you some feedback. |
could you help fix the windows yasm build and nasm issue? |
BTW, is 16 bits macros easy to extent to 8 bits? |
No problem. |
It’s not that easy to be honest for that the instructions are totally different when processing 8 bits and 16 bits. However, it’s easy to add 8 bits version of AVX2. Just need some times. No worries, I will add them ASAP! |
thank you. did you refer to hevc, how does it handle 8 and 16 bits? thank you |
I didn't refer to the hevc, but I refer to the dav1d. |
👍, dav1d is good too |
@QSXW , https://github.com/ffvvc/FFmpeg/actions/runs/5105666368/jobs/9227821132?pr=69#step:7:413 yasm is still failed. could you help fix it? thank you |
movu m9, [srcq + r3srcq * 2 - 16] | ||
|
||
H_COMPUTE_H8_16_AVX2 7, 8, 9, 10, 11 | ||
movu [dstq + r3srcq * 2 - 32], m7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this overwrite if the width is 24?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. Do we have the situation of 24?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no just curious about this
91966b8
to
d124eb8
Compare
3cd2faf
to
b990c46
Compare
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
put_vvc_luma_hv_C: 15480181 put_vvc_luma_hv_16_avx512icl: 1649488 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_h_16_C_w4_h4: 846 ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h4: 137 ff_vvc_put_vvc_luma_h_16_C_w8_h8: 1115 ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h8: 80 ff_vvc_put_vvc_luma_h_16_C_w16_h16: 4531 ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h16: 355 ff_vvc_put_vvc_luma_h_16_C_w32_h32: 18828 ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h32: 1271 ff_vvc_put_vvc_luma_h_16_C_w64_h64: 72362 ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h64: 4850 ff_vvc_put_vvc_luma_h_16_C_w128_h128: 288658 ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h128: 19746 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_v_16_C_w4_h4: 399 ff_vvc_put_vvc_luma_v_16_avx512icl_w4_h4: 113 ff_vvc_put_vvc_luma_v_16_C_w8_h8: 1309 ff_vvc_put_vvc_luma_v_16_avx512icl_w8_h8: 171 ff_vvc_put_vvc_luma_v_16_C_w16_h16: 4867 ff_vvc_put_vvc_luma_v_16_avx512icl_w16_h16: 590 ff_vvc_put_vvc_luma_v_16_C_w32_h32: 18842 ff_vvc_put_vvc_luma_v_16_avx512icl_w32_h32: 2235 ff_vvc_put_vvc_luma_v_16_C_w64_h64: 73020 ff_vvc_put_vvc_luma_v_16_avx512icl_w64_h64: 8559 ff_vvc_put_vvc_luma_v_16_C_w128_h128: 286941 ff_vvc_put_vvc_luma_v_16_avx512icl_w128_h128: 34015 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_h_16_C_w4_h4: 340 ff_vvc_put_vvc_luma_h_16_avx2_w4_h4: 64 ff_vvc_put_vvc_luma_h_16_C_w8_h8: 1212 ff_vvc_put_vvc_luma_h_16_avx2_w8_h8: 120 ff_vvc_put_vvc_luma_h_16_C_w16_h16: 4684 ff_vvc_put_vvc_luma_h_16_avx2_w16_h16: 386 ff_vvc_put_vvc_luma_h_16_C_w32_h32: 21161 ff_vvc_put_vvc_luma_h_16_avx2_w32_h32: 1381 ff_vvc_put_vvc_luma_h_16_C_w64_h64: 85119 ff_vvc_put_vvc_luma_h_16_avx2_w64_h64: 5236 ff_vvc_put_vvc_luma_h_16_C_w128_h128: 320314 ff_vvc_put_vvc_luma_h_16_avx2_w128_h128: 21994 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_hv_16_C_w4_h4: 130368 ff_vvc_put_vvc_luma_hv_16_avx2_w4_h4: 24447 ff_vvc_put_vvc_luma_hv_16_C_w8_h8: 342297 ff_vvc_put_vvc_luma_hv_16_avx2_w8_h8: 36312 ff_vvc_put_vvc_luma_hv_16_C_w16_h16: 1111204 ff_vvc_put_vvc_luma_hv_16_avx2_w16_h16: 140174 ff_vvc_put_vvc_luma_hv_16_C_w32_h32: 4111344 ff_vvc_put_vvc_luma_hv_16_avx2_w32_h32: 550833 ff_vvc_put_vvc_luma_hv_16_C_w64_h64: 15383468 ff_vvc_put_vvc_luma_hv_16_avx2_w64_h64: 2204067 ff_vvc_put_vvc_luma_hv_16_C_w128_h128: 59013947 ff_vvc_put_vvc_luma_hv_16_avx2_w128_h128: 8876216 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_v_16_C_w4_h4: 38329 ff_vvc_put_vvc_luma_v_16_avx2_w4_h4: 10133 ff_vvc_put_vvc_luma_v_16_C_w8_h8: 129643 ff_vvc_put_vvc_luma_v_16_avx2_w8_h8: 18627 ff_vvc_put_vvc_luma_v_16_C_w16_h16: 473556 ff_vvc_put_vvc_luma_v_16_avx2_w16_h16: 64610 ff_vvc_put_vvc_luma_v_16_C_w32_h32: 1874001 ff_vvc_put_vvc_luma_v_16_avx2_w32_h32: 251537 ff_vvc_put_vvc_luma_v_16_C_w64_h64: 7247058 ff_vvc_put_vvc_luma_v_16_avx2_w64_h64: 998391 ff_vvc_put_vvc_luma_v_16_C_w128_h128: 28602886 ff_vvc_put_vvc_luma_v_16_avx2_w128_h128: 4017060 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_h_8_C_w4_h4: 40440 ff_vvc_put_vvc_luma_h_8_avx2_w4_h4: 3009 ff_vvc_put_vvc_luma_h_8_C_w8_h8: 155320 ff_vvc_put_vvc_luma_h_8_avx2_w8_h8: 5717 ff_vvc_put_vvc_luma_h_8_C_w16_h16: 616144 ff_vvc_put_vvc_luma_h_8_avx2_w16_h16: 18478 ff_vvc_put_vvc_luma_h_8_C_w32_h32: 2462780 ff_vvc_put_vvc_luma_h_8_avx2_w32_h32: 69706 ff_vvc_put_vvc_luma_h_8_C_w64_h64: 9727819 ff_vvc_put_vvc_luma_h_8_avx2_w64_h64: 284385 ff_vvc_put_vvc_luma_h_8_C_w128_h128: 38717268 ff_vvc_put_vvc_luma_h_8_avx2_w128_h128: 1229316 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_hv_8_C_w4_h4: 130862 ff_vvc_put_vvc_luma_hv_8_avx2_w4_h4: 16154 ff_vvc_put_vvc_luma_hv_8_C_w8_h8: 344271 ff_vvc_put_vvc_luma_hv_8_avx2_w8_h8: 25723 ff_vvc_put_vvc_luma_hv_8_C_w16_h16: 1134516 ff_vvc_put_vvc_luma_hv_8_avx2_w16_h16: 95782 ff_vvc_put_vvc_luma_hv_8_C_w32_h32: 4038029 ff_vvc_put_vvc_luma_hv_8_avx2_w32_h32: 380669 ff_vvc_put_vvc_luma_hv_8_C_w64_h64: 15027688 ff_vvc_put_vvc_luma_hv_8_avx2_w64_h64: 1500418 ff_vvc_put_vvc_luma_hv_8_C_w128_h128: 58250619 ff_vvc_put_vvc_luma_hv_8_avx2_w128_h128: 6018737 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_hv_8_C_w4_h4: 43011 ff_vvc_put_vvc_luma_hv_8_avx2_w4_h4: 6294 ff_vvc_put_vvc_luma_hv_8_C_w8_h8: 158762 ff_vvc_put_vvc_luma_hv_8_avx2_w8_h8: 14103 ff_vvc_put_vvc_luma_hv_8_C_w16_h16: 647619 ff_vvc_put_vvc_luma_hv_8_avx2_w16_h16: 48283 ff_vvc_put_vvc_luma_hv_8_C_w32_h32: 2572971 ff_vvc_put_vvc_luma_hv_8_avx2_w32_h32: 185575 ff_vvc_put_vvc_luma_hv_8_C_w64_h64: 10401320 ff_vvc_put_vvc_luma_hv_8_avx2_w64_h64: 714610 ff_vvc_put_vvc_luma_hv_8_C_w128_h128: 41994567 ff_vvc_put_vvc_luma_hv_8_avx2_w128_h128: 2878383 Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
928cdf0
to
f15b9cf
Compare
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
Merged. |
put_vvc_luma_hv_10_C: 15480181
put_vvc_luma_hv_10_avx512icl: 1649488
ff_vvc_put_vvc_luma_h_16_C_w4_h4: 846
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h4: 137
ff_vvc_put_vvc_luma_h_16_C_w4_h8: 1591
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h8: 221
ff_vvc_put_vvc_luma_h_16_C_w4_h16: 2456
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h16: 329
ff_vvc_put_vvc_luma_h_16_C_w4_h32: 3652
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h32: 565
ff_vvc_put_vvc_luma_h_16_C_w4_h64: 4842
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h64: 800
ff_vvc_put_vvc_luma_h_16_C_w4_h128: 11294
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h128: 1548
ff_vvc_put_vvc_luma_h_16_C_w8_h4: 577
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h4: 89
ff_vvc_put_vvc_luma_h_16_C_w8_h8: 1115
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h8: 80
ff_vvc_put_vvc_luma_h_16_C_w8_h16: 2252
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h16: 192
ff_vvc_put_vvc_luma_h_16_C_w8_h32: 4620
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h32: 373
ff_vvc_put_vvc_luma_h_16_C_w8_h64: 9126
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h64: 677
ff_vvc_put_vvc_luma_h_16_C_w8_h128: 20130
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h128: 1495
ff_vvc_put_vvc_luma_h_16_C_w16_h4: 1125
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h4: 128
ff_vvc_put_vvc_luma_h_16_C_w16_h8: 2217
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h8: 204
ff_vvc_put_vvc_luma_h_16_C_w16_h16: 4531
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h16: 355
ff_vvc_put_vvc_luma_h_16_C_w16_h32: 8979
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h32: 662
ff_vvc_put_vvc_luma_h_16_C_w16_h64: 17850
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h64: 1294
ff_vvc_put_vvc_luma_h_16_C_w16_h128: 37791
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h128: 2485
ff_vvc_put_vvc_luma_h_16_C_w32_h4: 2410
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h4: 200
ff_vvc_put_vvc_luma_h_16_C_w32_h8: 4748
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h8: 353
ff_vvc_put_vvc_luma_h_16_C_w32_h16: 9376
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h16: 651
ff_vvc_put_vvc_luma_h_16_C_w32_h32: 18828
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h32: 1271
ff_vvc_put_vvc_luma_h_16_C_w32_h64: 37546
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h64: 2489
ff_vvc_put_vvc_luma_h_16_C_w32_h128: 76550
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h128: 5011
ff_vvc_put_vvc_luma_h_16_C_w64_h4: 4613
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h4: 355
ff_vvc_put_vvc_luma_h_16_C_w64_h8: 9077
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h8: 661
ff_vvc_put_vvc_luma_h_16_C_w64_h16: 18130
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h16: 1272
ff_vvc_put_vvc_luma_h_16_C_w64_h32: 36182
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h32: 2474
ff_vvc_put_vvc_luma_h_16_C_w64_h64: 72362
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h64: 4850
ff_vvc_put_vvc_luma_h_16_C_w64_h128: 144763
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h128: 9637
ff_vvc_put_vvc_luma_h_16_C_w128_h4: 8941
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h4: 661
ff_vvc_put_vvc_luma_h_16_C_w128_h8: 17812
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h8: 1266
ff_vvc_put_vvc_luma_h_16_C_w128_h16: 35624
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h16: 2464
ff_vvc_put_vvc_luma_h_16_C_w128_h32: 71536
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h32: 4867
ff_vvc_put_vvc_luma_h_16_C_w128_h64: 143315
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h64: 9930
ff_vvc_put_vvc_luma_h_16_C_w128_h128: 288658
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h128: 19746
OS: Ubuntu 22.04.2 LTS
CPU: i7-11700k