Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MC AVX2/AVX512ICL #69

Merged
merged 11 commits into from
Jul 3, 2023
Merged

MC AVX2/AVX512ICL #69

merged 11 commits into from
Jul 3, 2023

Conversation

QSXW
Copy link
Collaborator

@QSXW QSXW commented Apr 24, 2023

put_vvc_luma_hv_10_C: 15480181
put_vvc_luma_hv_10_avx512icl: 1649488

ff_vvc_put_vvc_luma_h_16_C_w4_h4: 846
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h4: 137
ff_vvc_put_vvc_luma_h_16_C_w4_h8: 1591
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h8: 221
ff_vvc_put_vvc_luma_h_16_C_w4_h16: 2456
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h16: 329
ff_vvc_put_vvc_luma_h_16_C_w4_h32: 3652
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h32: 565
ff_vvc_put_vvc_luma_h_16_C_w4_h64: 4842
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h64: 800
ff_vvc_put_vvc_luma_h_16_C_w4_h128: 11294
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h128: 1548
ff_vvc_put_vvc_luma_h_16_C_w8_h4: 577
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h4: 89
ff_vvc_put_vvc_luma_h_16_C_w8_h8: 1115
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h8: 80
ff_vvc_put_vvc_luma_h_16_C_w8_h16: 2252
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h16: 192
ff_vvc_put_vvc_luma_h_16_C_w8_h32: 4620
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h32: 373
ff_vvc_put_vvc_luma_h_16_C_w8_h64: 9126
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h64: 677
ff_vvc_put_vvc_luma_h_16_C_w8_h128: 20130
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h128: 1495
ff_vvc_put_vvc_luma_h_16_C_w16_h4: 1125
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h4: 128
ff_vvc_put_vvc_luma_h_16_C_w16_h8: 2217
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h8: 204
ff_vvc_put_vvc_luma_h_16_C_w16_h16: 4531
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h16: 355
ff_vvc_put_vvc_luma_h_16_C_w16_h32: 8979
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h32: 662
ff_vvc_put_vvc_luma_h_16_C_w16_h64: 17850
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h64: 1294
ff_vvc_put_vvc_luma_h_16_C_w16_h128: 37791
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h128: 2485
ff_vvc_put_vvc_luma_h_16_C_w32_h4: 2410
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h4: 200
ff_vvc_put_vvc_luma_h_16_C_w32_h8: 4748
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h8: 353
ff_vvc_put_vvc_luma_h_16_C_w32_h16: 9376
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h16: 651
ff_vvc_put_vvc_luma_h_16_C_w32_h32: 18828
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h32: 1271
ff_vvc_put_vvc_luma_h_16_C_w32_h64: 37546
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h64: 2489
ff_vvc_put_vvc_luma_h_16_C_w32_h128: 76550
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h128: 5011
ff_vvc_put_vvc_luma_h_16_C_w64_h4: 4613
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h4: 355
ff_vvc_put_vvc_luma_h_16_C_w64_h8: 9077
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h8: 661
ff_vvc_put_vvc_luma_h_16_C_w64_h16: 18130
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h16: 1272
ff_vvc_put_vvc_luma_h_16_C_w64_h32: 36182
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h32: 2474
ff_vvc_put_vvc_luma_h_16_C_w64_h64: 72362
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h64: 4850
ff_vvc_put_vvc_luma_h_16_C_w64_h128: 144763
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h128: 9637
ff_vvc_put_vvc_luma_h_16_C_w128_h4: 8941
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h4: 661
ff_vvc_put_vvc_luma_h_16_C_w128_h8: 17812
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h8: 1266
ff_vvc_put_vvc_luma_h_16_C_w128_h16: 35624
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h16: 2464
ff_vvc_put_vvc_luma_h_16_C_w128_h32: 71536
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h32: 4867
ff_vvc_put_vvc_luma_h_16_C_w128_h64: 143315
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h64: 9930
ff_vvc_put_vvc_luma_h_16_C_w128_h128: 288658
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h128: 19746

OS: Ubuntu 22.04.2 LTS
CPU: i7-11700k

clip before after delta
Tango2_3840x2160_60_10_420_27_LD.266 33 36 9.1%
RitualDance_1920x1080_60_10_420_32_LD.266 138 146 5.8%
RitualDance_1920x1080_60_10_420_37_RA.266 150 163 7.8%

@@ -0,0 +1,460 @@
; /*
; * Provide SIMD mc functions for VVC decoding
; * Copyright (c) 2013 Pierre-Edouard LEPERE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add your name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@nuomi2021
Copy link
Member

nuomi2021 commented Apr 25, 2023

@QSXW thank you for the patch.

put_vvc_luma_hv_10_C: 15480181
put_vvc_luma_hv_10_avx512icl: 1649488

@nuomi2021
Copy link
Member

put_vvc_luma_hv_10_C: 15480181 put_vvc_luma_hv_10_avx512icl: 1649488

clip before after delta
Tango2_3840x2160_60_10_420_27_LD.266 33 36 9.1%
RitualDance_1920x1080_60_10_420_32_LD.266 138 146 5.8%
RitualDance_1920x1080_60_10_420_37_RA.266 150 163 7.8%

Thank you. Seems pretty good. which cpu did you use?

@QSXW
Copy link
Collaborator Author

QSXW commented May 10, 2023

put_vvc_luma_hv_10_C: 15480181 put_vvc_luma_hv_10_avx512icl: 1649488
clip before after delta
Tango2_3840x2160_60_10_420_27_LD.266 33 36 9.1%
RitualDance_1920x1080_60_10_420_32_LD.266 138 146 5.8%
RitualDance_1920x1080_60_10_420_37_RA.266 150 163 7.8%

Thank you. Seems pretty good. which cpu did you use?

i7-11700k. I update the details above.

@QSXW QSXW force-pushed the mc_avx branch 2 times, most recently from bd2ff7b to c1c2ff0 Compare May 10, 2023 18:59
@nuomi2021
Copy link
Member

nuomi2021 commented May 12, 2023

Hi @QSXW ,
Thank you for the patch.
AVX512ICL is only supported on 10xxx and 11xxx GPU. 12xxx 13xxx does not support it.
AVX2 can support 4xxx+ platforms. https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
To cover more cpus is it possible to give AVX2 more priority?
Thank you.

@QSXW
Copy link
Collaborator Author

QSXW commented May 12, 2023

Sure thing. The AVX2 will be the top priority now.

@QSXW QSXW changed the title MC AVX512ICL MC AVX2/AVX512ICL May 24, 2023
@QSXW QSXW force-pushed the mc_avx branch 2 times, most recently from 3a81316 to 0ead44d Compare May 26, 2023 21:21
@QSXW
Copy link
Collaborator Author

QSXW commented May 26, 2023

@nuomi2021 The inter prediction 16bits of AVX2 version has been done.

@nuomi2021
Copy link
Member

Great! let me try and give you some feedback.
Thank you.

@nuomi2021
Copy link
Member

could you help fix the windows yasm build and nasm issue?
thank you

@nuomi2021
Copy link
Member

BTW, is 16 bits macros easy to extent to 8 bits?

@QSXW
Copy link
Collaborator Author

QSXW commented May 28, 2023

could you help fix the windows yasm build and nasm issue?
thank you

No problem.

@QSXW
Copy link
Collaborator Author

QSXW commented May 28, 2023

BTW, is 16 bits macros easy to extent to 8 bits?

It’s not that easy to be honest for that the instructions are totally different when processing 8 bits and 16 bits. However, it’s easy to add 8 bits version of AVX2. Just need some times. No worries, I will add them ASAP!

@nuomi2021
Copy link
Member

BTW, is 16 bits macros easy to extent to 8 bits?

It’s not that easy to be honest for that the instructions are totally different when processing 8 bits and 16 bits. However, it’s easy to add 8 bits version of AVX2. Just need some times. No worries, I will add them ASAP!

thank you. did you refer to hevc, how does it handle 8 and 16 bits? thank you

@QSXW
Copy link
Collaborator Author

QSXW commented May 28, 2023

BTW, is 16 bits macros easy to extent to 8 bits?

It’s not that easy to be honest for that the instructions are totally different when processing 8 bits and 16 bits. However, it’s easy to add 8 bits version of AVX2. Just need some times. No worries, I will add them ASAP!

thank you. did you refer to hevc, how does it handle 8 and 16 bits? thank you

I didn't refer to the hevc, but I refer to the dav1d.

@nuomi2021
Copy link
Member

nuomi2021 commented May 29, 2023

I didn't refer to the hevc, but I refer to the dav1d.

👍, dav1d is good too

@nuomi2021
Copy link
Member

@QSXW , https://github.com/ffvvc/FFmpeg/actions/runs/5105666368/jobs/9227821132?pr=69#step:7:413 yasm is still failed. could you help fix it?

thank you

libavcodec/x86/vvc_mc.asm Show resolved Hide resolved
movu m9, [srcq + r3srcq * 2 - 16]

H_COMPUTE_H8_16_AVX2 7, 8, 9, 10, 11
movu [dstq + r3srcq * 2 - 32], m7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this overwrite if the width is 24?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. Do we have the situation of 24?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no just curious about this

@QSXW QSXW force-pushed the mc_avx branch 6 times, most recently from 3cd2faf to b990c46 Compare June 11, 2023 17:26
QSXW added 10 commits July 2, 2023 23:10
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
put_vvc_luma_hv_C:            15480181
put_vvc_luma_hv_16_avx512icl:  1649488

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_h_16_C_w4_h4:                846
ff_vvc_put_vvc_luma_h_16_avx512icl_w4_h4:        137
ff_vvc_put_vvc_luma_h_16_C_w8_h8:               1115
ff_vvc_put_vvc_luma_h_16_avx512icl_w8_h8:         80
ff_vvc_put_vvc_luma_h_16_C_w16_h16:             4531
ff_vvc_put_vvc_luma_h_16_avx512icl_w16_h16:      355
ff_vvc_put_vvc_luma_h_16_C_w32_h32:            18828
ff_vvc_put_vvc_luma_h_16_avx512icl_w32_h32:     1271
ff_vvc_put_vvc_luma_h_16_C_w64_h64:            72362
ff_vvc_put_vvc_luma_h_16_avx512icl_w64_h64:     4850
ff_vvc_put_vvc_luma_h_16_C_w128_h128:         288658
ff_vvc_put_vvc_luma_h_16_avx512icl_w128_h128:  19746

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_v_16_C_w4_h4:                 399
ff_vvc_put_vvc_luma_v_16_avx512icl_w4_h4:         113
ff_vvc_put_vvc_luma_v_16_C_w8_h8:                1309
ff_vvc_put_vvc_luma_v_16_avx512icl_w8_h8:         171
ff_vvc_put_vvc_luma_v_16_C_w16_h16:              4867
ff_vvc_put_vvc_luma_v_16_avx512icl_w16_h16:       590
ff_vvc_put_vvc_luma_v_16_C_w32_h32:             18842
ff_vvc_put_vvc_luma_v_16_avx512icl_w32_h32:      2235
ff_vvc_put_vvc_luma_v_16_C_w64_h64:             73020
ff_vvc_put_vvc_luma_v_16_avx512icl_w64_h64:      8559
ff_vvc_put_vvc_luma_v_16_C_w128_h128:          286941
ff_vvc_put_vvc_luma_v_16_avx512icl_w128_h128:   34015

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_h_16_C_w4_h4:                340
ff_vvc_put_vvc_luma_h_16_avx2_w4_h4:              64
ff_vvc_put_vvc_luma_h_16_C_w8_h8:               1212
ff_vvc_put_vvc_luma_h_16_avx2_w8_h8:             120
ff_vvc_put_vvc_luma_h_16_C_w16_h16:             4684
ff_vvc_put_vvc_luma_h_16_avx2_w16_h16:           386
ff_vvc_put_vvc_luma_h_16_C_w32_h32:            21161
ff_vvc_put_vvc_luma_h_16_avx2_w32_h32:          1381
ff_vvc_put_vvc_luma_h_16_C_w64_h64:            85119
ff_vvc_put_vvc_luma_h_16_avx2_w64_h64:          5236
ff_vvc_put_vvc_luma_h_16_C_w128_h128:         320314
ff_vvc_put_vvc_luma_h_16_avx2_w128_h128:       21994

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_hv_16_C_w4_h4:             130368
ff_vvc_put_vvc_luma_hv_16_avx2_w4_h4:           24447
ff_vvc_put_vvc_luma_hv_16_C_w8_h8:             342297
ff_vvc_put_vvc_luma_hv_16_avx2_w8_h8:           36312
ff_vvc_put_vvc_luma_hv_16_C_w16_h16:          1111204
ff_vvc_put_vvc_luma_hv_16_avx2_w16_h16:        140174
ff_vvc_put_vvc_luma_hv_16_C_w32_h32:          4111344
ff_vvc_put_vvc_luma_hv_16_avx2_w32_h32:        550833
ff_vvc_put_vvc_luma_hv_16_C_w64_h64:         15383468
ff_vvc_put_vvc_luma_hv_16_avx2_w64_h64:       2204067
ff_vvc_put_vvc_luma_hv_16_C_w128_h128:       59013947
ff_vvc_put_vvc_luma_hv_16_avx2_w128_h128:     8876216

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_v_16_C_w4_h4:              38329
ff_vvc_put_vvc_luma_v_16_avx2_w4_h4:           10133
ff_vvc_put_vvc_luma_v_16_C_w8_h8:             129643
ff_vvc_put_vvc_luma_v_16_avx2_w8_h8:           18627
ff_vvc_put_vvc_luma_v_16_C_w16_h16:           473556
ff_vvc_put_vvc_luma_v_16_avx2_w16_h16:         64610
ff_vvc_put_vvc_luma_v_16_C_w32_h32:          1874001
ff_vvc_put_vvc_luma_v_16_avx2_w32_h32:        251537
ff_vvc_put_vvc_luma_v_16_C_w64_h64:          7247058
ff_vvc_put_vvc_luma_v_16_avx2_w64_h64:        998391
ff_vvc_put_vvc_luma_v_16_C_w128_h128:       28602886
ff_vvc_put_vvc_luma_v_16_avx2_w128_h128:     4017060

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_h_8_C_w4_h4:              40440
ff_vvc_put_vvc_luma_h_8_avx2_w4_h4:            3009
ff_vvc_put_vvc_luma_h_8_C_w8_h8:             155320
ff_vvc_put_vvc_luma_h_8_avx2_w8_h8:            5717
ff_vvc_put_vvc_luma_h_8_C_w16_h16:           616144
ff_vvc_put_vvc_luma_h_8_avx2_w16_h16:         18478
ff_vvc_put_vvc_luma_h_8_C_w32_h32:          2462780
ff_vvc_put_vvc_luma_h_8_avx2_w32_h32:         69706
ff_vvc_put_vvc_luma_h_8_C_w64_h64:          9727819
ff_vvc_put_vvc_luma_h_8_avx2_w64_h64:        284385
ff_vvc_put_vvc_luma_h_8_C_w128_h128:       38717268
ff_vvc_put_vvc_luma_h_8_avx2_w128_h128:     1229316

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_hv_8_C_w4_h4:             130862
ff_vvc_put_vvc_luma_hv_8_avx2_w4_h4:           16154
ff_vvc_put_vvc_luma_hv_8_C_w8_h8:             344271
ff_vvc_put_vvc_luma_hv_8_avx2_w8_h8:           25723
ff_vvc_put_vvc_luma_hv_8_C_w16_h16:          1134516
ff_vvc_put_vvc_luma_hv_8_avx2_w16_h16:         95782
ff_vvc_put_vvc_luma_hv_8_C_w32_h32:          4038029
ff_vvc_put_vvc_luma_hv_8_avx2_w32_h32:        380669
ff_vvc_put_vvc_luma_hv_8_C_w64_h64:         15027688
ff_vvc_put_vvc_luma_hv_8_avx2_w64_h64:       1500418
ff_vvc_put_vvc_luma_hv_8_C_w128_h128:       58250619
ff_vvc_put_vvc_luma_hv_8_avx2_w128_h128:     6018737

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
ff_vvc_put_vvc_luma_hv_8_C_w4_h4:              43011
ff_vvc_put_vvc_luma_hv_8_avx2_w4_h4:            6294
ff_vvc_put_vvc_luma_hv_8_C_w8_h8:             158762
ff_vvc_put_vvc_luma_hv_8_avx2_w8_h8:           14103
ff_vvc_put_vvc_luma_hv_8_C_w16_h16:           647619
ff_vvc_put_vvc_luma_hv_8_avx2_w16_h16:         48283
ff_vvc_put_vvc_luma_hv_8_C_w32_h32:          2572971
ff_vvc_put_vvc_luma_hv_8_avx2_w32_h32:        185575
ff_vvc_put_vvc_luma_hv_8_C_w64_h64:         10401320
ff_vvc_put_vvc_luma_hv_8_avx2_w64_h64:        714610
ff_vvc_put_vvc_luma_hv_8_C_w128_h128:       41994567
ff_vvc_put_vvc_luma_hv_8_avx2_w128_h128:     2878383

Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
@QSXW QSXW force-pushed the mc_avx branch 2 times, most recently from 928cdf0 to f15b9cf Compare July 2, 2023 18:53
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
@nuomi2021 nuomi2021 merged commit 8eec730 into ffvvc:main Jul 3, 2023
@nuomi2021
Copy link
Member

Merged.
thank you, really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants