Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alf filter avx2 #42

Merged
merged 3 commits into from
Mar 4, 2023
Merged

Alf filter avx2 #42

merged 3 commits into from
Mar 4, 2023

Conversation

nuomi2021
Copy link
Member

got 11%~26% performance for 1080P and 4k videos

clip before after delta
RitualDance_1920x1080_60_10_420_32_LD.26 35 43 22.8%
RitualDance_1920x1080_60_10_420_37_RA.266 43 48 11.6%
Tango2_3840x2160_60_10_420_27_LD.266 7.9 10 26.5%

got 11%~26% performance for 1080P and 4k video

clip                                        before      after   delta
RitualDance_1920x1080_60_10_420_32_LD.26        35          43    22.8%
RitualDance_1920x1080_60_10_420_37_RA.266       43          48    11.6%
Tango2_3840x2160_60_10_420_27_LD.266            7.9         10    26.5%
checkasm: all 128 tests passed
vvc_alf_filter_chroma_4x4_10_c: 657.0
vvc_alf_filter_chroma_4x4_10_avx2: 138.0
vvc_alf_filter_chroma_4x8_10_c: 1264.7
vvc_alf_filter_chroma_4x8_10_avx2: 253.5
vvc_alf_filter_chroma_4x12_10_c: 1841.7
vvc_alf_filter_chroma_4x12_10_avx2: 375.5
vvc_alf_filter_chroma_4x16_10_c: 2442.7
vvc_alf_filter_chroma_4x16_10_avx2: 491.7
vvc_alf_filter_chroma_4x20_10_c: 3057.0
vvc_alf_filter_chroma_4x20_10_avx2: 607.2
vvc_alf_filter_chroma_4x24_10_c: 3667.0
vvc_alf_filter_chroma_4x24_10_avx2: 747.5
vvc_alf_filter_chroma_4x28_10_c: 4286.7
vvc_alf_filter_chroma_4x28_10_avx2: 849.0
vvc_alf_filter_chroma_4x32_10_c: 4886.0
vvc_alf_filter_chroma_4x32_10_avx2: 967.5
vvc_alf_filter_chroma_8x4_10_c: 1250.5
vvc_alf_filter_chroma_8x4_10_avx2: 261.0
vvc_alf_filter_chroma_8x8_10_c: 2430.7
vvc_alf_filter_chroma_8x8_10_avx2: 494.7
vvc_alf_filter_chroma_8x12_10_c: 3631.2
vvc_alf_filter_chroma_8x12_10_avx2: 734.5
vvc_alf_filter_chroma_8x16_10_c: 13675.7
vvc_alf_filter_chroma_8x16_10_avx2: 972.0
vvc_alf_filter_chroma_8x20_10_c: 6212.0
vvc_alf_filter_chroma_8x20_10_avx2: 1211.0
vvc_alf_filter_chroma_8x24_10_c: 7440.7
vvc_alf_filter_chroma_8x24_10_avx2: 1447.0
vvc_alf_filter_chroma_8x28_10_c: 8460.5
vvc_alf_filter_chroma_8x28_10_avx2: 1682.5
vvc_alf_filter_chroma_8x32_10_c: 9665.2
vvc_alf_filter_chroma_8x32_10_avx2: 1917.7
vvc_alf_filter_chroma_12x4_10_c: 1865.2
vvc_alf_filter_chroma_12x4_10_avx2: 391.7
vvc_alf_filter_chroma_12x8_10_c: 3625.2
vvc_alf_filter_chroma_12x8_10_avx2: 739.0
vvc_alf_filter_chroma_12x12_10_c: 5427.5
vvc_alf_filter_chroma_12x12_10_avx2: 1094.2
vvc_alf_filter_chroma_12x16_10_c: 7237.7
vvc_alf_filter_chroma_12x16_10_avx2: 1447.2
vvc_alf_filter_chroma_12x20_10_c: 9035.2
vvc_alf_filter_chroma_12x20_10_avx2: 1805.2
vvc_alf_filter_chroma_12x24_10_c: 11135.7
vvc_alf_filter_chroma_12x24_10_avx2: 2158.2
vvc_alf_filter_chroma_12x28_10_c: 12644.0
vvc_alf_filter_chroma_12x28_10_avx2: 2511.2
vvc_alf_filter_chroma_12x32_10_c: 14441.7
vvc_alf_filter_chroma_12x32_10_avx2: 2888.0
vvc_alf_filter_chroma_16x4_10_c: 2410.0
vvc_alf_filter_chroma_16x4_10_avx2: 251.7
vvc_alf_filter_chroma_16x8_10_c: 4943.0
vvc_alf_filter_chroma_16x8_10_avx2: 479.0
vvc_alf_filter_chroma_16x12_10_c: 7235.5
vvc_alf_filter_chroma_16x12_10_avx2: 9751.0
vvc_alf_filter_chroma_16x16_10_c: 10142.7
vvc_alf_filter_chroma_16x16_10_avx2: 935.5
vvc_alf_filter_chroma_16x20_10_c: 12029.0
vvc_alf_filter_chroma_16x20_10_avx2: 1174.5
vvc_alf_filter_chroma_16x24_10_c: 14414.2
vvc_alf_filter_chroma_16x24_10_avx2: 1410.5
vvc_alf_filter_chroma_16x28_10_c: 16813.0
vvc_alf_filter_chroma_16x28_10_avx2: 1713.0
vvc_alf_filter_chroma_16x32_10_c: 19228.5
vvc_alf_filter_chroma_16x32_10_avx2: 2256.0
vvc_alf_filter_chroma_20x4_10_c: 3015.2
vvc_alf_filter_chroma_20x4_10_avx2: 371.7
vvc_alf_filter_chroma_20x8_10_c: 6170.2
vvc_alf_filter_chroma_20x8_10_avx2: 721.0
vvc_alf_filter_chroma_20x12_10_c: 9019.7
vvc_alf_filter_chroma_20x12_10_avx2: 1102.7
vvc_alf_filter_chroma_20x16_10_c: 12040.2
vvc_alf_filter_chroma_20x16_10_avx2: 1422.5
vvc_alf_filter_chroma_20x20_10_c: 15010.7
vvc_alf_filter_chroma_20x20_10_avx2: 1765.7
vvc_alf_filter_chroma_20x24_10_c: 18017.7
vvc_alf_filter_chroma_20x24_10_avx2: 2124.7
vvc_alf_filter_chroma_20x28_10_c: 21025.5
vvc_alf_filter_chroma_20x28_10_avx2: 2488.2
vvc_alf_filter_chroma_20x32_10_c: 31128.5
vvc_alf_filter_chroma_20x32_10_avx2: 3205.2
vvc_alf_filter_chroma_24x4_10_c: 3701.2
vvc_alf_filter_chroma_24x4_10_avx2: 494.7
vvc_alf_filter_chroma_24x8_10_c: 7613.0
vvc_alf_filter_chroma_24x8_10_avx2: 957.2
vvc_alf_filter_chroma_24x12_10_c: 10816.7
vvc_alf_filter_chroma_24x12_10_avx2: 1427.7
vvc_alf_filter_chroma_24x16_10_c: 14390.5
vvc_alf_filter_chroma_24x16_10_avx2: 1948.2
vvc_alf_filter_chroma_24x20_10_c: 17989.5
vvc_alf_filter_chroma_24x20_10_avx2: 2363.7
vvc_alf_filter_chroma_24x24_10_c: 21581.7
vvc_alf_filter_chroma_24x24_10_avx2: 2839.7
vvc_alf_filter_chroma_24x28_10_c: 25179.2
vvc_alf_filter_chroma_24x28_10_avx2: 3313.2
vvc_alf_filter_chroma_24x32_10_c: 28776.2
vvc_alf_filter_chroma_24x32_10_avx2: 4154.7
vvc_alf_filter_chroma_28x4_10_c: 4331.2
vvc_alf_filter_chroma_28x4_10_avx2: 624.2
vvc_alf_filter_chroma_28x8_10_c: 8445.0
vvc_alf_filter_chroma_28x8_10_avx2: 1197.7
vvc_alf_filter_chroma_28x12_10_c: 12684.5
vvc_alf_filter_chroma_28x12_10_avx2: 1786.7
vvc_alf_filter_chroma_28x16_10_c: 16924.5
vvc_alf_filter_chroma_28x16_10_avx2: 2378.7
vvc_alf_filter_chroma_28x20_10_c: 38361.0
vvc_alf_filter_chroma_28x20_10_avx2: 2967.0
vvc_alf_filter_chroma_28x24_10_c: 25329.0
vvc_alf_filter_chroma_28x24_10_avx2: 3564.2
vvc_alf_filter_chroma_28x28_10_c: 29514.0
vvc_alf_filter_chroma_28x28_10_avx2: 4151.7
vvc_alf_filter_chroma_28x32_10_c: 33673.2
vvc_alf_filter_chroma_28x32_10_avx2: 5125.0
vvc_alf_filter_chroma_32x4_10_c: 4945.2
vvc_alf_filter_chroma_32x4_10_avx2: 485.7
vvc_alf_filter_chroma_32x8_10_c: 9658.7
vvc_alf_filter_chroma_32x8_10_avx2: 943.7
vvc_alf_filter_chroma_32x12_10_c: 16177.7
vvc_alf_filter_chroma_32x12_10_avx2: 1443.7
vvc_alf_filter_chroma_32x16_10_c: 19336.0
vvc_alf_filter_chroma_32x16_10_avx2: 1876.0
vvc_alf_filter_chroma_32x20_10_c: 24153.0
vvc_alf_filter_chroma_32x20_10_avx2: 2323.0
vvc_alf_filter_chroma_32x24_10_c: 28917.7
vvc_alf_filter_chroma_32x24_10_avx2: 2806.2
vvc_alf_filter_chroma_32x28_10_c: 33738.7
vvc_alf_filter_chroma_32x28_10_avx2: 3454.0
vvc_alf_filter_chroma_32x32_10_c: 38531.5
vvc_alf_filter_chroma_32x32_10_avx2: 4103.2
vvc_alf_filter_luma_4x4_10_c: 1076.2
vvc_alf_filter_luma_4x4_10_avx2: 240.0
vvc_alf_filter_luma_4x8_10_c: 2113.2
vvc_alf_filter_luma_4x8_10_avx2: 454.5
vvc_alf_filter_luma_4x12_10_c: 3179.2
vvc_alf_filter_luma_4x12_10_avx2: 669.0
vvc_alf_filter_luma_4x16_10_c: 4146.5
vvc_alf_filter_luma_4x16_10_avx2: 885.0
vvc_alf_filter_luma_4x20_10_c: 5168.2
vvc_alf_filter_luma_4x20_10_avx2: 1106.0
vvc_alf_filter_luma_4x24_10_c: 6168.2
vvc_alf_filter_luma_4x24_10_avx2: 1357.0
vvc_alf_filter_luma_4x28_10_c: 7330.0
vvc_alf_filter_luma_4x28_10_avx2: 1539.5
vvc_alf_filter_luma_4x32_10_c: 8202.0
vvc_alf_filter_luma_4x32_10_avx2: 1803.7
vvc_alf_filter_luma_8x4_10_c: 2100.5
vvc_alf_filter_luma_8x4_10_avx2: 479.7
vvc_alf_filter_luma_8x8_10_c: 4079.5
vvc_alf_filter_luma_8x8_10_avx2: 898.2
vvc_alf_filter_luma_8x12_10_c: 6209.2
vvc_alf_filter_luma_8x12_10_avx2: 1328.7
vvc_alf_filter_luma_8x16_10_c: 8177.5
vvc_alf_filter_luma_8x16_10_avx2: 1765.0
vvc_alf_filter_luma_8x20_10_c: 10400.5
vvc_alf_filter_luma_8x20_10_avx2: 2196.2
vvc_alf_filter_luma_8x24_10_c: 12222.7
vvc_alf_filter_luma_8x24_10_avx2: 2626.0
vvc_alf_filter_luma_8x28_10_c: 14235.5
vvc_alf_filter_luma_8x28_10_avx2: 3065.2
vvc_alf_filter_luma_8x32_10_c: 16702.2
vvc_alf_filter_luma_8x32_10_avx2: 3494.2
vvc_alf_filter_luma_12x4_10_c: 3142.0
vvc_alf_filter_luma_12x4_10_avx2: 699.5
vvc_alf_filter_luma_12x8_10_c: 6093.2
vvc_alf_filter_luma_12x8_10_avx2: 1335.5
vvc_alf_filter_luma_12x12_10_c: 9098.7
vvc_alf_filter_luma_12x12_10_avx2: 1988.5
vvc_alf_filter_luma_12x16_10_c: 12237.5
vvc_alf_filter_luma_12x16_10_avx2: 2635.0
vvc_alf_filter_luma_12x20_10_c: 15240.7
vvc_alf_filter_luma_12x20_10_avx2: 3289.5
vvc_alf_filter_luma_12x24_10_c: 18262.0
vvc_alf_filter_luma_12x24_10_avx2: 3937.2
vvc_alf_filter_luma_12x28_10_c: 21283.0
vvc_alf_filter_luma_12x28_10_avx2: 4585.2
vvc_alf_filter_luma_12x32_10_c: 24299.7
vvc_alf_filter_luma_12x32_10_avx2: 5333.5
vvc_alf_filter_luma_16x4_10_c: 5729.7
vvc_alf_filter_luma_16x4_10_avx2: 446.2
vvc_alf_filter_luma_16x8_10_c: 8256.5
vvc_alf_filter_luma_16x8_10_avx2: 876.7
vvc_alf_filter_luma_16x12_10_c: 12178.7
vvc_alf_filter_luma_16x12_10_avx2: 1332.7
vvc_alf_filter_luma_16x16_10_c: 16262.5
vvc_alf_filter_luma_16x16_10_avx2: 1734.5
vvc_alf_filter_luma_16x20_10_c: 20263.7
vvc_alf_filter_luma_16x20_10_avx2: 2147.2
vvc_alf_filter_luma_16x24_10_c: 24789.7
vvc_alf_filter_luma_16x24_10_avx2: 2591.7
vvc_alf_filter_luma_16x28_10_c: 28894.5
vvc_alf_filter_luma_16x28_10_avx2: 3228.7
vvc_alf_filter_luma_16x32_10_c: 33360.0
vvc_alf_filter_luma_16x32_10_avx2: 4117.5
vvc_alf_filter_luma_20x4_10_c: 5076.0
vvc_alf_filter_luma_20x4_10_avx2: 674.2
vvc_alf_filter_luma_20x8_10_c: 10138.2
vvc_alf_filter_luma_20x8_10_avx2: 1323.5
vvc_alf_filter_luma_20x12_10_c: 15171.5
vvc_alf_filter_luma_20x12_10_avx2: 2026.5
vvc_alf_filter_luma_20x16_10_c: 20315.0
vvc_alf_filter_luma_20x16_10_avx2: 2611.0
vvc_alf_filter_luma_20x20_10_c: 25367.0
vvc_alf_filter_luma_20x20_10_avx2: 3259.5
vvc_alf_filter_luma_20x24_10_c: 30443.5
vvc_alf_filter_luma_20x24_10_avx2: 3898.5
vvc_alf_filter_luma_20x28_10_c: 35439.7
vvc_alf_filter_luma_20x28_10_avx2: 4645.5
vvc_alf_filter_luma_20x32_10_c: 40609.0
vvc_alf_filter_luma_20x32_10_avx2: 5849.0
vvc_alf_filter_luma_24x4_10_c: 6245.5
vvc_alf_filter_luma_24x4_10_avx2: 901.2
vvc_alf_filter_luma_24x8_10_c: 12166.7
vvc_alf_filter_luma_24x8_10_avx2: 1754.7
vvc_alf_filter_luma_24x12_10_c: 18223.2
vvc_alf_filter_luma_24x12_10_avx2: 2621.5
vvc_alf_filter_luma_24x16_10_c: 24287.2
vvc_alf_filter_luma_24x16_10_avx2: 3474.2
vvc_alf_filter_luma_24x20_10_c: 38042.2
vvc_alf_filter_luma_24x20_10_avx2: 4335.7
vvc_alf_filter_luma_24x24_10_c: 36462.0
vvc_alf_filter_luma_24x24_10_avx2: 5199.5
vvc_alf_filter_luma_24x28_10_c: 42502.7
vvc_alf_filter_luma_24x28_10_avx2: 6133.5
vvc_alf_filter_luma_24x32_10_c: 48675.5
vvc_alf_filter_luma_24x32_10_avx2: 7575.0
vvc_alf_filter_luma_28x4_10_c: 7101.5
vvc_alf_filter_luma_28x4_10_avx2: 1128.2
vvc_alf_filter_luma_28x8_10_c: 14185.7
vvc_alf_filter_luma_28x8_10_avx2: 2189.0
vvc_alf_filter_luma_28x12_10_c: 21278.7
vvc_alf_filter_luma_28x12_10_avx2: 3347.2
vvc_alf_filter_luma_28x16_10_c: 28338.2
vvc_alf_filter_luma_28x16_10_avx2: 4462.7
vvc_alf_filter_luma_28x20_10_c: 37076.7
vvc_alf_filter_luma_28x20_10_avx2: 5729.0
vvc_alf_filter_luma_28x24_10_c: 42612.2
vvc_alf_filter_luma_28x24_10_avx2: 6508.7
vvc_alf_filter_luma_28x28_10_c: 49686.0
vvc_alf_filter_luma_28x28_10_avx2: 7666.0
vvc_alf_filter_luma_28x32_10_c: 65345.2
vvc_alf_filter_luma_28x32_10_avx2: 9330.2
vvc_alf_filter_luma_32x4_10_c: 8329.5
vvc_alf_filter_luma_32x4_10_avx2: 887.7
vvc_alf_filter_luma_32x8_10_c: 16941.7
vvc_alf_filter_luma_32x8_10_avx2: 1736.0
vvc_alf_filter_luma_32x12_10_c: 73347.7
vvc_alf_filter_luma_32x12_10_avx2: 2584.2
vvc_alf_filter_luma_32x16_10_c: 32359.5
vvc_alf_filter_luma_32x16_10_avx2: 3442.7
vvc_alf_filter_luma_32x20_10_c: 40482.5
vvc_alf_filter_luma_32x20_10_avx2: 4318.5
vvc_alf_filter_luma_32x24_10_c: 48674.7
vvc_alf_filter_luma_32x24_10_avx2: 5174.2
vvc_alf_filter_luma_32x28_10_c: 56715.7
vvc_alf_filter_luma_32x28_10_avx2: 6124.5
vvc_alf_filter_luma_32x32_10_c: 66720.0
vvc_alf_filter_luma_32x32_10_avx2: 7577.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant