Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added missing AVX2 fillers #2565

Merged

Conversation

itzpr3d4t0r
Copy link
Member

@itzpr3d4t0r itzpr3d4t0r commented Nov 12, 2023

This PR is a continuation of #2382 and adds the SUB, MIN, MAX, and MULT blend flags. this should be a 70X to 110X perf improvement over the current fillers with blend flags and is 35% faster than caching a surface with a color and blitting (with AVX2).

Results and test program:
ON MAIN

Flag: BLEND_SUB
fill: 1.8767017999998643
blit: 0.021530719999827853
--------------------
Flag: BLEND_MULT
fill: 1.7614726199999495
blit: 0.03417129999997996
--------------------
Flag: BLEND_MIN
fill: 1.82431628000013
blit: 0.02125239999986661
--------------------
Flag: BLEND_MAX
fill: 1.872223340000346
blit: 0.021590140000080284
--------------------

WITH THIS PR

Flag: BLEND_SUB
fill: 0.01598090000006778
blit: 0.021626579999974638
--------------------
Flag: BLEND_MULT
fill: 0.024974719999954688
blit: 0.03429605999972409
--------------------
Flag: BLEND_MIN
fill: 0.01568092000015895
blit: 0.021365460000197345
--------------------
Flag: BLEND_MAX
fill: 0.015678200000002106
blit: 0.021505839999917953
--------------------

Test Program

from timeit import repeat

import pygame

pygame.init()

surf = pygame.Surface((500, 500))
surf.fill((132, 33, 200))

color = pygame.Surface((500, 500))
color.fill((24, 24, 24))

flags = [
    "BLEND_SUB",
    "BLEND_MULT",
    "BLEND_MIN",
    "BLEND_MAX",
]

G = globals()

for flag in flags:
    print(f"Flag: {flag}")
    teststr = "surf.fill((24, 24, 24), None, pygame." + flag + ")"
    l = [min(repeat(teststr, globals=G, number=1000, repeat=10)) for _ in range(5)]
    print(f"fill: {sum(l) / len(l)}")

    teststr = "surf.blit(color, (0, 0), None, pygame." + flag + ")"
    l = [min(repeat(teststr, globals=G, number=1000, repeat=10)) for _ in range(5)]
    print(f"blit: {sum(l) / len(l)}")
    print("-" * 20)

@itzpr3d4t0r itzpr3d4t0r added Performance Related to the speed or resource usage of the project SIMD Surface pygame.Surface labels Nov 12, 2023
@itzpr3d4t0r itzpr3d4t0r requested a review from a team as a code owner November 12, 2023 09:51
Copy link
Member

@MyreMylar MyreMylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

I see the same speed up locally, the tests also all pass locally and the code all makes sense to me - macros very similar to the SSE2 versions from the other PR. Nice work!

Copy link
Member

@Starbuck5 Starbuck5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Thanks for working this up.

@Starbuck5 Starbuck5 merged commit 68685f4 into pygame-community:main Dec 5, 2023
30 checks passed
@itzpr3d4t0r itzpr3d4t0r deleted the add-missing-avx-fillers branch December 23, 2023 09:35
@itzpr3d4t0r itzpr3d4t0r mentioned this pull request Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Related to the speed or resource usage of the project SIMD Surface pygame.Surface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants