Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass at backwards compatible pygame alpha blit #2213

Merged
merged 31 commits into from
Oct 23, 2020

Conversation

MyreMylar
Copy link
Contributor

@MyreMylar MyreMylar commented Oct 19, 2020

This adds an SDL1 style blitter for alpha blending where both source and destination surfaces have an alpha channel. It also adds a flag to disable this blitter in favour of reverting to the SDL2 one.

To Do

  • Currently we are only handling ARGB surfaces in the SSE2 version, can we tweak this for other 32 bit surface formats e.g. RGBA. Do we need to? Looks like SDL only handles 'ARGB/ABGR' formats in my tests, basically where the alpha is at shift 24, though whether to call this ARGB or RGBA with big endian and little endian seems to be a minefield in itself. The current code should handle any 32bit surfaces where the alpha is shifted by 24bits anyway.
  • 16 bit surfaces - is it worth doing anything for these? They should be handled by the slow, non-sse2, blitter but we aren't testing that right now. These are now handled by the non-sse2 blitter & tested. It is about half the speed of the SDL2 16 bit surface blitter, if you care about speed over accuracy in 16bit blits you'll want to use the SDL2 blitter.
  • blitting surface with alpha to surface without and visa versa - does this need handling or do the SDL1 & SDL2 blitters behave the same? Investigated this, it appears that SDL1 goes down a different path when the destination surface is opaque so the new blitter path doesn't match the SDL1 output. I put these blits down the new path anyway because it looks more accurate to SDL1 than the SDL2 blitter.
  • Check 'surface alpha' (set with set_alpha()). Also, we don't want to get involved in RLE stuff accidentally.
  • Split SSE2 blitter into multiple blitters to see if we can improve speed in most common surface setup versus rarer cases.
  • Add test for blit surface with per pixel alpha and surface alpha - this one will be different from normal SDL2 and is not a possible combo in SDL1, so will have to use carefully selected values. Should be SDL1 blending formula with SDL2 alpha capabilities.
  • Check blitter against original test cases in SDL 2: Per Pixel Alpha error. #1289
  • Add environment variable to makes SDL2 blitter the default instead of SDL1-mimic blitter.
  • Investigate splitting SSE2 blitter a third time to handle common case where there is no alpha on destination surface.

@MyreMylar
Copy link
Contributor Author

Some speed tests:

new-default SDL1-style alpha-blit (using SSE2):

         256002 function calls in 1.854 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.166    0.166    1.854    1.854 pygame_blit_sdl1_speed_test.py:9(speed_test_blits)
   128000    1.055    0.000    1.055    0.000 {method 'fill' of 'pygame.Surface' objects}
    64000    0.610    0.000    0.610    0.000 {method 'blit' of 'pygame.Surface' objects}
    64000    0.023    0.000    0.023    0.000 {method 'get_at' of 'pygame.Surface' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

old-default SDL2 style blit (uses some form of intrinsics in SDL2):

         256002 function calls in 1.832 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.185    0.185    1.832    1.832 pygame_blit_sdl2_speed_test.py:10(speed_test_blits)
   128000    1.054    0.000    1.054    0.000 {method 'fill' of 'pygame.Surface' objects}
    64000    0.568    0.000    0.568    0.000 {method 'blit' of 'pygame.Surface' objects}
    64000    0.025    0.000    0.025    0.000 {method 'get_at' of 'pygame.Surface' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

new-default SDL1 style alpha-blit (without any SSE2/NEON intrinsics)

         256002 function calls in 6.070 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.172    0.172    6.070    6.070 pygame_blit_sdl1_speed_test.py:9(speed_test_blits)
    64000    4.836    0.000    4.836    0.000 {method 'blit' of 'pygame.Surface' objects}
   128000    1.038    0.000    1.038    0.000 {method 'fill' of 'pygame.Surface' objects}
    64000    0.024    0.000    0.024    0.000 {method 'get_at' of 'pygame.Surface' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So the new SDL1 blitter is fairly comparable, though slightly slower, than the default SDL2 one and both of them are a lot faster than not using intrinsics.

It's possible we could improve the speed a bit without affecting the accuracy, I'm not an SSE2 intrinsics expert.

@MyreMylar
Copy link
Contributor Author

Was using this to do speed testing (changing the flag to switch to the SDL2 blitter):

from sys import stdout
from pstats import Stats
from cProfile import Profile

import pygame
pygame.init()


def speed_test_blits():

    nums = [0, 1, 65, 126, 127, 199, 254, 255]
    results = {}
    for iterations in range(0, 1000):
        for dst_r, dst_b, dst_a in zip(nums, reversed(nums), reversed(nums)):
            for src_r, src_b, src_a in zip(nums, reversed(nums), nums):
                src_surf = pygame.Surface((66, 66), pygame.SRCALPHA, 32)
                src_surf.fill((src_r, 255, src_b, src_a))
                dest_surf = pygame.Surface((66, 66), pygame.SRCALPHA, 32)
                dest_surf.fill((dst_r, 255, dst_b, dst_a))

                dest_surf.blit(src_surf, (0, 0))
                key = ((dst_r, dst_b, dst_a), (src_r, src_b, src_a))
                results[key] = dest_surf.get_at((65, 33))


if __name__ == '__main__':
    profiler = Profile()
    profiler.runcall(speed_test_blits)
    stats = Stats(profiler, stream=stdout)
    stats.strip_dirs()
    stats.sort_stats('cumulative')
    stats.print_stats()

MyreMylar added 8 commits October 20, 2020 13:41
There is a separate path for the arm blitters in SDL2 and it
and, at least in the SDL version used in the CI, outputs
different results than on other platforms. This behaviour is
likely to change in the future as changes have gone into the
arm blitters. This is not something we can fix as this test
is specifically for the SDL 2 blitter.
Various SSE2 shift functions require compile time constants
so if we wanted RGBA surfaces to have SSE2 optimised blends
they would need their own version of the function.
This will bring them into line with SDL1 but they are
about half as fast as 16bit surfaces using the SDL2
blitter.
This doesn't bring us into line with SDL1 *exactly* in this
case but it does put the alpha blits down the same path.
The new SDL1 mimic-ing blitter seems closer than the SDL2
blitter to the original SDL1 blitter here. A test was also
added but then disabled because the blit is still slightly
off the SDL1 original. Surfaces with colorkeys have also been
excluded from the new path.
Also makes sure that surfaces with just surface alpha get passed
through, though they can't use the SSE2 enhanced blit so we need
to see how that works with surfaces with per pixel alpha AND
surface alpha.
Clarified some test descriptions as well and re-enabled original
test for this issue as it now passes.
Adding this means we can have pixel alpha and surface alpha in the
same source surface for a blit, but ti does slow the whole blit
function down doing the extra maths. Think the next step may be
to split the SSE2 blitter into several with different feature
support, this seems to be the approach taken by SDL2. Also this
needs a test.
Supporting surface alpha is makes the blitter slower, but because it
is a surface level thing we now test for it at the start and
send the blitter down two different SSE2 paths - one with surface
alpha support, and one without. The one with support seems faster
than the SDL2 version. Also added a test for source surfaces with
source alpha.
@MyreMylar
Copy link
Contributor Author

Checked over the test cases in the original report (#1289) and they all look/print like they do in SDL1 now. So that's nice.

Thought the name was a bit clearer, env var is actually
'PYGAME_BLEND_ALPHA_SDL2'. I added getting the environment variable
 into pygame_init() because I didn't want to get it on every blit.
 adding: os.environ['PYGAME_BLEND_ALPHA_SDL2'] = '1' before you
 call pygame.init() should now set the blit mode for every blit.
@MyreMylar
Copy link
Contributor Author

MyreMylar commented Oct 21, 2020

With this PR you should now be able to do:

import os
import pygame
os.environ['PYGAME_BLEND_ALPHA_SDL2'] = '1'

pygame.init()

To set the blitter back to how it worked before this PR. That's in addition to the BLEND_ALPHA_SDL2 flag you can pass in when blitting.

…aque

This version seems equivalent to SDL2 speed wise which is good as I expect alpha surfaces to an opaque background is a fairly common operation.
@MyreMylar
Copy link
Contributor Author

Ok, that is probably enough fiddling. To summarise:

  • This adds a new default blitter for surfaces with alpha. The default blitter prioritises accuracy to what SDL1 did over speed.
  • There are SSE2 optimised versions of the blitter that cover three common cases:
    • Two surfaces with per-pixel alpha and no surface alpha.
    • A source surface with per pixel alpha and surface alpha, blitted to a destination surface with regular pixel alpha.
    • A source surface with per-pixel alpha blitted to an opaque surface.
  • There is a regular non-intrinsic alpha blitter (that has been in pygame for ages) that is used for any cases directed to the alpha blitter outside of the three above.
  • The blitter doesn't handle colorkey alpha, that is passed over to SDL2 as before.
  • The blitter doesn't handle blits with RLE encoded surface, they are is also passed over to SDL2.
  • You can switch from this new default blitter to SDL2 by passing the flag BLEND_ALPHA_SDL2 when blitting, or by setting the environment variable PYGAME_BLEND_ALPHA_SDL2 before calling pygame.init().

This should resolve all the cases raised in #1289 and maybe will help with some others in the tracker, I haven't checked through carefully yet.

@MyreMylar
Copy link
Contributor Author

Speed wise, in my tests (rough code posted above):

  • slightly slower than SD2 blitter when doing pixel alpha surface to pixel alpha surface.
  • equivalent or slightly faster when blitting with a surface with pixel alpha & surface alpha (from set_alpha()) or to an opaque surface.

I don't have any particular ideas for improving that right now.

@MyreMylar MyreMylar marked this pull request as ready for review October 21, 2020 19:09
@illume
Copy link
Member

illume commented Oct 22, 2020

Nice work :) This also fixes the issue in #742

This example used to error with:

pygame.error: Blit combination not supported

import pygame as pg
pg.init()
surf = pg.display.set_mode((320,240))
font = pg.font.SysFont("Arial", 24)
image = font.render("Test", 0, (255,255,255), (0,0,0))
print(image.get_colorkey()) #returns None in both SDL1/SDL2.
image.set_alpha(255)
surf.blit(image,(0,0))

But now it doesn't crash :)

I committed this test case to the branch in font_test.py.

illume and others added 7 commits October 22, 2020 09:06
Hard to debug this remotely but I have a suspicion that the issues might be
caused by big endian bit shifting. This commit attempts to alter that by not
bit-shifting the alpha around and flipping some masks I was using. I understand
 that SIMD registers are always little endian so any shifting done there should
 not be affected.
Previously we were checking the Ashift was 24, which it won't be if this
is an endianness caused issue.
Just setting the src alpha to 128 to see what happens in the output, if anything.
@MyreMylar
Copy link
Contributor Author

MyreMylar commented Oct 22, 2020

Looks like the arm CI build doesn't use the SSE2 optimised path at all, so the test issue was likely coming from a lack of support for surface_alpha() in the non-sse2 version of the alpha blend which I've now changed.

Fingers crossed.

Copy link
Member

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_/=====\_
| great! |
---------\
          `_  
             🦜 

@illume illume requested a review from nthykier October 22, 2020 15:49
@MyreMylar
Copy link
Contributor Author

MyreMylar commented Oct 22, 2020

Hopefully this doesn't cause chaos in everyone's apps, but I'm bracing for it just in case :)

@illume
Copy link
Member

illume commented Oct 22, 2020

The unit tests pass locally on Mac.

I see there's a difference in solarwolf:

Screenshot 2020-10-23 at 09 35 07

As seen on the SETUP screen.

I'll try and make a test case later.

Copy link
Member

@illume illume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something weird going on with solarwolf.

@MyreMylar
Copy link
Contributor Author

MyreMylar commented Oct 23, 2020

This is my current solarwolf related issue with this PR on windows:

import pygame

pygame.init()


def textshadowed(color, text, center=None, pos='center'):
    font = pygame.font.Font(None, 32)
    darkcolor = [int(c // 2) for c in color]
    if text is None: text = ' '
    try:
        img1 = img2 = font.render(text, 1, color)
        img2 = font.render(text, 1, darkcolor)
    except (pygame.error, TypeError):
        img1 = img2 = pygame.Surface((10, 10))

    newsize = img1.get_width() + 2, img1.get_height() + 2
    img = pygame.Surface(newsize)

    print("start:")
    print("img.get_flags():", img.get_flags())
    print("img.get_at((0, 0)):", img.get_at((0, 0)))
    print("img.get_colorkey():", img.get_colorkey())
    print("img.get_shifts():", img.get_shifts())
    print("img.get_masks():", img.get_masks())
    print("img.get_bitsize():", img.get_bitsize())
    print("img.get_bitsize():", img.get_bytesize())

    img.blit(img1, (0, 0))
    img = img.convert()

    print("\nafter blit:")
    print("img.get_flags():", img.get_flags())
    print("img.get_at((0, 0)):", img.get_at((0, 0)))
    print("img.get_colorkey():", img.get_colorkey())
    print("img.get_shifts():", img.get_shifts())
    print("img.get_masks():", img.get_masks())
    print("img.get_bitsize():", img.get_bitsize())
    print("img.get_bitsize():", img.get_bytesize())

    img.set_colorkey((0, 0, 0), pygame.RLEACCEL)

    print("\npost set_colorkey:")
    print("img.get_flags():", img.get_flags())
    print("img.get_at((0, 0)):", img.get_at((0, 0)))
    print("img.get_colorkey():", img.get_colorkey())
    print("img.get_shifts():", img.get_shifts())
    print("img.get_masks():", img.get_masks())
    print("img.get_bitsize():", img.get_bitsize())
    print("img.get_bitsize():", img.get_bytesize())

    return img


screen = pygame.display.set_mode((320, 240))
screen.fill((255,255,255))

text_render = textshadowed((160, 200, 250), 'A', (190, 170), "midright")
screen.blit(text_render, (50, 50))

running = True
while running:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False

    pygame.display.update()

Output pre-PR:
image

Output post-PR:
image

So far I can't figure out why the colorkey isn't working after using the new blitter before it is set.

@illume
Copy link
Member

illume commented Oct 23, 2020

Windows with current commit

I can see that problem too.

start:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): None
img.get_shifts(): (16, 8, 0, 0)
img.get_masks(): (16711680, 65280, 255, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

after blit:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): None
img.get_shifts(): (16, 8, 0, 0)
img.get_masks(): (16711680, 65280, 255, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

post set_colorkey:
img.get_flags(): 12288
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): (0, 0, 0, 255)
img.get_shifts(): (16, 8, 0, 0)
img.get_masks(): (16711680, 65280, 255, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

pygame 1.9.6

start:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): None
img.get_shifts(): (16, 8, 0, 0)
img.get_masks(): (16711680, 65280, 255, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

after blit:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): None
img.get_shifts(): (16, 8, 0, 0)
img.get_masks(): (16711680, 65280, 255, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

post set_colorkey:
img.get_flags(): 12288
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): (0, 0, 0, 255)
img.get_shifts(): (16, 8, 0, 0)
img.get_masks(): (16711680, 65280, 255, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

Mac with the current commit

it looks like your "Output pre-PR" one to me (on Mac):
Screenshot 2020-10-23 at 14 41 37

start:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 0)
img.get_colorkey(): None
img.get_shifts(): (16, 8, 0, 24)
img.get_masks(): (16711680, 65280, 255, 4278190080)
img.get_bitsize(): 32
img.get_bitsize(): 4

after blit:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): None
img.get_shifts(): (16, 8, 0, 24)
img.get_masks(): (16711680, 65280, 255, 4278190080)
img.get_bitsize(): 32
img.get_bitsize(): 4

post set_colorkey:
img.get_flags(): 12288
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): (0, 0, 0, 255)
img.get_shifts(): (16, 8, 0, 24)
img.get_masks(): (16711680, 65280, 255, 4278190080)
img.get_bitsize(): 32
img.get_bitsize(): 4

Mac, python3, 1.9.6 from wheels.

start:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): None
img.get_shifts(): (8, 16, 24, 0)
img.get_masks(): (65280, 16711680, 4278190080, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

after blit:
img.get_flags(): 0
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): None
img.get_shifts(): (8, 16, 24, 0)
img.get_masks(): (65280, 16711680, 4278190080, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

post set_colorkey:
img.get_flags(): 12288
img.get_at((0, 0)): (0, 0, 0, 255)
img.get_colorkey(): (0, 0, 0, 255)
img.get_shifts(): (8, 16, 24, 0)
img.get_masks(): (65280, 16711680, 4278190080, 0)
img.get_bitsize(): 32
img.get_bitsize(): 4

MyreMylar added 2 commits October 23, 2020 17:09
Tracked this back to SDL and it seems colorkeys for RGB surfaces always have
00 in the alpha spot in SDL even if it reports 255 on the pygame side.
@illume illume merged commit be9ee9a into master Oct 23, 2020
@illume illume deleted the alpha-blend-back-compat branch November 5, 2020 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants