Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface.blits #439

Merged
merged 4 commits into from Apr 10, 2018

Conversation

Projects
None yet
1 participant
@illume
Copy link
Member

illume commented Apr 3, 2018

For drawing multiple surfaces at once. See subject: "blits proposal" on mailing list.

I made an initial implementation of Surface.blits(), focusing just on correctness, with no optimizations.

In the micro benchmark below it takes from 88% to 92% of the time for 255 surfaces
compared to using Surface.blit in python in a loop over the same list.

Considering blit is usually the slow part in most pygame apps, this is sort of nice.

Some benchmarking and other notes below.


  1. Why not work on a faster implementation that saves the unwrapped objects?
    This would allow you to save a list into a C object like:
struct blitinfo {
 SDL_Surface dest;
 GAME_Rect *src_rect;
 GAME_Rect *area;
 int flags;
}

Then if you promise not to change the C list (ie, you are updating rects in place, and all your Surfaces are still there),
then it could avoid a lot of the unwrapping work.

However, I did a test where I commented out the blit call. So only the unwrapping and looping over the list is done.
And it seems that the python book keeping for these 255 10x10 surfaces is only 2.1%-3.3% of the total time taken.

  1. Another optimization would be to avoid subsurface checks, and avoid a few other preparations for surfaces.
    I tried this, and didn't see any noticeable improvement.

  2. Currently neither SDL1 or SDL2 have a special batched blit, but there are proposals and implementations around.
    Such as SDL_GPU.
    This could see a bigger improvement on such backends where changing state is slow (OpenGL etc).

import pygame
from pygame.locals import *
NUM_SURFS = 255
dst = pygame.Surface((NUM_SURFS * 10, 10), SRCALPHA, 32)
dst.fill((230, 230, 230))

blit_list = []
for i in range(NUM_SURFS):
    dest = (i * 10, 0)
    surf = pygame.Surface((10, 10), SRCALPHA, 32)
    color = (i * 1, i * 1, i * 1)
    surf.fill(color)
    blit_list.append((surf, dest))

def blits(blit_list):
    for surface, dest in blit_list:
        dst.blit(surface, dest)
In [17]: %timeit results = blits(blit_list)
774 µs ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [18]: %timeit results = dst.blits(blit_list)
717 µs ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [19]: %timeit results = dst.blits(blit_list, doreturn=0)
688 µs ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [20]: (100. / 774) * 717
Out[20]: 92.63565891472868

In [21]: (100. / 774) * 688
Out[21]: 88.88888888888889

If I comment out the actual blit call...

In [3]:  %timeit results = dst.blits(blit_list)
26.2 µs ± 695 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: %timeit results = dst.blits(blit_list, doreturn=0)
17.6 µs ± 314 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [5]: (100. / 774) * 26
Out[5]: 3.3591731266149867

In [6]: (100. / 774) * 17
Out[6]: 2.1963824289405682

@illume illume force-pushed the blits branch from 7a7c5f9 to cc8447b Apr 3, 2018

@illume

This comment has been minimized.

Copy link
Member Author

illume commented Apr 3, 2018

It was pointed out to me that the benchmark wasn't all that fair.
The python implementation didn't return a list of rects.

With that, Surface.blits takes 82%-85% of the time. Not 88%-92% as I stated earlier.

In [4]: def blits(blit_list):
   ...:     ret = []
   ...:     for surface, dest in blit_list:
   ...:         ret.append(dst.blit(surface, dest))
   ...:     return ret
   ...:

In [5]:

In [5]: %timeit results = blits(blit_list)
841 µs ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [6]: %timeit results = dst.blits(blit_list)
715 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: (100. / 841) * 717
Out[7]: 85.25564803804994

In [8]: (100. / 841) * 688
Out[8]: 81.80737217598097

@illume illume merged commit 0e9131b into master Apr 10, 2018

0 of 4 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/travis-ci/push The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/branch AppVeyor build failed
Details
continuous-integration/appveyor/pr AppVeyor build failed
Details

@illume illume deleted the blits branch Apr 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.