Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRAPHICS: ATARI: Align surface on a 16-byte boundary #4771

Merged
merged 1 commit into from Mar 20, 2023

Conversation

mikrosk
Copy link
Contributor

@mikrosk mikrosk commented Mar 5, 2023

Atari buffers (maybe some other platforms, too?) benefit from being aligned on a 16-byte boundary. Not only the cache line is 16 bytes big but on the 68040 and 68060 CPUs there is a special instruction, move16, which is specifically optimised for moving data aligned on 16 bytes.

As this slightly pollutes surface.h, I'll appreciate your feedback (👍 is also sufficient ;-)).

Copy link
Member

@lephilousophe lephilousophe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some suggestions.
The main idea is to get rid of this new variable by all means. :)

graphics/blit-atari.cpp Outdated Show resolved Hide resolved
graphics/blit-atari.cpp Outdated Show resolved Hide resolved
graphics/blit-atari.cpp Outdated Show resolved Hide resolved
graphics/surface.h Outdated Show resolved Hide resolved
@lephilousophe
Copy link
Member

lephilousophe commented Mar 5, 2023

After thinking at the PR again, I believe you must handle unaligned cases.
I also think that vmalloc alignment doesn't matter.

The Surface class can be used in two ways: using create() and free() or providing an externally (read: by engine code) allocated buffer.
In such cases you can't expect anything about memory alignment and must handle the memcpy.

About the vmalloc call, I understand that it's called whenever the user has a SuperVidel accelerator.
We then have two cases when blitting: either the src is from a surface created by create(), then data will also be in video memory and alignment isn't important (you use the hardware blitter), either the pointer doesn't come from create() and it won't likely be aligned.
This means that you won't be able to use move16 in this case and you will have to fallback to memcpy where alignment is again not important.

@mikrosk
Copy link
Contributor Author

mikrosk commented Mar 5, 2023

Fortunately, the situation isn't as bleak as it would seem...

The Surface class can be used in two ways: using create() and free() or providing an externally (read: by engine code) allocated buffer.
In such cases you can't expect anything about memory alignment and must handle the memcpy.

Indeed. But my memcpy (or its inlined version) can cheat, i.e. copy the first 0 - 15 bytes first and then use the move16. So my idea is to specialise this even more, i.e. src/dst = 0xA0xxxxxx -> use the blitter; src/dst (and/or src/dst pitch) aligned on 16 bytes -> just move16 everything; otherwise -> split it.

As you can see, with this application logic I don't have to differentiate between vmalloc'ed and malloc'ed cases, just check for the best (blitter) scenario and do the rest as quickly as possible.

either the pointer doesn't come from create() and it won't likely be aligned.
This means that you won't be able to use move16 in this case and you will have to fallback to memcpy where alignment is again not important.

See above. But good thinking, when I started with this rework I imagined it exactly this easy, only later I have realised that I have to make it a little bit more robust.

@lephilousophe
Copy link
Member

Indeed. But my memcpy (or its inlined version) can cheat, i.e. copy the first 0 - 15 bytes first and then use the move16. So my idea is to specialise this even more, i.e. src/dst = 0xA0xxxxxx -> use the blitter; src/dst (and/or src/dst pitch) aligned on 16 bytes -> just move16 everything; otherwise -> split it.

I don't see how you can do that.
If your source and destination data don't have the same alignment, you won't be able to use move16 at all.
Let's imagine you have dst properly aligned (unaligned by 2 for example) and src unaligned.
You will want to align src and copy 2 bytes to dst. You will then have dst unaligned and so on.
You will never be able to use move16 at all in this case.

The main problem with move16 is that it requires both source and destination to be aligned on 16 bytes boundaries.

@mikrosk
Copy link
Contributor Author

mikrosk commented Mar 5, 2023

Damn, you are absolutely right! So yeah, your first statement is true - ct60_vmalloc indeed doesn't need an alignment. So move16 could be used only for the same scenarios as the blitter - if both surfaces are created by my code (i.e. in case the SuperVidel is not present).

I was fooled by a memcpy implementation, it skipped a few bytes in favour of a fixed-copy-size loop but yeah, there were no alignment requirements.

Well, on one hand this is bad news but on the other hand, you've helped me to simplify the code quite a lot. :)

@mikrosk
Copy link
Contributor Author

mikrosk commented Mar 5, 2023

I've rebased this on top of #4773 and verified that it is indeed working nicely.

I'll wait for feedback on #4773 (maybe someone will notice other cases and I could fix them within that PR) and merge these two into master.

Thanks again.

@lephilousophe
Copy link
Member

I think you should merge this PR once when you have done the move16 part to make it really interesting.

@mikrosk
Copy link
Contributor Author

mikrosk commented Mar 5, 2023

Yeah, why not. Now that I have actually singled this one change out (it was part of a bigger refactoring effort which is still wip), there's no reason to hurry with merging. And as you say, it will make the commit worthy of its name. :)

@sev-
Copy link
Member

sev- commented Mar 16, 2023

This PR has unrelated SVG changes, so it isn't easy to review until it is addressed.

@mikrosk
Copy link
Contributor Author

mikrosk commented Mar 16, 2023

Indeed. It is based on top of #4773.

Only commit 4b95038 is up for review, other commits will disappear once we agree on how/what to merge of #4773 into master.

@mikrosk mikrosk force-pushed the surface_align branch 2 times, most recently from e1a354b to 94bc63b Compare March 16, 2023 18:26
@sev-
Copy link
Member

sev- commented Mar 16, 2023

So far, looks all good and meaningful to me

@mikrosk
Copy link
Contributor Author

mikrosk commented Mar 16, 2023

Up for review. I've just added the move16 code and tested it, works nicely. The cookie checks will be later refactored, I have already code prepared for that but I don't want to pollute it even more. EDIT: it was just a few lines of code, so I have done it after all.

For instance, the dstPitch == srcPitch part is used for GUI updates while the other one for cursor restore so it is definitely helping (on 68040 / 68060 CPUs only, unfortunately).

Also implement a CPU-based optimization for the 68040 / 68060.
@mikrosk mikrosk merged commit 06af761 into scummvm:master Mar 20, 2023
8 checks passed
@mikrosk mikrosk deleted the surface_align branch March 20, 2023 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants