New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRAPHICS: ATARI: Align surface on a 16-byte boundary #4771
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some suggestions.
The main idea is to get rid of this new variable by all means. :)
After thinking at the PR again, I believe you must handle unaligned cases. The Surface class can be used in two ways: using About the vmalloc call, I understand that it's called whenever the user has a SuperVidel accelerator. |
Fortunately, the situation isn't as bleak as it would seem...
Indeed. But my memcpy (or its inlined version) can cheat, i.e. copy the first 0 - 15 bytes first and then use the move16. So my idea is to specialise this even more, i.e. src/dst = 0xA0xxxxxx -> use the blitter; src/dst (and/or src/dst pitch) aligned on 16 bytes -> just move16 everything; otherwise -> split it. As you can see, with this application logic I don't have to differentiate between vmalloc'ed and malloc'ed cases, just check for the best (blitter) scenario and do the rest as quickly as possible.
See above. But good thinking, when I started with this rework I imagined it exactly this easy, only later I have realised that I have to make it a little bit more robust. |
I don't see how you can do that. The main problem with move16 is that it requires both source and destination to be aligned on 16 bytes boundaries. |
Damn, you are absolutely right! So yeah, your first statement is true - I was fooled by a memcpy implementation, it skipped a few bytes in favour of a fixed-copy-size loop but yeah, there were no alignment requirements. Well, on one hand this is bad news but on the other hand, you've helped me to simplify the code quite a lot. :) |
I think you should merge this PR once when you have done the move16 part to make it really interesting. |
Yeah, why not. Now that I have actually singled this one change out (it was part of a bigger refactoring effort which is still wip), there's no reason to hurry with merging. And as you say, it will make the commit worthy of its name. :) |
This PR has unrelated SVG changes, so it isn't easy to review until it is addressed. |
e1a354b
to
94bc63b
Compare
So far, looks all good and meaningful to me |
Up for review. I've just added the For instance, the |
Also implement a CPU-based optimization for the 68040 / 68060.
Atari buffers (maybe some other platforms, too?) benefit from being aligned on a 16-byte boundary. Not only the cache line is 16 bytes big but on the 68040 and 68060 CPUs there is a special instruction, move16, which is specifically optimised for moving data aligned on 16 bytes.
As this slightly pollutes👍 is also sufficient ;-)).
surface.h
, I'll appreciate your feedback (