Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Unsafe and Vertical Orientation Functions #50

Closed
xfixium opened this issue Mar 11, 2023 · 17 comments
Closed

New Unsafe and Vertical Orientation Functions #50

xfixium opened this issue Mar 11, 2023 · 17 comments

Comments

@xfixium
Copy link

xfixium commented Mar 11, 2023

First request:
While Devkit SMS has great horizontal consecutive tile setting functions, these are not as speedy if called for consecutive vertical tiles.

SMS_setNextTileatXY and SMS_setTile are pretty quick, especially if SMS_setNextTileatXY is set once before a loop calls SMS_setTile.
However, if the same is done in a vertical orientation of consecutive tiles. It takes quite a performance hit.
May I suggest a SMS_setNextTileatYX, or something to that affect?

The general thought is that whatever auto incrementor that is used for consecutive horizontal tiles function, can be utilized for consecutive vertical tiles, as well.

Second Request:
UNSAFE versions of SMS_VRAMmemset and SMS_VRAMmemsetW

Thank you for your time.

@sverx
Copy link
Owner

sverx commented Mar 12, 2023

The reason why setting horizontally consecutive tiles is fast and any other way is slow is because the SMS VDP automatically auto-increments the destination address, so you can write a stream of bytes to VRAM without setting a new address only as long as the new write address is the next one as the previous write.

This means that the only way to set a single column of tiles is to issue one SMS_setNextTileatXY and one SMS_setTile for each tile, there's no practical workaround.

Regarding your second request, I will look into it. It shouldn't be that hard to add that.

@sverx
Copy link
Owner

sverx commented Mar 12, 2023

To better clarify: the SMS VDP can only auto-increment the VRAM address using a fixed +1 offset and there's no way to set any other delta (as opposed to the MegaDrive/Genesis VDP, IIRC).

@sverx sverx changed the title [Feature Request] New Unsafe and Vertical Orientation Functions New Unsafe and Vertical Orientation Functions Mar 12, 2023
@xfixium
Copy link
Author

xfixium commented Mar 12, 2023

I see. I did read something like that when looking over various SMS documentation. I thought maybe there might be some more customization that could be set, or maybe there was more going on under the hood in Devkit SMS. I do appreciate you looking into the UNSAFE variants. That may help in various operations. Thank you.

@sverx
Copy link
Owner

sverx commented Mar 13, 2023

An alternative way to update a single column (or a few of them) could be to use the SMS_loadTileMapArea function.
But you would need to prepare the data in a source buffer.
Unfortunately, the current implementation is pretty slow. But I might rewrite that in asm if that's something that you can find useful.

@sverx
Copy link
Owner

sverx commented Mar 13, 2023

Regarding your second request, what's your case scenario using "UNSAFE versions of SMS_VRAMmemset and SMS_VRAMmemsetW"? I have a few different options regading how to code them but the more I know about how they will be actually used the best option I will likely be able to pick...

@xfixium
Copy link
Author

xfixium commented Mar 14, 2023

I have tried SMS_loadTileMapArea as well as some other methods. The fastest is still SMS_setNextTileatXY and SMS_setTile, in an unrolled loop. I thought if this couldn't be made faster, that maybe just writing words through an UNSAFE SMS_VRAMmemset could do the job instead. As I'm creating the update during the vblank. I thought the UNSAFE versions could be useful in other areas as well.

@sverx
Copy link
Owner

sverx commented Mar 14, 2023

I see, so you would say use one UNSAFE_SMS_VRAMmemsetW call (if there was such a thing) instead of calling SMS_setNextTileatXY and SMS_setTile, but still looping over them - or in an unrolled loop.

I suspect this would save very few cycles anyway, so maybe what could be added instead is some function that updates a single column of the tilemap in a single call, getting the values from a buffer. This way there would be less function call overhead and the function logic would also handle the destination pointer in VRAM, adding 64 bytes to it at each loop. It wouldn't even need to be UNSAFE, which means you could even call that when the vblank is going to finish and it will eventually even partially run outside vblank without issues - and you won't notice screen tearing because it's updating the screen top to bottom so if you start when still in vblank, then the screen raster won't catch you.

Does this sound like a possible solution?

@xfixium
Copy link
Author

xfixium commented Mar 14, 2023

Honestly, a few cycles go a long way.

I like the sound of what you propose, I'm just not clear how that would be implemented. Are you suggesting using something like SMS_VRAMmemsetW? Replacing SMS_setNextTileatXY and SMS_setTile combination? As I believe I have tried this before. I thought it would be quicker, but it was not. I'll revisit it to be sure, however. Maybe this is where an UNSAFE version of SMS_VRAMmemsetW would come in handy?

Also, some more detail. The tiles are in vertical alignment within the ROM. I currently just point to a section of it, and apply it via the unrolled SMS_setNextTileatXY and SMS_setTile sequentially.

Also also, I'd like to note, an UNSAFE version of SMS_VRAMmemset(W) might come in handy for various other things I'm trying to accomplish manipulating the VRAM directly. Not just this update column scenario.

I know I'm probably viewing this in a very uneducated perspective, so please bare with me XD. I obviously don't want to introduce any unnecessary bloat to the library. I have some unorthodox methods I would like to vet, and there seems to be some possible performance gains to be had.

@sverx
Copy link
Owner

sverx commented Mar 14, 2023

What I suggested is kind of a special form of SMS_loadTileMapArea that updates just a single tile per each row (thus updating a column) starting from the pointed tile and looping down as fast as possible (yet without resorting to an unrolled loop). I think it could save several cycles by avoiding dozens call overheads.

Library bloat isn't much an issue as long as I keep the functions in separate modules. The SDCC linker won't link the modules containing unused functions anyway.

@sverx
Copy link
Owner

sverx commented Mar 14, 2023

OK you can try now. I have added
SMS_loadTileMapColumnatAddr (unsigned int dst, const void *src, unsigned int height)
and the macro
SMS_loadTileMapColumn(x,y,src,height)
so for instance to update 24 tiles in vertical starting from the one at 0,0 you can issue
SMS_loadTileMapColumn (0,0,source_data,24);
where source_data is a pointer to unsigned int - or an array of unsigned int with at least 24 elements.

The function is native asm and pretty fast, and just reordering the asm instructions I made it safe anyway so it doesn't need to be marked as UNSAFE as it anyway can't be faster than this.

@xfixium
Copy link
Author

xfixium commented Mar 14, 2023

Wow, that is blazing fast now! It seems to work as expected, thank you so much. I do have one concern, however. While this method works perfect for my needs, I'm wondering how popular it would be for people using a standard horizontal formatted tilemap? Just throwing that thought out there.

@sverx
Copy link
Owner

sverx commented Mar 14, 2023

If you code an horizontally scrolling engine, you're likely to arrange your map in column-major order. Beside that, in any other case where one needs to update a single column, one could just allocate a small buffer in RAM and populate that with the data needed for the update. Think compressed maps, for example...

@xfixium
Copy link
Author

xfixium commented Mar 14, 2023

Valid points, thanks again!

@sverx
Copy link
Owner

sverx commented Mar 15, 2023

I realized it could be done even faster, so I just pushed an update.

Regarding this topic, instead:

Also also, I'd like to note, an UNSAFE version of SMS_VRAMmemset(W) might come in handy for various other things I'm trying to accomplish manipulating the VRAM directly. Not just this update column scenario.

would you mind elaborating on it? I'd like to understand what are your specific needs for something like that.

@xfixium
Copy link
Author

xfixium commented Mar 15, 2023

Yes, I need to re-crunch the numbers just so I'm not talking out my butt. It mainly has to do with sprite processing. I'm currently on vacation, so my response has been delayed until I have free time.

@xfixium
Copy link
Author

xfixium commented Mar 15, 2023

Actually, I think you can disregard my other request. I think through some testing, I found a solution that may work. Thank you for your time, knowledge, and skill.

@sverx
Copy link
Owner

sverx commented Mar 16, 2023

OK, so I'm closing this.

Thank you for suggesting an useful new feature! 😃

@sverx sverx closed this as completed Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants