You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the StellaDS thread, AtariAge member llabnip made some suggestions how to speed up the Thumbulator class quite significantly (almost 40% on the Nintendo DS):
Not calling into the execute() for each Thumb instruction - the overhead of the call was not optimized away with GCC at the max settings so I moved the handling of the Thumb loop to inside the execute(). (implemented with 025de6e)
Keeping a 16-bit pointer always pointing to the next instruction rather than re-index into the Thumb instruction ROM array. I don't even bother to update the PC register until it's needed (easy to back-calculate from the 16-bit Thumb instruction pointer from the start of ROM).
The biggest improvement in speed came from simply using the top 2 bits of the Thumb instruction to binary parse the instruction. So I check if the high bit is set - that puts the instruction into one of two buckets. Then check bit 15 to parse those two buckets into two further buckets. This way I only have to check those instructions in each bucket which really reduces the long search for the opcode. Then I did some profiling and found the popular instructions which were often several orders of magnitude more likely to be called - and check them first in each bucket (i.e. ADD big immediate one register, CMP immediate, conditional branch, etc.) (Stella decodes opcodes only once, which is even better)
The conditional branch is heavily used in most programs - Galagon calls it about 200k per second. Since each entry in the 8-bit decoded table (256 possibilities) only has 72 (rough count) opcodes... some of the most heavily used opcodes could be further split during decoding. The conditional branch, for example, could be split into the 13 different types (branch if zero, branch if not zero, etc). This would just add to the op-code count but would save the shift, AND and switch for that instruction. (implemented with 96d5a3f)
This might ease the CPU stress for other platforms too.
BTW: It is quite impressive to get ARM games running at (mostly) full speed on a platform (Nintendo DSi) where the main CPU clocks at just 133 MHz.
The text was updated successfully, but these errors were encountered:
In the StellaDS thread, AtariAge member llabnip made some suggestions how to speed up the Thumbulator class quite significantly (almost 40% on the Nintendo DS):
This might ease the CPU stress for other platforms too.
BTW: It is quite impressive to get ARM games running at (mostly) full speed on a platform (Nintendo DSi) where the main CPU clocks at just 133 MHz.
The text was updated successfully, but these errors were encountered: