-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very minor Performance issue in comparison to CMSIS #5
Comments
This is interesting. I did a few modifications (see here). The cppreg version is now only larger by one instruction compared to the CMSIS code. Here is a thought: when we use the template form for the Two things:
|
So after careful checking it seems that the only difference in the implementation given in the previous comment is related to what @hak8or mentioned about registers between related through an offset. I think this is great news because this means we obtain quasi-identical performance to a CMSIS implementation. We could create an issue for a register cluster implementation as mentioned above. |
Awesome, in that case:
In that case, I say we close this after implementing your point 1 with me saying that the grouping will be done in #7. |
Agreed. I will however create an issue for the new access policies implementation (already available but needs to be merged). |
When working on #1 using the following code, I spotted the following;
Differences
First up is the read modify write for the
MODER
register which takes 2 more instructions in cppreg vs cmsis for cortex m0plus. This seems to be because for cppreg the address of the register is built via multiple immediate while in CMSIS it re-uses theMODER
address when accessingBSRR
.Secondly, it seems cppreg for writing a value to the
BSRR
does a simplerstr r1, [r3]
in cppreg instead of the potentially more expensivestr r2, [r3, #24]
instruction under CMSIS. The Cortex-M0+ Technical Reference Manual says there is no difference, but this may be different for M7 and more complicated architectures.Potential Cause
I feel the first and second difference are somewhat related because in CMSIS the compiler is informed that
MODER
andBSRR
are related via offsets from a base pointer, while in CPPReg they look totally unrelated. This results in the compiler having to "rebuild" the address twice for cppreg, once from immediates forMODER
and another from a hardcoded address stored in the .text section. Furthermore, it seems the two instruction difference is also due to the masking and applying the valueSolution
I view this as being caused by the architecture limitations of Cortex M and the design of cppreg.
Regarding the architecture, if you compile this for X86-64 or non thumb ARM then you see the issue goes away (the assembly is identical). I think this is because immediates in those ISA's/ARCH's can be huge due to, well, instructions being allowed to be very large too. This results in all stores/loads being done via full immediates instead of an immediate and offset or shifting immediate to build the address.
Regarding cppreg, you cannot specify that multiple registers are just an offset from each other instead of totally unrelated areas (CMSIS does this by placing a huge struct on the address). I do not see an easy way to give the compiler that sort of information either.
Real World Implications
To be frank, this difference in assembly from a performance standpoint is small, very small. Ideally the reason for the discrepancy can be verified/found with cppreg adopting the smaller of the two. But, writing to a register is very rarely a bottle neck unless you are bit banging, in which case you should probably be starting to seriously consider assembly instead.
The text was updated successfully, but these errors were encountered: