Very minor Performance issue in comparison to CMSIS #5

hak8or · 2018-03-06T04:43:50Z

When working on #1 using the following code, I spotted the following;

Differences

First up is the read modify write for the MODER register which takes 2 more instructions in cppreg vs cmsis for cortex m0plus. This seems to be because for cppreg the address of the register is built via multiple immediate while in CMSIS it re-uses the MODER address when accessing BSRR.
Secondly, it seems cppreg for writing a value to the BSRR does a simpler str r1, [r3] in cppreg instead of the potentially more expensive str r2, [r3, #24] instruction under CMSIS. The Cortex-M0+ Technical Reference Manual says there is no difference, but this may be different for M7 and more complicated architectures.

Potential Cause

I feel the first and second difference are somewhat related because in CMSIS the compiler is informed that MODER and BSRR are related via offsets from a base pointer, while in CPPReg they look totally unrelated. This results in the compiler having to "rebuild" the address twice for cppreg, once from immediates for MODERand another from a hardcoded address stored in the .text section. Furthermore, it seems the two instruction difference is also due to the masking and applying the value

Solution

I view this as being caused by the architecture limitations of Cortex M and the design of cppreg.

Regarding the architecture, if you compile this for X86-64 or non thumb ARM then you see the issue goes away (the assembly is identical). I think this is because immediates in those ISA's/ARCH's can be huge due to, well, instructions being allowed to be very large too. This results in all stores/loads being done via full immediates instead of an immediate and offset or shifting immediate to build the address.
Regarding cppreg, you cannot specify that multiple registers are just an offset from each other instead of totally unrelated areas (CMSIS does this by placing a huge struct on the address). I do not see an easy way to give the compiler that sort of information either.

Real World Implications

To be frank, this difference in assembly from a performance standpoint is small, very small. Ideally the reason for the discrepancy can be verified/found with cppreg adopting the smaller of the two. But, writing to a register is very rarely a bottle neck unless you are bit banging, in which case you should probably be starting to seriously consider assembly instead.

The text was updated successfully, but these errors were encountered:

sendyne-nicocvn · 2018-03-06T23:13:07Z

This is interesting. I did a few modifications (see here).

The cppreg version is now only larger by one instruction compared to the CMSIS code. Here is a thought:

when we use the template form for the MODER write call this calls the regular write function. Technically because in such a case the value, the offset, and the mask are compile-time constants part of the write implementation could be simplified (this is done in the super_write of the modified code). That seems to be where the simplification occurs.

Two things:

It will not be much work to implement such faster write when the template form is used and it seems it could bring some additional performance.
Regarding the offset between registers we could probably implement an abstract peripheral type or rather a cluster of register types but that is a bit more work and revisions.

sendyne-nicocvn · 2018-03-07T16:57:56Z

So after careful checking it seems that the only difference in the implementation given in the previous comment is related to what @hak8or mentioned about registers between related through an offset.

I think this is great news because this means we obtain quasi-identical performance to a CMSIS implementation. We could create an issue for a register cluster implementation as mentioned above.

hak8or · 2018-03-08T20:05:36Z

Awesome, in that case:

May be worthwhile to implement
Would be fantastic, I will create the issue. It sounds like a decent bit of changes to the API though so maybe it will be worthwhile to clump together the various potential API changes before starting.

In that case, I say we close this after implementing your point 1 with me saying that the grouping will be done in #7.

sendyne-nicocvn · 2018-03-08T20:09:16Z

Agreed. I will however create an issue for the new access policies implementation (already available but needs to be merged).

hak8or added the discussion Discussion about possible features or revisions. label Mar 6, 2018

hak8or mentioned this issue Mar 6, 2018

Document cppreg (zero) impact on performance and code size. #1

Closed

3 tasks

hak8or mentioned this issue Mar 8, 2018

Add grouping registers together, like how CMSIS maps a struct of types to a base address #7

Closed

sendyne-nicocvn mentioned this issue Mar 8, 2018

Revise access policies implementation for better performance. #8

Closed

sendyne-nicocvn closed this as completed Mar 8, 2018

sendyne-nicocvn mentioned this issue Mar 8, 2018

Writes to read_write fields which fill the size of the register still create a read #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very minor Performance issue in comparison to CMSIS #5

Very minor Performance issue in comparison to CMSIS #5

hak8or commented Mar 6, 2018 •

edited

Loading

sendyne-nicocvn commented Mar 6, 2018

sendyne-nicocvn commented Mar 7, 2018

hak8or commented Mar 8, 2018

sendyne-nicocvn commented Mar 8, 2018

Very minor Performance issue in comparison to CMSIS #5

Very minor Performance issue in comparison to CMSIS #5

Comments

hak8or commented Mar 6, 2018 • edited Loading

Differences

Potential Cause

Solution

Real World Implications

sendyne-nicocvn commented Mar 6, 2018

sendyne-nicocvn commented Mar 7, 2018

hak8or commented Mar 8, 2018

sendyne-nicocvn commented Mar 8, 2018

hak8or commented Mar 6, 2018 •

edited

Loading