Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very minor Performance issue in comparison to CMSIS #5

Closed
hak8or opened this issue Mar 6, 2018 · 4 comments
Closed

Very minor Performance issue in comparison to CMSIS #5

hak8or opened this issue Mar 6, 2018 · 4 comments
Labels
discussion Discussion about possible features or revisions.

Comments

@hak8or
Copy link
Collaborator

hak8or commented Mar 6, 2018

When working on #1 using the following code, I spotted the following;

Differences

  • First up is the read modify write for the MODER register which takes 2 more instructions in cppreg vs cmsis for cortex m0plus. This seems to be because for cppreg the address of the register is built via multiple immediate while in CMSIS it re-uses the MODER address when accessing BSRR.

  • Secondly, it seems cppreg for writing a value to the BSRR does a simpler str r1, [r3] in cppreg instead of the potentially more expensive str r2, [r3, #24] instruction under CMSIS. The Cortex-M0+ Technical Reference Manual says there is no difference, but this may be different for M7 and more complicated architectures.

Potential Cause

I feel the first and second difference are somewhat related because in CMSIS the compiler is informed that MODER and BSRR are related via offsets from a base pointer, while in CPPReg they look totally unrelated. This results in the compiler having to "rebuild" the address twice for cppreg, once from immediates for MODERand another from a hardcoded address stored in the .text section. Furthermore, it seems the two instruction difference is also due to the masking and applying the value

Solution

I view this as being caused by the architecture limitations of Cortex M and the design of cppreg.

  • Regarding the architecture, if you compile this for X86-64 or non thumb ARM then you see the issue goes away (the assembly is identical). I think this is because immediates in those ISA's/ARCH's can be huge due to, well, instructions being allowed to be very large too. This results in all stores/loads being done via full immediates instead of an immediate and offset or shifting immediate to build the address.

  • Regarding cppreg, you cannot specify that multiple registers are just an offset from each other instead of totally unrelated areas (CMSIS does this by placing a huge struct on the address). I do not see an easy way to give the compiler that sort of information either.

Real World Implications

To be frank, this difference in assembly from a performance standpoint is small, very small. Ideally the reason for the discrepancy can be verified/found with cppreg adopting the smaller of the two. But, writing to a register is very rarely a bottle neck unless you are bit banging, in which case you should probably be starting to seriously consider assembly instead.

@hak8or hak8or added the discussion Discussion about possible features or revisions. label Mar 6, 2018
@sendyne-nicocvn
Copy link
Contributor

This is interesting. I did a few modifications (see here).

The cppreg version is now only larger by one instruction compared to the CMSIS code. Here is a thought:

when we use the template form for the MODER write call this calls the regular write function. Technically because in such a case the value, the offset, and the mask are compile-time constants part of the write implementation could be simplified (this is done in the super_write of the modified code). That seems to be where the simplification occurs.

Two things:

  1. It will not be much work to implement such faster write when the template form is used and it seems it could bring some additional performance.
  2. Regarding the offset between registers we could probably implement an abstract peripheral type or rather a cluster of register types but that is a bit more work and revisions.

@sendyne-nicocvn
Copy link
Contributor

So after careful checking it seems that the only difference in the implementation given in the previous comment is related to what @hak8or mentioned about registers between related through an offset.

I think this is great news because this means we obtain quasi-identical performance to a CMSIS implementation. We could create an issue for a register cluster implementation as mentioned above.

@hak8or
Copy link
Collaborator Author

hak8or commented Mar 8, 2018

Awesome, in that case:

  1. May be worthwhile to implement
  2. Would be fantastic, I will create the issue. It sounds like a decent bit of changes to the API though so maybe it will be worthwhile to clump together the various potential API changes before starting.

In that case, I say we close this after implementing your point 1 with me saying that the grouping will be done in #7.

@sendyne-nicocvn
Copy link
Contributor

Agreed. I will however create an issue for the new access policies implementation (already available but needs to be merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion about possible features or revisions.
Projects
None yet
Development

No branches or pull requests

2 participants