Document cppreg (zero) impact on performance and code size. #1

sendyne-nicocvn · 2018-02-22T15:50:13Z

The main goal is to provide a document (e.g., Performance.md) where we show assembly outputs for various level of optimizations (and possibly compilers) to illustrate that cppreg does not affect runtime performance or code size.

This requires:

design a code example with a cppreg implementation and a traditional implementation (à la CMSIS)
produce compiler outputs (the target is GCC ARM but others could be added) for Og, O1, O2, O3, Os (although O[1-3] will most likely be identical)
write Performance.md

We could put the various materials in a benchmark directory to not pollute the main one.

The text was updated successfully, but these errors were encountered:

hak8or · 2018-02-23T09:07:02Z

As I was cleaning up an old project of mine, I started thinking it might be a decent example showing a full use case comparing CMSIS and cppreg. On an unrelated note, the minimal example to do a clock tree init and get a LED toggling at ~1hz is only ~280 bytes!

For the performance.md, my idea is to include a godbolt based example for ::write, ::chained_write.with.with, and ::is_set (so four operations total) in terms of the C++ code and assembly output using -Os. This would be used to show that the resulting instructions cannot be optimized further.

Then, a full example, possibly implementing clock tree initialization and then some UART transmissions and LED blinking, with various optimization flags (-O0,-Os,-O1,-O2, -O3, -Og) for cppreg vs CMSIS. The data would be presented in the form of two tables, each with two columns (cppreg vs CMSIS) and 6 rows (optimization levels) table. The first table would be total size of the resulting binary in bytes, and the second table somehow showing how long it takes to do the clock init and a few million loops of the LED toggling and UART sending, in milliseconds or cycles.

sendyne-nicocvn · 2018-02-23T10:56:17Z

Sounds good to me.

The clock tree + UART + LED would be a great example. I assume 280 bytes is without startup code (i.e., no interrupts table and SystemInit) but that would actually be clearer that way. For the data this seems like a good start. Once we collect them we can decide on how to organize them.

I created the performance branch so that we can start putting code in the repository.

hak8or · 2018-02-23T20:32:44Z

That is actually *with* the startup code and everything. Check main.cpp, I wrote my own attempt at startup code. It doesn't work with constructors or destructor yet (only meant for a minimal c example), but it does get a led blinking.

…

On Feb 23, 2018 5:56 AM, "Sendyne Principal Scientist" < ***@***.***> wrote: Sounds good to me. The clock tree + UART + LED would be a great example. I assume 280 bytes is without startup code (*i.e.*, no interrupts table and SystemInit) but that would actually be clearer that way. For the data this seems like a good start. Once we collect them we can decide on how to organize them. I created the performance <http:///sendyne/cppreg/tree/performance> branch so that we can start putting code in the repository. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA5gCm9pWRk1c2Mx3ZGCafu-1frc_LRWks5tXplRgaJpZM4SPiMk> .

hak8or · 2018-03-06T04:39:02Z

Scenario

I decided to heavily leverage godbolt in the comparison because I feel it will be much more accessible to anyone who will want to go through the edit-compile-lookatassembly loop themselves. They won't have to download a compiler, will be easier to show comparisons, etc.

Anyways, for a decent yet simple comparison it would be good to turn a led on, and then in a loop turn the led off, wait a bit turn the led on, and loop back. This is small enough that in terms of code it fits in a screen in terms of lines of code, is easy to understand by everyone, and doesn't bog the example down in terms of setting up a lot of registers (which would be needed to do a full demo including clocking and power gating).

Progress

Here is what I am working with right now (the URL is so long that even neither google nor bit.ly can shorten it!). It looks very similar but there are two interesting differences.

I want to focus a bit on #5 before continuing on writing up a performance comparison. @sendyne-nclauvelin Do let me know if you think the comparison scenario is appropriate though so the first task can be marked as complete.

hak8or · 2018-03-06T04:55:16Z

On a somewhat related note, I wonder if it would be worthwhile to include this in a larger (many registers) comparison somehow as another metric when looking at the assembly alone is not feasible.

sendyne-nicocvn · 2018-03-09T21:48:12Z

Subsequent to #6, #8 and #9 I think we can resume the performance assessment. I forwarded the performance branch to the current state.

hak8or · 2018-03-09T21:59:45Z

Agreed, will aim to have it in a good state by tonight.

hak8or · 2018-03-10T04:36:06Z

I was working and spotted two "issues" that cause differences.

For registers which have a read only field and you are writing to another field, the bits relevant to the read only field are being cleared. I do not see why there should be more work done (clearing it) compared to just writing the old relevant bits back. I do not recall ever seeing a register that contains a read only field which required writing back 0's to the associated field (which would make it not read only), instead just that all writes were ignored. This does introduce overhead I feel because with cppreg as-is you cannot disable it (write back old values instead of clearing). What was your line of reasoning for clearing the bits?
The lack of expressing related registers (CMSIS style structures mapped to memory) sadly does introduce "overhead" in the form of the compiler sticking two close addresses in .text and then overwriting an old CPU register with the new address (introducing more register pressure), instead to just doing memory interfacing using offsets.

This is the example I am planning to use for the comparison.

Ignoring those two issues, the assembly is identical.

sendyne-nicocvn · 2018-03-10T13:40:18Z

For the second point this relates to #7 and I have first to create an example which isolate the issue to understand a bit better what is exactly happening. For the first point can you provide a minimal example because I am not clear on what you are describing?

sendyne-nicocvn · 2018-03-10T13:51:08Z

Also in your example you have:

UART::STATUS::merge_write<UART::STATUS::Enable>(1).with<UART::STATUS::Sending>(1).done();

Using:

UART::STATUS::merge_write<UART::STATUS::Enable,1>().with<UART::STATUS::Sending,1>().done();

simplifies the generated assembly (look at the L3 branch). This an important detail ... cppreg uses a faster implementation if the data are known at compile time and to indicate that you need to use the template version. This is also true for regular write calls.

At this point the only difference is how offsets between registers is managed in cppreg. Will work on that for now.

hak8or · 2018-03-10T19:48:36Z

Ah, I didn't realize that could be done via template arguments for merge writes (is that part of the new API changes you did recently?).

The minimal example is here. Looking at it again, I realized I misread the assembly originally. It looked like the BICS instruction was clearing bits that are in the readonly field, but turns out it was actually clearing the bits that were being actually written to via the merge write. In that case, I agree that your fix (via template arg) does fix the issue.

When/if the register offsets concept gets put in, then the assembly should finally match for pretty much all use cases, hopefully.

sendyne-nicocvn · 2018-03-11T13:29:04Z

Ah, I didn't realize that could be done via template arguments for merge writes (is that part of the new API changes you did recently?).

No this was already there. As part of #10 I added more details to the API documentation regarding this particular point.

hak8or · 2018-03-28T03:47:27Z

What do you think as of 08bc98c?

sendyne-nicocvn · 2018-03-28T10:44:52Z

Looks good. I actually prefer that we only present a small example rather than a lengthy and complex one. I will fix some typos and tie it with the README.

hak8or · 2018-03-28T19:00:46Z

Awesome!

sendyne-nicocvn · 2018-03-29T10:35:34Z

The performance comparison is now available in the master branch so I consider this issue closed.

sendyne-nicocvn added the discussion Discussion about possible features or revisions. label Feb 22, 2018

sendyne-nicocvn assigned hak8or and sendyne-nicocvn Feb 22, 2018

hak8or mentioned this issue Mar 6, 2018

Very minor Performance issue in comparison to CMSIS #5

Closed

sendyne-nicocvn mentioned this issue Mar 29, 2018

Performance #13

Merged

sendyne-nicocvn closed this as completed Mar 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document cppreg (zero) impact on performance and code size. #1

Document cppreg (zero) impact on performance and code size. #1

sendyne-nicocvn commented Feb 22, 2018 •

edited by hak8or

hak8or commented Feb 23, 2018 •

edited

sendyne-nicocvn commented Feb 23, 2018

hak8or commented Feb 23, 2018 via email

hak8or commented Mar 6, 2018 •

edited

hak8or commented Mar 6, 2018

sendyne-nicocvn commented Mar 9, 2018 •

edited

hak8or commented Mar 9, 2018

hak8or commented Mar 10, 2018

sendyne-nicocvn commented Mar 10, 2018

sendyne-nicocvn commented Mar 10, 2018

hak8or commented Mar 10, 2018

sendyne-nicocvn commented Mar 11, 2018 •

edited

hak8or commented Mar 28, 2018

sendyne-nicocvn commented Mar 28, 2018

hak8or commented Mar 28, 2018

sendyne-nicocvn commented Mar 29, 2018

Document cppreg (zero) impact on performance and code size. #1

Document cppreg (zero) impact on performance and code size. #1

Comments

sendyne-nicocvn commented Feb 22, 2018 • edited by hak8or

hak8or commented Feb 23, 2018 • edited

sendyne-nicocvn commented Feb 23, 2018

hak8or commented Feb 23, 2018 via email

hak8or commented Mar 6, 2018 • edited

Scenario

Progress

hak8or commented Mar 6, 2018

sendyne-nicocvn commented Mar 9, 2018 • edited

hak8or commented Mar 9, 2018

hak8or commented Mar 10, 2018

sendyne-nicocvn commented Mar 10, 2018

sendyne-nicocvn commented Mar 10, 2018

hak8or commented Mar 10, 2018

sendyne-nicocvn commented Mar 11, 2018 • edited

hak8or commented Mar 28, 2018

sendyne-nicocvn commented Mar 28, 2018

hak8or commented Mar 28, 2018

sendyne-nicocvn commented Mar 29, 2018

sendyne-nicocvn commented Feb 22, 2018 •

edited by hak8or

hak8or commented Feb 23, 2018 •

edited

hak8or commented Mar 6, 2018 •

edited

sendyne-nicocvn commented Mar 9, 2018 •

edited

sendyne-nicocvn commented Mar 11, 2018 •

edited