-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document cppreg (zero) impact on performance and code size. #1
Comments
As I was cleaning up an old project of mine, I started thinking it might be a decent example showing a full use case comparing CMSIS and cppreg. On an unrelated note, the minimal example to do a clock tree init and get a LED toggling at ~1hz is only ~280 bytes! For the performance.md, my idea is to include a godbolt based example for ::write, ::chained_write.with.with, and ::is_set (so four operations total) in terms of the C++ code and assembly output using Then, a full example, possibly implementing clock tree initialization and then some UART transmissions and LED blinking, with various optimization flags ( |
Sounds good to me. The clock tree + UART + LED would be a great example. I assume 280 bytes is without startup code (i.e., no interrupts table and SystemInit) but that would actually be clearer that way. For the data this seems like a good start. Once we collect them we can decide on how to organize them. I created the performance branch so that we can start putting code in the repository. |
That is actually *with* the startup code and everything. Check main.cpp, I
wrote my own attempt at startup code. It doesn't work with constructors or
destructor yet (only meant for a minimal c example), but it does get a led
blinking.
…On Feb 23, 2018 5:56 AM, "Sendyne Principal Scientist" < ***@***.***> wrote:
Sounds good to me.
The clock tree + UART + LED would be a great example. I assume 280 bytes
is without startup code (*i.e.*, no interrupts table and SystemInit) but
that would actually be clearer that way. For the data this seems like a
good start. Once we collect them we can decide on how to organize them.
I created the performance <http:///sendyne/cppreg/tree/performance>
branch so that we can start putting code in the repository.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA5gCm9pWRk1c2Mx3ZGCafu-1frc_LRWks5tXplRgaJpZM4SPiMk>
.
|
ScenarioI decided to heavily leverage godbolt in the comparison because I feel it will be much more accessible to anyone who will want to go through the edit-compile-lookatassembly loop themselves. They won't have to download a compiler, will be easier to show comparisons, etc. Anyways, for a decent yet simple comparison it would be good to turn a led on, and then in a loop turn the led off, wait a bit turn the led on, and loop back. This is small enough that in terms of code it fits in a screen in terms of lines of code, is easy to understand by everyone, and doesn't bog the example down in terms of setting up a lot of registers (which would be needed to do a full demo including clocking and power gating). ProgressHere is what I am working with right now (the URL is so long that even neither google nor bit.ly can shorten it!). It looks very similar but there are two interesting differences. I want to focus a bit on #5 before continuing on writing up a performance comparison. @sendyne-nclauvelin Do let me know if you think the comparison scenario is appropriate though so the first task can be marked as complete. |
On a somewhat related note, I wonder if it would be worthwhile to include this in a larger (many registers) comparison somehow as another metric when looking at the assembly alone is not feasible. |
Agreed, will aim to have it in a good state by tonight. |
I was working and spotted two "issues" that cause differences.
This is the example I am planning to use for the comparison. Ignoring those two issues, the assembly is identical. |
For the second point this relates to #7 and I have first to create an example which isolate the issue to understand a bit better what is exactly happening. For the first point can you provide a minimal example because I am not clear on what you are describing? |
Also in your example you have: UART::STATUS::merge_write<UART::STATUS::Enable>(1).with<UART::STATUS::Sending>(1).done(); Using: UART::STATUS::merge_write<UART::STATUS::Enable,1>().with<UART::STATUS::Sending,1>().done(); simplifies the generated assembly (look at the L3 branch). This an important detail ... At this point the only difference is how offsets between registers is managed in |
Ah, I didn't realize that could be done via template arguments for merge writes (is that part of the new API changes you did recently?). The minimal example is here. Looking at it again, I realized I misread the assembly originally. It looked like the When/if the register offsets concept gets put in, then the assembly should finally match for pretty much all use cases, hopefully. |
No this was already there. As part of #10 I added more details to the API documentation regarding this particular point. |
What do you think as of 08bc98c? |
Looks good. I actually prefer that we only present a small example rather than a lengthy and complex one. I will fix some typos and tie it with the README. |
Awesome! |
The performance comparison is now available in the master branch so I consider this issue closed. |
The main goal is to provide a document (e.g., Performance.md) where we show assembly outputs for various level of optimizations (and possibly compilers) to illustrate that
cppreg
does not affect runtime performance or code size.This requires:
cppreg
implementation and a traditional implementation (à la CMSIS)We could put the various materials in a benchmark directory to not pollute the main one.
The text was updated successfully, but these errors were encountered: