-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add grouping registers together, like how CMSIS maps a struct of types to a base address #7
Comments
@hak8or here are some thoughts about this issue ... PrologueThis issue was created after it was noticed in #5 that A typical example is given here. What is exactly happeningCMSIS maps C structures directly to memory, and registers are accessed as members of these structures., for example: typedef struct {
__IO uint8_t TXFIFO; // Base + 0x00
__IO uint8_t STATUS; // Base + 0x01
} UART_TypeDef;
#define PERIPH_BASE ((uint32_t)0xF0A03110)
#define UART_Base (PERIPH_BASE + 0x0010)
#define UART ((UART_TypeDef *) UART_Base)
UART->TXFIFO = 'a'; With such an approach, the compiler is clearly aware the registers (here On the other hand in This is clearly the case and the example here is identical to the first one except that for the CMSIS code we now access the registers directly as pointers. And this leads to the exact same code compared to What's nextThis issue highlights a fundamental difference between CMSIS and For Also as a side note, it is not very clear how heavy the performance hit ... but let's give it a try. A first attemptSo the first step was to not deal explicitly with pointers anymore but rather rely on reference in The second step is to provide some syntax construct so that it is possible to express that some registers are grouped. This requires some register cluster definition, for example: // Here MemoryMap is the type corresponding to the structure to be mapped.
template <Address_t ClusterBase, typename MemoryMap>
struct Cluster {
constexpr static const Address_t cluster_base = ClusterBase;
using map_t = MemoryMap;
static MemoryMap* const _mem_map;
};
template <Address_t ClusterBase, typename MemoryMap>
MemoryMap* const
Cluster<ClusterBase, MemoryMap>::_mem_map = reinterpret_cast<
MemoryMap* const
>(Cluster<ClusterBase, MemoryMap>::cluster_base); Then we can define a template <
typename Cluster,
Width_t RegWidth,
std::uint32_t offset,
volatile typename RegisterType<RegWidth>::type Cluster::map_t::*member
>
struct GroupedRegister :
Register<
Cluster::cluster_base + offset,
RegWidth
>
{
using base_reg = Register<
Cluster::cluster_base + offset,
RegWidth
>;
static typename base_reg::MMIO_t& rw_mem_pointer() {
return std::ref(Cluster::_mem_map->*member);
};
static const typename base_reg::MMIO_t& ro_mem_pointer() {
return std::ref(Cluster::_mem_map->*member);
};
template <typename F, typename base_reg::type value>
inline static void write() {
typename base_reg::type fmt_value = 0;
F::policy::template write<
typename base_reg::type,
typename base_reg::type,
F::mask,
F::offset,
value
>(fmt_value);
Cluster::_mem_map->*member = fmt_value;
};
}; This is still very experimental ... for a And ... we are getting closer to CMSIS (same number of instructions). But ... it is a different assembly code, so @hak8or it would be great if you could take a look ... I am curious as to what is happening according to you. It is here. |
The assembly difference was surprising, but looking closer here is what I see. Diffirences
Even smaller example + fiddlingI made a super bare bones example here based off your snippet but with some renaming and removal of unrelated components to increase clarity. Using the above example, if we swap Making the wiping of old bits (removal of the OR) consistent with the CMSIS example via just writing 1 to STATUS, lets try the clustering implementation. This looks as follows: // ============= CPPReg =============
typedef struct {
volatile uint8_t TXFIFO; // Base + 0x00
volatile uint8_t STATUS; // Base + 0x01
} UART_TypeDef;
struct UART {
struct UART_Cluster : cppreg::Cluster<0xF0A03120, UART_TypeDef> {};
struct TXFIFOg : cppreg::GroupedRegister<UART_Cluster, 8u, 0, &UART_TypeDef::TXFIFO> {
using DATA = cppreg::Field<TXFIFOg, 8u, 0 * 2, cppreg::write_only>;
};
struct STATUSg : cppreg::GroupedRegister<UART_Cluster, 8u, 8, &UART_TypeDef::STATUS> {
using BIT0 = cppreg::Field<STATUSg, 1u, 0, cppreg::read_write>;
using RES = cppreg::Field<STATUSg, 7u, 1, cppreg::read_only>;
};
};
void Demo_CPPReg(void){
UART::STATUSg::write<UART::STATUSg::BIT0, 1>();
// Introducing this does cause a new address to show up in .L2
// instead of using the an offset from the base address.
UART::TXFIFOg::DATA::write<0x12>();
}
// ============= CMSIS =============
#include <stdint.h>
typedef struct {
volatile uint8_t TXFIFO; // Base + 0x00
volatile uint8_t STATUS; // Base + 0x01
} UART_TypeDef;
#define UART ((UART_TypeDef *)0xF0A03120)
void Demo_CMSIS(){
UART->STATUS = 0b0000'0001u;
UART->TXFIFO = 0x12;
} Sadly, the assembler still is unable to notice how close the addresses are and use relative addressing. Grand OverviewI do not know how much work should be put in to getting the assembly to fully match and be just as performant as the CMSIS example. The reason why I am so worried about this is because it does introduce overhead in the form of increased binary size and register pressure. It is not rare to encounter situations where you have an array of registers that you want to write to quickly and packed closely together. For example, when register pressure is high enough (due to having to load so many register addresses) then we get register spillover which can be catastrophic in a hot path. For example, like this. Interestingly, if you remove the |
@hak8or for review, this is a more "stable" API and this adds the ability to simply index the registers (or the fields) of interest. |
The proposed API is in the This works as follow:
|
And a somewhat very concise and interesting version and |
That iterator example is pretty sweet. Seems like it could be worthwhile to use for scenarios like resetting a bunch of registers to their default/reset values. I am seeing that the interface is still fairly compact and easy to understand at a glance. It still generates the same assembly across various architectures and optimizations. Humorously enough this seems to cause an "internal compiler error" in all of the MSVC toolchains. Also, ICC seems to not be using relative addressing for some reason from CPPReg, but I am not clear on why it's doing that. Depricating old APIGoing further with the new API, it seems to be extremely similar to the previous one. I think it may be worthwhile to actually remove the old API and replace it with this.
What do you think? |
Ok so first let's discuss the new API and focus on the limitations (also the API is still in progress).
This also start to mean we have to think about interfacing |
Here is a rephrasing of the question underlying the current limitation:
This obviously boils down to: for a N-bits MCU are the base addresses of the various peripherals (N/8) bytes aligned? |
So after a bit of research it seems that in most cases, peripheral registers will be aligned in ways that makes it possible always use RegisterPack (granted we add mixed-size support):
It seems we can then assume this will the most generic case. @hak8or, to keep it moving here is my proposal:
|
This relies on a memory map implementation based on std::array and specialized over the register size. PackedRegister can now be used with all supported register sizes (i.e., widths). This relates to #7 and the finalization of the API before performance testing.
Ok so the final implementation is in the Next steps:
|
Oh dang, lots of activity, awesome! Regarding register sizes, I can confirm that across a few different MCU's from different vendors (only looked at ARM Cortex-M) the registers are all 32 bits large (and 4 byte aligned), with usually the upper bits marked as reserved or read only (though sometimes mentioned that they have to be kept set to 1). Given the edge cases you described, I agree that the none packed register API should be kept. Furthermore, I think the new packed registers API is solid for now. When updating the performance comparison (hopefully finishing it up by the end of today, barring any other potential issues), I will have a better idea of if there are any pain spots in the new API. Unrelated notes:
|
This is not really a licensing issue but this was more to avoid bloating the repository. The test suite (based on googletest) requires some mock up—which I hope to have the time to describe at some point—to be able to run on desktop. In addition, this relies heavily on some CMake boilerplate (see that and this). So this is more a time issue as this will require some documentation. But ideally I would prefer to put the test suite in a dedicated repository.
Once we stabilize the API and finish the performance review I agree that this should be the next step. The scripts in cmsis-svd provide all the necessary tools (parsing and assembling data). So what will be left is mostly writing a |
RegisterPack todo listOk let's gather here what is needed before I merge the register pack implementation. This requires to take some decisions.
|
Ok considering this list I decided to start a new issue (#12). This will be easier. |
Issue
As saw in #5 , there is a discrepancy between CMSIS and Cppreg which was found to be due to the CMSIS style allowing the ability to inform the compiler about groupings of registers. This allows the compiler to, when reading and writing from memory, use relative memory operations like this
Since CppReg doesn't have that information explicitly stated, the compiler is stuck assuming that the registers are totally unrelated and is unable to get the address via offsets from a base address. I guess the optimization process is unable to do that extensive inspection for memory addresses (though it can do that for immediate as shown in the sidenote of #6 it seems).
Solution
Add a way to group registers together. I do not have a syntax example off the top of my head, but as @sendyne-nclauvelin mentioned in #5 this would require a not trivial amount of API rework, so before implementing, lets see what other potential API points of pain there are.
The text was updated successfully, but these errors were encountered: