Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong alignement for function pointers (callbacks) causes CPU fault on Coretex-M4 #17

Closed
r2r0 opened this issue Apr 28, 2021 · 15 comments
Closed

Comments

@r2r0
Copy link
Contributor

r2r0 commented Apr 28, 2021

I got usage fault beacuse of unaligned memory on STM32WB55 (Cortex-M4) when accessing structure fields holding function pointers (callbacks) i.e. function hdlc_ll_reset is failing at line: handle->rx.state = hdlc_ll_read_start;.

For confirmation I added (dirty hack) alignment to memory pointers used to initialize buf field in hdlc_ll_init_t and such change allows code to execute properly:
TinyProtocolFd.h:301

uint8_t m_data[S] __attribute__ ((aligned (8))) {};

TinyProtocolFd.h:317

    : IFd(new uint8_t[size+7], size+7)

tiny_fd.c:584:

uint32_t ptrValue = (uint32_t)ptr;
ptrValue = ((ptrValue + 7U) / 8U) * 8U;
_init.buf = (void*)ptrValue;

Compiler GCC 10.2.0

@lexus2k
Copy link
Owner

lexus2k commented Apr 28, 2021

Hi,

Thank you for reporting the issues. I will investigate that.
By hand I have the board with Cortex-M0 (also 32-bit) and it doesn't have such issue. Need some time to figure out the root cause.

PS. But why 8 and not 4 ?

@lexus2k
Copy link
Owner

lexus2k commented Apr 28, 2021

What if to play with CCR.STKALIGN=1 in your code?

Is it the issue with C++ classes only? What about C-style API?

Just checked the code on Cortex-A53, no issues with alignment.

Another idea is to add __attribute__((__packed__)) to the structures declarations

lexus2k added a commit that referenced this issue Apr 29, 2021
…much code in FreeRTOS sources has no alignment for callbacks in Cortex-M implementations #17
@r2r0
Copy link
Contributor Author

r2r0 commented Apr 29, 2021

Thanks for quick reply.

I use Zephyr RTOS and it has already set 8 bytes stack alignment (because MPU is used for stack overflow protection).
I think that stack alignment should have no influence on problem because I also used FdD for testing (with the same effect) where all data is allocated on heap.

I did not try C API yet.

I also tried packed structures and it works but it makes access to their fields quite expensive (comparing to unpacked/aligned structures).

IMHO if code is doing pointer arithemetic and then code uses such pointer as an addres of structure (i.e. _init.buf assignment) then it is required that such (calulated) address is properly aligned.

Unfortunately in 99.99% cases you will get proper results but i.e. compiler version change can trigger unaligned access.
I also tried to run my test on QEMU for Cortex-M3 and there was no problem, so for me it means that problem is triggered by some very minor change in compiler options.

@lexus2k
Copy link
Owner

lexus2k commented Apr 29, 2021

I also tried to run my test on QEMU for Cortex-M3 and there was no problem, so for me it means that problem is triggered by some very minor change in compiler options.

Can you capture the compiler options in both cases?

IMHO if code is doing pointer arithemetic and then code uses such pointer as an addres of structure

Almost all code in C / C++ does pointer arithmetic, and uses such pointers. The raw pointers are the basic of C. And doing manual alignment every time you use the pointer would be real headache for all developers. But that's not true.
I agree, that sometimes to speed up a program execution used structures must be aligned in memory, but UsageFault is something specific to the platform, and we need to find the root cause for that to properly fix the issue.
The alignment to 8 bytes will not be good solution for other platforms.

Of course, it is possible to make quick fix with #ifdef to support your case.

PS. On ARM website, I have found 2 compiler options --unaligned_access, --no_unaligned_access
The documentation for STM32W55 points out that this CPU has Cortex-M4 core and according to ARM website Cortex-M4 has Armv7E-M architecture. The ARM documentation on the compiler options says that Armv7E-M supports unaligned access.

PPS. And I'm thinking if aligned_alloc instead of new can help (: IFd(new uint8_t[size+7], size+7))

PPPS.

uintptr_t m_data[(S + sizeof(uintptr_t ) - 1) / sizeof(uintptr_t )]

@r2r0 r2r0 changed the title Wrong alignement for function pointers (callbacks) causes CPU fault on Coretex-M3 Wrong alignement for function pointers (callbacks) causes CPU fault on Coretex-M4 Apr 30, 2021
@r2r0
Copy link
Contributor Author

r2r0 commented Apr 30, 2021

You are right - STM32WB55 is obviously Cortex-M4.
I am sorry for messing this - I was just after build for QEMU Cortex-M3 and I had compilation output on screen when writing issue description.

I attached compiler command line for both platforms.
Except difference in mcu option the interesting part is that _FORTIFY_SOURCE=2 is used only for Cortex-M4.

I also found something similiar problem described on stackoverflow:
https://stackoverflow.com/questions/59592724/why-am-i-getting-an-unaligned-memory-access-fault-cortex-m4

I did 8 bytes alignment in example because I also tried it on x86_64 and there is 8 bytes alignement for some pointers.
So your idea to use sizeof(uintptr_t) as alignment value seems to be very good.

At least on my platform aligned_alloc is problematic (linker error: undefined reference to `posix_memalign').
If you wrap aligned allocation in some HAL/platform wrapper then it can be very good solution (i.e. in Zephyr I have k_aligned_alloc).

compilation_options.txt

@lexus2k
Copy link
Owner

lexus2k commented May 2, 2021

I added some alignment implementation on the master branch.
If you have a time to test it, it would be very helpful.

lexus2k added a commit that referenced this issue May 4, 2021
lexus2k added a commit that referenced this issue May 4, 2021
@Zzzwierzak
Copy link

Hi, I've tested revision 46993c8, issue is still visible. What I noticed is that calling setWindowSize function with argument different than 4 will cause HF. Disassembly shows that HF is caused by STRD instruction. Pointer ptr value that is problematic came from calculation in tiny_fd_init() before call of hdlc_ll_init().

@lexus2k
Copy link
Owner

lexus2k commented May 5, 2021

Hi,

Ok, what we have:

  1. According to previos posts packed structures with alignment 1 byte work for you, but they shouldn't according to ARM documentation:

I also tried packed structures and it works but it makes access to their fields quite expensive (comparing to unpacked/aligned structures).

And here is the link to documntation: https://developer.arm.com/documentation/100748/0609/writing-optimized-code/packing-data-structures

If a member of a structure or union is packed and therefore does not have its natural alignment, then to access this member, you must use the structure or union that contains this member. You must not take the address of such a packed member to use as a pointer, because the pointer might be unaligned. Dereferencing such a pointer can be unsafe even when unaligned accesses are supported by the target, because certain instructions always require word-aligned addresses.

  1. @r2r0 mentioned that 8 bytes alignment works for you:

uint8_t m_data[S] attribute ((aligned (8))) {};
...
: IFd(new uint8_t[size+7], size+7)
...
uint32_t ptrValue = (uint32_t)ptr;
ptrValue = ((ptrValue + 7U) / 8U) * 8U;
_init.buf = (void*)ptrValue;

  1. You mentioned the problem with setWindowSize() function.

What I noticed is that calling setWindowSize function with argument different than 4 will cause HF. Disassembly shows that HF is caused by STRD instruction

But it does almost nothing, just sets member field of the class. How it can fail?

    void setWindowSize(uint8_t window)
    {
        m_window = window;
    }
  1. At least with the latest commits I added alignment for raw buffer allocation. so the structure tiny_fd_data_t is always aligned.

Currently alignment is set to 4 bytes for Cortex-M4:

#if defined(__TARGET_CPU_CORTEX_M0) || defined(__TARGET_CPU_CORTEX_M0_) || defined(__ARM_ARCH_6M__) || \
    defined(__TARGET_CPU_CORTEX_M3) || defined(__TARGET_CPU_CORTEX_M4) || defined(__ARM_ARCH_7EM__) || \
    defined(__ARM_ARCH_7M__)

#define TINY_ALIGNED_PTR   TINY_ALIGNED(sizeof(uintptr_t))

#else

#define TINY_ALIGNED_PTR   TINY_ALIGNED(sizeof(uintptr_t))

#endif

But you can change that to 8 bytes by replacing sizeof(uintptr_t) with 8, or even you can try to use alignment 16.

  1. To understand how the code works in your case, I need some simple example from you with the protocol initialization and failing functions.

lexus2k added a commit that referenced this issue May 5, 2021
lexus2k added a commit that referenced this issue May 5, 2021
@Zzzwierzak
Copy link

It seems that last commit fixed issue, at least I'm not able to reproduce it. Let's wait for final confirmation from @r2r0

@lexus2k
Copy link
Owner

lexus2k commented May 6, 2021

HI @Zzzwierzak

That's nice to hear that. I added alignment checks for the provided buffer to hdlc and fd initialization code. However, from the API standpoint maybe a better way is to remove so strict requirements for the user buffer, and automatically allocate a aligned space for the structures inside the buffer, as @r2r0 mentioned in initial post.
Let me know what you think.

@r2r0
Copy link
Contributor Author

r2r0 commented May 11, 2021

Hi @lexus2k

Thanks for your support - present version works well in our application.

In fact alignment requirements for user supplied bufer can be sometimes problematic.

Probably the definition of TINY_ALIGN_STRUCT_VALUE can be unified to sizeof(uintptr_t) - I used in my sample hardcoded value of 8 only for quick test on x86 platform but sizeof(uintptr_t) seems to be much more elegant solution.

I wonder if allocation in FdD::FdD should also use TINY_ALIGN_STRUCT_VALUE (istead of sizeof(uintptr_t)).

@lexus2k
Copy link
Owner

lexus2k commented May 11, 2021

HI

I wonder if allocation in FdD::FdD should also use TINY_ALIGN_STRUCT_VALUE (istead of sizeof(uintptr_t)).

You're absolutely correct. I've just fixed that in last commit.

In fact alignment requirements for user supplied bufer can be sometimes problematic.

Ok, I will remove strict verififcations from HDLC and FD.

Probably the definition of TINY_ALIGN_STRUCT_VALUE can be unified to sizeof(uintptr_t) - I used in my sample hardcoded value of 8 only for quick test on x86 platform but sizeof(uintptr_t) seems to be much more elegant solution.

Can you confirm that sizeof(uintptr_t) works for you without any issues? If yes, I will update the header file.

@r2r0
Copy link
Contributor Author

r2r0 commented May 13, 2021

Yes, I confirm that sizeof(uintptr_t) works for me.

@lexus2k
Copy link
Owner

lexus2k commented May 13, 2021

@r2r0 Thank you very much.

@lexus2k
Copy link
Owner

lexus2k commented Jul 25, 2021

I'm closing the issue for now, as the problem is considered to be fixed.
If you have any questions and notes, please let me know.

@lexus2k lexus2k closed this as completed Jul 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants