Skip to content

Conversation

sgstreet
Copy link
Contributor

@sgstreet sgstreet commented Jul 9, 2024

This new version of the PR #1645 which was merged, closed and reverted. This version removes all usage of the GCC specific optimize attribute.

@sgstreet sgstreet changed the title Add atomic support Add C11 standard atomic support Jul 9, 2024
@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 9, 2024

@kilograham @petrhosek Here is some fixes for the MacOS and Windows compilation failures. I also fixed a problem with the preinit section so it works as intended. Is is possible to launch the build checks so I can verify the fixes?

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 9, 2024

FYI the error:

D:/a/pico-sdk/pico-sdk/src/rp2_common/pico_atomic/pico_atomic.c:208:21: error: mismatch in return type of built-in function '__atomic_fetch_sub_4'; expected 'unsigned int' [-Werror=builtin-declaration-mismatch]
  208 | __optimize uint32_t __atomic_fetch_sub_4(volatile void *mem, uint32_t val, __unused int model) {
      |                     ^~~~~~~~~~~~~~~~~~~~

This is a GCC bug as uint32_t and unsigned int are the same type on the ARMv6 arch and interesting the other atomic types do not generate compiler warnings.

volatile uint32_t *ptr = mem;
uint32_t state = __atomic_lock(mem);
uint32_t result = *ptr;
unsigned int __atomic_fetch_add_4(volatile void *mem, unsigned int val, __unused int model) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this (and the similiar foo_4 functions) should probably have remained as

uint32_t __atomic_fetch_add_4(volatile void *mem, uint32_t val, __unused int model) {

?

Esepcially as the other functions have signatures like:

uint8_t __atomic_fetch_add_1(volatile void *mem, uint8_t val, __unused int model) {
uint16_t __atomic_fetch_add_2(volatile void *mem, uint16_t val, __unused int model) {
uint64_t __atomic_fetch_add_8(volatile void *mem, uint64_t val, __unused int model) {

Copy link
Contributor Author

@sgstreet sgstreet Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but there is a GCC bug here when -Wextra is enabled. Since uint32_t and unsigned int are the same type on ARMv6 this should not be a warning nor error. See https://github.com/raspberrypi/pico-sdk/actions/runs/9848449043/job/27190379443. The current linux cmake configuration does not enable -Wextra but I'm not clear how to enable it as my CMake skill are not great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry! I made this comment from just looking at the diff view on GitHub, before realising that you'd actually specifically commented on this already!! 🤦

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no I was hacking up a comment at the same time as you were reviewing and commenting.

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 9, 2024

Is the way the turn on -Wextra for the linux builds so that any further problems are exposed before I push more to the PR.

@sgstreet sgstreet requested a review from lurch July 10, 2024 00:06
@kilograham
Copy link
Contributor

Ah, sorry, a lot going on, so maybe didn't mention this yesterday: since when we have a rather different implementation from a separate source.

@sgstreet
Copy link
Contributor Author

So no interest in this?

@kilograham
Copy link
Contributor

yup; no need for further fixes in this PR; we took it internally to fix the issues (compile, and things like "save" size that you have now fixed)...it will drop in as part of another PR

@lurch lurch removed their request for review July 10, 2024 16:51
@kilograham
Copy link
Contributor

@sgstreet did you actually verify that your functions are called by GCC; I saw that you #undefed a few functions, but I see on GCC that it just uses its own intrinsics by default

@sgstreet
Copy link
Contributor Author

Yes I tested it, the functions

atomic_flag_test_and_set
atomic_flag_test_and_set_explicit
atomic_flag_clear
atomic_flag_clear_explicit

are handled by GCC slightly differently than others, thus the overide. This following is an excerpt from my test map file:

 .text.__atomic_init
                0x10003e3c        0xc CMakeFiles/atomic-test.dir/pico-sdk/src/rp2_common/pico_atomic/pico_atomic.c.obj
 .text.__atomic_fetch_add_4
                0x10003e48       0x98 CMakeFiles/atomic-test.dir/pico-sdk/src/rp2_common/pico_atomic/pico_atomic.c.obj
                0x10003e48                __atomic_fetch_add_4
 .text.__atomic_compare_exchange_4
                0x10003ee0       0xb4 CMakeFiles/atomic-test.dir/pico-sdk/src/rp2_common/pico_atomic/pico_atomic.c.obj
                0x10003ee0                __atomic_compare_exchange_4
 .text.__atomic_test_and_set_m0
                0x10003f94       0xa0 CMakeFiles/atomic-test.dir/pico-sdk/src/rp2_common/pico_atomic/pico_atomic.c.obj
                0x10003f94                __atomic_test_and_set_m0
 .text.__atomic_clear_m0
                0x10004034        0xc CMakeFiles/atomic-test.dir/pico-sdk/src/rp2_common/pico_atomic/pico_atomic.c.obj
                0x10004034                __atomic_clear_m0

Did you decide to take PR?

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 11, 2024

@kilograham
Copy link
Contributor

kilograham commented Jul 11, 2024

What functions was your test calling?

Did you decide to take PR?

I am not taking the PR as Clang works somewhat differently; i'm trying to take the best parts of both (yours and another), but when i tried calling this on GCC

volatile atomic_uint_fast32_t foo;
uint32_t bar = atomic_load(&foo);

it uses its own non multi-core aware code; are you using explcit methods of the right size? also what GCC version?

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 11, 2024

Disappointed you did not take the PR nor ask for a clang compatible solution as I would have researched it and built one.

But to answer you question, please consider how the AHB Lite Bus operates. 1, 2 and 4 byte bus loads and stores are guaranteed atomic with respect to multi-masters. In some circumstances (not so important on ARMv6-m), a memory barrier will be required to ensure memory ordering. I'm sure you examined the generated code for the atomic_load above and saw the dmb ish or dmb sys instructions. The generated load/store code is sufficient and there is no need to implement replacements. This is not the case for 8 byte quantities and atomic read/modify/write functions.

If you replace atomic_uint_fast32_t with atomic_uint_fast64_t you should see in the disassembly a call to __atomic_load_8. Further, any call to a read/modify/write atomic function such as atomic_exchange will generate a function call to one of the size matched functions in the library.

How are you implementing the multi-core awareness? Using the hardware semaphores? If so how many semaphore bits are you allocating? I have code which does this, but uses all 32 semaphore bits, not a big deal as once you have atomic variables the need for the raw hardware semaphores is reduced. This of course is not compatible with the SDK as current implemented. This could be reduced to 8 or 16 bits with minor impact to performance. Just curious really.

Hope this helps.

@sgstreet
Copy link
Contributor Author

There is not a write buffer in front of the SRAM correct?

@kilograham
Copy link
Contributor

But to answer you question, please consider how the AHB Lite Bus operates. 1, 2 and 4 byte bus loads and stores are guaranteed atomic with respect to multi-masters

Yes, sorry i wasn't thinking thru; CLang still emits calls on m0plus, and I was just comparing that with GCC.

Disappointed you did not take the PR nor ask for a clang compatible solution as I would have researched it and built one.

I needed it yesterday, and when i first took the PR, there were a bunch of glaring bugs (which you have since fixed)... this interacts with some other stuff we're doing, so I didnt have time to go back and forth and fix the issues on github. thanks for your help, the spirit of your code is intact!!

@kilograham
Copy link
Contributor

There is not a write buffer in front of the SRAM correct?

correct

@kilograham kilograham added this to the 1.6.2 milestone Jul 19, 2024
@kilograham kilograham modified the milestones: 1.6.2, 2.0.0 Aug 8, 2024
@kilograham
Copy link
Contributor

fixed in SDK 2.0.0 thanks for your PR

@kilograham kilograham closed this Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants