-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add C11 standard atomic support #1645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As motivation to add this PR, below is a sample ticket based spin lock implementation similar to the Linux kernel. This type of implementation removes the limitation of 32 hardware spin locks at the minimal cost of potential hardware spin lock contention.
|
@sgstreet ; I'm guessing you may have closed this because we hadn't gotten to it yet. It is a useful PR, if you are still OK with us merging it; we generally add our own copyright to yours |
Sure, let me update the PR there was a bug that needs to be fixed.
|
20db065
to
ca51664
Compare
#ifndef __STDATOMIC_H | ||
#define __STDATOMIC_H | ||
|
||
#include_next <stdatomic.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phew; i was wondering what versions of compilers supported this, but GCC 6 is happy, so OK i guess!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could drop a complete replacement for stdatomic.h as different approach? Seem like that would be worse in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah; i am happy with include_next
i just thought it was a recent add
#include "pico/config.h" | ||
|
||
#ifndef __optimize | ||
#define __optimize __attribute__((optimize("-Os"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for Os by default? note i am merging this now as I want to test it with something else, but may fixup later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my experience, you get better code from -Os on the cortex parts. Less and generally faster code, but our mileage may vary. I can remove if you wish.
Sort of a adjacent question is if the atomic code should be forced in to SRAM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question; in similar bits of code we have a #define
to pick
No, i think -Os
is fine, and probably as you say better in many circumstances (our default being -O3 is a historic cmake thing, and i am worried about knock on effects on existing stuff if i change it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this makes the code incompatible with Clang which doesn't support the optimize
attribute which is GCC specific. A correct way to achieve this without relying on non-portable behavior is to set the optimization level for this specific target through the build system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend not to use clang on my projects. I will remove this momentarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the branching underlying this PR, but since the PR has been merged, closed and reverted I'm not sure how to get github to see the PR updates. @kilograham @petrhosek I think we will need a new PR. See PR #1763
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much!
Thanks, I have merged this for now somewhat rapidly as we have something internal dependent on it... |
Glad you and the team found a use for this! Let me know if there is a problem or something else the team would like changed. |
This reverts commit 01dec6f.
cool; i have actually backed it out again temporarily, as it seems certain versions of GCC complain about a few things |
If I could know which compiler versions and what problems cropped up I can see if there is an easy fix? |
I create PR #1763 to reopen this PR and remove the GCC specific optimize attribute. |
This PR Fixes #1642 Runtime for C11 atomics missing.
The file pico_atomic.c implements the required functions for the read/modify/write function of atomic variables with sizes of 1, 2, 4 and 8 bytes. Atomic load and store for 8 bytes quantities is also supported. Support for atomic_flags is also present.
To eliminate interference with existing hardware spinlock usage while reducing multicore contention on unique atomic variables, we use one of the watchdog scratch registers (WATCHDOG_SCRATCH3) to implement 16, 2 bit, multicore locks, via a variation of Dekker's Algorithm. The lock is selected as a function of the variable address and the stripe width which hashes variables addresses to locks numbers and leverages the RP2040 Atomic Register Access functionality of the WATCHDOG scratch registers.
Clearly the current approach (See __atomic_lock) is slower than using the RP2040 Hardware Spinlocks directly, but I was not sure how to assign spin locks from the reserved range (IDs 0-13) and was concerned about multicore lock contention when the number of hash buckets (hardware spinlocks) is minimized. Any recommendations concerning the number of spins lock to allocated to this library along with the stripping width (number of addresses hashed to the same lock) would greatly appreciated? My thoughts are four to eight locks with a stripe width of four bytes. Using the hardware spinlocks is an easy update.