Skip to content

Conversation

sgstreet
Copy link
Contributor

@sgstreet sgstreet commented Feb 20, 2024

This PR Fixes #1642 Runtime for C11 atomics missing.

The file pico_atomic.c implements the required functions for the read/modify/write function of atomic variables with sizes of 1, 2, 4 and 8 bytes. Atomic load and store for 8 bytes quantities is also supported. Support for atomic_flags is also present.

To eliminate interference with existing hardware spinlock usage while reducing multicore contention on unique atomic variables, we use one of the watchdog scratch registers (WATCHDOG_SCRATCH3) to implement 16, 2 bit, multicore locks, via a variation of Dekker's Algorithm. The lock is selected as a function of the variable address and the stripe width which hashes variables addresses to locks numbers and leverages the RP2040 Atomic Register Access functionality of the WATCHDOG scratch registers.

Clearly the current approach (See __atomic_lock) is slower than using the RP2040 Hardware Spinlocks directly, but I was not sure how to assign spin locks from the reserved range (IDs 0-13) and was concerned about multicore lock contention when the number of hash buckets (hardware spinlocks) is minimized. Any recommendations concerning the number of spins lock to allocated to this library along with the stripping width (number of addresses hashed to the same lock) would greatly appreciated? My thoughts are four to eight locks with a stripe width of four bytes. Using the hardware spinlocks is an easy update.

@sgstreet
Copy link
Contributor Author

As motivation to add this PR, below is a sample ticket based spin lock implementation similar to the Linux kernel. This type of implementation removes the limitation of 32 hardware spin locks at the minimal cost of potential hardware spin lock contention.

typedef atomic_ulong spinlock_t;

void spin_lock(spinlock_t *spinlock)
{
	uint16_t ticket = atomic_fetch_add(spinlock, 1UL << 16) >> 16;
	while ((*spinlock & 0xffff) != ticket);
}

unsigned int spin_lock_irqsave(spinlock_t *spinlock)
{
	uint32_t state = save_and_disable_interrupts();
	spin_lock(spinlock);
	return state;
}

bool spin_try_lock(spinlock_t *spinlock)
{
	uint32_t value = *spinlock;
	if ((value >> 16) != (value & 0xffff))
		return false;

	return atomic_compare_exchange_strong(spinlock, &value, value + (1UL << 16));
}

bool spin_try_lock_irqsave(spinlock_t *spinlock, unsigned int *state)
{
	uint32_t irq_state = save_and_disable_interrupts();
	bool locked = spin_try_lock(spinlock);
	if (!locked) {
		restore_interrupts(irq_state);
		return false;
	}

	*state = irq_state;
	return true;
}

void spin_unlock(spinlock_t *spinlock)
{
	atomic_fetch_add((uint16_t *)spinlock, 1);
}

void spin_unlock_irqrestore(spinlock_t *spinlock, unsigned int state)
{
	spin_unlock(spinlock);
	restore_interrupts(state);
}

@sgstreet sgstreet closed this May 8, 2024
@kilograham kilograham self-assigned this Jul 3, 2024
@kilograham
Copy link
Contributor

@sgstreet ; I'm guessing you may have closed this because we hadn't gotten to it yet. It is a useful PR, if you are still OK with us merging it; we generally add our own copyright to yours

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 3, 2024 via email

@kilograham kilograham reopened this Jul 4, 2024
@kilograham kilograham added this to the 1.6.0 milestone Jul 4, 2024
@sgstreet sgstreet force-pushed the add-atomic-support branch from 20db065 to ca51664 Compare July 8, 2024 22:45
#ifndef __STDATOMIC_H
#define __STDATOMIC_H

#include_next <stdatomic.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phew; i was wondering what versions of compilers supported this, but GCC 6 is happy, so OK i guess!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could drop a complete replacement for stdatomic.h as different approach? Seem like that would be worse in the long term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah; i am happy with include_next i just thought it was a recent add

#include "pico/config.h"

#ifndef __optimize
#define __optimize __attribute__((optimize("-Os")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason for Os by default? note i am merging this now as I want to test it with something else, but may fixup later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience, you get better code from -Os on the cortex parts. Less and generally faster code, but our mileage may vary. I can remove if you wish.
Sort of a adjacent question is if the atomic code should be forced in to SRAM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question; in similar bits of code we have a #define to pick

No, i think -Os is fine, and probably as you say better in many circumstances (our default being -O3 is a historic cmake thing, and i am worried about knock on effects on existing stuff if i change it)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this makes the code incompatible with Clang which doesn't support the optimize attribute which is GCC specific. A correct way to achieve this without relying on non-portable behavior is to set the optimization level for this specific target through the build system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend not to use clang on my projects. I will remove this momentarily.

Copy link
Contributor Author

@sgstreet sgstreet Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the branching underlying this PR, but since the PR has been merged, closed and reverted I'm not sure how to get github to see the PR updates. @kilograham @petrhosek I think we will need a new PR. See PR #1763

kilograham
kilograham previously approved these changes Jul 8, 2024
Copy link
Contributor

@kilograham kilograham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much!

@kilograham kilograham merged commit 01dec6f into raspberrypi:develop Jul 8, 2024
@kilograham
Copy link
Contributor

Thanks, I have merged this for now somewhat rapidly as we have something internal dependent on it...

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 8, 2024

Glad you and the team found a use for this! Let me know if there is a problem or something else the team would like changed.

kilograham added a commit that referenced this pull request Jul 8, 2024
@kilograham
Copy link
Contributor

cool; i have actually backed it out again temporarily, as it seems certain versions of GCC complain about a few things

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 9, 2024

If I could know which compiler versions and what problems cropped up I can see if there is an easy fix?

@sgstreet
Copy link
Contributor Author

sgstreet commented Jul 9, 2024

I create PR #1763 to reopen this PR and remove the GCC specific optimize attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants