Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve race free register access ergonomics #37

Open
IamfromSpace opened this issue Dec 6, 2019 · 37 comments
Open

Improve race free register access ergonomics #37

IamfromSpace opened this issue Dec 6, 2019 · 37 comments
Labels
enhancement New feature or request

Comments

@IamfromSpace
Copy link
Contributor

IamfromSpace commented Dec 6, 2019

Summary

The current way we go about this is both un-ergonomic and unsound.

Current Usage

Today, the strategy for handling registers like these is to allow only the HAL to mutate them (via pub(crate)), and then assure that they are available for mutation from the user.

For example if we want to put PA4 into an alternate function mode 2:

...
let mut gpioa = dp.GPIOA.split(&mut rcc.ahb);
gpioa.pa4.into_af2(&mut gpioa.moder, &mut gpioa.afrl);

The AHB, MODER, and AFRL have no public API, so they cannot be mutated directly, but they must be (mutably) passed into the split or into_af2 so the HAL can modify them and correctly configure the pin.

The Problem

Originally, my thought was a recommendation to improve ergonomics and hide more internal details from users. However, I realized that this strategy is not actually sound. The former leads to the latter, so we'll explore them in that order.

Ergonomics

If we do not return or require these registers it greatly simplifies the API for users. New users who are less familiar to MCU will be able to work in more concrete topics like peripherals, rather than registers.

Also, composition becomes much simpler. Now, peripherals that accept pins must either a) require the pin in an already correctly configured state b) require registers that are not related its own function so it can configure the pin.

The former requires the user know extra details, and the latter is a break of the Law of Demeter (reaching into a dependency).

If instead moving a pin into alternate function 2 was as simple as:

let mut gpioa = dp.GPIOA.split();
gpioa.pa4.into_af2();

Then a peripheral could simply accept an unconfigured pin, and the user would never even know that an alternate function was required, further hiding details and simplifying user's code. HAL's across multiple chips and manufactures would look more similar and wouldn't require re-learning the nuances of each chip.

The two reasons that this hasn't been done before is likely not wanting to mask these details and the number of resulting unsafe blocks.

It is not the job of a HAL to teach a user how their chip works. There should absolutely be resources to enable the curious! But an abstraction layer is about hiding. And the resulting unsafe blocks are not any less safe than today, due to soundness issues.

Unsoundness

This strategy has been built on the premise that: If HAL users do not write unsafe code, and HAL contributors do not write unsafe code, then there will not be "unsafe" results. This is not true.

As a contributor I can write the following "safe" function:

pub fn evil(afrl: &mut AFRL) {
  afrl.reset();
}

Any user who setups up a peripheral, and then calls evil in their code (remember, they'll still be able to pass a mutable reference at this point) all pins that require an alternate function are now set incorrectly and no longer connected to their previous peripheral.

The fact that maintainers would never allow this into the HAL is irrelevant. The important piece is that the borrow checker didn't not prevent this function from mutating pins it didn't own or have mutable access to.

Addressing this Issue

EDIT: The following has soundness issues, and is not a viable solution! Please see my next comment further down for a summary of the issue and safe ways to address it.

Going forward these registers should not be exposed or accepted. Instead unsafe blocks should be used, where ownership/mutable reference of the pin/peripheral is proof of safe access to particular bits within a register with "global" bits. Other mechanisms ensure that there is never more than one pin/peripheral/etc:

impl<MODE> PA4<MODE> {
  pub fn into_af2(self) -> PA4<AF2> {
    unsafe {
      (*$GPIOX::ptr()).moder.modify(|_, w| {
        w.moder4().alternate()
      }
    }
 
    unsafe {
      (*$GPIOX::ptr()).afrl.modify(|_, w| {
        w.modify().afrl4().af2()
      }
    }
 
    PA4 { _mode: PhantomData }
}

The presence of the unsafe keyword is not a "bad thing." Instead it is a signal that this is a delicate area of the code that should be used with caution.

The way that the stm32f3xx chips are designed simply doesn't allow for svd2rust to ensure safety at the peripheral layer. This is not anyone's "failing" it's simply a that we need to manually validate safety in certain places--it cannot be deferred to the borrow checker.

@mvirkkunen
Copy link

mvirkkunen commented Dec 8, 2019

The reason the whole "require &mut to global registers" thing exists is to prevent race conditions. Requiring &mut is very powerful in that it requires the user to take care that only one thread (interrupt handlers are essentially threads when it comes to race conditions) can have access to the global register at once. The into_af2() code you propose has a race condition, because modify is not atomic, and attempting to modify the global register from multiple threads at the same time can cause an unpredictable value to be written. As unfortunate as it is, it's pretty common to have a single register contain bit fields related to multiple otherwise unrelated peripherals/pins which makes them difficult to access without race conditions.

If it's so easy to write that pub fn evil() currently, then perhaps the methods that modify the global register bits in the HAL should be marked as unsafe. Then whoever modifies them, even in HAL code, will have to take responsibility for what they're doing by using an unsafe block.

I completely agree with the ergonomics bit though, I'd rather not have all this noise in my code, but currently it seems like a necessary evil.

@david-sawatzke
Copy link
Member

david-sawatzke commented Dec 8, 2019

In general, unsafe is only required for memory safety bugs, which the whole &mut spiel prevents.

Whilst changing pin state (or similar things) unexpectedly isn't great, it's not really unsound (in all the cases that i can think of), since it doesn't lead to memory corruption, the code just doesn't work anymore.

This may be an issue when peripherals are doing memory access, as with dma, but for most peripherals this isn't an issue.

@Disasm
Copy link
Member

Disasm commented Dec 9, 2019

For gd32vf103 I used an interrupt::free block to access the shared registers and prevent race conditions. This works only for single-core MCUs, though.

@IamfromSpace
Copy link
Contributor Author

IamfromSpace commented Dec 11, 2019

Along with the comments from the other day, there was some good discussion on this in the Matrix WG channel starting here (discussion ends at the change of subject).

Why unsafe Blocks (alone) Don't Work

To summarize (mostly re-iterating @mvirkkunen), the issue with my previously presented resolution is that it does not work for multi-core or in the presence of interrupts (the former may not be an issue for the f3 series, but the latter most certainly is). The modify instruction is not inherently atomic, it only works correctly if there are not multiple writers for the modified register--which safe code guarantees. modify works with three instructions (read, modify, store). If an interrupt occurs in the middle and then alters the register, the main program will then overwrite the interrupt's modification.

Is this a Soundness Concern?

To @david-sawatzke points, I'd still argue that this issue is important and should be considered unsound. the memory is in essence corrupted, because to a user of the library they expect a pin to be a pin. They don't know that its state is splayed among many registers, they expect it to be an isolated concept in memory. As such, in the same way that we would expect an item in a list not be modified without mutable access, we would not expect a pin to be modified either (regardless of which register that state is stored in).

Ideally, we don't choose between to safety issues, we solve them both.

Some approaches discussed

Bitbanding

Certain MCUs have a memory range that allows for single bit writes. This is how the f1xx-hal solves a number of these issues. Notably the f3 does not have this feature for GPIO. Bitbanding only solves the issue for single bits (like enable/disable), and it is a feature that is becoming obsolete. It may be appropriate for solving certain issues, but is not viable as a catch all solution.

Critical Sections, interrupt::free blocks

As pointed out by @Disasm, critical sections around the modification prevent the interrupt of occurring between the load/modify/store instructions, preserving atomicity. As pointed out, not viable for multi-core. Does anyone know if this is an issue for f3? Are there multicore devices this hal must support?

The main concern with critical sections is WCET (Worst Case Execution Time). By preventing interrupts, there is both time spent locking/unlocking/contention and performing the work in the critical section before interrupt code is allowed to execute. With Rust, using critical sections is much safer since start/finish operations are not done manually, but they are not guaranteed to terminate/yield.

From my research, it does seem that critical sections still allow for WCET analysis, as long as the code inside the critical section can be analyzed.

Mutexes

As far as I'm aware, these are impractical for interrupts, because if the main thread acquires the lock it cannot relinquish while interrupted. If there is an interrupt safe mutex I don't expect it would be preferable to critical sections (at least for single core).

Theoretical Discussion

Guarantees can be broken down into two classes: Safety and Liveness. Safety is "bad things don't happen," and Liveness is "good things eventually happen."

We want both--especially in the context of embedded where Real Time is important. However we must consider the following:

  • Liveness is significantly harder
  • Lamport, who coined the term, recommends a focus on Safety
  • The safety of Rust is literally just that--Safety, Rust itself cannot provide any guarantees of Liveness

I believe our primary focus should be to use Rust's safety to guarantee us the utmost Safety first--and then deliver on Liveness guarantees in a way that preserves the former.

Going forward

My recommendation is that we use bitbanding when possible, critical sections when we cannot, and we bring the utmost care to preserve Worst Case Execution Time analysis by keeping critical section code short and simple.

As an example:

impl<MODE> PA4<MODE> {
  pub fn into_af2(self) -> PA4<AF2> {
    interrupt::free(|_| {
      unsafe {
        (*$GPIOX::ptr()).moder.modify(|_, w| {
          w.moder4().alternate()
        }
      }
 
      unsafe {
        (*$GPIOX::ptr()).afrl.modify(|_, w| {
          w.modify().afrl4().af2()
        }
      }
    });
 
    PA4 { _mode: PhantomData }
}

@IamfromSpace
Copy link
Contributor Author

It's worth pointing out that there was an alternative pointed out by @thalesfragoso, that addresses the soundness gap. My recommendation is to still use the previous proposal, but it seemed unreasonable to omit this from discussion.

The Alternative

Since the issue is that "proxy" or "global" registers are unsafe to access, if all their methods were marked unsafe, then errors by HAL maintainers would at least be restrained to unsafe blocks. This does indeed plug the soundness concern.

If we apply a pattern like this (marked fn as unsafe) for each "global" register:

impl MODER {
    pub(crate) unsafe fn moder(&mut self) -> &gpioa::MODER {
        unsafe { &(*GPIOA::ptr()).moder }
    }
}

Then the evil function is no longer safe.

Advantages

The principle advantage is the avoidance of Critical Sections. Be it perception or reality, there is resistance to their usage based on cost to WCET.

The other advantage is that this can be added without user impact. It'd be a bit tedious to mark every access to a Part unsafe, but it's easier than the other approach.

Drawbacks

The drawback of ergonomics remains. Passing around &mut registers is still costly for users.

Certain traits cannot be implemented. The loss of generality and sneaking in of device specific information means that generic interfaces may become impossible. For example, the enable/disable traits for PwmPin simply cannot work with this strategy. There is no second argument to accept as a proxy for the "global" effects. There are up to four channels that are competing for the register which toggles the individual channels on/off. Breaking down the Traits will just only get us so far as a general strategy as we support more and more devices with their own unique nuances.

Recommendation

I still believe that we can get a better interface, that allows for easier HAL Trait implementation, while still keeping WCET both low (<25 cycles) and document-able.

It seems reasonable to calculate and put right in the README. I think the vast majority of users will see that those numbers are well within their required tolerance.

@Sh3Rm4n Sh3Rm4n added the enhancement New feature or request label Mar 8, 2020
@Sh3Rm4n Sh3Rm4n mentioned this issue May 6, 2020
@Sh3Rm4n Sh3Rm4n changed the title Registers with "Global" Bits (AFRL/MODER/APB1/etc) _must_ not be Exposed Improve race free register access ergonomics Jul 28, 2020
@Sh3Rm4n Sh3Rm4n pinned this issue Jul 28, 2020
@Sh3Rm4n
Copy link
Member

Sh3Rm4n commented Jul 28, 2020

#127 restarted the discussion on the same topic, so I'll like to give my thoughts on this issue:

Bitbanding

Certain MCUs have a memory range that allows for single bit writes. This is how the f1xx-hal solves a number of these issues. Notably the f3 does not have this feature for GPIO. Bitbanding only solves the issue for single bits (like enable/disable), and it is a feature that is becoming obsolete. It may be appropriate for solving certain issues, but is not viable as a catch all solution.

I recently stumbled upon bitbanding myself and I see this neatly fits our purpose to improve the ergonomics at least on some levels, at least for methods, where only one register needs to be accessed.
But I had trouble to find any useful resources. Are there examples where the f1xx-hal is using bitbanding? Are there bitbanding examples for rust in general? As it is an ARM Cortex-M(3|4) feature, it is applicable in general.

Critical Sections, interrupt::free blocks

As pointed out by Disasm, critical sections around the modification prevent the interrupt of occurring between the load/modify/store instructions, preserving atomicity. As pointed out, not viable for multi-core. Does anyone know if this is an issue for f3? Are there multicore devices this hal must support?

For f3's we do not have to worry about multi-core issues, at least. This is a viable option to be used in this case. But as @IamfromSpace said, WCET is an issue we have to be aware of.

To summarize, we have these three issues, which we have to prioritize, while further improving the API:

  • safety
  • ergonomics
  • WCET

This is my initial prioritization order, but this is up to discussion to change this order.

One goal we can all agree on, is preventing unsound issues, I guess. The current ergonomic situation at least show the problems, that a thread-safe and sound controlling of the peripheries is difficult to realize with the current register map on the stm32 family. I don't know if any other microcontroller family are better off.

@Sh3Rm4n
Copy link
Member

Sh3Rm4n commented Jul 28, 2020

From @ra-kete in #127:

In my mind, the best solution is to just bite the bullet and use critical sections. This is certainly the best case for ergonomics: No need for the into_x methods to take any arguments. No passing around and having to somehow share the shared GPIO registers. Once you have a pin, you can just use it without having to think about how to get access to the other parts of the GPIO that you also need.

This approach has the drawback of disabling interrupts, but I'm not convinced that is more than a theoretical issue. The critical sections are very short (two register modifies usually) and are only invoked during the initial setup anyway. If someone can present a real-world use-case where such critical sections are actually an issue, we can still provide the "coarse-grained" way as an alternative, but I'd keep the CS approach even then because of the superior ergonomics.

So action points could be the following:

  • Use critical sections instead of &mut moder, &mut afrl ...
  • Use bitbanding where possible, if possible.
  • Bonus: Document WCET

@Sh3Rm4n
Copy link
Member

Sh3Rm4n commented Jul 28, 2020

On critical sections: If bitbanding is an option, we would only need critical sections for methods, which need access to multiple global registers.

AFAIK, these are mostly methods used for general configuration of the pins or peripherals, .e.g into_... which are methods that are not necessarily run in hoot loops but at initialization / program startup.
The worse execution time is a price worth to pay for the improved ergonomics, IMO.

@thalesfragoso
Copy link
Member

First I would like to clarify that I don't think the possibility of a fn evil() scenario is unsound, so I don't think we have a problem in today's api (keep in mind that I've only glanced at the code, because I don't follow this crate up close since I don't own a F3).

But I had trouble to find any useful resources. Are there examples where the f1xx-hal is using bitbanding? Are there bitbanding examples for rust in general? As it is an ARM Cortex-M(3|4) feature, it is applicable in general.

It's usually used to enable peripherals' clocks on the RCC register, you can find it in use in the F1 hal and here is also an example on the F4: https://github.com/stm32-rs/stm32f4xx-hal/blob/master/src/pwm.rs#L165-L172. However, AFAIK the F3 doesn't have support for bit banding.

AFAIK, these are mostly methods used for general configuration of the pins or peripherals, .e.g into_... which are methods that are not necessarily run in hoot loops but at initialization / program startup.

You can also use that to say that the un-ergonomics isn't that bad since they are only used in the initialization where the user has all the &muts in hand. Either way, I wouldn't consider that a strong argument since we don't really know how the user will use this, there is even some movement to provide some as_config methods to allow using a pin in another configuration without changing its type, so it does have some use cases.

@teskje
Copy link
Collaborator

teskje commented Jul 28, 2020

However, AFAIK the F3 doesn't have support for bit banding.

I think that is correct. At least for the DISCOVERY, there are no bit-banding regions marked in the memory map. So bit-banding is not an option we need to discuss here further (though definitely relevant for more general discussions).

You can also use that to say that the un-ergonomics isn't that bad since they are only used in the initialization where the user has all the &muts in hand. Either way, I wouldn't consider that a strong argument since we don't really know how the user will use this, there is even some movement to provide some as_config methods to allow using a pin in another configuration without changing its type, so it does have some use cases.

Personally, I'd also like my initialization code to look clean, but I can understand if not everybody cares about that. It's certainly not a big issue to use the &mut-passing pattern in initialization code, so I agree this is not a strong argument.

However, as @adamgreig pointed out in Matrix, there are also scenarios in which one wants to re-configure a pin at runtime. I've not needed that myself, but the fact that some HALs have introduced methods for temporary re-configuration (e.g. stm32-rs/stm32l0xx-hal#74) makes it clear that this is still a common use-case. In these scenarios, passing around shared registers becomes unwieldy, especially when you have multiple pins owned by several components that you want to re-configure. Then you'll have to store the shared registers globally somehow. If you use something like RTIC that allows you to share resources easily that's actually not as bad, but still more of a hassle than simply being able to call .into() on your pins.

@teskje
Copy link
Collaborator

teskje commented Jul 28, 2020

And let's not forget the drawback of passing &mut arguments @IamfromSpace mentioned further above, namely that it makes it impossible to implement some common traits for pins, because the trait methods provide no way to pass in the additional context required. I might be wrong here, but it looks like this is the reason we currently have into_x methods on the pins, instead of having them implement the Into trait.

@Sh3Rm4n
Copy link
Member

Sh3Rm4n commented Jul 28, 2020

So the only way to go is to use critical sections instead of &mut passing. I see the elegancy of &mut but it is far too strict for more sophisticated programs than some 100 lines example code.

I might be wrong here, but it looks like this is the reason we currently have into_x methods on the pins, instead of having them implement the Into trait.

II would rather be careful using the Into trait for GPIOs, because it could make the GPIO state transition to implicit IMO, but yes, because of the global register passed in via &mut, we can't use Into.

@thalesfragoso
Copy link
Member

Then you'll have to store the shared registers globally somehow. If you use something like RTIC that allows you to share resources easily that's actually not as bad, but still more of a hassle than simply being able to call .into() on your pins.

The runtime case cuts both ways, now you can be using the methods in a hot and/or sensitive part of your code, which can cause the critical sections to be a problem. My biggest problem with changing the methods is that it takes away the option from the users that don't even need or want a critical section because they have the &mut at hand or went through the "trouble" to have it specially to avoid costs. It would also block the user from using a more fine grained mutex or even a sound unsafe block.

However, I wouldn't be against having both methods, one zero cost with the &mut and the other with the critical section properly documented.

I see the elegancy of &mut but it is far too strict for more sophisticated programs than some 100 lines example code.

I wouldn't consider that to be completely true.

@Sh3Rm4n
Copy link
Member

Sh3Rm4n commented Jul 29, 2020

Hm, I overstated it a bit.

I see the elegancy of &mut but it is far too a bit strict for more sophisticated programs than some 100 lines example code.

I like the idea of implementing both. Keep the &mut implementations as is, as they still are the only option to provide safety without any runtime cost and add the alternative with critical sections.

These could be added as extra into_... methods. I wonder about the naming though. Any ideas?

@teskje
Copy link
Collaborator

teskje commented Jul 29, 2020

My biggest problem with changing the methods is that it takes away the option from the users that don't even need or want a critical section because they have the &mut at hand or went through the "trouble" to have it specially to avoid costs.

I understand, I'm just not convinced anyone really needs that option. What would help to convince me is if you (or anyone else) could point me to a real world use-case that wouldn't be possible with critical sections.

I also wouldn't object to implementing both approaches in principle. However, I'd make the &mut approach contingent on anyone actually needing it. If we implement it and nobody uses it then, it would just be dead weight. And there is considerable implementation and maintenance overhead involved with implementing both ways, since then we have to think about how to make all the peripherals play nicely with them both.

These could be added as extra into_... methods. I wonder about the naming though. Any ideas?

For the critical section approach, is there any reason not to just implement the From/Into traits? That should be possible if there are no extra arguments that need to be passed in.

@adamgreig
Copy link
Member

And there is considerable implementation and maintenance overhead involved with implementing both ways, since then we have to think about how to make all the peripherals play nicely with them both.

Use cases aside, could the critical section based method simply use a CS to create the relevant &mut in an unsafe block, then pass it to the &mut version of the function?

@teskje
Copy link
Collaborator

teskje commented Jul 29, 2020

Use cases aside, could the critical section based method simply use a CS to create the relevant &mut in an unsafe block, then pass it to the &mut version of the function?

Yes, that sounds quite feasible. With this peripheral methods that reconfigure pins would be duplicated but otherwise the code wouldn't change much. Hm, sounds like it wouldn't be as bad as I initially thought. Might even be worthwhile to implement them both just for the sake of experimentation, to get data on what people like to use.

Unrelatedly, I think I found another advantage of the CS approach. It makes ergonomic use of fully erased pins possible. On fully erased pins, you cannot change the configuration using the &mut passing method, since which registers to pass in depends on the GPIO port, which isn't known at compile time. With a CS you can just conjure the correct registers dynamically at runtime.

@IamfromSpace
Copy link
Contributor Author

Use cases aside, could the critical section based method simply use a CS to create the relevant &mut in an unsafe block, then pass it to the &mut version of the function?

This would configure the device correctly, but would be unsound, leaving race conditions. The trick with &mut depends on only one reference existing ever, at which point the borrow checker does the rest of the work for us. If we create one, the borrow checker can’t enforce soundness for us (due to the lack of R-W atomicity). I made this mistake before the community pointed it out too.

The performance with CSes is certainly interesting. The &mut is zero cost in performance, whereas the CS strategy must lock and unlock. I’m not so certain it’s costly enough to merit concern, as the entirety of RTIC (formerly RTFM) is based on the concept.

On the topic of RTIC, there was one more approach that came up late in these discussions, but needs a lot more exploration. The idea was to expose a type parameter with a Mutex Trait constraint on each type that depends on global registers. With RTIC, it would use their Exclusive, and then we could also provide a simple CS impl, or even a F***ItGoFast impl that unsafely executes.

This still might be a bit confusing to a newcomer, but a tutorials would be sufficient and “learn once.“ We then avoid the “can’t impl common traits” problem. We’d really need a proof of concept for this. If it works though it seems like the most promising way to meet the majority of the numerous constraints we face here.

If that’s not possible, I’m also still in favor of the CS approach. We want our abstractions to be as free as possible, but if we don’t fully abstract, then I think we’ve missed our primary goal. The CS strategy isn’t free, but it’s not so costly, and the things we need them for are generally very simple; careful reviews and I think we’d be fine.

@adamgreig
Copy link
Member

This would configure the device correctly, but would be unsound, leaving race conditions.

Sorry, I don't think I was clear: we provide two configuration methods, one of which internally creates a critical section, and inside the critical section it uses an unsafe block to conjure a new &mut, which it then passes (still inside the critical section) to the existing method which requires the &mut. The existing method is unchanged from the current form where the &mut is passed in by the user. Once it's done, the new, critical-section-using method leaves the critical section and returns. There's no race condition because the newly created &mut only exists inside the critical section. My suggestion is just to make it easier to implement having both methods available; the safety is the same as the proposal to use critical sections at all.

I’m not so certain it’s costly enough to merit concern, as the entirety of RTIC (formerly RTFM) is based on the concept.

RTIC does not lock; it changes the BASEPRI (and uses interrupt priorities), so higher priority interrupts can always execute. That's a critical (sorry) difference from using critical sections, because activity in lower-priority interrupts does not affect WCET of higher-priority interrupts.

@IamfromSpace
Copy link
Contributor Author

we provide two configuration methods...

Ah, sorry, I saw that discussion and didn’t put two and two together—that is sound, my mistake.

In this regard, I think that the CS approach precludes the vast majority of cases. The instruction cost is just not that high, it doesn’t seem like there are that many applications that are truly 5-6 cycles away from meeting requirements (but I could be wrong). I do like the pragmatism though, it’s certainly appealing.

RTIC does not lock

Right, sorry, I was being too loose with terms. You’ve said it much better! And you’ve well highlighted the downside of the CS, we’d have to use the highest priority, so every RTIC task could in theory be affected in WCET.

I wonder if we could use the &mut strategy and the Mutex trait strategy too. In theory, any &mut REG can satisfy Mutex trivially, right? Ideally the compiler would optimize away the rest.

@teskje teskje mentioned this issue Jul 31, 2020
4 tasks
@IamfromSpace
Copy link
Contributor Author

IamfromSpace commented Aug 15, 2020

@adamgreig Shoot, I was making some progress with a PoC and realized that &mut conjuring isn't safe either. So this:

...which internally creates a critical section, and inside the critical section it uses an unsafe block to conjure a new &mut, which it then passes (still inside the critical section) to the existing method which requires the &mut.

Is unsound, because the other reference may get interrupted by this CS, eliminating the atomicity that should exist due to the single &mut and the borrow checking.

As an example, consider the following. Let's say we have some reg with toggle_top_bit and toggle_bottom_bit. We also have configure_top which automatically conjures a &mut reg.

Our main execution path looks like this:

  let dp = Peripherals::take().unwrap();
  dp.reg.modify(|w| w.toggle_bottom_bit());

And we have interrupt code simply calls configure_top(), which looks like:

fn configure_top() {
  interrupt::free(|_| {
    unsafe {
        (*REG::ptr()).modify(|_, w| w.toggle_top_bit());
      }
  }
}

In this case we have an ordering that fails us:

  1. The main execution starts the modify and reads the value of reg (ex. 0b000000000)
  2. The interrupt begins execution
  3. The configure_top conjures the &mut and disables all other interrupts via CS
  4. configure_top reads reg (still 0b00000000)
  5. configure_top writes the value after flipping the top bit (now 0b10000000)
  6. The CS ends
  7. The main execution finishes the modify by writing ignoring the previous write (value is incorrectly 0b00000001)

We can only conjure &mut if we know that there are no other references to it. And the way to prove that is... well, we're right back to the borrow checker.

I think there's still some opportunities here (notably Cloneable Mutexes that hold that one register via &mut conjuring), but this puts a pretty big hole in a pure CS strategy (and my original premise :/).

@teskje
Copy link
Collaborator

teskje commented Aug 15, 2020

@IamfromSpace If I understand correctly what you describe is that we run into problems if any other parts of the code also hold (non-conjured) references to a register at the same time we conjure one in a CS. That makes sense to me.

However, I don't see how it should be an issue in the GPIO case. If you construct an instance of the HAL's Gpio abstraction, you pass it the PAC registers of the respective GPIO port (say GPIOA). After this, Gpio owns all of the GPIOA registers and controls who gets references to them. So to make it all safe, we just need to make sure that Gpio never gives out references to it's registers and guards all the internal (non-atomic) register accesses with critical sections. I don't see how this could lead to a data race like you describe. Am I missing something?

@IamfromSpace
Copy link
Contributor Author

@ra-kete yep, that’s all correct! It’s not an issue as long as we can prove (one way or another) that there is no other reference floating around. If we can take it, then we can do the CS-conjure-combo anywhere we’d like safely.

That’s still a challenge for registers like RCC which are widely used and therefore sort of hard to take without preventing other usage.

One thing that makes sense here to to make something like a ShareableRCC, which takes the RCC and uses a CS on modification (through the Mutex trait is a natural fit) and then can safely be Copy and Clone. This helps for things that need global “fan-out” like PWM Channels.

I‘m hoping to put a branch together that does something along these lines to demonstrate its usage.

@adamgreig
Copy link
Member

@IamfromSpace Good catch, you're quite right, we can't allow simultaneous use of "pass in the &mut" and "we'll create one in an unsafe block". Your ShareableRCC idea sounds a lot like the REC struct in the H7 HAL which might provide some inspiration: https://github.com/stm32-rs/stm32h7xx-hal/blob/master/src/rcc/rec.rs https://docs.rs/stm32h7xx-hal/0.6.0/stm32h7xx_hal/rcc/rec/index.html

@IamfromSpace
Copy link
Contributor Author

IamfromSpace commented Aug 19, 2020

Alrighty, I feel like I've got something fairly interesting that does a lot of things that have been talked about. I've done this on the PWM module, because I'm very familiar with it and it's actually a very interesting example of some of the challenges here.

PWM is neat because it has a bit of everything: it needs the RCC block to configure the timers, then it has multiple channels that all share common configuration registers. There's no way to pass in these config registers and still implement the PwmPin trait. However, in theory, for single channel timers or multi channel where you only wanted one channel, then there would be no sharing to worry about!

This utilizes the Mutex trait for anything that might be shared. So the RCC block can't just be passed in directly, but if you have a &mut then you can easily wrap it in a mutex_trait::Exclusive. When using multiple channels, you need something like GlobalInterrupt as your Mutex (since it's safe to Clone), but the Clone restriction is dropped if you only have or construct a single channel. In that case, you can use OwnedExclusive as your Mutex, and (likely) incur no runtime penalty. Essentially, you only need to fallback to CSes as a last resort.

The Mutex impl is selected through type parameter declarations. So if you want to use GlobalInterrupts internally you'd do something like:

let my_peripheral: Peripheral<GlobalInterrupt<_>>  = make_my_peripheral(...);

The working branch is here where the most useful place to look would be the module docs and the examples.

I'm pretty excited about this, because it plugs a few race conditions in a way that gives the user a lot of control. The downside is pretty clear--it makes the interface a bit more advanced. I think there's been a major commitment to things like type states in the embedded ecosystem though, so it doesn't seem like a departure in that regard.

My hope was that you'd never need to pass in global registers, but I think we can only "push it up." This example avoids passing in anything when working with a PwmPin, but does not avoid passing in APB1/APB2 on initial setup. This seems reasonable given our constraints, and covers the most important part where the constructed peripherals are generic.

@Piroro-hs
Copy link
Contributor

I've been toying around with gpio module to remove those &mut gpioa.moder, &mut gpioa.afrl, .. without Mutexes. Working branch is here.

However I'm not sure this is legal or not🥴

unsafe { &*(&self.afrl as *const _ as *const AtomicU32) }

@mvirkkunen
Copy link

@Piroro-hs Does STM32F3 actually support atomic operations in peripheral memory space? I couldn't immediately find any mention one way or the other in documentation.

@Piroro-hs
Copy link
Contributor

Piroro-hs commented Mar 20, 2021

@mvirkkunen ST's "STM32 Cortex®-M4 MCUs and MPUs programming manual" (PM0214) said nothing about such restrictions, so I think peripheral memory space (device memory) is treated like normal memory in LDREX/STREX operations.
(At least it correctly loads, stores, and fails when interrupt occurred on stm32f303x8.)

@adamgreig
Copy link
Member

adamgreig commented Mar 20, 2021

In the ARMv7-M architecture reference manual, A3.4.5 "Load-Exclusive and Store-Exclusive usage restrictions", it says:

LDREX and STREX operations must be performed only on memory with the Normal memory attribute.

So I don't think it could work in device memory. Though it might be worth using a normal AtomicBool that's shared between all the pins as a semaphore lock on the AFRH and AFRL, which would still prevent needing to pass them in, at the cost of some memory (maybe a pointer per pin, which sort of sucks, or maybe one static atomic per port, which might not be so bad...).

Edit: having said that, I sort of bet it would work in practice, since I think for Cortex-M4 single core devices there's only a single exclusive access monitor which tags the entire memory space for exclusive access, which would probably cover the device memory... but it's strictly against the architecture reference manual.

@Piroro-hs
Copy link
Contributor

In the ARMv7-M architecture reference manual, A3.4.5 "Load-Exclusive and Store-Exclusive usage restrictions", it says:

LDREX and STREX operations must be performed only on memory with the Normal memory attribute.

Oh, so sad...
Thanks for clarification.

@mvirkkunen
Copy link

I'm sad too. It would be so nice to have a good solution for this, but nope. Everything has some issue...

@adamgreig
Copy link
Member

I'm sad as well, it's such an annoying problem to solve!

I found this interesting earlier discussion, https://internals.rust-lang.org/t/atomic-cmpxchg-with-volatile-semantics/4101, which seems to confirm my suspicion that it does work on cortex-m4, and suggests it might even be an oversight in the architecture reference manual. But, as far as I'm aware the situation around volatile operations hasn't changed since then: using AtomicU32 would not emit volatile operations, so wouldn't be suitable here either (the compiler could, now or in the future, optimise around your atomic operations so they no longer work).

I wonder if cortex-m could gain some new asm routines for volatile atomic operations...

@Piroro-hs
Copy link
Contributor

Piroro-hs commented Mar 27, 2021

Updated to use precompiled assembly lib for volatile atomic register modification.

@adamgreig
Copy link
Member

By the way, I heard secondhand from an anonymous ARM engineer that ldrex/strex will work on device-type memory on Cortex-M4, even though the ARMv7-M reference says it wouldn't be allowed for the architecture in general.

@Sh3Rm4n
Copy link
Member

Sh3Rm4n commented Mar 28, 2021

Is this a reasonable approach to "abuse" this hack? When it requires asm I guess it is a no-go for now, because we would need the nightly channel?

@mvirkkunen
Copy link

@Sh3Rm4n fwiw cortex-m already uses plenty of asm - it just provides everything as precompiled blobs that are linked in as opposed to using inline assembly which is unstable.

@Rahix
Copy link
Contributor

Rahix commented Apr 20, 2021

I'm not entirely sure if this approach has been suggested here before, if yes, please disregard...

Following up on the 'two methods' approach, instead of creating the &mut Reg references (which was shown to be unsound), a design like this could be chosen:

impl Pin {
    pub fn into_push_pull_output(self) -> Pin<...> {
        cortex_m::interrupt::free(|cs| {
            self.into_push_pull_output_cs(cs)
        })
    }

    pub fn into_push_pull_output_cs(self, cs: CriticalSection) -> Pin<...> {
        // SAFETY: We are certainly in a critical section here
        unsafe {
            let regs = &mut (*self.gpio.ptr());
            // ...
        }
    }
}

This allows:

  • Super simple access for basic users:
    let pa0 = gpioa.pa0.into_push_pull_output();
  • Safe and low overhead initialization of multiple pins in a single critical section.
    let (pa0, pa1, pa2) = cortex_m::interrupt::free(|cs| (
        gpioa.pa0.into_push_pull_output_cs(cs),
        gpioa.pa1.into_push_pull_output_cs(cs),
        gpioa.pa2.into_push_pull_output_cs(cs),
    ));
  • Unsafe and zero overhead initialization for very special situations:
    let pa0;
    {
        let cs = unsafe { CriticalSection::new() };
        pa0 = gpioa.pa0.into_push_pull_output_cs(cs);
    }

Of course this only works on single-core devices but are there any stm32f3's which are multi-core? Also it does not depend on an undocumented "feature" of the CPU...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants