Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect Bootrom Instrinsics #288

Merged
merged 5 commits into from
Feb 13, 2022
Merged

Connect Bootrom Instrinsics #288

merged 5 commits into from
Feb 13, 2022

Conversation

Sizurka
Copy link
Contributor

@Sizurka Sizurka commented Feb 5, 2022

This connects the ROM functions to intrinsics, including both basic ones (e.x. memset) and the floating point library.

Floating Point Correctness

The floating point ROM functions do not handle or generate NaNs. In the Pico SDK, NaN correctness is enabled with PICO_FLOAT_PROPAGATE_NANS (on by default), which enables NaN correctness by checking arguments before handing off to the ROM functions. So this implements similar logic around the affected functions.

Performance

Most of the ROM functions are faster than the default implementations in compiler-builtins. However, without ROM function caching (the rom-func-cache feature), the total performance is usually a bit of a wash (±2% for floating point operators). With ROM function caching, they're usually 10-20% faster. The exception is division, which is nearly twice as fast (and always a gain); probably due to efficient use of the hardware divider. For those intrinsics that are not at least as good, they are decorated with the non-standard attribute #[slower_than_default]. Currently, this just makes the intrinsics macro not actually declare the intrinsic; but in the future this could be made optional (e.x. to save flash or because the ROM functions are always a gain on a XIP cache miss).

Additionally, without NaN handling they gain 5-10% in general. So that would make many of them always a gain.

Floating Point Functions

There are intrinsics defined for floating point functions (sin, cos, exp, etc), not just the operators. However, these are not currently accessible. This is because #![no_std] currently disables things like f32::sin() (see rust-lang/rfcs#2505). The current workaround to that is just to use libm functions directly, so that can just be substituted for using the ROM functions directly. Many of these are significantly slower than the libm implementations anyway (probably due to ROM space constraints), so it's not a huge loss.

The intrinsic names mirror compiler-builtins own definitions for them, so hopefully when core picks up math support, they will be used automatically.

Bootrom Versions

The double precision functions and some of the single precision conversion/atan2 are only available on V2 or greater of the bootrom. The intrinsics that use thse are decorated with the non-standard attribute #[bootrom_v2]. The macro currently just always enables these. My logic here is that this is basically the same as how we were already exposing them in the ROM function definitions anyway. However, it might make sense to make this optional. These are only enabled when the feature rom-v2-intrinsics is set. This also adds a check for the bootrom version when using any of the V2 only functions (explicitly or implicitly (setting the feature and using a V1 bootrom)), and it will now panic in a sensible way instead of producing undefined behavior when one is used.

The Pico SDK handles this case with fairly complicated trampoline/SVC exception scheme to automatically insert software code. I figured this was well out of scope for this PR.

Possible Future Work

This PR will work on its own but there's some things that could be improved. I could certainly make some progress on these, but I wanted to get the basics out first.

  1. Make #[slower_than_default] intrinsics controllable. The argument here is that using them saves flash space, and they're still faster in the event of an XIP cache miss. So for large programs with only occasional floating point use, it could make sense.
  2. Make #[bootrom_v2] intrinsics controllable. Controlled by the rom-v2-intrinsics feature.
  3. An equivalent of disabling PICO_FLOAT_PROPAGATE_NANS. Most of the time "non-finite" is sufficient, and the ROM functions do propagate that. This would represent a speedup of all floating point operations.
  4. Expose the wrapped non-operator functions somehow. One possibility would be a trait implemented for f32 and f64 that can be brought into scope to use them.
  5. Better handling of unsupported ROM versions. Right now, it'll panic on the first use of an unsupported operations (either calling a V2 ROM function directly on a V1 chip, or compiling with rom-v2-intrinsics set and using an unsupported operation on a V1 chip). It would be better to panic sooner (i.e. on #[entry]), but I'm not sure how you'd detect if an unsupported operation was used at all. Maybe you could do something with having every V2 function reference some linker symbol and look for if it's been GCed during linking (and panic if it hasn't), but I'm not really sure how you'd actually test for that. The even better option would be to do what the Pico SDK does and fall back to flash implementations. This would probably be easier to implement in the detection sense: just provide alternatives to the V2 functions and have their ::ptr() implementations return the alternative on a V1 bootrom. The hard part here is providing the alternative implementations: you can't (?) reference compiler-builtins as an actual crate dependency, because doing so makes it be linked before rp-hal and that messes up the magic that's used to allow for intrinsic overrides.

Closes #237

Fix a few function signatures that don't match the ROM according to the
datasheet.
_memset4 also has a wrong code in the datasheet, so match it to the
actual ROM.
intrinsics! {
#[slower_than_default]
#[bootrom_v2]
#[alias = __eqsf2, __ltsf2, __nesf2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand those aliases. Aren't those different operations (equal, less than, not equal and less-or-equal?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the definition of the functions is things like "zero if strictly equal" or "negative if less" so the allow any other value if the condition isn't met. compiler-builtins does the same thing, which is actually where I got the logic from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine, I think now I understand: The underlying function (bootrom function _fcmp, softfloat library call __cmpsf2) distinguishes three cases, <, ==, >. The library also provides several alternatives which don't guarantee to distinguish all three cases, but just separate subsets. I guess they are separate in case the hardware allows for faster implementations if you are only interested in such a subset.
Thanks for those pointers!

/// entirely.
/// * `bootrom_v2` - indicates that the override is only available
/// on a V2 bootrom or higher. Currently this is ignored and the
/// override is always used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens on chips with V1 bootrom?
If it causes a crash or undefined behavior, I'd say that this is the wrong default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I don't have a chip with a V1 bootrom to test. I suspect it would crash: the table resolve would probably return 0, so you'd probably get a hard fault exception (which would panic). However, I'm not actually sure how common V1 bootroms are (i.e. where they ever actually public?). The Pico SDK has logic that I'm not clear about: it only inserts the fallbacks if both V1 ROM and B0 chip support is enabled.

If they actually exist in the wild, then yeah I'd agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, V1 ROM is equivalent to B0 chip. (B0 has ROM v1, B1 has ROM v2, B2 has ROM v3)
I didn't find any hard numbers, but after some searching I'd say that the original RP PICO boards sold in February 2021 were based on RP2040 revision B0, so there are probably quite a few in the wild.
When I'm back home I'll check which revisions I have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, probably best to make non-default then, so updated with them gated by the feature rom-v2-intrinsics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a look, the first rp pico boards I bought really contained b0 stepping chips. From a local shop, no special early samples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I confirm the expected behavior, more or less:
Building a firmware with rom-v2-intrinsics enabled, and flashing it to a RP2040 B0, causes a crash when executing a division.
What actually happens is not even a jump to address 0. Instead, the lookup for the soft_double_table returns 0, and then the ddiv method is looked up at an offset from that position. Incidently, the offset of the ddiv method is 0x0c, and the memory location 0x0c happens to contain a pointer to the HardFault handler in ROM, which is just a loop { wfi() }. Therefore, the debugger doesn't even show a crash.

I wonder if it would be better to somehow detect that situation and actively call panic!(), instead of just running into some random undefined behavior?

BTW, also tried on a RP2040 B1, works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well one idea might be to extend the ::ptr() method of all ROM functions that require a V2 bootrom to check the bootrom version and panic if they're called at all on a V1. The check is pretty simple: just fetching the ROM at 0x13 and comparing the integer number. I'd probably just add an attribute for the ROM functions definitions and check it in a modified lookup body, and you could also add it to the general case for soft_double_table().

The version might be a good idea to expose in general as part of the ROM functions. Probably orthogonal to this PR, though, since it's a general problem with the ROM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I added a check for a V1 bootrom in the lookup for the floating point functions. So you should get a more sensible panic if you try and use them on an unsupported version, either explicitly or implicitly through an intrinsic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, works (panics) as expected:

INFO  Program start
└─ rp2040_project_template::__cortex_m_rt_main @ src/main.rs:34
ERROR panicked at 'Double precision operations require V2 bootrom (found: V1)', /home/jan/rp2040/rp-rs/rp-hal/rp2040-hal/src/rom_data.rs:384:9
└─ panic_probe::print_defmt::print @ /home/jan/.cargo/registry/src/github.com-1ecc6299db9ec823/panic-probe-0.3.0/src/lib.rs:91

@Sizurka
Copy link
Contributor Author

Sizurka commented Feb 6, 2022

What's the correct way to make CI happy here? It looks like it's failing because it's compiling for a x86_64-unknown-linux-gnu target, which fails on the AAPCS ABI functions. Should I just add #[cfg(target_arch = "arm")] to all the macro generated functions?

Add a check for a V2 bootrom when using floating point functions that
require it.  Panic when it's not present.
@jannic
Copy link
Member

jannic commented Feb 7, 2022

I don't know much about those soft float routines (as obvious from my comment. So I can't really tell if the code is correct.
But that aside, the pull request looks good to me: It's very readable and well documented, and just looks like it was carefully written with a lot of thought. Good work! 🎉

One thing I'd add to the list of 'possible future work' is the handling of older bootrom versions. With the checks added one gets a proper panic message at runtime, which is fine. But it would be even better to get this panic message at the start of main and not at the first call to some unavailable ROM function. But I guess that would require something like a custom #[entry] macro. I'd say that's out-of-scope for now.

As an alternative, one could detect that case and fall back to the slower compiler-builtins at runtime. Doing that without a performance impact at every call would be tricky, and the resulting firmware would always be larger. So I'm not sure if it's worth the effort. But it would be a nice default setting: Works on every revision, and uses the faster ROM routines if available.

Regarding the CI failure, I guess adding #[cfg(target_arch = "arm")] would be fine. It just doesn't make sense to compile this inherently architecture specific code on x86. But I'm sure other people here know much more about the CI pipeline than I do.

@Sizurka
Copy link
Contributor Author

Sizurka commented Feb 7, 2022

One thing I'd add to the list of 'possible future work' is the handling of older bootrom versions.

Good idea, I've added some thoughts on that.

@jannic jannic mentioned this pull request Feb 10, 2022
@9names
Copy link
Member

9names commented Feb 12, 2022

I guess adding #[cfg(target_arch = "arm")] would be fine.

Yep, sounds fine to me.
It only needs to be added to 4 places, there's nothing the doctests can do with them, none of them are public functions/data anyway (so docs aren't required to build) so that is perfectly reasonable.

diff --git a/rp2040-hal/src/lib.rs b/rp2040-hal/src/lib.rs
index 8d2db4e..78967bb 100644
--- a/rp2040-hal/src/lib.rs
+++ b/rp2040-hal/src/lib.rs
@@ -15,6 +15,7 @@ pub use paste;
 
 pub extern crate rp2040_pac as pac;
 
+#[cfg(target_arch="arm")]
 #[macro_use]
 mod intrinsics;
 
@@ -23,6 +24,7 @@ pub(crate) mod atomic_register_access;
 pub mod clocks;
 mod critical_section_impl;
 pub mod dma;
+#[cfg(target_arch="arm")]
 mod float;
 pub mod gpio;
 pub mod i2c;
diff --git a/rp2040-hal/src/rom_data.rs b/rp2040-hal/src/rom_data.rs
index 4e28aec..a923626 100644
--- a/rp2040-hal/src/rom_data.rs
+++ b/rp2040-hal/src/rom_data.rs
@@ -280,6 +280,7 @@ rom_functions! {
 }
 
 // Various C intrinsics in the ROM
+#[cfg(target_arch="arm")]
 intrinsics! {
     #[alias = __popcountdi2]
     extern "C" fn __popcountsi2(x: u32) -> u32 {
diff --git a/rp2040-hal/src/sio.rs b/rp2040-hal/src/sio.rs
index a2f769f..443f45c 100644
--- a/rp2040-hal/src/sio.rs
+++ b/rp2040-hal/src/sio.rs
@@ -286,6 +286,7 @@ impl HwDivider {
     }
 }
 
+#[cfg(target_arch="arm")]
 intrinsics! {
     #[aeabi = __aeabi_uidiv]
     extern "C" fn __udivsi3(n: u32, d: u32) -> u32 {

@9names
Copy link
Member

9names commented Feb 12, 2022

Looks good to me, though I would feel more comfortable with it if you could turn all ROM intrinsics off with a feature so we could toggle it quickly to test for perf / correctness issues.

Add a standardized macro for intrinsics export and connect the simple
ROM functions to intrinsics.
Hook up the ROM functions to floating point intrinsics.
@Sizurka
Copy link
Contributor Author

Sizurka commented Feb 12, 2022

Updated with the intrinsic generation controlled by #[cfg(target_arch = "arm")] and added a crate feature disable-intrinsics that disables all intrinsic exports for testing purposes.

Copy link
Member

@9names 9names left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Will raise an issue to discuss whether we should switch the subtractive feature (disable_intrinsics) with an additive one (enable_intrinsics) that is in default_features.
It is generally preferred to have 'features' that add features rather than remove features, but I feel that isn't sufficient cause to not merge this.

@9names 9names merged commit f46c23d into rp-rs:main Feb 13, 2022
@Sizurka Sizurka deleted the rom-intrinsics branch February 13, 2022 15:13
@9names 9names mentioned this pull request Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide optimised math library
3 participants