Better support for ARM and other architectures #11

jk-jeon · 2021-01-08T19:51:08Z

Thanks to Tim Kientzle (@tbkka), it turned out that the current code is not working correctly with ARM processors. I believe this is due to errors in fallback implementations of intrinsics. Also the current #define switches are not able to correctly detect availabilities of intrinsics for ARM processors, resulting in suboptimal code generation.

See: swiftlang/swift#35299.

The text was updated successfully, but these errors were encountered:

jk-jeon · 2021-01-08T20:04:37Z

Just fixed an error that I found.
@tbkka, Could you test it with your test suite?

tbkka · 2021-01-08T20:21:14Z

That addresses the cases where too few digits were being generated. I'm still seeing unexpected trailing zeros, though.

jk-jeon · 2021-01-08T21:12:44Z

Could you give me some specific examples so that I can test on my machine?

tbkka · 2021-01-08T21:33:43Z

Here are a few binary32 values that give me unexpected trailing zeros on ARM32 but give the expected result on x86_64:

  Actual: 2.0E-43  Expected: 2E-43  Input: 0x1.1ep-142
  Actual: 1.0097420E-28  Expected: 1.009742E-28  Input: 0x1p-93
  Actual: 6.710890E7  Expected: 6.71089E7  Input: 0x1.000008p+26

jk-jeon · 2021-01-08T22:44:49Z

Okay, I think I fixed it now.

By the way, do you know of a way to more correctly deciding availability of intrinsics like __builtin_ctz that also works for ARM?
Currently I'm just doing something like

#if (defined(__GNUC__) || defined(__clang__)) && defined(__x86_64__)
...
#endif

or

#if (defined(__GNUC__) || defined(__clang__)) && defined(__SIZEOF_INT128__) && defined(__x86_64__)
...
#endif

jk-jeon · 2021-04-26T22:13:35Z

@expnkx Did make that change in 4032d67, thanks.

I still think we can further improve the coverage by taking care of ARM32, so I'll leave this open.

malachib · 2024-02-15T03:57:52Z

Would 8-bit / AVR compatibility make any sense for this algorithm?

jk-jeon · 2024-02-15T18:34:20Z

@malachib Admittedly that's something that has never been in my consideration so far. But, I'd say, why not?

First of all, the code as-is is supposed to be portable as long as things like std::uint64_t are supported by the toolchain, either with direct hardware support or software emulation. Do toolchains you have in mind support fixed-sized int's up to 64-bit? How about std::uint_least64_t and friends?

However, the code currently just assumes that float is IEEE-754 binary32 and double is IEEE-754 binary64. If that's not the case and if you try to use the mismatched one, the compilation will fail. (Actually, there is a possibility that it compiles and silently does nonsensical things in runtime, but I think that's unlikely.)

I did a bit of googling and it seems that floating-point support in 8-bit AVR toolchains is sort of irregular, is that right? Like, some toolchains use binary32 for both float and double, and some others use some variations of IEEE-754 instead, etc..

I've been keeping in mind that the library eventually needs to support some exotic float/double and even other user-provided float-like types, but I first need to know what exactly are the target formats that need to be supported. Can you give me some information about this?

Also, will the resulting code really be appropriate for specific platforms in your mind regarding performance/binary size is another question that I have no idea how to answer. But I'd want to see how it goes.

Thanks for the interest!

malachib · 2024-02-15T20:28:48Z

@jk-jeon a very thoughtful response, thank you!

I typically in fact use ESP32 Espressif chips, but for auxiliary testing I use Arduino AVR which last I checked used a very old GCC (last I checked it was 4.x or 5.x, but apparently they are up to 7.x https://github.com/arduino/toolchain-avr ). I doubt the format has changed between those releases - I always presumed they used IEEE-754, Presumed being the key word.

I did notice you had some specialization possibilities for other formats, kudos to that architecture. AVRs definitely come with serious size constraints, so even a 512-byte ROM lookup buffer isn't a slam dunk for those little guys. But might be worth it.

I wanted to ask the question because the algorithm perhaps isn't even viable without native 32-bit/64-bit integer support at the MCU level. Basically asking more questions than giving answers... lol.

jk-jeon · 2024-02-15T21:50:45Z

@malachib So I checked the AVR LibC user manual, and it seems:

All the 64-bit integer types are supported, unless the user specifies the compiler option -mint8 which turns int into 8-bit, which violates the C/C++ standards and seems to be more or less not recommended.
Both float and double are IEEE-754 binary32. (The manual only says they are both 32-bits and doesn't explicitly says that they conform to IEEE-754 binary32, but it seems this is kind of implied here and there.)

Given these, I think probably the code as-is may just work there, both for float and double, given the caveats:

I had no idea about this Harvard vs von Neumann shenanigans until now. Ideally static data probably needs to be on the flash memory along with the program source code so we need to decorate their definitions with some macros. Is that a correct understanding?
As you said the size of the lookup table currently is probably too big. It's 624 bytes for binary32, but I think we can compress it down further using a similar mechanism I do for compressing the table for binary64. I can't know what will be the resulting size, but I guess it would be around 100 bytes. For binary64, I think we could probably compress it down further (it's currently 586 bytes or something like that when compressed) , but I'm not sure if that's really worth doing in general, given that this specific AVR toolchain doesn't seem to care about binary64 at all, and it kinda sounds ridiculous to even try to do something with binary64 on a less powerful hardware where 600~ish bytes data are already too much. But I have no hard opinion.
I have no idea how large the stack usage is. It also will be a bit larger if table compression is used. I probably need to think about ways to reduce the stack usage, and a mechanism to request that through policy parameters.
Most importantly, I have no hardware to test some of the things mentioned above. I definitely need some help in this regard.

EDIT: Ah, wait. No, there is a big "if" here: I need several standard C++ headers... I obviously don't use some "advanced" things like <thread> or <iostream> but still...

So there seems to be several projects that tries to port freestanding C++ headers into 8-bit AVR chips, but I'm not sure if you're okay with relying on such a thing.

malachib · 2024-02-16T03:23:46Z

@jk-jeon right on. I have lots of thoughts on this

All the 64-bit integer types are supported, unless the user specifies the compiler option -mint8 which turns int into 8-bit, which violates the C/C++ standards and seems to be more or less not recommended.

Anyone who turns on that option and expects standards-conforming libraries to function OK are summarily dismissed

Both float and double are IEEE-754 binary32. (The manual only says they are both 32-bits and doesn't explicitly says that they conform to IEEE-754 binary32, but it seems this is kind of implied here and there.)

Not surprising. Long double, too https://www.avrfreaks.net/s/topic/a5C3l000000UbfZEAS/t158593

I had no idea about this Harvard vs von Neumann shenanigans until now. Ideally static data probably needs to be on the flash memory along with the program source code so we need to decorate their definitions with some macros. Is that a correct understanding?

Yes. PROGMEM is likely our tool of choice here - really only needed for the const variables. Functions are automatically put into flash/rom usually

As you said the size of the lookup table currently is probably too big. I do for compressing the table [...] would be around 100 bytes. [...] ridiculous to even try to do something with binary64 on a less powerful hardware where 600~ish bytes data are already too much.

100 bytes would be great. Definitely let's not shoot for binary64. I mean, we (you?) would have to spin up a whole custom float64 type, right? If anyone truly needs it, I motion they have to ask.

Your stack stuff I can only rely on your good judgement. AVRs definitely don't have much RAM. But, so many things are byte-aligned and IIRC pointers are 16 bit. So that's gotta help.

Most importantly, I have no hardware to test some of the things mentioned above. I definitely need some help in this regard.
So there seems to be several projects that tries to port freestanding C++ headers into 8-bit AVR chips, but I'm not sure if you're okay with relying on such a thing.

I got you covered in this area.

I am the author of one of those projects https://github.com/malachi-iot/estdlib and even started some preliminary work in anticipation of porting your code in.

I use AVRs and ESP32s for my J1939/CAN projects (my own float->char converter is truly gruesome). I'm eager to chuck in your code to test it. Though I can tell you I will definitely need that 100 byte optimization ... fitting J1939 on an AVR with 32k of ROM is no easy feat!

Perhaps we should start a github discussion?

jk-jeon · 2024-02-16T06:08:00Z

Anyone who turns on that option and expects standards-conforming libraries to function OK are summarily dismissed

I agree.

Not surprising. Long double, too https://www.avrfreaks.net/s/topic/a5C3l000000UbfZEAS/t158593

From the link:

HOWEVER if you like to run at the bleeding edge of technology then note that in the last month GCC v10 was released and one of the changes in that is that avr-gcc has now got options for switching double to be sizeof()=8.

Hmm.... okay but if I really start working on this I will consider binary64 much later.

Yes. PROGMEM is likely our tool of choice here - really only needed for the const variables. Functions are automatically put into flash/rom usually

Yeah, I think I need some mechanism to detect the toolchain and define an appropriate macro accordingly.

100 bytes would be great.

It requires some work. I think I will probably work on it in the near future but please don't expect it to be done too soon.

I mean, we (you?) would have to spin up a whole custom float64 type, right? If anyone truly needs it, I motion they have to ask.

So according to the link you provided, double can be binary64, so that's not strictly the case. Also, as long as it is IEEE-754 binary64-encoded, the exact spell of the type being double or not is not a very important detail. But anyway it seems reasonable to first not care too much about binary64.

I got you covered in this area.

Neat. But if it's not supposed to be a conforming (freestanding or not) standard library implementation, I need to provide some way to configure the library to replace those std functions I'm currently relying on. But I don't want to uglify the code by replacing std::uint32_t by JKJ_DRAGONBOX_UINT32_T and alike.

An idea that immediately pops up in my mind is to detect a user-provided macro JKJ_DRAGONBOX_STD_REPLACEMENT_NAMESPACE, and if that is defined then don't include any std headers. (The user is supposed to include all necessary headers directly then.) And then have something like

namespace stdr = JKJ_DRAGONBOX_STD_REPLACEMENT_NAMESPACE;

inside the jkj::dragonbox::detail namespace and replace all std:: with stdr::.

Downsides are (1) all needed stuffs needs to be in the same namespace, and they should have the exact same (unqualified) names as their std counterparts, (2) including necessary headers might be annoying, and it now turns the list of std things this library internally uses into a part of the library's interface, (3) not sure what will happen to the "blessed functions" like std::memcpy or std::bit_cast, maybe like losing constexpr-ness or introduce UB?

Any ideas?

Perhaps we should start a github discussion?

Please feel free to make one!

By the way, just to make sure, you're aware that Dragonbox cannot be used as a compatible replacement of printf("%f") (as the former does shortest-roundtripping formatting and the latter does fixed-precision one), right? If you ever want to have a proper replacement of printf then that's a whole another can of worms...

malachib · 2024-02-16T08:00:24Z

#57 Discussion started!

jk-jeon added a commit that referenced this issue Jan 8, 2021

Fix errors in countr_zero (see #11)

aa9904b

jk-jeon added a commit that referenced this issue Jan 8, 2021

Fix an error in countr_zero (#11)

dfa2428

alexey-milovidov mentioned this issue Jan 17, 2021

AArch64: floating point numbers are formatted with excessive zeros. ClickHouse/ClickHouse#19217

Closed

jk-jeon mentioned this issue Jan 30, 2021

[Question] Question Thread jk-jeon/fp#5

Closed

jk-jeon mentioned this issue Mar 28, 2021

C++11? #8

Open

Alexhuszagh mentioned this issue Sep 5, 2021

Add Workflows to Ensure Portability? #20

Closed

striezel mentioned this issue Jun 15, 2022

update actions/checkout in GitHub Actions to v3 #30

Merged

jk-jeon changed the title ~~Dealing with ARM~~ Better support for ARM and other architectures Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better support for ARM and other architectures #11

Better support for ARM and other architectures #11

jk-jeon commented Jan 8, 2021

jk-jeon commented Jan 8, 2021

tbkka commented Jan 8, 2021

jk-jeon commented Jan 8, 2021

tbkka commented Jan 8, 2021

jk-jeon commented Jan 8, 2021

jk-jeon commented Apr 26, 2021

malachib commented Feb 15, 2024

jk-jeon commented Feb 15, 2024 •

edited

Loading

malachib commented Feb 15, 2024

jk-jeon commented Feb 15, 2024 •

edited

Loading

malachib commented Feb 16, 2024 •

edited

Loading

jk-jeon commented Feb 16, 2024

malachib commented Feb 16, 2024

Better support for ARM and other architectures #11

Better support for ARM and other architectures #11

Comments

jk-jeon commented Jan 8, 2021

jk-jeon commented Jan 8, 2021

tbkka commented Jan 8, 2021

jk-jeon commented Jan 8, 2021

tbkka commented Jan 8, 2021

jk-jeon commented Jan 8, 2021

jk-jeon commented Apr 26, 2021

malachib commented Feb 15, 2024

jk-jeon commented Feb 15, 2024 • edited Loading

malachib commented Feb 15, 2024

jk-jeon commented Feb 15, 2024 • edited Loading

malachib commented Feb 16, 2024 • edited Loading

jk-jeon commented Feb 16, 2024

malachib commented Feb 16, 2024

jk-jeon commented Feb 15, 2024 •

edited

Loading

jk-jeon commented Feb 15, 2024 •

edited

Loading

malachib commented Feb 16, 2024 •

edited

Loading