Compile option to process floating point values in single-precision, internally #84

jrahlf · 2022-01-20T22:30:30Z

I have a project where I use single precision floats, but no doubles.
Problem is, that printf processes all floating point values as doubles, even if the caller is content with single precision formatting. On devices without double-FPU, this pulls in __aeabi_dsub, __aeabi_dmul and the likes, which are several kilobytes large.

My suggestion is to add a compile option so that floating point values are formatted with single precision.

The value would still be passed as a double to printf, which would then be converted to float.
Some of the required functionality seems to be already there, via the #if DBL_MANT_DIG == 24 define.

The text was updated successfully, but these errors were encountered:

eyalroz · 2022-01-20T23:10:53Z

I don't quite understand what you mean by:

The value would still be passed as a double to printf, which would then be converted to float.

If doubles are still to be used, what does this suggestion save you?

eyalroz · 2022-01-20T23:14:11Z

At any rate, if someone were to create a PR generalizing the internal floating-point-value handling to not necessarily use doubles, but some printf_floating_point_t, chosen via preprocessor defines, I would consider it. A PR would need to include appropriate tests of course.

jrahlf · 2022-01-22T01:59:35Z

If doubles are still to be used, what does this suggestion save you?

doubles would only be used for passing floating point values to printf(), because the standard mandates this for varargs. Passing floats directly would be even better for microcontrollers (if float precision is enough)...

The screenshot shows the added functions for a CM0+ microcontroller, this will be similar for any ARM microcontroller without dp-FPU.

If the processing inside printf() was done on float instead of double, the compiler would not have to add these functions (assuming the code does not use doubles somewhere else).
- On devices without FPU, the compiler would then use the __aeabi_f counterparts, but these are
  a) smaller
  b) more likely to be already in use

I will try to come up with a PR.

jrahlf · 2022-01-22T15:46:23Z

Changing double to floating_point_t and adapting the existing DBL_MANT_DIG functionality was easy, but there is a problem: The code uses 64-bit integer math and converts between floating_point_t and int_fast64_t. This latter also pulls in the double routines, UGH (see attached picture).

For testing I changed all int_fast64_t to int_fast32_t and it worked - __aeabi_d* no more. This of course breaks the code if the input is larger than 2^32-1.

Exponential formatting without int64<>float conversions should be possible, ryu-f2s seems to provide just that.

eyalroz · 2022-01-22T16:01:05Z

The code uses 64-bit integer math and converts between floating_point_t and int_fast64_t.

Naturally, you would need to adapt this code. I tried to make the existing code kind-of-generic, so you should not have to change too much - but some changes will be needed.

This of course breaks the code if the input is larger than 2^32-1.

Perhaps you could split the uses of the int_fast64_t type: For actual integers, use it as-is; and for floating-point work, use int_fast32_t instead.

eyalroz · 2022-01-24T11:31:00Z

@jrahlf : Note I am about to land the rather significant PR #92 onto the develop branch. Would you like me to wait for a couple of days with that, for you to submit your own PR? Or should I just go ahead and let you do some rebasing?

jrahlf · 2022-01-24T23:58:32Z

Go ahead, changing the handling of the double_components without duplicating several code pieces (and only pulling in double processing when necessary) requires a bit more work, so it will take some time.
With the current structure 32-bit+float processing seems only possible with exponential formatting.

I measured the clock cycles on a CM0+ without FPU (measured with timer running at CPU frequency), all compiled with -O3:

variant	snprintf("%g", 123.456f);	snprintf("%g", 123.456e6f);
printf from gcc nano specs	11900	9900
printf w/ double and int64	17800	20600
printf w/ float and int64	14500	16900
printf w/ float and int32	7100	9100

Interesting results, I will repeat them for a CM4 with FPU. I expect that the last variant should get significantly faster then.

Also interesting:

float to double takes 48 cpu cycles
double to float takes 74 cpu cycles

eyalroz · 2022-01-25T09:47:01Z

Did you enable LTO, when compiling and when linking? I've not done so in the CMakeLists.txt for this reason, but that might change the various numbers.

jrahlf · 2022-01-25T22:35:29Z

Here are the cycle count values for a CM4 microcontroller with FPU, same methodology as above. Compiled without LTO because gcc then strips debug symbols. I would expect that LTO improves printf due to putchar_, but not snprintf.

variant	snprintf("%g", 123.456f);	snprintf("%g", 123.456e6f);
printf from gcc nano specs	7200	5600
printf w/ double and int64	9400	9400
printf w/ float and int64	3500	3500
printf w/ float and int32	1500	1500

float to double: 25 cycles
double to float: 29 cycles

One can see that you can get a lot faster than the highly optimized stdlib because the latter must work on doubles. Also interesting that the cycle count is the same for both numbers, contrary as for the CM0.

=> Overall one could save 6kB of code space and get a very decent speed improvement as well, if we can add a float option (with 32-bit int only) (for microcontrollers without double FPU).

Edit:
I just noticed that the __aeabi_d functions on the CM4 with FPU are quite a bit smaller:

However, this library is generally more targeted towards the smallest controllers, i.e. without FPU, so the point is still valid.

eyalroz · 2022-01-25T22:41:30Z

These are interesting stats! But actually, what are the "gcc nano specs"? And - what is a CM4?

At any rate, I've merged my significant change (and the next version number will be 6.0.0), so you're welcome to resume to resume the PR work. Your argument is compelling :-)

jrahlf · 2022-01-25T22:54:03Z

"gcc nano specs" refers to linking the newlib-nano standard library, which is the de facto used standard library for small microcontrollers (you have to explicitly enable floating point support for printf).

It might be that newlib (no nano) has a faster printf thatn newlib-nano..

CM0 and CM4 refers to cortex m0 and cortex m4 microcontrollers, respectively. In this case I used an STM32G071 nucleo board and a STM32F407 discovery board.

eyalroz · 2022-01-29T10:14:37Z

Any news? We have quite a bunch of changes on the develop branch, so that a new release should occur soon. Have you made progress on the single-precision floating point processing?

eyalroz · 2022-01-31T11:01:48Z

@jrahlf : ping.

jrahlf · 2022-01-31T23:26:31Z

I have not forgotten about this, but only made little further progress. If you don't want to keep this open we can close and re-open when I come up with a PR.

The current macro used in the tests is not suitable to work with a float variant. I would not want to duplicate all floating point tests, the test macro should rather cut off when the precision of float ends.

jrahlf · 2022-02-06T15:49:26Z

If I understand it correctly, the current version does not even need the 64-bit processing, as long as PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL <=9 holds true. Adding a conditional typedef there is easy. The testsuite passed for me with int_fast32_t (testsuite cmake sets PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL to 9) and fp set to double.

#if PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL <= 9
typedef int_fast32_t fp_component;
#else
typedef int_fast64_t fp_component;
#endif

I pushed the first version to jrahlf@b7e1f52 . Tests for float processing are not added yet. Basically all double_x get replaced by floating_point_x or fp_t.

I just wasted 30min trying to rebase this sh#t onto develop. Now it won't compile, I am done for today.

eyalroz · 2022-02-06T21:05:48Z

@jrahlf : Sorry about the difficulty in rebasing. Like I'd said earlier, there have been some significant changes - not functionality-wise, but code-arrangement wise - over the past few weeks which make it difficult. I'll see what you've done and should probably be able to adapt it.

However - I believe you're misinterpreting what PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL means. For example, if you print with %e, that definition is not used at all - and you would not expect the accuracy to drop because someone set PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL. So the choice of fp_component cannot be determined by it.

eyalroz · 2022-02-06T21:36:12Z

... Yes, so, it's what I expected you would write. Indeed, there's not that much to change beyond the typedef replacing double. Hope to get around to this within the next couple of days.

jrahlf · 2022-02-07T00:17:54Z

However - I believe you're misinterpreting what PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL means. For example, if you print with %e, that definition is not used at all - and you would not expect the accuracy to drop because someone set PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL. So the choice of fp_component cannot be determined by it.

The integral value of formerly double_components can only exceed the range of int32 if the floating value to be formatted is greater than int32 and if fixed point formatting is selected. However, fixed point formatting is aborted if the value is greater than 1e9 (default threshold). I assume that is why the default threshold is 1e9? I mean 10 characters should not be a problem for any system :D.

But I see now, the fractional part is a bit more complicated.
For fixed point formatting, 32bit suffices if the decimal digits is less than 9, i.e. up to %8.f.
For exponential formatting, 32bit should suffice if the precision is less than 9 decimal places, i.e. true for float but wrong for double.

* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option. * Adjusted test suite to support the different choice of floating-point type: * Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif` * Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used. * Lots of new explicit conversions :-(

eyalroz · 2022-02-07T20:46:20Z

Please have a look at #116 and possibly try out the branch.

* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option. * Adjusted test suite to support the different choice of floating-point type: * Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif` * Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used. * Lots of new explicit conversions :-(

eyalroz · 2022-06-03T08:47:35Z

So, I've neglected to resolve this... were you ok with the brach with the fix integrated? If you were, I'll merge it into the develop branch.

eyalroz · 2022-06-30T21:16:13Z

@jrahlf : ping. Will try to merge this soon if you don't respond.

* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option. * Adjusted test suite to support the different choice of floating-point type: * Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif` * Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used. * Lots of new explicit conversions :-(

eyalroz · 2023-01-24T22:30:53Z

Well... the merging might have fixed the internal float handling, but - some test cases are broken now, so we need to fix that.

jrahlf mentioned this issue Jan 23, 2022

Missing tests for floating point with integral > 2^31 -1 #93

Closed

jrahlf mentioned this issue Jan 31, 2022

Weak precision performance for floating point values with non-negligible negative exponents #109

Open

eyalroz mentioned this issue Feb 7, 2022

Fixes #84: Choice of float/double #116

Merged

eyalroz closed this as completed Feb 21, 2022

eyalroz reopened this Feb 21, 2022

eyalroz changed the title ~~Compile option to process floating point values as float~~ Compile option to process floating point values in single-precision, internally Jun 30, 2022

eyalroz closed this as completed Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile option to process floating point values in single-precision, internally #84

Compile option to process floating point values in single-precision, internally #84

jrahlf commented Jan 20, 2022

eyalroz commented Jan 20, 2022

eyalroz commented Jan 20, 2022

jrahlf commented Jan 22, 2022

jrahlf commented Jan 22, 2022 •

edited

Loading

eyalroz commented Jan 22, 2022 •

edited

Loading

eyalroz commented Jan 24, 2022

jrahlf commented Jan 24, 2022 •

edited

Loading

eyalroz commented Jan 25, 2022

jrahlf commented Jan 25, 2022 •

edited

Loading

eyalroz commented Jan 25, 2022 •

edited

Loading

jrahlf commented Jan 25, 2022 •

edited

Loading

eyalroz commented Jan 29, 2022

eyalroz commented Jan 31, 2022

jrahlf commented Jan 31, 2022 •

edited

Loading

jrahlf commented Feb 6, 2022 •

edited

Loading

eyalroz commented Feb 6, 2022

eyalroz commented Feb 6, 2022

jrahlf commented Feb 7, 2022 •

edited

Loading

eyalroz commented Feb 7, 2022 •

edited

Loading

eyalroz commented Jun 3, 2022

eyalroz commented Jun 30, 2022 •

edited

Loading

eyalroz commented Jan 24, 2023

Compile option to process floating point values in single-precision, internally #84

Compile option to process floating point values in single-precision, internally #84

Comments

jrahlf commented Jan 20, 2022

eyalroz commented Jan 20, 2022

eyalroz commented Jan 20, 2022

jrahlf commented Jan 22, 2022

jrahlf commented Jan 22, 2022 • edited Loading

eyalroz commented Jan 22, 2022 • edited Loading

eyalroz commented Jan 24, 2022

jrahlf commented Jan 24, 2022 • edited Loading

eyalroz commented Jan 25, 2022

jrahlf commented Jan 25, 2022 • edited Loading

eyalroz commented Jan 25, 2022 • edited Loading

jrahlf commented Jan 25, 2022 • edited Loading

eyalroz commented Jan 29, 2022

eyalroz commented Jan 31, 2022

jrahlf commented Jan 31, 2022 • edited Loading

jrahlf commented Feb 6, 2022 • edited Loading

eyalroz commented Feb 6, 2022

eyalroz commented Feb 6, 2022

jrahlf commented Feb 7, 2022 • edited Loading

eyalroz commented Feb 7, 2022 • edited Loading

eyalroz commented Jun 3, 2022

eyalroz commented Jun 30, 2022 • edited Loading

eyalroz commented Jan 24, 2023

jrahlf commented Jan 22, 2022 •

edited

Loading

eyalroz commented Jan 22, 2022 •

edited

Loading

jrahlf commented Jan 24, 2022 •

edited

Loading

jrahlf commented Jan 25, 2022 •

edited

Loading

eyalroz commented Jan 25, 2022 •

edited

Loading

jrahlf commented Jan 25, 2022 •

edited

Loading

jrahlf commented Jan 31, 2022 •

edited

Loading

jrahlf commented Feb 6, 2022 •

edited

Loading

jrahlf commented Feb 7, 2022 •

edited

Loading

eyalroz commented Feb 7, 2022 •

edited

Loading

eyalroz commented Jun 30, 2022 •

edited

Loading