Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile option to process floating point values in single-precision, internally #84

Closed
jrahlf opened this issue Jan 20, 2022 · 22 comments
Closed

Comments

@jrahlf
Copy link

jrahlf commented Jan 20, 2022

I have a project where I use single precision floats, but no doubles.
Problem is, that printf processes all floating point values as doubles, even if the caller is content with single precision formatting. On devices without double-FPU, this pulls in __aeabi_dsub, __aeabi_dmul and the likes, which are several kilobytes large.

My suggestion is to add a compile option so that floating point values are formatted with single precision.

The value would still be passed as a double to printf, which would then be converted to float.
Some of the required functionality seems to be already there, via the #if DBL_MANT_DIG == 24 define.

@eyalroz
Copy link
Owner

eyalroz commented Jan 20, 2022

I don't quite understand what you mean by:

The value would still be passed as a double to printf, which would then be converted to float.

If doubles are still to be used, what does this suggestion save you?

@eyalroz
Copy link
Owner

eyalroz commented Jan 20, 2022

At any rate, if someone were to create a PR generalizing the internal floating-point-value handling to not necessarily use doubles, but some printf_floating_point_t, chosen via preprocessor defines, I would consider it. A PR would need to include appropriate tests of course.

@jrahlf
Copy link
Author

jrahlf commented Jan 22, 2022

If doubles are still to be used, what does this suggestion save you?

doubles would only be used for passing floating point values to printf(), because the standard mandates this for varargs. Passing floats directly would be even better for microcontrollers (if float precision is enough)...

The screenshot shows the added functions for a CM0+ microcontroller, this will be similar for any ARM microcontroller without dp-FPU.

Screenshot from 2022-01-22 02-49-34

  • If the processing inside printf() was done on float instead of double, the compiler would not have to add these functions (assuming the code does not use doubles somewhere else).
    • On devices without FPU, the compiler would then use the __aeabi_f counterparts, but these are
      a) smaller
      b) more likely to be already in use

I will try to come up with a PR.

@jrahlf
Copy link
Author

jrahlf commented Jan 22, 2022

Changing double to floating_point_t and adapting the existing DBL_MANT_DIG functionality was easy, but there is a problem: The code uses 64-bit integer math and converts between floating_point_t and int_fast64_t. This latter also pulls in the double routines, UGH (see attached picture).

For testing I changed all int_fast64_t to int_fast32_t and it worked - __aeabi_d* no more. This of course breaks the code if the input is larger than 2^32-1.

Exponential formatting without int64<>float conversions should be possible, ryu-f2s seems to provide just that.

Screenshot from 2022-01-22 16-12-12

@eyalroz
Copy link
Owner

eyalroz commented Jan 22, 2022

The code uses 64-bit integer math and converts between floating_point_t and int_fast64_t.

Naturally, you would need to adapt this code. I tried to make the existing code kind-of-generic, so you should not have to change too much - but some changes will be needed.

This of course breaks the code if the input is larger than 2^32-1.

Perhaps you could split the uses of the int_fast64_t type: For actual integers, use it as-is; and for floating-point work, use int_fast32_t instead.

@eyalroz
Copy link
Owner

eyalroz commented Jan 24, 2022

@jrahlf : Note I am about to land the rather significant PR #92 onto the develop branch. Would you like me to wait for a couple of days with that, for you to submit your own PR? Or should I just go ahead and let you do some rebasing?

@jrahlf
Copy link
Author

jrahlf commented Jan 24, 2022

Go ahead, changing the handling of the double_components without duplicating several code pieces (and only pulling in double processing when necessary) requires a bit more work, so it will take some time.
With the current structure 32-bit+float processing seems only possible with exponential formatting.

I measured the clock cycles on a CM0+ without FPU (measured with timer running at CPU frequency), all compiled with -O3:

variant snprintf("%g", 123.456f); snprintf("%g", 123.456e6f);
printf from gcc nano specs 11900 9900
printf w/ double and int64 17800 20600
printf w/ float and int64 14500 16900
printf w/ float and int32 7100 9100

Interesting results, I will repeat them for a CM4 with FPU. I expect that the last variant should get significantly faster then.

Also interesting:

  • float to double takes 48 cpu cycles
  • double to float takes 74 cpu cycles

@eyalroz
Copy link
Owner

eyalroz commented Jan 25, 2022

Did you enable LTO, when compiling and when linking? I've not done so in the CMakeLists.txt for this reason, but that might change the various numbers.

@jrahlf
Copy link
Author

jrahlf commented Jan 25, 2022

Here are the cycle count values for a CM4 microcontroller with FPU, same methodology as above. Compiled without LTO because gcc then strips debug symbols. I would expect that LTO improves printf due to putchar_, but not snprintf.

variant snprintf("%g", 123.456f); snprintf("%g", 123.456e6f);
printf from gcc nano specs 7200 5600
printf w/ double and int64 9400 9400
printf w/ float and int64 3500 3500
printf w/ float and int32 1500 1500
  • float to double: 25 cycles
  • double to float: 29 cycles

One can see that you can get a lot faster than the highly optimized stdlib because the latter must work on doubles. Also interesting that the cycle count is the same for both numbers, contrary as for the CM0.

=> Overall one could save 6kB of code space and get a very decent speed improvement as well, if we can add a float option (with 32-bit int only) (for microcontrollers without double FPU).

Edit:
I just noticed that the __aeabi_d functions on the CM4 with FPU are quite a bit smaller:
Screenshot from 2022-01-25 23-44-37

However, this library is generally more targeted towards the smallest controllers, i.e. without FPU, so the point is still valid.

@eyalroz
Copy link
Owner

eyalroz commented Jan 25, 2022

These are interesting stats! But actually, what are the "gcc nano specs"? And - what is a CM4?

At any rate, I've merged my significant change (and the next version number will be 6.0.0), so you're welcome to resume to resume the PR work. Your argument is compelling :-)

@jrahlf
Copy link
Author

jrahlf commented Jan 25, 2022

"gcc nano specs" refers to linking the newlib-nano standard library, which is the de facto used standard library for small microcontrollers (you have to explicitly enable floating point support for printf).

It might be that newlib (no nano) has a faster printf thatn newlib-nano..

CM0 and CM4 refers to cortex m0 and cortex m4 microcontrollers, respectively. In this case I used an STM32G071 nucleo board and a STM32F407 discovery board.

@eyalroz
Copy link
Owner

eyalroz commented Jan 29, 2022

Any news? We have quite a bunch of changes on the develop branch, so that a new release should occur soon. Have you made progress on the single-precision floating point processing?

@eyalroz
Copy link
Owner

eyalroz commented Jan 31, 2022

@jrahlf : ping.

@jrahlf
Copy link
Author

jrahlf commented Jan 31, 2022

I have not forgotten about this, but only made little further progress. If you don't want to keep this open we can close and re-open when I come up with a PR.

The current macro used in the tests is not suitable to work with a float variant. I would not want to duplicate all floating point tests, the test macro should rather cut off when the precision of float ends.

@jrahlf
Copy link
Author

jrahlf commented Feb 6, 2022

If I understand it correctly, the current version does not even need the 64-bit processing, as long as PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL <=9 holds true. Adding a conditional typedef there is easy. The testsuite passed for me with int_fast32_t (testsuite cmake sets PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL to 9) and fp set to double.

#if PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL <= 9
typedef int_fast32_t fp_component;
#else
typedef int_fast64_t fp_component;
#endif

I pushed the first version to jrahlf@b7e1f52 . Tests for float processing are not added yet. Basically all double_x get replaced by floating_point_x or fp_t.

I just wasted 30min trying to rebase this sh#t onto develop. Now it won't compile, I am done for today.

@eyalroz
Copy link
Owner

eyalroz commented Feb 6, 2022

@jrahlf : Sorry about the difficulty in rebasing. Like I'd said earlier, there have been some significant changes - not functionality-wise, but code-arrangement wise - over the past few weeks which make it difficult. I'll see what you've done and should probably be able to adapt it.

However - I believe you're misinterpreting what PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL means. For example, if you print with %e, that definition is not used at all - and you would not expect the accuracy to drop because someone set PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL. So the choice of fp_component cannot be determined by it.

@eyalroz
Copy link
Owner

eyalroz commented Feb 6, 2022

... Yes, so, it's what I expected you would write. Indeed, there's not that much to change beyond the typedef replacing double. Hope to get around to this within the next couple of days.

@jrahlf
Copy link
Author

jrahlf commented Feb 7, 2022

However - I believe you're misinterpreting what PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL means. For example, if you print with %e, that definition is not used at all - and you would not expect the accuracy to drop because someone set PRINTF_MAX_INTEGRAL_DIGITS_FOR_DECIMAL. So the choice of fp_component cannot be determined by it.

The integral value of formerly double_components can only exceed the range of int32 if the floating value to be formatted is greater than int32 and if fixed point formatting is selected. However, fixed point formatting is aborted if the value is greater than 1e9 (default threshold). I assume that is why the default threshold is 1e9? I mean 10 characters should not be a problem for any system :D.

But I see now, the fractional part is a bit more complicated.
For fixed point formatting, 32bit suffices if the decimal digits is less than 9, i.e. up to %8.f.
For exponential formatting, 32bit should suffice if the precision is less than 9 decimal places, i.e. true for float but wrong for double.

eyalroz added a commit that referenced this issue Feb 7, 2022
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
@eyalroz
Copy link
Owner

eyalroz commented Feb 7, 2022

Please have a look at #116 and possibly try out the branch.

eyalroz added a commit that referenced this issue Feb 7, 2022
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
@eyalroz eyalroz closed this as completed Feb 21, 2022
@eyalroz eyalroz reopened this Feb 21, 2022
@eyalroz
Copy link
Owner

eyalroz commented Jun 3, 2022

So, I've neglected to resolve this... were you ok with the brach with the fix integrated? If you were, I'll merge it into the develop branch.

@eyalroz
Copy link
Owner

eyalroz commented Jun 30, 2022

@jrahlf : ping. Will try to merge this soon if you don't respond.

@eyalroz eyalroz changed the title Compile option to process floating point values as float Compile option to process floating point values in single-precision, internally Jun 30, 2022
@eyalroz eyalroz closed this as completed Jul 26, 2022
eyalroz added a commit that referenced this issue Jan 21, 2023
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
eyalroz added a commit that referenced this issue Jan 21, 2023
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
eyalroz added a commit that referenced this issue Jan 21, 2023
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
eyalroz added a commit that referenced this issue Jan 24, 2023
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
eyalroz added a commit that referenced this issue Jan 24, 2023
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
eyalroz added a commit that referenced this issue Jan 24, 2023
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
eyalroz added a commit that referenced this issue Jan 24, 2023
* Can now choose between using `double` and `float` internally, for floating-point work, using the `PRINTF_USE_DOUBLE_INTERNALLY` definition and corresponding CMake option.
* Adjusted test suite to support the different choice of floating-point type:
	* Tests relevant both to `float` and `double`, but with different precisions, are adjusted with an `#if #else #endif`
	* Tests and test cases relevant only for `double` precision are not compiled at all when `float` is used.
* Lots of new explicit conversions :-(
@eyalroz
Copy link
Owner

eyalroz commented Jan 24, 2023

Well... the merging might have fixed the internal float handling, but - some test cases are broken now, so we need to fix that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants