Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weak precision performance for floating point values with non-negligible negative exponents #109

Open
steven-buytaert opened this issue Jan 31, 2022 · 8 comments
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@steven-buytaert
Copy link

steven-buytaert commented Jan 31, 2022

Hi, I notice that the autotest floating point testing is biased towards large values. I tried the following example, against the libc printf.

#include <stdio.h>

// #define PRINTF_ALIAS_STANDARD_FUNCTION_NAMES  0 in following file.
#include "printf_config.h"
#include "../src/printf/printf.h"

void putchar_(char character) { }

int main(int argc, char * argv[]) {

  double fp = 1.380651569e-23; // Boltzmann

  char b[128];
  
  sprintf (b, "%.16e", fp); printf("libc %s\n", b);
  sprintf_(b, "%.16e", fp); printf("this %s\n", b);

}

Output:
libc 1.3806515690000000e-23
this 1.3806515648376314e-23

As you see there's a serious precision issue due too arithmetic underflow. Depending on the application, this might be problematic or not. In any case, it's something you should check also in the test framework.

If floating point precision is/becomes an issue, I have a standalone implementation of ecvt that uses (very wide) integer multiplication/division to format floating point with the proper precision based upon GRISU; as such, the formatting code doesn't do any floating point arithmetic itself to format the floats. The total text size for ARM Thumb code with -Os is 1979 bytes, only depending on strcpy from libc. A good reference to this floating point formatting can be found here. This code produces the exact same bit representation of the boltzmann constant as GNU libc. It can also format 32 bit IEEE 754 directly, without going via a 64 bit IEEE 754 format. You can find this ecvt and the required wide integer arithmetic code here.

Kind regards,

Steven

@jrahlf
Copy link

jrahlf commented Jan 31, 2022

Interesting find. There is also ryu. I think both yours and ryu do not use any double arithmetic, so it also interesting from perspective of PR #84. However, it undermines the single header approach.

@eyalroz
Copy link
Owner

eyalroz commented Feb 1, 2022

@steven-buytaert : Thank you for noticing this.

I will look into it later this week, and try to figure out whether this is a minor bug/oversight somewhere or something more fundamental. Can you list the exact CMake option settings you were using?

In any case, it's something you should check also in the test framework.

Indeed.

@steven-buytaert
Copy link
Author

@eyalroz

I did a normal build, as follows:

$ mkdir build;  cd build/
$ cmake -DBUILD_TESTS=ON -DBUILD_STATIC_LIBRARY=ON ..
$ # Changed PRINTF_ALIAS_STANDARD_FUNCTION_NAMES to 0 in the print_config.h header file.
$ make

And build my mentioned small sample program.

I think the issue is fundamental, i.e. underflow happening during the double precision operations. You can increase the precision requirement in the print_config.h file, but the fundamental issue remains.

I was triggered by the fact that the autotest program only produced positive exponents, never negative.

I was interested in this embedded printf, just because I wrote one myself.

Kind regards,

Steven

@eyalroz
Copy link
Owner

eyalroz commented Feb 1, 2022

I think the issue is fundamental, i.e. underflow happening during the double precision operations.

I meant fundamental in the sense of can't easily be worked around (we already have some workarounds for precision issues in there - playing with the exponent, keeping track of remainders etc.)

@eyalroz
Copy link
Owner

eyalroz commented Feb 2, 2022

Ok, here are the results of investigation...

the problem (not a bug) occurs in this piece of code:

  double_with_bit_access conv = get_bit_access(abs_number);
  // based on the algorithm by David Gay (https://www.ampl.com/netlib/fp/dtoa.c)
  int exp2 = get_exp2(conv);
	  // drop the exponent, so conv.F comes into the range [1,2)
  conv.U = (conv.U & (( (double_uint_t)(1) << DOUBLE_STORED_MANTISSA_BITS) - 1U)) | ((double_uint_t) DOUBLE_BASE_EXPONENT << DOUBLE_STORED_MANTISSA_BITS);
  // now approximate log10 from the log2 integer part and an expansion of ln around 1.5
  exp10 = (int)(0.1760912590558 + exp2 * 0.301029995663981 + (conv.F - 1.5) * 0.289529654602168);
  // now we want to compute 10^exp10 but we want to be sure it won't overflow
  exp2 = (int)(exp10 * 3.321928094887362 + 0.5);
  const double z  = exp10 * 2.302585092994046 - exp2 * 0.6931471805599453;
  const double z2 = z * z;
  conv.U = ((double_uint_t)(exp2) + DOUBLE_BASE_EXPONENT) << DOUBLE_STORED_MANTISSA_BITS;
  // compute exp(z) using continued fractions, see https://en.wikipedia.org/wiki/Exponential_function#Continued_fractions_for_ex
  conv.F *= 1 + 2 * z / (2 - z + (z2 / (6 + (z2 / (10 + z2 / 14)))));
  // correct for rounding errors
  if (abs_number < conv.F) {
    exp10--;
    conv.F /= 10;
  }
}
abs_exp10_covered_by_powers_table = PRINTF_ABS(exp10) < PRINTF_MAX_PRECOMPUTED_POWER_OF_10;
normalization.raw_factor = abs_exp10_covered_by_powers_table ? powers_of_10[PRINTF_ABS(exp10)] : conv.F;

And the two numbers 1.380651569e-23 and 1.380651569e+23 take different paths here: Both of them have an exponent beyond what's covered by the precomputed table (PRINTF_MAX_PRECOMPUTED_POWER_OF_10), but for one of them, we calculate exp10 to be -22, and apply conv.F /= 10, while for the other we get exp10 as 23, so we don't need to change conv.F. The division applied to conv.F hurts accuracy - changes the result enough that we get different low digits.

Now we need to think about what to do about this.

Some ideas:

  • Increase the powers_of_10 table <- doesn't scale... just use 1e-50 or something.
  • Get exp10 more "correct" for negative values (rounding perhaps? But would that even make sense? I wonder)
  • not correct conv.F, but rather "remember" the extra factor for later.

Thoughts?

@eyalroz
Copy link
Owner

eyalroz commented Feb 2, 2022

More info: The linear approximation of exp10 for the e-23 gives ~ -22.83 , and for the e+23 gives ~ 23.15. So... maybe it's the (int) conversion that's the problem. Hmmm...

@eyalroz eyalroz self-assigned this Feb 3, 2022
@eyalroz eyalroz added bug Something isn't working help wanted Extra attention is needed question Further information is requested labels Feb 3, 2022
eyalroz added a commit that referenced this issue Feb 4, 2022
…nt. CAVEAT: This exposes an issue with printing values having exponent -308 (the minimum exponent for double's); and - we see some divergence from perfect corrtness with higher precision values.
@eyalroz
Copy link
Owner

eyalroz commented Feb 4, 2022

@steven-buytaert : With the recent commit, your program now yields:

libc 1.3806515690000000e-23
this 1.3806515689994298e-23

i.e. about 12 digits of precision. (Although it exposes an unrelated issue.)

What do you think?

@steven-buytaert
Copy link
Author

steven-buytaert commented Feb 4, 2022

@eyalroz That is already much closer to the real Boltzmann constant, so it is an improvement, yes. With some more rounding, you could make that 8999 into 9000 but the point is that you don't know where to start and stop rounding. The fundamental issue is that the floating point operations you use, are but an approximation. Operations that yield bits beyond the mantissabits loose precision.

I decided therefore to not use floating point operations in my ecvt routine.

  • It drags in an unknown number of floating point code.
  • Floating point operations are always an approximation. That is why I decided to go for integer multiplication and division for the formatting as you can make these as wide as the precision you want to achieve.

As I said, it depends what you want to achieve. If you really need the precision to be 17 digits, floating operations are not good, unless you could use a higher precision floating point the IEEE754 128 or 256 bit variants. Of course, to format these themselves, you would still need to fall back to even wider integer arithmetic, as floating point again would be too limiting. In short, for formatting floating points with enough precision, you always need a type wider than the one you want to format.

Don't worry too much. Floating point formatting is still a major minefield and black art, as you undoubtedly have found out. :-)

BTW When I look at that code of David Gay, he also seems to be using a much wider floating point type when USE_BF96 is defined, exactly to achieve the proper precision to format 64 bit doubles. When this is enabled, there is a large table of powers of ten that is included, inflating the code size. In my code, I have a very small table, but I do several exact scale up or scale down operations, to bring the mantissa and the exponent in the proper range.

Regards,

Steven

eyalroz added a commit that referenced this issue Feb 5, 2022
…nt. CAVEAT: This exposes an issue with printing values having exponent -308 (the minimum exponent for double's); and - we see some divergence from perfect corrtness with higher precision values.
@eyalroz eyalroz changed the title Precision for very small floating point Weak precision performance for floating point values with non-negligible negative exponents Feb 5, 2022
eyalroz added a commit that referenced this issue Feb 7, 2022
…nt. CAVEAT: This exposes an issue with printing values having exponent -308 (the minimum exponent for double's); and - we see some divergence from perfect corrtness with higher precision values.
eyalroz added a commit that referenced this issue Feb 10, 2022
…nt. CAVEAT: This exposes an issue with printing values having exponent -308 (the minimum exponent for double's); and - we see some divergence from perfect corrtness with higher precision values.
eyalroz added a commit that referenced this issue Feb 11, 2022
…nt. CAVEAT: This exposes an issue with printing values having exponent -308 (the minimum exponent for double's); and - we see some divergence from perfect corrtness with higher precision values.
@eyalroz eyalroz removed the bug Something isn't working label Feb 20, 2022
eyalroz added a commit that referenced this issue Feb 21, 2022
…nt. CAVEAT: This exposes an issue with printing values having exponent -308 (the minimum exponent for double's); and - we see some divergence from perfect corrtness with higher precision values.
kuba2k2 pushed a commit to libretiny-eu/library-printf that referenced this issue Feb 25, 2023
* Added a couple of test cases for exposing the behavior of the `#` modifier (alternative mode) together with `ll` (long long modifier), and specifically exposing the example format string mentioned in bug eyalroz#114.
* Our fix for eyalroz#109 was too eager to keep `FLAG_HASH` - it dropped it only when a precision wasn't specified. We're now going part of the way back - dropping `FLAG_HASH` even when precision wasn't specific - except for octal.
* The `long long` version of ntoa now behaves just like the `long` version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants