You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The constant std::numeric_limits<half>::digits10 in halfLimits.h currently evaluates to 2, but should actually evaluate to 3.
The macro HALF_DIG in half.h is also presumably wrong.
Rationale:
std::numeric_limits<half> should follow the std::numeric_limits<T> template semantics as defined in the C++ standard. For the digits10 member, the standard mandates that they shall be "equivalent" to the FLT_DIG, DBL_DIG and LDBL_DIG macros as defined by the C standard.
The C standard, in turn, specifies that the FLT_DIG, DBL_DIG and LDBL_DIG macros shall be equal to p-1 multiplied by the base-10 logarithm of b, rounded down, where p is the number of base-b digits in the significand, and b is the floating-point numerical base. (A different formula applies if b is a power of 10, but is irrelevant for half since it is base-2.)
Note that p includes the implicit MSbit of the significand (*). Also, the C standard clearly mandates that the macros FLT_MANT_DIG, DBL_MANT_DIG and LDBL_MANT_DIG shall be defined as p for the three standard floating-point types. The C++ standard in turn mandates that the std::numeric_limits<T> template's digits member shall be "equivalent" to these macros.
(* While the definition of p in the C standard may not be obious, the examples should be clear enough, listing the corresponding values for IEEE single and double precision floating-point types as 24 and 53, respectively.)
So std::numeric_limits<half>::digits10 should be equal to std::numeric_limits<half>::digits-1, multiplied by the base-10 logarithm of std::numeric_limits<T>::radix, rounded down. With std::numeric_limits<half>::digits evaluating to 11, and std::numeric_limits<T>::radix evaluating to 2, the value should be approx. 3.01 rounded down, i.e. 3.
Suggested solution:
Since the HALF_DIG macro in half.h is also presumably wrong, and std::numeric_limits<half>::digits10 is defined in terms of that macro, the issue should be fixed by changing the macro's definition from 2 to 3.
The text was updated successfully, but these errors were encountered:
…gits
Based on float / double math for base 10 digits, with 1 bit of rounding
error, the equation should be floor( mantissa_digits - 1 ) * log10(2) ),
which in the case of half becomes floor( 10 * log10(2) ) or 3
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
…gits
Based on float / double math for base 10 digits, with 1 bit of rounding
error, the equation should be floor( mantissa_digits - 1 ) * log10(2) ),
which in the case of half becomes floor( 10 * log10(2) ) or 3
Signed-off-by: Kimball Thurston <kdt3rd@gmail.com>
The constant
std::numeric_limits<half>::digits10
inhalfLimits.h
currently evaluates to 2, but should actually evaluate to 3.The macro
HALF_DIG
inhalf.h
is also presumably wrong.Rationale:
std::numeric_limits<half>
should follow thestd::numeric_limits<T>
template semantics as defined in the C++ standard. For thedigits10
member, the standard mandates that they shall be "equivalent" to theFLT_DIG
,DBL_DIG
andLDBL_DIG
macros as defined by the C standard.The C standard, in turn, specifies that the
FLT_DIG
,DBL_DIG
andLDBL_DIG
macros shall be equal to p-1 multiplied by the base-10 logarithm of b, rounded down, where p is the number of base-b digits in the significand, and b is the floating-point numerical base. (A different formula applies if b is a power of 10, but is irrelevant forhalf
since it is base-2.)Note that p includes the implicit MSbit of the significand (*). Also, the C standard clearly mandates that the macros
FLT_MANT_DIG
,DBL_MANT_DIG
andLDBL_MANT_DIG
shall be defined as p for the three standard floating-point types. The C++ standard in turn mandates that thestd::numeric_limits<T>
template'sdigits
member shall be "equivalent" to these macros.(* While the definition of p in the C standard may not be obious, the examples should be clear enough, listing the corresponding values for IEEE single and double precision floating-point types as 24 and 53, respectively.)
So
std::numeric_limits<half>::digits10
should be equal tostd::numeric_limits<half>::digits
-1, multiplied by the base-10 logarithm ofstd::numeric_limits<T>::radix
, rounded down. Withstd::numeric_limits<half>::digits
evaluating to 11, andstd::numeric_limits<T>::radix
evaluating to 2, the value should be approx. 3.01 rounded down, i.e. 3.Suggested solution:
Since the
HALF_DIG
macro inhalf.h
is also presumably wrong, andstd::numeric_limits<half>::digits10
is defined in terms of that macro, the issue should be fixed by changing the macro's definition from2
to3
.The text was updated successfully, but these errors were encountered: