New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Rounding floats which are already equal to an integer changes the value #20514
Comments
Presumably due to: #include <stdio.h>
#include <math.h>
int main(int argc, char **argv) {
double v = 3061040371728385.0;
double rv = round(v * 100) / 100;
printf("round 2dp of %f is %f\n", v, rv);
} Output:
|
Yes, it can happen. Should be the same issue as gh-13699. Not sure if there is a good way to get around some of these paths at least. Python seems to do slightly better, so maybe there is something to steal, if it doesn't make things much slower. |
I'd love to have a try at this issue, would that be fine with you guys? @seberg @matthew-brett @FudgeMunkey I think it might be possible to mitigate this issue for some of the cases, but not entirely remove it. IMO the issue originates from inherent limitations of the floating point representation. |
Nobody will be working on this specifically. An improval proposal would be great! |
Improvement Proposal@matthew-brett, @seberg as you correctly identified, the error @FudgeMunkey stumbled upon can be attributed to limitations of the floating point representation (especially at larger scales). To get around it, we can treat the integral and the decimal portion of the input array separately and combine them after rounding the decimal portion. This leverages the fact that rounding works just fine at smaller scales. This will ensure that:
However, it is important to keep in mind that:
Here are my suggestions:
Demonstration of Proposal 1import numpy as np
def updated_round(input_array, input_decimals, carefully_handle_large_numbers=False):
if (carefully_handle_large_numbers):
# separate the integral, decimal portion of the input
integral_portion = np.floor(input_array)
decimal_portion = input_array - integral_portion
# combine integral, decimal portion after rounding the decimal portion
# (relies on the EXISTING version of np.round - which should be fine for small numbers < 1)
output_array = integral_portion + np.round(decimal_portion, input_decimals)
else:
# resort to normal rounding if user hasn't requested careful
# handling of large numbers
output_array = np.round(input_array, input_decimals)
return output_array
# EXAMPLE 1
A = np.array([3061040371728385.0])
old_out = updated_round(A, 2, carefully_handle_large_numbers=False)
new_out = updated_round(A, 2, carefully_handle_large_numbers=True)
old_diff = old_out - A # 0.5
new_diff = new_out - A # 0.0
print("EXAMPLE 1: ")
print(old_diff, new_diff)
print()
# EXAMPLE 2
A = np.array([6.2768919806476296e16])
old_out = updated_round(A, 1, carefully_handle_large_numbers=False)
new_out = updated_round(A, 1, carefully_handle_large_numbers=True)
old_diff = old_out - A # 8.0
new_diff = new_out - A # 0.0
print("EXAMPLE 2: ")
print(old_diff, new_diff)
print()
# EXAMPLE 3 (just to demonstrate that rounding still works fine on small numbers)
A = np.array([6.6789])
old_out = updated_round(A, 3, carefully_handle_large_numbers=False)
new_out = updated_round(A, 3, carefully_handle_large_numbers=True)
old_diff = old_out - A # 0.0001
new_diff = new_out - A # 0.0001
print("EXAMPLE 3: ")
print(old_diff, new_diff)
print() The above code has the following output:
Performance comparison of existing
|
@seberg @matthew-brett @Zac-HD Hey guys, what do you think about my proposal? |
In what proportion of the cases is there a difference in the output between the methods? I think this would be relevant if we add a |
@mattip I don't think such a scenario is too common 'in the wild' (but so is the case for many other bugs). However, here are some examples where it might come into play:
But in any case, I think rounding shouldn't lead to massive errors. Even if someone uses this function with a large value the code should at-least try to minimize errors. If a change in code isn't warranted, there should at-least be a warning in documentation for |
A factor of two slower seems quite a lot, OTOH, I would not be surprised if we could have much faster rounding for certain cases (e.g. all cases without digits after the comma)? |
The problem here is how round to If I got it right, (but I still have to check the source), numpy uses a variation of the round to specified multiple formula namely or The problem is that both the division A related, much more serious, consequence of the way in which
In fact (remeber that
Who is right? One could argue in favor of numpy (round to even), or python (2.77 is nearer to 2.765 than 2.76):
There are other problems with Footnotes |
Describe the issue:
Running
np.round
adds0.5
to some whole integer floats. For example,3061040371728385.0
becomes3061040371728385.0
. This error occurs if decimal values of2, 5, 8
are used.This becomes more significant with larger numbers such as
6.2768919806476296e16
which rounds up8.0
numbers. This error occurs if decimal values of1, 4, 7, 10
are used. Note, this isn't a problem if62768919806476296
is used instead regardless of decimal value.*I checked decimcal values in the range
[0, 15]
*This bug was found using Hypothesis.
Reproduce the code example:
Error message:
NumPy/Python version information:
The text was updated successfully, but these errors were encountered: