New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE MCOL-4516 Check for floating point rounding error in regr_** #1776
Conversation
Some floating point rounding error is inevitable during regr_** function execution. In some cases, when a intermediate result should be 0, it may become some microscopic negative number. This patch checks for this.
In the first fix, the algorithm was wrong for sxx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
David, please add "HOW TO REPEAT" instructions into MCOL and corresponding mtr tests into this patch demonstrating that the problem does not exists any more.
| @@ -154,7 +154,7 @@ mcsv1_UDAF::ReturnCode regr_intercept::evaluate(mcsv1Context* context, static_an | |||
| long double sumxy = data->sumxy; | |||
| long double numerator = sumy * sumx2 - sumx * sumxy; | |||
| long double var_pop = (N * sumx2) - (sumx * sumx); | |||
| if (var_pop != 0) | |||
| if (var_pop != 0 && var_pop != -0) | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is meaningless I am afraid:
0 and -0 are integer (not DOUBLE!) literals and both are equal to INT(0), and then both are converted to DOUBLE for comparison.
So possibly you meant:
if (var_pop != 0e0 && var_pop != -0e0)
But this is not really needed because 0e0 and -0e0 are equal to each other although they are not binary equal.
This program proves:
#include <stdio.h>
#include <cstring>
int main()
{
double a= -0e0;
double b= +0e0;
printf("memcmp=%d eq=%d\n", memcmp(&a, &b, sizeof(a)), a == b);
return 0;
}
And the output is:
memcmp=128 eq=1
I guess the old version of this line was good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes it appears to not compare properly. I don't know if this is an artifact of other bits being different or perhaps a float number so small in abs value that it prints as -0. Whatever the case, we can use the fact that var_pop should never be < 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some research, it turns out var_pop cannot be negative, but covar_pop may be negative. The changes assume <0 should be 0 for var_pop and any negative values for covar_pop remain as calculated. This does allow for -0 to be printed in the answer for some functions.
utils/regr/regr_slope.cpp
Outdated
| @@ -154,7 +154,7 @@ mcsv1_UDAF::ReturnCode regr_slope::evaluate(mcsv1Context* context, static_any::a | |||
| long double sumxy = data->sumxy; | |||
| long double covar_pop = N * sumxy - sumx * sumy; | |||
| long double var_pop = N * sumx2 - sumx * sumx; | |||
| if (var_pop != 0) | |||
| if (var_pop != 0 && var_pop != -0) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
utils/regr/regrmysql.cpp
Outdated
| @@ -537,7 +537,7 @@ extern "C" | |||
| long double sumxy = data->sumxy; | |||
| long double covar_pop = N * sumxy - sumx * sumy; | |||
| long double var_pop = N * sumx2 - sumx * sumx; | |||
| if (var_pop != 0) | |||
| if (var_pop != 0 && var_pop != -0) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
utils/regr/regrmysql.cpp
Outdated
| @@ -657,7 +657,7 @@ extern "C" | |||
| long double sumxy = data->sumxy; | |||
| long double numerator = sumy * sumx2 - sumx * sumxy; | |||
| long double var_pop = (N * sumx2) - (sumx * sumx); | |||
| if (var_pop != 0) | |||
| if (var_pop != 0 && var_pop != -0) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
utils/regr/regrmysql.cpp
Outdated
| @@ -1494,6 +1502,8 @@ extern "C" | |||
| long double sumy = data->sumy; | |||
| long double sumxy = data->sumxy; | |||
| long double covar_samp = (sumxy - ((sumx * sumy) / N)) / (N - 1); | |||
| if (covar_samp < 0) // might be -0 | |||
| covar_samp = 0; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above block is repeated in so many places. It would be nice to make a function or a class and reuse it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I thought about a sub-class and function, but decided I didn't want to spend the cpu cycles for the call. Could be done easily enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case covar_samp could have a legitimate negative value, so I removed the check
var_pop is never negative. covar_pop may be negative Adjusted the checks for -0 to this reality
Some floating point rounding error is inevitable during regr_** function execution. In some cases, when a intermediate result should be 0, it may become some microscopic negative number. This patch checks for this.