Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE MCOL-4516 Check for floating point rounding error in regr_** #1776

Closed

Conversation

davidhall586
Copy link
Contributor

Some floating point rounding error is inevitable during regr_** function execution. In some cases, when a intermediate result should be 0, it may become some microscopic negative number. This patch checks for this.

Some floating point rounding error is inevitable during regr_** function execution. In some cases, when a intermediate result should be 0, it may become some microscopic negative number. This patch checks for this.
In the first fix, the algorithm was wrong for sxx
Copy link
Contributor

@abarkov abarkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

David, please add "HOW TO REPEAT" instructions into MCOL and corresponding mtr tests into this patch demonstrating that the problem does not exists any more.

@@ -154,7 +154,7 @@ mcsv1_UDAF::ReturnCode regr_intercept::evaluate(mcsv1Context* context, static_an
long double sumxy = data->sumxy;
long double numerator = sumy * sumx2 - sumx * sumxy;
long double var_pop = (N * sumx2) - (sumx * sumx);
if (var_pop != 0)
if (var_pop != 0 && var_pop != -0)
{
Copy link
Contributor

@abarkov abarkov Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is meaningless I am afraid:
0 and -0 are integer (not DOUBLE!) literals and both are equal to INT(0), and then both are converted to DOUBLE for comparison.

So possibly you meant:

if (var_pop != 0e0 && var_pop != -0e0)

But this is not really needed because 0e0 and -0e0 are equal to each other although they are not binary equal.
This program proves:

#include <stdio.h>
#include <cstring>

int main()
{
  double a= -0e0;
  double b= +0e0;
  printf("memcmp=%d eq=%d\n", memcmp(&a, &b, sizeof(a)), a == b);
  return 0;
}

And the output is:

memcmp=128 eq=1

I guess the old version of this line was good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it appears to not compare properly. I don't know if this is an artifact of other bits being different or perhaps a float number so small in abs value that it prints as -0. Whatever the case, we can use the fact that var_pop should never be < 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some research, it turns out var_pop cannot be negative, but covar_pop may be negative. The changes assume <0 should be 0 for var_pop and any negative values for covar_pop remain as calculated. This does allow for -0 to be printed in the answer for some functions.

@@ -154,7 +154,7 @@ mcsv1_UDAF::ReturnCode regr_slope::evaluate(mcsv1Context* context, static_any::a
long double sumxy = data->sumxy;
long double covar_pop = N * sumxy - sumx * sumy;
long double var_pop = N * sumx2 - sumx * sumx;
if (var_pop != 0)
if (var_pop != 0 && var_pop != -0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -537,7 +537,7 @@ extern "C"
long double sumxy = data->sumxy;
long double covar_pop = N * sumxy - sumx * sumy;
long double var_pop = N * sumx2 - sumx * sumx;
if (var_pop != 0)
if (var_pop != 0 && var_pop != -0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -657,7 +657,7 @@ extern "C"
long double sumxy = data->sumxy;
long double numerator = sumy * sumx2 - sumx * sumxy;
long double var_pop = (N * sumx2) - (sumx * sumx);
if (var_pop != 0)
if (var_pop != 0 && var_pop != -0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -1494,6 +1502,8 @@ extern "C"
long double sumy = data->sumy;
long double sumxy = data->sumxy;
long double covar_samp = (sumxy - ((sumx * sumy) / N)) / (N - 1);
if (covar_samp < 0) // might be -0
covar_samp = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above block is repeated in so many places. It would be nice to make a function or a class and reuse it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I thought about a sub-class and function, but decided I didn't want to spend the cpu cycles for the call. Could be done easily enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case covar_samp could have a legitimate negative value, so I removed the check

@davidhall586 davidhall586 changed the title MCOL-4516 Check for floating point rounding error in regr_** DO NOT MERGE MCOL-4516 Check for floating point rounding error in regr_** Mar 9, 2021
@drrtuy drrtuy closed this Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants