Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Adjusted R2 #2624

Merged
merged 18 commits into from
Dec 31, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
29 changes: 28 additions & 1 deletion src/mlpack/core/cv/metrics/r2_score.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,10 @@ namespace cv {
* For example, a model having R2Score = 0.85, explains 85 \% variability of
* the response data around its mean.
*/
class R2Score

template<bool adjustedR2> class R2Score;

template<> class R2Score<false>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:

template<bool AdjustedR2>
class R2Score

and then in the implementation of Evaluate(), you can change the bottom to this:

    if (AdjustedR2)
    {
      // Handling undefined R2 Score when both denominator and numerator is 0.0.
      if (residualSumSquared == 0.0)
        return totalSumSquared ? 1.0 : DBL_MIN;
      // Returning adjusted R-squared.
      double rsq = 1 - (residualSumSquared / totalSumSquared);
      return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
    }
    else
    {
      // Returning R-squared
      return 1 - residualSumSquared / totalSumSquared;
    }

and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the if (AdjustedR2) won't actually be run when the program is executed---only the correct branch will be run!

Would you mind refactoring it to try this? It should result in a significantly shorter diff. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @rcurtin , sure, even I was thinking of a way which would have been shorter but like I said I am not an experienced programmer in C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:

template<bool AdjustedR2>
class R2Score

and then in the implementation of Evaluate(), you can change the bottom to this:

    if (AdjustedR2)
    {
      // Handling undefined R2 Score when both denominator and numerator is 0.0.
      if (residualSumSquared == 0.0)
        return totalSumSquared ? 1.0 : DBL_MIN;
      // Returning adjusted R-squared.
      double rsq = 1 - (residualSumSquared / totalSumSquared);
      return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
    }
    else
    {
      // Returning R-squared
      return 1 - residualSumSquared / totalSumSquared;
    }

and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the if (AdjustedR2) won't actually be run when the program is executed---only the correct branch will be run!

Would you mind refactoring it to try this? It should result in a significantly shorter diff.

Dear @rcurtin , I have removed the template specialization and made it the way you had asked for.

{
public:
/**
Expand All @@ -67,6 +70,30 @@ class R2Score
static const bool NeedsMinimization = false;
};

template<> class R2Score<true>
{
public:
/**
* Run prediction and calculate the Adjusted R squared error.
*
* @param model A regression model.
* @param data Column-major data containing test items.
* @param responses Ground truth (correct) target values for the test items,
* should be either a row vector or a column-major matrix.
* @return calculated Ajusted R2 Score.
*/
template<typename MLAlgorithm, typename DataType, typename ResponsesType>
static double Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses);

/**
* Information for hyper-parameter tuning code. It indicates that we want
* to maximize the measurement.
*/
static const bool NeedsMinimization = false;
};

} // namespace cv
} // namespace mlpack

Expand Down
40 changes: 38 additions & 2 deletions src/mlpack/core/cv/metrics/r2_score_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ namespace mlpack {
namespace cv {

template<typename MLAlgorithm, typename DataType, typename ResponsesType>
double R2Score::Evaluate(MLAlgorithm& model,
double R2Score<false>::Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses)
{
Expand Down Expand Up @@ -45,10 +45,46 @@ double R2Score::Evaluate(MLAlgorithm& model,
// Handling undefined R2 Score when both denominator and numerator is 0.0.
if (residualSumSquared == 0.0)
return totalSumSquared ? 1.0 : DBL_MIN;


// Returning R-squared
return 1 - residualSumSquared / totalSumSquared;
}

template<typename MLAlgorithm, typename DataType, typename ResponsesType>
double R2Score<true>::Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses)
{
if (data.n_cols != responses.n_cols)
{
std::ostringstream oss;
oss << "R2Score::Evaluate(): number of points (" << data.n_cols << ") "
<< "does not match number of responses (" << responses.n_cols << ")!"
<< std::endl;
throw std::invalid_argument(oss.str());
}

ResponsesType predictedResponses;
// Taking Predicted Output from the model.
model.Predict(data, predictedResponses);
// Mean value of response.
double meanResponses = arma::mean(responses);

// Calculate the numerator i.e. residual sum of squares.
double residualSumSquared = arma::accu(arma::square(responses -
predictedResponses));

// Calculate the denominator i.e.total sum of squares.
double totalSumSquared = arma::accu(arma::square(responses - meanResponses));

// Handling undefined R2 Score when both denominator and numerator is 0.0.
if (residualSumSquared == 0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be residualSumSquared == 0.0 || totalSumSquared == 0.0? Because if totalSumSquared is 0 the output is undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @zoq Thanks for pointing it out. First of all I think, I have placed the // Handling undefined R2 Score... part inside the if (AdjustedR2) by mistake. It should be above and outside it.

Second, since I do not know what DBL_MIN means, I really don't know if residualSumSquared == 0.0 || totalSumSquared == 0.0 should be put in the if condition. If you could tell me the meaning of DBL_MIN, I might be able to help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBL_MIN is the smallest positive normal double, it's an alias for std::numeric_limits<double>::min - https://en.cppreference.com/w/cpp/types/numeric_limits/min, hope that helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so AdjustedR2 or R2 becomes undefined only if totalSumSquared is equal to zero. If residualSumSquared equals to zero then they are equal to 1. Hence what I would suggest is

if (totalSumSquared = 0)
return DBL_MIN;
else if (residualSumSquared = 0)
return 1;

I am checking totalSumSquared first just in case if totalSumSquared and residualSumSquared are both equal to zero, then, the answer should be still undefined.

I hope it makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is correctly handled on lines 46 and 47 now. 👍

return totalSumSquared ? 1.0 : DBL_MIN;
// Returning adjusted R-squared.
double rsq = 1 - (residualSumSquared / totalSumSquared);
return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
return (1 - ((1 - rsq) * ((data.n_cols - 1) /
(data.n_cols - data.n_rows - 1))));

This line was longer than 80 characters, so I wrapped it. 👍

}

} // namespace cv
} // namespace mlpack

Expand Down
24 changes: 23 additions & 1 deletion src/mlpack/tests/cv_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -187,10 +187,32 @@ TEST_CASE("R2ScoreTest", "[CVTest]")

double expectedR2 = 0.99999779;

REQUIRE(R2Score::Evaluate(lr, data, responses)
REQUIRE(R2Score<false>::Evaluate(lr, data, responses)
== Approx(expectedR2).epsilon(1e-7));
}

/**
* Test the Adjusted R squared metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Test the Adjusted R squared metric
* Test the Adjusted R squared metric.

Add stop to be consistent with the rest of the codebase.

*/
TEST_CASE("AdjR2ScoreTest", "[CVTest]")
{
// Making two variables that define the linear function is
// f(x1, x2) = x1 + x2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// f(x1, x2) = x1 + x2
// f(x1, x2) = x1 + x2.

Add stop at the end to be consistent with the rest of the codebase.

arma::mat X;
X << 1 << 2 << 3 << 4 << 5 << 6 << arma::endr
<< 2 << 3 << 4 << 5 << 6 << 7 << arma::endr;
arma::rowvec Y;
Y << 3 << 5 << 7 << 9 << 11 << 13;

LinearRegression lr(X, Y);

//Theoretically Adjusted R squared should be equal 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//Theoretically Adjusted R squared should be equal 1
// Theoretically Adjusted R squared should be equal 1.

Insert an extra space right after // and a stop at the end.

double expAdjR2 = 1;
REQUIRE(std::abs(R2Score<true>::Evaluate(lr, X, Y) - expAdjR2)
<= 1e-7);
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

No need for two blank lines---one will be fine. 👍

/**
* Test the mean squared error with matrix responses.
*/
Expand Down