New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Adjusted R2 #2624
Added Adjusted R2 #2624
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shawnbrar,
Some style fix suggestions as well as some personal preferences to make code look cleaner.
Thanks for working on this.
Suggested change by kartikdutt18 Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shawnbrar,
The build is currently failing. To fix that could you please add the adjR2
argument to function declaration in r2_score_impl.hpp
in line 21. That should fix the build.
Thanks.
Added adjR2 argumnet
Thank you. Pretty dumb mistake.😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIce, would be great to have a test for the added functionality.
* @return calculated R2 Score. | ||
*/ | ||
template<typename MLAlgorithm, typename DataType, typename ResponsesType> | ||
static double Evaluate(MLAlgorithm& model, | ||
const DataType& data, | ||
const ResponsesType& responses); | ||
const ResponsesType& responses, | ||
const bool adjR2 = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be more explicit in the naming, what about adjustedRSquared
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had actually thought of this and I came to the conclusion that it was just a long argument name. So I shortened it. But if you want rename it to adjustedRSquared
, please do tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with long(ish) argument names, or, at least, I don't think adjustedRSquared
is too long. adj
could also mean adjoint
for someone unfamiliar. :)
Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, This should fix the build. I'll take another look when this is done. Thanks again.
src/mlpack/tests/cv_test.cpp
Outdated
X << 1 << 2 << 3 << 4 << 5 << 6 << arma::endr | ||
<< 2 << 3 << 4 << 5 << 6 << 7 << arma::endr; | ||
arma::rowvec Y; | ||
y << 3 << 5 << 7 << 9 << 11 << 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y << 3 << 5 << 7 << 9 << 11 << 13; | |
Y << 3 << 5 << 7 << 9 << 11 << 13; |
src/mlpack/tests/cv_test.cpp
Outdated
|
||
//Theoretically Adjusted R squared should be equal 1 | ||
double expAdjR2 = 1; | ||
REQUIRE(std::abs(R2Score::Evaluate(lr, X, y) - expAdjR2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REQUIRE(std::abs(R2Score::Evaluate(lr, X, y) - expAdjR2) | |
REQUIRE(std::abs(R2Score::Evaluate(lr, X, Y) - expAdjR2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shawnbrar, thanks for taking the time to work on this! I mostly only have one comment about the use of the adjusted R2 score with the CV/HPT infrastructure. 👍
* @return calculated R2 Score. | ||
*/ | ||
template<typename MLAlgorithm, typename DataType, typename ResponsesType> | ||
static double Evaluate(MLAlgorithm& model, | ||
const DataType& data, | ||
const ResponsesType& responses); | ||
const ResponsesType& responses, | ||
const bool adjR2 = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with long(ish) argument names, or, at least, I don't think adjustedRSquared
is too long. adj
could also mean adjoint
for someone unfamiliar. :)
Hey @shawnbrar, I'd be happy to make this a part of the next release if you want to handle the couple comments. Let me know what you think. 👍 (There's also no rush, it can happen later; I'm just trying to figure out if we should add this to the mlpack 3.4.2 milestone. :)) |
Hello @rcurtin, I am really sorry for not replying a little early. This is because I have just moved to France for my higher studies. Also, I don't have access to a decent computer in the university library. |
@shawnbrar no worries, hopefully the move went well! I'll see if I have a chance at some point in the future (but it may not be that soon). In this case I think we need to use the templated approach if we want to be able to use adjusted R2 from the CV/HPT system. 👍 |
Dear @rcurtin , Now I have access to a good system and I will be able to complete the changes required. Just wanted to be sure that I only have to add the template argument and its functionality? |
Hey @shawnbrar, great! And yeah, I think that all we should need here is to transform the |
Dear @rcurtin , I have added the boolean template parameter. However, I don't how to differentiate the |
|
||
template<bool adjustedR2> class R2Score; | ||
|
||
template<> class R2Score<false> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:
template<bool AdjustedR2>
class R2Score
and then in the implementation of Evaluate()
, you can change the bottom to this:
if (AdjustedR2)
{
// Handling undefined R2 Score when both denominator and numerator is 0.0.
if (residualSumSquared == 0.0)
return totalSumSquared ? 1.0 : DBL_MIN;
// Returning adjusted R-squared.
double rsq = 1 - (residualSumSquared / totalSumSquared);
return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
}
else
{
// Returning R-squared
return 1 - residualSumSquared / totalSumSquared;
}
and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the if (AdjustedR2)
won't actually be run when the program is executed---only the correct branch will be run!
Would you mind refactoring it to try this? It should result in a significantly shorter diff. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @rcurtin , sure, even I was thinking of a way which would have been shorter but like I said I am not an experienced programmer in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:
template<bool AdjustedR2> class R2Score
and then in the implementation of
Evaluate()
, you can change the bottom to this:if (AdjustedR2) { // Handling undefined R2 Score when both denominator and numerator is 0.0. if (residualSumSquared == 0.0) return totalSumSquared ? 1.0 : DBL_MIN; // Returning adjusted R-squared. double rsq = 1 - (residualSumSquared / totalSumSquared); return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1)))); } else { // Returning R-squared return 1 - residualSumSquared / totalSumSquared; }
and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the
if (AdjustedR2)
won't actually be run when the program is executed---only the correct branch will be run!Would you mind refactoring it to try this? It should result in a significantly shorter diff.
Dear @rcurtin , I have removed the template specialization and made it the way you had asked for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shawnbrar, this looks great! Thanks for taking the time to update it. And don't worry, C++ is a complex language (especially once templates are involved) and takes a long time to learn.
Do you want to add a note to HISTORY.md
documenting this new functionality? I think it looks great otherwise, if you want to accept my suggestions (or make similar changes, up to you). 👍
class R2Score | ||
{ | ||
public: | ||
/** | ||
* Run prediction and calculate the R squared error. | ||
* Run prediction and calculate the R squared or Adjusted R sauared error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Run prediction and calculate the R squared or Adjusted R sauared error. | |
* Run prediction and calculate the R squared or Adjusted R squared error. |
Quick typo fix. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was probably because I am still not very used to using an AZERTY keyboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, I've never used an AZERTY keyboard. I think that would be really hard on my hands. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, nothing says welcome to France better than this. :)
double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model, | ||
const DataType& data, | ||
const ResponsesType& responses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model, | |
const DataType& data, | |
const ResponsesType& responses) | |
double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model, | |
const DataType& data, | |
const ResponsesType& responses) |
This should make things line up correctly. 👍
return totalSumSquared ? 1.0 : DBL_MIN; | ||
// Returning adjusted R-squared. | ||
double rsq = 1 - (residualSumSquared / totalSumSquared); | ||
return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1)))); | |
return (1 - ((1 - rsq) * ((data.n_cols - 1) / | |
(data.n_cols - data.n_rows - 1)))); |
This line was longer than 80 characters, so I wrapped it. 👍
src/mlpack/tests/cv_test.cpp
Outdated
<= 1e-7); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for two blank lines---one will be fine. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I resolved the merge in HISTORY.md
. Now, everything should hopefully build correctly. Thanks for adding this support! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well. Sorry I haven't been able to review this PR in a while. Thanks a lot for adding this feature.
Regards.
Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍 |
src/mlpack/tests/cv_test.cpp
Outdated
|
||
LinearRegression lr(X, Y); | ||
|
||
//Theoretically Adjusted R squared should be equal 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//Theoretically Adjusted R squared should be equal 1 | |
// Theoretically Adjusted R squared should be equal 1. |
Insert an extra space right after //
and a stop at the end.
src/mlpack/tests/cv_test.cpp
Outdated
TEST_CASE("AdjR2ScoreTest", "[CVTest]") | ||
{ | ||
// Making two variables that define the linear function is | ||
// f(x1, x2) = x1 + x2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// f(x1, x2) = x1 + x2 | |
// f(x1, x2) = x1 + x2. |
Add stop at the end to be consistent with the rest of the codebase.
src/mlpack/tests/cv_test.cpp
Outdated
== Approx(expectedR2).epsilon(1e-7)); | ||
} | ||
|
||
/** | ||
* Test the Adjusted R squared metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Test the Adjusted R squared metric | |
* Test the Adjusted R squared metric. |
Add stop to be consistent with the rest of the codebase.
if (AdjustedR2) | ||
{ | ||
// Handling undefined R2 Score when both denominator and numerator is 0.0. | ||
if (residualSumSquared == 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be residualSumSquared == 0.0 || totalSumSquared == 0.0
? Because if totalSumSquared
is 0 the output is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @zoq Thanks for pointing it out. First of all I think, I have placed the // Handling undefined R2 Score...
part inside the if (AdjustedR2)
by mistake. It should be above and outside it.
Second, since I do not know what DBL_MIN
means, I really don't know if residualSumSquared == 0.0 || totalSumSquared == 0.0
should be put in the if condition. If you could tell me the meaning of DBL_MIN
, I might be able to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DBL_MIN
is the smallest positive normal double, it's an alias for std::numeric_limits<double>::min
- https://en.cppreference.com/w/cpp/types/numeric_limits/min, hope that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so AdjustedR2
or R2
becomes undefined only if totalSumSquared
is equal to zero. If residualSumSquared
equals to zero then they are equal to 1. Hence what I would suggest is
if (totalSumSquared = 0)
return DBL_MIN;
else if (residualSumSquared = 0)
return 1;
I am checking totalSumSquared
first just in case if totalSumSquared
and residualSumSquared
are both equal to zero, then, the answer should be still undefined.
I hope it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is correctly handled on lines 46 and 47 now. 👍
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍 |
Thanks @shawnbrar! Sorry this sat for so long before merge. 👍 |
#2572
Added functionality for calculating Adjusted R2 Score.