Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Adjusted R2 #2624

Merged
merged 18 commits into from Dec 31, 2020
Merged

Added Adjusted R2 #2624

merged 18 commits into from Dec 31, 2020

Conversation

shawnbrar
Copy link
Contributor

#2572
Added functionality for calculating Adjusted R2 Score.

Copy link
Member

@kartikdutt18 kartikdutt18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar,
Some style fix suggestions as well as some personal preferences to make code look cleaner.
Thanks for working on this.

src/mlpack/core/cv/metrics/r2_score_impl.hpp Outdated Show resolved Hide resolved
Suggested change  by kartikdutt18

Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>
Copy link
Member

@kartikdutt18 kartikdutt18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar,
The build is currently failing. To fix that could you please add the adjR2 argument to function declaration in r2_score_impl.hpp in line 21. That should fix the build.
Thanks.

Added adjR2 argumnet
@shawnbrar
Copy link
Contributor Author

Hey @shawnbrar,
The build is currently failing. To fix that could you please add the adjR2 argument to function declaration in r2_score_impl.hpp in line 21. That should fix the build.
Thanks.

Thank you. Pretty dumb mistake.😁

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce, would be great to have a test for the added functionality.

* @return calculated R2 Score.
*/
template<typename MLAlgorithm, typename DataType, typename ResponsesType>
static double Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses);
const ResponsesType& responses,
const bool adjR2 = false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can be more explicit in the naming, what about adjustedRSquared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had actually thought of this and I came to the conclusion that it was just a long argument name. So I shortened it. But if you want rename it to adjustedRSquared, please do tell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with long(ish) argument names, or, at least, I don't think adjustedRSquared is too long. adj could also mean adjoint for someone unfamiliar. :)

src/mlpack/core/cv/metrics/r2_score_impl.hpp Outdated Show resolved Hide resolved
Copy link
Member

@kartikdutt18 kartikdutt18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, This should fix the build. I'll take another look when this is done. Thanks again.

X << 1 << 2 << 3 << 4 << 5 << 6 << arma::endr
<< 2 << 3 << 4 << 5 << 6 << 7 << arma::endr;
arma::rowvec Y;
y << 3 << 5 << 7 << 9 << 11 << 13;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y << 3 << 5 << 7 << 9 << 11 << 13;
Y << 3 << 5 << 7 << 9 << 11 << 13;


//Theoretically Adjusted R squared should be equal 1
double expAdjR2 = 1;
REQUIRE(std::abs(R2Score::Evaluate(lr, X, y) - expAdjR2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
REQUIRE(std::abs(R2Score::Evaluate(lr, X, y) - expAdjR2)
REQUIRE(std::abs(R2Score::Evaluate(lr, X, Y) - expAdjR2)

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar, thanks for taking the time to work on this! I mostly only have one comment about the use of the adjusted R2 score with the CV/HPT infrastructure. 👍

* @return calculated R2 Score.
*/
template<typename MLAlgorithm, typename DataType, typename ResponsesType>
static double Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses);
const ResponsesType& responses,
const bool adjR2 = false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with long(ish) argument names, or, at least, I don't think adjustedRSquared is too long. adj could also mean adjoint for someone unfamiliar. :)

src/mlpack/core/cv/metrics/r2_score.hpp Outdated Show resolved Hide resolved
@rcurtin
Copy link
Member

rcurtin commented Oct 16, 2020

Hey @shawnbrar, I'd be happy to make this a part of the next release if you want to handle the couple comments. Let me know what you think. 👍 (There's also no rush, it can happen later; I'm just trying to figure out if we should add this to the mlpack 3.4.2 milestone. :))

@shawnbrar
Copy link
Contributor Author

Hello @rcurtin, I am really sorry for not replying a little early. This is because I have just moved to France for my higher studies. Also, I don't have access to a decent computer in the university library.
If you want to add the template parameter, you can add it, as, I don't have any experience writing templates. And I don't want to mess up the build.

@rcurtin
Copy link
Member

rcurtin commented Oct 27, 2020

@shawnbrar no worries, hopefully the move went well! I'll see if I have a chance at some point in the future (but it may not be that soon). In this case I think we need to use the templated approach if we want to be able to use adjusted R2 from the CV/HPT system. 👍

@shawnbrar
Copy link
Contributor Author

@shawnbrar no worries, hopefully the move went well! I'll see if I have a chance at some point in the future (but it may not be that soon). In this case I think we need to use the templated approach if we want to be able to use adjusted R2 from the CV/HPT system.

Dear @rcurtin , Now I have access to a good system and I will be able to complete the changes required. Just wanted to be sure that I only have to add the template argument and its functionality?

@rcurtin
Copy link
Member

rcurtin commented Nov 21, 2020

Hey @shawnbrar, great! And yeah, I think that all we should need here is to transform the adjR2 parameter into a template parameter. 👍

@shawnbrar
Copy link
Contributor Author

Hey @shawnbrar, great! And yeah, I think that all we should need here is to transform the adjR2 parameter into a template parameter.

Dear @rcurtin , I have added the boolean template parameter. However, I don't how to differentiate the R2Score<true> from R2Score<false> using doxygen. So the documentation might not be the best possible one.


template<bool adjustedR2> class R2Score;

template<> class R2Score<false>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:

template<bool AdjustedR2>
class R2Score

and then in the implementation of Evaluate(), you can change the bottom to this:

    if (AdjustedR2)
    {
      // Handling undefined R2 Score when both denominator and numerator is 0.0.
      if (residualSumSquared == 0.0)
        return totalSumSquared ? 1.0 : DBL_MIN;
      // Returning adjusted R-squared.
      double rsq = 1 - (residualSumSquared / totalSumSquared);
      return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
    }
    else
    {
      // Returning R-squared
      return 1 - residualSumSquared / totalSumSquared;
    }

and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the if (AdjustedR2) won't actually be run when the program is executed---only the correct branch will be run!

Would you mind refactoring it to try this? It should result in a significantly shorter diff. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @rcurtin , sure, even I was thinking of a way which would have been shorter but like I said I am not an experienced programmer in C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:

template<bool AdjustedR2>
class R2Score

and then in the implementation of Evaluate(), you can change the bottom to this:

    if (AdjustedR2)
    {
      // Handling undefined R2 Score when both denominator and numerator is 0.0.
      if (residualSumSquared == 0.0)
        return totalSumSquared ? 1.0 : DBL_MIN;
      // Returning adjusted R-squared.
      double rsq = 1 - (residualSumSquared / totalSumSquared);
      return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
    }
    else
    {
      // Returning R-squared
      return 1 - residualSumSquared / totalSumSquared;
    }

and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the if (AdjustedR2) won't actually be run when the program is executed---only the correct branch will be run!

Would you mind refactoring it to try this? It should result in a significantly shorter diff.

Dear @rcurtin , I have removed the template specialization and made it the way you had asked for.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shawnbrar, this looks great! Thanks for taking the time to update it. And don't worry, C++ is a complex language (especially once templates are involved) and takes a long time to learn.

Do you want to add a note to HISTORY.md documenting this new functionality? I think it looks great otherwise, if you want to accept my suggestions (or make similar changes, up to you). 👍

src/mlpack/core/cv/metrics/r2_score.hpp Show resolved Hide resolved
class R2Score
{
public:
/**
* Run prediction and calculate the R squared error.
* Run prediction and calculate the R squared or Adjusted R sauared error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Run prediction and calculate the R squared or Adjusted R sauared error.
* Run prediction and calculate the R squared or Adjusted R squared error.

Quick typo fix. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was probably because I am still not very used to using an AZERTY keyboard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I've never used an AZERTY keyboard. I think that would be really hard on my hands. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nothing says welcome to France better than this. :)

Comment on lines 20 to 22
double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses)
double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model,
const DataType& data,
const ResponsesType& responses)

This should make things line up correctly. 👍

return totalSumSquared ? 1.0 : DBL_MIN;
// Returning adjusted R-squared.
double rsq = 1 - (residualSumSquared / totalSumSquared);
return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
return (1 - ((1 - rsq) * ((data.n_cols - 1) /
(data.n_cols - data.n_rows - 1))));

This line was longer than 80 characters, so I wrapped it. 👍

<= 1e-7);
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

No need for two blank lines---one will be fine. 👍

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I resolved the merge in HISTORY.md. Now, everything should hopefully build correctly. Thanks for adding this support! 👍

Copy link
Member

@kartikdutt18 kartikdutt18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well. Sorry I haven't been able to review this PR in a while. Thanks a lot for adding this feature.
Regards.

@mlpack-bot
Copy link

mlpack-bot bot commented Nov 26, 2020

Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in COPYRIGHT.txt and you haven't already, please feel free to push a change to this PR---or, if it gets merged before you can, feel free to open another PR.

In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍


LinearRegression lr(X, Y);

//Theoretically Adjusted R squared should be equal 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//Theoretically Adjusted R squared should be equal 1
// Theoretically Adjusted R squared should be equal 1.

Insert an extra space right after // and a stop at the end.

TEST_CASE("AdjR2ScoreTest", "[CVTest]")
{
// Making two variables that define the linear function is
// f(x1, x2) = x1 + x2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// f(x1, x2) = x1 + x2
// f(x1, x2) = x1 + x2.

Add stop at the end to be consistent with the rest of the codebase.

== Approx(expectedR2).epsilon(1e-7));
}

/**
* Test the Adjusted R squared metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Test the Adjusted R squared metric
* Test the Adjusted R squared metric.

Add stop to be consistent with the rest of the codebase.

if (AdjustedR2)
{
// Handling undefined R2 Score when both denominator and numerator is 0.0.
if (residualSumSquared == 0.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be residualSumSquared == 0.0 || totalSumSquared == 0.0? Because if totalSumSquared is 0 the output is undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @zoq Thanks for pointing it out. First of all I think, I have placed the // Handling undefined R2 Score... part inside the if (AdjustedR2) by mistake. It should be above and outside it.

Second, since I do not know what DBL_MIN means, I really don't know if residualSumSquared == 0.0 || totalSumSquared == 0.0 should be put in the if condition. If you could tell me the meaning of DBL_MIN, I might be able to help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBL_MIN is the smallest positive normal double, it's an alias for std::numeric_limits<double>::min - https://en.cppreference.com/w/cpp/types/numeric_limits/min, hope that helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so AdjustedR2 or R2 becomes undefined only if totalSumSquared is equal to zero. If residualSumSquared equals to zero then they are equal to 1. Hence what I would suggest is

if (totalSumSquared = 0)
return DBL_MIN;
else if (residualSumSquared = 0)
return 1;

I am checking totalSumSquared first just in case if totalSumSquared and residualSumSquared are both equal to zero, then, the answer should be still undefined.

I hope it makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is correctly handled on lines 46 and 47 now. 👍

@mlpack-bot
Copy link

mlpack-bot bot commented Dec 30, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added s: stale and removed s: stale labels Dec 30, 2020
@rcurtin rcurtin merged commit 56ac3d2 into mlpack:master Dec 31, 2020
@rcurtin
Copy link
Member

rcurtin commented Dec 31, 2020

Thanks @shawnbrar! Sorry this sat for so long before merge. 👍

This was referenced Oct 14, 2022
@rcurtin rcurtin mentioned this pull request Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants