Added Adjusted R2 #2624

shawnbrar · 2020-09-17T03:37:07Z

#2572
Added functionality for calculating Adjusted R2 Score.

kartikdutt18

Hey @shawnbrar,
Some style fix suggestions as well as some personal preferences to make code look cleaner.
Thanks for working on this.

src/mlpack/core/cv/metrics/r2_score_impl.hpp

Suggested change by kartikdutt18 Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>

kartikdutt18

Hey @shawnbrar,
The build is currently failing. To fix that could you please add the adjR2 argument to function declaration in r2_score_impl.hpp in line 21. That should fix the build.
Thanks.

Added adjR2 argumnet

shawnbrar · 2020-09-20T06:52:34Z

Hey @shawnbrar,
The build is currently failing. To fix that could you please add the adjR2 argument to function declaration in r2_score_impl.hpp in line 21. That should fix the build.
Thanks.

Thank you. Pretty dumb mistake.😁

zoq

NIce, would be great to have a test for the added functionality.

zoq · 2020-09-24T15:39:14Z

src/mlpack/core/cv/metrics/r2_score.hpp

   * @return calculated R2 Score.
   */
  template<typename MLAlgorithm, typename DataType, typename ResponsesType>
  static double Evaluate(MLAlgorithm& model,
                         const DataType& data,
-                         const ResponsesType& responses);
+                         const ResponsesType& responses,
+                         const bool adjR2 = false);


I think we can be more explicit in the naming, what about adjustedRSquared?

I had actually thought of this and I came to the conclusion that it was just a long argument name. So I shortened it. But if you want rename it to adjustedRSquared, please do tell.

I don't see anything wrong with long(ish) argument names, or, at least, I don't think adjustedRSquared is too long. adj could also mean adjoint for someone unfamiliar. :)

src/mlpack/core/cv/metrics/r2_score_impl.hpp

Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de>

kartikdutt18

Hey, This should fix the build. I'll take another look when this is done. Thanks again.

kartikdutt18 · 2020-10-02T15:18:16Z

src/mlpack/tests/cv_test.cpp

+  X << 1 << 2 << 3 << 4 << 5 << 6 << arma::endr
+    << 2 << 3 << 4 << 5 << 6 << 7 << arma::endr;
+  arma::rowvec Y;
+  y << 3 << 5 << 7 << 9 << 11 << 13;


Suggested change

y << 3 << 5 << 7 << 9 << 11 << 13;

Y << 3 << 5 << 7 << 9 << 11 << 13;

kartikdutt18 · 2020-10-02T15:18:42Z

src/mlpack/tests/cv_test.cpp

+
+  //Theoretically Adjusted R squared should be equal 1
+  double expAdjR2 = 1;
+  REQUIRE(std::abs(R2Score::Evaluate(lr, X, y) - expAdjR2)


Suggested change

REQUIRE(std::abs(R2Score::Evaluate(lr, X, y) - expAdjR2)

REQUIRE(std::abs(R2Score::Evaluate(lr, X, Y) - expAdjR2)

rcurtin

Hey @shawnbrar, thanks for taking the time to work on this! I mostly only have one comment about the use of the adjusted R2 score with the CV/HPT infrastructure. 👍

rcurtin · 2020-10-06T01:58:26Z

src/mlpack/core/cv/metrics/r2_score.hpp

   * @return calculated R2 Score.
   */
  template<typename MLAlgorithm, typename DataType, typename ResponsesType>
  static double Evaluate(MLAlgorithm& model,
                         const DataType& data,
-                         const ResponsesType& responses);
+                         const ResponsesType& responses,
+                         const bool adjR2 = false);


I don't see anything wrong with long(ish) argument names, or, at least, I don't think adjustedRSquared is too long. adj could also mean adjoint for someone unfamiliar. :)

src/mlpack/core/cv/metrics/r2_score.hpp

rcurtin · 2020-10-16T22:16:58Z

Hey @shawnbrar, I'd be happy to make this a part of the next release if you want to handle the couple comments. Let me know what you think. 👍 (There's also no rush, it can happen later; I'm just trying to figure out if we should add this to the mlpack 3.4.2 milestone. :))

shawnbrar · 2020-10-21T09:50:57Z

Hello @rcurtin, I am really sorry for not replying a little early. This is because I have just moved to France for my higher studies. Also, I don't have access to a decent computer in the university library.
If you want to add the template parameter, you can add it, as, I don't have any experience writing templates. And I don't want to mess up the build.

rcurtin · 2020-10-27T00:09:51Z

@shawnbrar no worries, hopefully the move went well! I'll see if I have a chance at some point in the future (but it may not be that soon). In this case I think we need to use the templated approach if we want to be able to use adjusted R2 from the CV/HPT system. 👍

shawnbrar · 2020-11-19T13:07:01Z

@shawnbrar no worries, hopefully the move went well! I'll see if I have a chance at some point in the future (but it may not be that soon). In this case I think we need to use the templated approach if we want to be able to use adjusted R2 from the CV/HPT system.

Dear @rcurtin , Now I have access to a good system and I will be able to complete the changes required. Just wanted to be sure that I only have to add the template argument and its functionality?

rcurtin · 2020-11-21T22:12:05Z

Hey @shawnbrar, great! And yeah, I think that all we should need here is to transform the adjR2 parameter into a template parameter. 👍

shawnbrar · 2020-11-23T16:50:36Z

Hey @shawnbrar, great! And yeah, I think that all we should need here is to transform the adjR2 parameter into a template parameter.

Dear @rcurtin , I have added the boolean template parameter. However, I don't how to differentiate the R2Score<true> from R2Score<false> using doxygen. So the documentation might not be the best possible one.

rcurtin · 2020-11-25T00:25:49Z

src/mlpack/core/cv/metrics/r2_score.hpp

+
+template<bool adjustedR2> class R2Score;
+
+template<> class R2Score<false>


Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:

template<bool AdjustedR2> class R2Score

and then in the implementation of Evaluate(), you can change the bottom to this:

if (AdjustedR2) { // Handling undefined R2 Score when both denominator and numerator is 0.0. if (residualSumSquared == 0.0) return totalSumSquared ? 1.0 : DBL_MIN; // Returning adjusted R-squared. double rsq = 1 - (residualSumSquared / totalSumSquared); return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1)))); } else { // Returning R-squared return 1 - residualSumSquared / totalSumSquared; }

and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the if (AdjustedR2) won't actually be run when the program is executed---only the correct branch will be run!

Would you mind refactoring it to try this? It should result in a significantly shorter diff. 👍

Dear @rcurtin , sure, even I was thinking of a way which would have been shorter but like I said I am not an experienced programmer in C++.

Hey @shawnbrar, thanks for taking the time to add this template parameter! Actually, I think you don't need template specialization here. All you should need to do is declare the class as:

template<bool AdjustedR2> class R2Score

and then in the implementation of Evaluate(), you can change the bottom to this:

if (AdjustedR2) { // Handling undefined R2 Score when both denominator and numerator is 0.0. if (residualSumSquared == 0.0) return totalSumSquared ? 1.0 : DBL_MIN; // Returning adjusted R-squared. double rsq = 1 - (residualSumSquared / totalSumSquared); return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1)))); } else { // Returning R-squared return 1 - residualSumSquared / totalSumSquared; }

and that should be all that's necessary. The nice thing about templates is that that code above will actually be compiled into two different functions at compile time, so the if (AdjustedR2) won't actually be run when the program is executed---only the correct branch will be run!

Would you mind refactoring it to try this? It should result in a significantly shorter diff.

Dear @rcurtin , I have removed the template specialization and made it the way you had asked for.

rcurtin

Hey @shawnbrar, this looks great! Thanks for taking the time to update it. And don't worry, C++ is a complex language (especially once templates are involved) and takes a long time to learn.

Do you want to add a note to HISTORY.md documenting this new functionality? I think it looks great otherwise, if you want to accept my suggestions (or make similar changes, up to you). 👍

src/mlpack/core/cv/metrics/r2_score.hpp

rcurtin · 2020-11-25T23:15:29Z

src/mlpack/core/cv/metrics/r2_score.hpp

 class R2Score
 {
 public:
  /**
-   * Run prediction and calculate the R squared error.
+   * Run prediction and calculate the R squared or Adjusted R sauared error.


Suggested change

* Run prediction and calculate the R squared or Adjusted R sauared error.

* Run prediction and calculate the R squared or Adjusted R squared error.

Quick typo fix. :)

Sorry, this was probably because I am still not very used to using an AZERTY keyboard.

Oh wow, I've never used an AZERTY keyboard. I think that would be really hard on my hands. 😄

Yes, nothing says welcome to France better than this. :)

rcurtin · 2020-11-25T23:16:35Z

src/mlpack/core/cv/metrics/r2_score_impl.hpp

+double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model,
                         const DataType& data,
                         const ResponsesType& responses)


Suggested change

double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model,

const DataType& data,

const ResponsesType& responses)

double R2Score<AdjustedR2>::Evaluate(MLAlgorithm& model,

const DataType& data,

const ResponsesType& responses)

This should make things line up correctly. 👍

rcurtin · 2020-11-25T23:17:24Z

src/mlpack/core/cv/metrics/r2_score_impl.hpp

+      return totalSumSquared ? 1.0 : DBL_MIN;
+    // Returning adjusted R-squared.
+    double rsq = 1 - (residualSumSquared / totalSumSquared);
+    return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));


Suggested change

return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));

return (1 - ((1 - rsq) * ((data.n_cols - 1) /

(data.n_cols - data.n_rows - 1))));

This line was longer than 80 characters, so I wrapped it. 👍

rcurtin · 2020-11-25T23:18:37Z

src/mlpack/tests/cv_test.cpp

+          <= 1e-7);
+}
+
+


Suggested change

No need for two blank lines---one will be fine. 👍

rcurtin

I resolved the merge in HISTORY.md. Now, everything should hopefully build correctly. Thanks for adding this support! 👍

kartikdutt18

Looks good to me as well. Sorry I haven't been able to review this PR in a while. Thanks a lot for adding this feature.
Regards.

mlpack-bot · 2020-11-26T16:56:02Z

Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in COPYRIGHT.txt and you haven't already, please feel free to push a change to this PR---or, if it gets merged before you can, feel free to open another PR.

In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍

zoq · 2020-11-27T12:01:16Z

src/mlpack/tests/cv_test.cpp

+
+  LinearRegression lr(X, Y);
+
+  //Theoretically Adjusted R squared should be equal 1


Suggested change

//Theoretically Adjusted R squared should be equal 1

// Theoretically Adjusted R squared should be equal 1.

Insert an extra space right after // and a stop at the end.

zoq · 2020-11-27T12:01:35Z

src/mlpack/tests/cv_test.cpp

+TEST_CASE("AdjR2ScoreTest", "[CVTest]")
+{
+  // Making two variables that define the linear function is
+  // f(x1, x2) = x1 + x2


Suggested change

// f(x1, x2) = x1 + x2

// f(x1, x2) = x1 + x2.

Add stop at the end to be consistent with the rest of the codebase.

zoq · 2020-11-27T12:01:56Z

src/mlpack/tests/cv_test.cpp

          == Approx(expectedR2).epsilon(1e-7));
 }

+/**
+ * Test the Adjusted R squared metric


Suggested change

* Test the Adjusted R squared metric

* Test the Adjusted R squared metric.

Add stop to be consistent with the rest of the codebase.

zoq · 2020-11-27T12:07:24Z

src/mlpack/core/cv/metrics/r2_score_impl.hpp

+  if (AdjustedR2)
+  {
+    // Handling undefined R2 Score when both denominator and numerator is 0.0.
+    if (residualSumSquared == 0.0)


Should this be residualSumSquared == 0.0 || totalSumSquared == 0.0? Because if totalSumSquared is 0 the output is undefined.

Dear @zoq Thanks for pointing it out. First of all I think, I have placed the // Handling undefined R2 Score... part inside the if (AdjustedR2) by mistake. It should be above and outside it.

Second, since I do not know what DBL_MIN means, I really don't know if residualSumSquared == 0.0 || totalSumSquared == 0.0 should be put in the if condition. If you could tell me the meaning of DBL_MIN, I might be able to help.

DBL_MIN is the smallest positive normal double, it's an alias for std::numeric_limits<double>::min - https://en.cppreference.com/w/cpp/types/numeric_limits/min, hope that helps.

Okay, so AdjustedR2 or R2 becomes undefined only if totalSumSquared is equal to zero. If residualSumSquared equals to zero then they are equal to 1. Hence what I would suggest is

if (totalSumSquared = 0)
return DBL_MIN;
else if (residualSumSquared = 0)
return 1;

I am checking totalSumSquared first just in case if totalSumSquared and residualSumSquared are both equal to zero, then, the answer should be still undefined.

I hope it makes sense.

It looks like this is correctly handled on lines 46 and 47 now. 👍

mlpack-bot · 2020-12-30T05:12:47Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

rcurtin · 2020-12-31T00:07:20Z

Thanks @shawnbrar! Sorry this sat for so long before merge. 👍

Added Adjusted R2

87dff6d

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Sep 17, 2020

kartikdutt18 reviewed Sep 17, 2020

View reviewed changes

src/mlpack/core/cv/metrics/r2_score_impl.hpp Outdated Show resolved Hide resolved

Update src/mlpack/core/cv/metrics/r2_score_impl.hpp

726a15e

Suggested change by kartikdutt18 Co-authored-by: kartikdutt18 <39593019+kartikdutt18@users.noreply.github.com>

kartikdutt18 reviewed Sep 19, 2020

View reviewed changes

zoq added c: methods t: added feature and removed s: unanswered s: unlabeled labels Sep 19, 2020

Update r2_score_impl.hpp

1668879

Added adjR2 argumnet

zoq reviewed Sep 24, 2020

View reviewed changes

shawnbrar and others added 2 commits September 25, 2020 06:58

Added Test for Adjusted R squared

786e5e9

Update src/mlpack/core/cv/metrics/r2_score_impl.hpp

cd9fbcd

Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de>

kartikdutt18 requested changes Oct 2, 2020

View reviewed changes

rcurtin reviewed Oct 6, 2020

View reviewed changes

Update cv_test.cpp

3ef6020

shawnbrar added 2 commits November 22, 2020 17:39

Added template parameter

85556de

Added Documentation

9a36925

shawnbrar requested a review from kartikdutt18 November 23, 2020 08:08

rcurtin reviewed Nov 25, 2020

View reviewed changes

Removed template specialization

e4df97a

Corrected parameter name

8346677

rcurtin reviewed Nov 25, 2020

View reviewed changes

shawnbrar and others added 5 commits November 26, 2020 08:10

Some styling changes

30a1dbc

Edited HISTORY.md

19eb5c9

Corrected Evaluate in HISTORY.md

60d4c2a

Corrected HISTORY.md

d810ca6

Merge branch 'master' into master

e1993dd

rcurtin approved these changes Nov 26, 2020

View reviewed changes

kartikdutt18 approved these changes Nov 26, 2020

View reviewed changes

mlpack-bot bot removed the s: needs review label Nov 26, 2020

shawnbrar mentioned this pull request Nov 27, 2020

Added Copyright statement #2742

Merged

zoq reviewed Nov 27, 2020

View reviewed changes

shawnbrar added 2 commits November 28, 2020 20:39

Changes by Zoq

dd0d16f

Changes by Zoq 2

c95a1f5

mlpack-bot bot added s: stale and removed s: stale labels Dec 30, 2020

Merge branch 'master' into master

a48e158

rcurtin merged commit 56ac3d2 into mlpack:master Dec 31, 2020

This was referenced Oct 14, 2022

Release version 4.0.0 #3285

Closed

Release version 4.0.0 #3286

Closed

rcurtin mentioned this pull request Oct 23, 2022

Release version 4.0.0 #3293

Merged

	y << 3 << 5 << 7 << 9 << 11 << 13;
	Y << 3 << 5 << 7 << 9 << 11 << 13;

	REQUIRE(std::abs(R2Score::Evaluate(lr, X, y) - expAdjR2)
	REQUIRE(std::abs(R2Score::Evaluate(lr, X, Y) - expAdjR2)


		template<bool adjustedR2> class R2Score;

		template<> class R2Score<false>

	* Run prediction and calculate the R squared or Adjusted R sauared error.
	* Run prediction and calculate the R squared or Adjusted R squared error.

	return (1 - ((1 - rsq) * ((data.n_cols - 1) / (data.n_cols - data.n_rows - 1))));
	return (1 - ((1 - rsq) * ((data.n_cols - 1) /
	(data.n_cols - data.n_rows - 1))));


		LinearRegression lr(X, Y);

		//Theoretically Adjusted R squared should be equal 1

	//Theoretically Adjusted R squared should be equal 1
	// Theoretically Adjusted R squared should be equal 1.

	* Test the Adjusted R squared metric
	* Test the Adjusted R squared metric.

Added Adjusted R2 #2624

Added Adjusted R2 #2624

Conversation

shawnbrar commented Sep 17, 2020

kartikdutt18 left a comment

Choose a reason for hiding this comment

kartikdutt18 left a comment

Choose a reason for hiding this comment

shawnbrar commented Sep 20, 2020

zoq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kartikdutt18 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin commented Oct 16, 2020

shawnbrar commented Oct 21, 2020

rcurtin commented Oct 27, 2020

shawnbrar commented Nov 19, 2020

rcurtin commented Nov 21, 2020

shawnbrar commented Nov 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcurtin left a comment

Choose a reason for hiding this comment

kartikdutt18 left a comment

Choose a reason for hiding this comment

mlpack-bot bot commented Nov 26, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlpack-bot bot commented Dec 30, 2020

rcurtin commented Dec 31, 2020