# R squared Intuition
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination.

Before you look at the statistical measures for goodness-of-fit, you should [check the residual plots](https://blog.minitab.com/blog/adventures-in-statistics-2/why-you-need-to-check-your-residual-plots-for-regression-analysis). Residual plots can reveal unwanted residual patterns that indicate biased results more effectively than numbers. When your residual plots pass muster, you can trust your numerical results and check the goodness-of-fit statistics.

![](rsq.png)

> R-squared = Explained variation / Total variation

R-squared is always between 0 and 100%:

- 0% indicates that the model explains none of the variability of the response data around its mean.
- 100% indicates that the model explains all the variability of the response data around its mean.

## Key Limitations of R-squared
- R-squared cannot determine whether the coefficient estimates and predictions are biased, which is why you must assess the residual plots.
- R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data!
- [The R-squared in your output is a biased estimate of the population R-squared.](https://blog.minitab.com/blog/adventures-in-statistics-2/r-squared-shrinkage-and-power-and-sample-size-guidelines-for-regression-analysis)

#### Reference: [Regression Analysis: How Do I Interpret R-squared and Assess the Goodness-of-Fit?](https://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit)


***


# Adjusted $R^2$


### Some Problems with R-squared

R-squared has additional problems that the adjusted R-squared and predicted R-squared are designed to address.

__Problem 1:__ Every time you add a predictor to a model, the R-squared increases, even if due to chance alone. It never decreases. Consequently, a model with more terms may appear to have a better fit simply because it has more terms.

__Problem 2:__ If a model has too many predictors and higher order polynomials, it begins to model the random noise in the data. This condition is known as overfitting the model and it produces misleadingly high R-squared values and a lessened ability to make predictions.


For a linear regression model every additional predictor variable tends to minimize the error of the model. As a result the $R^2$ value will never decreases for any additional number of predictor variables being included in model.


__Adjusted $R^2$__ takes into account the number of predictor variables included in the regression model. Unlike $R^2$, adjusted $R^2$ can decrease with increase in  number of predictors.

$$ R_a^2 = 1 - \frac{\frac{SSE}{n-k-1}}{\frac{SST}{n-1}} $$
where,  
n - number of observations  
k - number of predictor variables in method

> __SST = SSR + SSE__.

> $ \sum(y_{i} - \bar{y})^{2} = \sum(\widehat{y_{i}} - \bar{y})^{2} + \sum(y_{i} - \widehat{y_{i}})^{2}$

In addition to $R^2$ and adjusted $R^2$ values, __F-test__ can also be used to suggest that model is significant

It can also be said that adj $R^2$ has a penalization factor __(n-k-1)__ so as we add more regresssors the adj $R^2$ decreases. 

The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance. The adjusted R-squared can be negative, but it’s usually not.  It is always lower than the R-squared.

In the simplified Best Subsets Regression output below, you can see where the adjusted R-squared peaks, and then declines. Meanwhile, the R-squared continues to increase.

![](jku.PNG)

__Reference__: [Multiple Regression Analysis: Use Adjusted R-Squared and Predicted R-Squared to Include the Correct Number of Variables](https://blog.minitab.com/blog/adventures-in-statistics-2/multiple-regession-analysis-use-adjusted-r-squared-and-predicted-r-squared-to-include-the-correct-number-of-variables)





***

1. What are the pros and cons of each model ?

Please find here a cheat-sheet that gives you all the pros and the cons of each regression model.

2. How do I know which model to choose for my problem ?

First, you need to figure out whether your problem is linear or non linear. You will learn how to do that in Part 10 - Model Selection. Then: If your problem is linear, you should go for Simple Linear Regression if you only have one feature, and Multiple Linear Regression if you have several features.

If your problem is non linear, you should go for Polynomial Regression, SVR, Decision Tree or Random Forest. Then which one should you choose among these four ? That you will learn in Part 10 - Model Selection. The method consists of using a very relevant technique that evaluates your models performance, called k-Fold Cross Validation, and then picking the model that shows the best results. Feel free to jump directly to Part 10 if you already want to learn how to do that.

3. How can I improve each of these models ?

In Part 10 - Model Selection, you will find the second section dedicated to Parameter Tuning, that will allow you to improve the performance of your models, by tuning them. You probably already noticed that each model is composed of two types of parameters: 
the parameters that are learnt, for example the coefficients in Linear Regression, the hyperparameters.

The hyperparameters are the parameters that are not learnt and that are fixed values inside the model equations. For example, the regularization parameter lambda or the penalty parameter C are hyperparameters. So far we used the default value of these hyperparameters, and we haven't searched for their optimal value so that your model reaches even higher performance. Finding their optimal value is exactly what Parameter Tuning is about. So for those of you already interested in improving your model performance and doing some parameter tuning, feel free to jump directly to Part 10 - Model Selection.