# Module 2: Advanced Techniques in Scikit-Learn

## Section 6: Model Evaluation and Selection

### Part 2: R-squared (Coefficient of Determination)

In this part, we will explore the concept of R-squared (coefficient of determination), a common evaluation metric used to assess the goodness of fit for regression models. R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables. Understanding R-squared is crucial for evaluating the overall performance of regression models. Let's dive in!

### 2.1 Understanding R-squared

R-squared, also known as the coefficient of determination, is a statistical measure that represents the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features). It indicates the goodness of fit of the regression model to the observed data.

The formula for R-squared is as follows:

$R^2 = 1- \frac{\sum\limits _{i=1} ^{n}(y_{true,i}-y_{pred,i})^2}{\sum\limits _{i=1} ^{n}(y_{true,i}-\hat{y}_{true})^2}$

Where:

- $n$ is the number of samples in the dataset.
- ${y_true,i}$ is the true target value of the i-th sample.
- $y_{pred,i}$ is the predicted target value of the i-th sample.
ˉ $\hat{y}_{true}$ is the mean of the true target values.

R-squared ranges from 0 to 1, where:

- $R^2 = 1$ indicates that the regression model perfectly fits the data.
- $R^2 = 0$ indicates that the model does not explain any variance in the target variable beyond the mean.
- $R^2 < 0$ suggests that the model performs worse than a horizontal line (the mean of the target variable).

### 2.2 Interpreting R-squared

R-squared provides an indication of how well the regression model captures the variability of the target variable. A higher R-squared value indicates that a larger proportion of the variance in the target variable is explained by the model's predictors, suggesting a better fit. However, R-squared should be interpreted in conjunction with other evaluation metrics to assess the overall performance of the model.

Using R-squared in Scikit-Learn
Scikit-Learn provides the r2_score function to calculate R-squared. Here's an example of how to use it:

```python
from sklearn.metrics import r2_score

# Assuming y_true and y_pred are the true and predicted target values, respectively
r2 = r2_score(y_true, y_pred)
```

### 2.3 Summary

R-squared (coefficient of determination) is a crucial evaluation metric for regression models. It measures the proportion of the variance in the target variable that is predictable from the features. A higher R-squared value indicates a better fit of the model to the data. Scikit-Learn's r2_score function allows easy computation of R-squared for regression tasks.

In the next part, we will explore other evaluation metrics commonly used in regression and classification tasks.

Feel free to practice calculating R-squared using Scikit-Learn's r2_score function with different regression models. Compare the R-squared values to assess the goodness of fit of the models on your dataset.