# Lesson 8: Algorithm Evaluation Metrics

## Regression

### USE RMSE and RSquared metrics on a regression problem

In [1]:
from sklearn.datasets import load_diabetes # Regression dataset
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np

In [2]:
# Load the wine dataset
diabetes_data = load_diabetes()

# Convert the dataset to a pandas DataFrame
df = pd.DataFrame(data=diabetes_data.data, columns=diabetes_data.feature_names)

In [10]:
# Display the first few rows of the DataFrame
df.head(10)

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6
0,0.038076,0.05068,0.061696,0.021872,-0.044223,-0.034821,-0.043401,-0.002592,0.019907,-0.017646
1,-0.001882,-0.044642,-0.051474,-0.026328,-0.008449,-0.019163,0.074412,-0.039493,-0.068332,-0.092204
2,0.085299,0.05068,0.044451,-0.00567,-0.045599,-0.034194,-0.032356,-0.002592,0.002861,-0.02593
3,-0.089063,-0.044642,-0.011595,-0.036656,0.012191,0.024991,-0.036038,0.034309,0.022688,-0.009362
4,0.005383,-0.044642,-0.036385,0.021872,0.003935,0.015596,0.008142,-0.002592,-0.031988,-0.046641
5,-0.092695,-0.044642,-0.040696,-0.019442,-0.068991,-0.079288,0.041277,-0.076395,-0.041176,-0.096346
6,-0.045472,0.05068,-0.047163,-0.015999,-0.040096,-0.0248,0.000779,-0.039493,-0.062917,-0.038357
7,0.063504,0.05068,-0.001895,0.066629,0.09062,0.108914,0.022869,0.017703,-0.035816,0.003064
8,0.041708,0.05068,0.061696,-0.040099,-0.013953,0.006202,-0.028674,-0.002592,-0.01496,0.011349
9,-0.0709,-0.044642,0.039062,-0.033213,-0.012577,-0.034508,-0.024993,-0.002592,0.067737,-0.013504


In [11]:
# Select the features and target labels
X = diabetes_data.data
y = diabetes_data.target

# Print the shape of X and Y
print("Shape of X:", X.shape)
print("Shape of Y:", y.shape)

Shape of X: (442, 10)
Shape of Y: (442,)


In [12]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

In [13]:
# Calculate RMSE (Root Mean Squared Error)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

# Calculate R-squared
r_squared = r2_score(y_test, y_pred)

print("Root Mean Squared Error (RMSE):", rmse)
print("R-squared:", r_squared)

Root Mean Squared Error (RMSE): 53.85344583676593
R-squared: 0.4526027629719195


* **RMSE (Root Mean Squared Error):** Measures the average deviation of the predictions made by a model from the actual values in the dataset. In other words, it provides a measure of the average error of the model's predictions. The lower values of RMSE indicate better model performance.

* **R-squared (Coefficient of Determination):** Measures the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features) in the model. It indicates the goodness of fit of the model to the data. R-squared value lies between 0 and 1 so a higher R-squared value indicates a better fit of the model to the data.