$R^2$ is very helpful for data scientists to understand the model performance but it is very intuitive for stakeholders. So we need to introduce error metrics to measure how well our model fits in the units of our target. 

The most common error metrics are **MAE - Mean Absolute Error** and **RMSE - Root Mean Squared Error**. 

- **MAE**:
$
\frac{\sum_i |y_i - \hat y_i|}{n}
$

- **MSE**:
$
\frac{\sum_i (y_i - \hat y_i)^2}{n}
$

- **RMSE**:
$
\sqrt \frac{\sum_i (y_i - \hat y_i)^2}{n}
$

**RMSE** is much more sensitive to outliers than **MAE** 


In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.api as sm

In [3]:
insurance_df = pd.read_csv('../Course Materials/Data/insurance.csv')
insurance_df.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [4]:
insurance_df.corr(numeric_only=True)

Unnamed: 0,age,bmi,children,charges
age,1.0,0.109272,0.042469,0.299008
bmi,0.109272,1.0,0.012759,0.198341
children,0.042469,0.012759,1.0,0.067998
charges,0.299008,0.198341,0.067998,1.0


In [5]:
features =['age', 'bmi', 'children']
X = sm.add_constant(insurance_df[features])
y = insurance_df['charges']
model = sm.OLS(y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,charges,R-squared:,0.12
Model:,OLS,Adj. R-squared:,0.118
Method:,Least Squares,F-statistic:,60.69
Date:,"Sun, 14 Dec 2025",Prob (F-statistic):,8.8e-37
Time:,16:38:29,Log-Likelihood:,-14392.0
No. Observations:,1338,AIC:,28790.0
Df Residuals:,1334,BIC:,28810.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-6916.2433,1757.480,-3.935,0.000,-1.04e+04,-3468.518
age,239.9945,22.289,10.767,0.000,196.269,283.720
bmi,332.0834,51.310,6.472,0.000,231.425,432.741
children,542.8647,258.241,2.102,0.036,36.261,1049.468

0,1,2,3
Omnibus:,325.395,Durbin-Watson:,2.012
Prob(Omnibus):,0.0,Jarque-Bera (JB):,603.372
Skew:,1.52,Prob(JB):,9.54e-132
Kurtosis:,4.255,Cond. No.,290.0


In [6]:
from sklearn.metrics import mean_absolute_error as mae

mae(y, model.predict())

9015.442199156727

In [11]:
from sklearn.metrics import mean_squared_error as rmse

np.sqrt(rmse(y, model.predict()))

np.float64(11355.317901125973)

# Adjusted R-squared
It is a metric that helps us decide whether or not a feature should be included. Normal R-squared would never decrease when we add a lot of features whether a feature is helpful or not. It will always increase. But when a feature is useless adjusted R-squared will not increase