## Computing the MAE of a Model
The goal of this exercise is to find the score and loss of a model using the same dataset as Exercise 6.02, Computing the R2 Score of a Linear Regression Model.

In [8]:
# import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error

In [2]:
_headers = ['CIC0', 'SM1', 'GATS1i', 'NdsCH', 'Ndssc','MLOGP', 'response']
df = pd.read_csv('https://raw.githubusercontent.com/'\
                 'PacktWorkshops/The-Data-Science-Workshop'\
                 '/master/Chapter06/Dataset/qsar_fish_toxicity.csv', names=_headers, sep=';')

In [3]:
df.head()

Unnamed: 0,CIC0,SM1,GATS1i,NdsCH,Ndssc,MLOGP,response
0,3.26,0.829,1.676,0,1,1.453,3.77
1,2.189,0.58,0.863,0,0,1.348,3.115
2,2.125,0.638,0.831,0,0,1.348,3.531
3,3.027,0.331,1.472,1,0,1.807,3.51
4,2.094,0.827,0.86,0,0,1.886,5.39


Split the data into features and labels and into training and evaluation sets:

In [5]:
features = df.drop('response', axis=1).values 
labels = df[['response']].values
X_train, X_eval, y_train, y_eval = train_test_split(features, labels, test_size=0.2, random_state=0) #train=80%, eval=20%
X_val, X_test, y_val, y_test = train_test_split(X_eval, y_eval, random_state=0)

In [6]:
# create a simple Linear Regression model
model = LinearRegression()
# train the model
model.fit(X_train, y_train)

LinearRegression()

Now predict the values of our validation dataset:

In [7]:
# let's use our model to predict on our validation 
y_pred = model.predict(X_val)

At this point, your model is ready to use. You make use of the predict method to predict on your data. In this case, you are passing X_val as a parameter to the function. Recall that X_val is your validation dataset. The result is assigned to a variable called y_pred and will be used in the next step to compute the MAE of the model.

In [9]:
# compute MAE (Mean Absolute Error)
mae = mean_absolute_error(y_val, y_pred)
print('MAE:{}'.format(mae))

MAE:0.7243440846447939


In this step, you compute the MAE of the model by using the mean_absolute_error function and passing in y_val and y_pred. y_val is the label that was provided with your training data, and y_pred is the prediction from the model.

Both y_val and y_pred are a numpy array that contains the same number of elements. The mean_absolute_error function subtracts y_pred from y_val. This results in a new array. The elements in the resulting array have the absolute function applied to them so that all negative signs are dropped.

In [10]:
# compute R2 score
r2 = model.score(X_val, y_val)
print('R^2 score: {}'.format(r2))

R^2 score: 0.5623861754188691


In this exercise, we have calculated the MAE, which is a significant parameter when it comes to evaluating models.

You will now train a second model and compare its R2 score and MAE to the first model to evaluate which is a better performing model.