# Model Evaluation

A key part of machine learning is evaluating the effectiveness of your model. 

In classification 
models, we often use a confusion matrix and an ROC curve to evaluate the model.
This gives us a better understanding of things like the true positive rate, false positive rate, and accuracy of the model.

In regression models, we often use the mean squared error (MSE) to evaluate the model. This gives us a better understanding of how well the model is predicting the target variable.

In [10]:
# Lets start with a simple example of a regression model and how to evaluate it using the mean squared error (MSE).

import numpy as np

np.random.seed(0)
X = np.random.rand(100, 1)
y = 2.0 + 3.0 * X + np.random.randn(100, 1)

In [11]:
# Split the data into training and test sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [15]:
# Fit a linear regression model

from sklearn.linear_model import LinearRegression

model = LinearRegression()

model.fit(X_train, y_train)

In [16]:
# Make predictions

y_pred = model.predict(X_test)

In [17]:
# Evaluate the model

from sklearn.metrics import mean_squared_error  

mse = mean_squared_error(y_test, y_pred)

print(f'MSE: {mse}')

MSE: 1.0434333815695176


### So what does this MSE value mean?

- The MSE value is the average squared difference between the actual and predicted values.
- The lower the MSE value, the better the model is at predicting the target variable.
- In this case, the MSE value is 1.043, which is relatively low, indicating that the model is doing a good job of predicting the target variable.
- We need to keep in mind that the MSE value is relative to the scale of the target variable, so it's important to compare it to other models or to a baseline model to get a sense of how well the model is performing.

In [66]:
# Now let's look at an example of a classification model and how to evaluate it using a confusion matrix and an ROC curve.

# Generate some random data which we will use to predict if a point is a 0 or 1

np.random.seed(0)
X = np.random.rand(1000, 3)
y = np.random.randint(0, 2, 1000)

In [67]:
y.shape

(1000,)

In [68]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [69]:
# Fit a decision tree classifier

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()

In [70]:
model.fit(X_train, y_train)

In [71]:
# Make predictions

y_pred = model.predict(X_test)

In [72]:
# Evaluate the model using a confusion matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

cm

array([[84, 71],
       [75, 70]])

In [73]:
# Accuracy:

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

# True positive rate:

from sklearn.metrics import recall_score

recall = recall_score(y_test, y_pred)
print(f'Recall: {recall}')

from sklearn.metrics import precision_score

precision = precision_score(y_test, y_pred)
print(f'Precision: {precision}')

Accuracy: 0.5133333333333333
Recall: 0.4827586206896552
Precision: 0.49645390070921985
