## 4. Evaluating a machine learning model

Three ways to evaluate Scikit-Learn models/estimators: 

1. Estimator's built-in `score()` method
2. The `scoring` parameter
3. Problem-specific metric functions
    
You can read more about these here: https://scikit-learn.org/stable/modules/model_evaluation.html 

### 4.1 Evaluating a model with the `score` method

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Get the data (be sure to click "raw") - https://github.com/mrdbourke/zero-to-mastery-ml/blob/master/data/heart-disease.csv 
heart_disease = pd.read_csv("https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/data/heart-disease.csv")
heart_disease.head()

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Setup random seed
np.random.seed(42)

# Make the data
X = heart_disease.drop("target", axis=1)
y = heart_disease["target"]

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100)

# Fit the model to the data (training the machine learning model)
clf.fit(X_train, y_train)

In [None]:
#The highest value for the .score() method is 1.0  , the lowest is 0.0
clf.score(X_train, y_train)
#the default score method for a classfication algorithm is accuracy 
#of course we have to convert into percentage if we want accuracy into percentage so 1.0 is 100% accuracy so if 0.8 then 80% accuracy


In [None]:
#why is our model getting 1.0 on the training data well it's had exposure to all fo the the training features and all of the training labels and so if the model is powerful enought , it iwll achieve a perfect socre on the traing data because its able to split data in our case, it's binary zero and one it's able to predcit from all of the X train value to  predict perfectly all o fthe y train values  

In [None]:
clf.score(X_test , y_test)

In [None]:
#now it might be a differnet scenario for the test data it hasn't seen the test sample  learn patterns in data that we have existing to make predcitions, quality predictions on data that data we have't seen before
# the core is less on unseen data

In [None]:
#the training score is usually sometimes quite a bit higher than the test score, but they should be relatively close but the training will generally be higher than the testing score and if you ever get a perfect test score, like 100% accuracy or something like that , always be skeptical go back and check your data but if we made our model a little bit worse

Let's use the 'score()' on our regression problem

In [None]:
# Get California Housing dataset
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
housing

In [None]:

housing_df["target"] = housing["target"]
housing_df.head()

In [None]:
from sklearn.ensemble import RandomForestRegressor

np.random.seed(42)

# Create the data
X = housing_df.drop("target", axis=1)
y = housing_df["target"]

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create model instance
model = RandomForestRegressor(n_estimators=1000)

# Fit the model to the data
model.fit(X_train, y_train)

In [None]:
model.score(X_test , y_test) # if we want to have a look at what the metric here that's being used for the score method how might we do that shift + tab

In [None]:
y_test.mean() #if we go the mean of that mean, if every single sample in our predictions redicted that we'd get an r-squared value of zero

#we've seen how to quickly get a sniff of how our machine learning model is doing and evaluate it using the score method and that'll return a default evalutation metric depending on the problem we're working in regression it's returns ot the coeffiecient of determination and in classfication it returns the mean accuracy

## However, when you get furhter into a problem, it's likely you'll want to start using osme more powerful metrics to evaluate your mdoel's performance 

## 4.2 Evaluating a model using the scoring parmaeter

In [None]:
# Import cross_val_score from the model_selection module
from sklearn.model_selection import cross_val_score

# Import the RandomForestClassifier model class from the ensemble module
from sklearn.ensemble import RandomForestClassifier

# Setup random seed
np.random.seed(42)

# Split the data into X (features/data) and y (target/labels)
X = heart_disease.drop("target", axis=1)
y = heart_disease["target"]

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate the model (on the training set)
clf = RandomForestClassifier()

# Call the fit method on the model and pass it training data
clf.fit(X_train, y_train);

In [None]:
# Using score()
clf.score(X_test, y_test)

In [None]:
# Using cross_val_score()
cross_val_score(clf, X, y)

In [None]:
#cross val return the array and score returns a single number 

## so cross val score returns the array because what cross validation does ? 
## it does 5 different split  > cross validation trained on 5 differnet versions of training data and evaluated on 5 different versions of the test data
##so whats the purpose of this ? 
>well as you could imagine, if we're only training one model, it could be a lucky split , like say this 80% of rows say that hasd a whole bunch of information and the model was able to learn really well on these 80 rows, on these 80 patient records and then it got a really good score on this test set is that a tru reflection of how our model woruld understand the data or figure out the patterns in the data well not really because it just luck somehow easy patient record get and get a good score we are thinking our model is good as it is not
so that where corss validation comes into play it aims to provide a soultuin to not training on all the data and avoding getting those lucky scores on just a single split of data so it will create 5 differnt split so no matter what our model is going to e triaaning on all of the data and evaluate on all of the data
so it gives a number of 5 array
it is going to differetn 5 fold split you can do 100 fold also but the recommened is 5 
<img src = "./Screenshot (87).png" />

In [None]:
# Using cross_val_score()
cross_val_score(clf, X, y , cv = 10)

Since we set `cv=5` (5-fold cross-validation), we get back 5 different scores instead of 1.

Taking the mean of this array gives us a more in-depth idea of how our model is performing by converting the 5 scores into one.

Notice, the average `cross_val_score()` is slightly lower than single value returned by `score()`.

 and so what we do here is to figure out a more ideal performance metric or evaluation metric for our model is that we can take the average of this 5 score


In [None]:

np.random.seed(42)

# Single training and test split score
clf_single_score = clf.score(X_test, y_test)

# Take mean of 5-fold cross-validation
clf_cross_val_score = np.mean(cross_val_score(clf, X, y, cv=5))

#Compare the two
clf_single_score, clf_cross_val_score

## >>>>>>>> in this case, if you are asked to report the accuracy of your model even though it is lower you'd prefer the cross validation metric over the non cross validation metric 

In [None]:
cross_val_score(clf, X, y, cv=5, scoring=None) # default scoring

In [None]:
#Scoring parmaeter set to None by default
cross_val_score(clf, X, y, cv=5, scoring=None) # default scoring         #so that means when we have scoring set to none its' going to use the default evaluation metric for cross-validation on our classifier 
#i f none a single value if none the estimate is default sccorer if available is used 
# now this is why we know that this is accuracy because if the scoring parmaeter of cross val score  is none, it uses the default scoring parmaeter of our estimate in our case  
#so that means when we have scoring set to none it's going to use the default evaluation metric for corss validation on our classifier 
#Default socring parameter of classifier = mean accuracy

# clf.score()

 # so it's going to return the same values or it might be slightly differnet, right because we haven't se tup a seed in this cell  so these value are going to be differnet to the cross cell score we see up there if we'd run it in here, we woruld have seen simlar values  

In [None]:
#Default socring parameter of classifier = mean accuracy
clf.score()

In [None]:
#we gona have a look on a next few vidoes some other classification model evaluation metrices we can use  with cross val score

so why we use cross validation ?
well as we so picture corss validation aim to solve not training  on all the data we are creating 5 models having model train on all of the data and avoiding getting lucky score so training on a single split and we so that in action tat clf socre is slightly higher then the cross value average score

### 4.2.1 Classification model evaluation metrics

Four of the main evaluation metrics/methods you'll come across for classification models are:

1. Accuracy
2. Area under ROC curve
3. Confusion matrix
4. Classification report

Let's have a look at each of these. We'll bring down the classification code from above to go through some examples.

## Accuracy

In [None]:
heart_disease.head()

In [None]:
# Import cross_val_score from the model_selection module
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

import numpy as np
np.random.seed(42)

X = heart_disease.drop("target", axis=1)
y = heart_disease["target"]



clf = RandomForestClassifier(n_estimators=100)
coss_val_score = cross_val_score(clf , X , y , cv=5)


In [None]:
np.mean(coss_val_score)

In [None]:
# Accuracy as percentage
print(f"Heart Disease Classifier Cross-Validated  Accuracy: {np.mean(coss_val_score) * 100:.2f}%")

In [None]:
#accuracy is saying given a random sample that the model hasn't seen before how likely to predict the right label

In [None]:
#we just cover accuracy so you might be thinking why just not leve that as we start to go through other matrix here you will start to understand why  might be important to get few differnet evaluation metrix rather then just accuracy


#### Area Under Receiver Operating Characteristic (ROC) Curve (AUC) Curve
>now what does roc curve measure a rock curve is a comparision of a models true positive rate aka TPR versus a model's 
false postitive rate


If this one sounds like a mouthful, its because reading the full name is.

It's usually referred to as AUC for Area Under Curve and the curve they're talking about is the Receiver Operating Characteristic or ROC for short.

So if hear someone talking about AUC or ROC, they're probably talking about what follows.



ROC curves are a comparison of true postive rate (tpr) versus false positive rate (fpr).

For clarity:
* True positive = model predicts 1 when truth is 1
* False positive = model predicts 1 when truth is 0
* True negative = model predicts 0 when truth is 0
* False negative = model predicts 0 when truth is 1

Now we know this, let's see one. Scikit-Learn lets you calculate the information required for a ROC curve using the [`roc_curve`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) function.

## Area under the reciever operating characteristic curve (AUG/ROC)

In [None]:
#Create X_test ... etc
X_train , X_test , y_train , y_test = train_test_split(X , y , test_size=0.2)

In [None]:
from sklearn.metrics import roc_curve

#FIt the classifier
clf.fit(X_train , y_train)

# Make predictions with probabilities
y_probs = clf.predict_proba(X_test)

y_probs[:10], len(y_probs)


# ROC curves are a comparison of true postive rate (tpr) versus false positive rate (fpr)

In [None]:
# Keep the probabilites of the positive class only


y_probs_positive = y_probs[:, 1]
y_probs_positive[:10]

In [None]:
# Calculate fpr, tpr and thresholds
fpr, tpr, thresholds = roc_curve(y_test, y_probs_positive)

# Check the false positive rate
fpr  

In [None]:
#so looking at this doesn't make any sense but plotting it and seeing the roc curve acual roc curve gona make sense
#just see an example how we might create roc curve plotting function      

import matplotlib.pyplot as plt

def plot_roc_curve(fpr, tpr):
    """
    Plots a ROC curve given the false positve rate (fpr) and 
    true postive rate (tpr) of a classifier.
    """
    # Plot ROC curve
    plt.plot(fpr, tpr, color='orange', label='ROC')
    # Plot line with no predictive power (baseline)
    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--', label='Guessing')
    # Customize the plot
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend()
    plt.show()
    
plot_roc_curve(fpr, tpr) 

#if the false postive rate is 0.6 then the true postive rate will became 1.0
the maximum score we can get here is 1.0 up here this model here going from  corner to corner is guessing can you guess where most the ideal rock curve might end up if this is guessing and our model is doing far better then guessing by getting about 80% 85% something like that

In [19]:

#auc score are under curve what is auc score ?
#auc stand for area under curve it is the area under the curve  if you remove the guessing curve for a second curve the areea occupie by the curve is the auc are under curve
#it can goes up to 1.0
from sklearn.metrics import roc_auc_score

roc_auc_score(y_test, y_probs_positive)

NameError: name 'y_test' is not defined

In [20]:
#lets check the perfect roc curve  i have previously said it can go to 1.0
# the are under the curve will be 1.0 auc will be 1.0 we just disccus that 
# but so in reality, a perfect rock curve is very unlikely that means you've got a perfect model it's got no flase positives everything's a true postive
# Plot perfect ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_test)
plot_roc_curve(fpr, tpr)
#themain details ehre that a rock curve is predicting is  a ture positive rate versus a false postive rate
#the main metric you can use to boil it down rathen than just being a curve , you can use the auc score

NameError: name 'roc_curve' is not defined

In [21]:
# Perfect ROC AUC score
roc_auc_score(y_test, y_test)

NameError: name 'y_test' is not defined

In reality, a perfect ROC curve is unlikely.

#### Confusion matrix
The next way to evaluate a classification model is by using a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix). 

A confusion matrix is a quick way to compare the labels a model predicts and the actual labels it was supposed to predict. In essence, giving you an idea of where the model is getting confused.

In [22]:
from sklearn.metrics import confusion_matrix

y_preds = clf.predict(X_test)

confusion_matrix(y_test, y_preds)

NameError: name 'clf' is not defined

Again, this is probably easier visualized.

One way to do it is with `pd.crosstab()`.

In [23]:
#visualize confusin matrix with pd.crosstab()
pd.crosstab(y_test, 
            y_preds, 
            rownames=["Actual Label"], 
            colnames=["Predicted Label"])

NameError: name 'y_test' is not defined

In [24]:
#our model has predicted 0 and 1 are the predicted lable #the row are the actual lables and the columns are the predictive labels
#in this case actual lable is o and pri0dictive label is 0 we have 24 examples and where the preditive labels is 1 and the acutal labels is 1 we have 24 examples


In [25]:
24 + 5 + 3 + 29

61

In [26]:
len(y_preds) #lets check how many prediction we have done and why 61 because we have 61 test

NameError: name 'y_preds' is not defined

In [27]:
# why 61 because there is 61 examples on test 
len(y_test) #we have made 61 prediction because we have 61 example on test set

NameError: name 'y_test' is not defined

In [28]:
#other thing is that in actual label 2 nd column we can see 0 it is because the actual label of 5 data set is 0 but it predicts 1 therefore
#a$#the acutal label is 1 but it predict 0 in 3 data set

## hence confusin metrix came here these example there our model is being confusion  thes example our model is getting confused to prediciting 0 with actual  label is 1 or prdicting 1 where the actual is 0

<img src = "./Screenshot (88).png" />

Make our confusion matrix more visual with  seaborn's heatmap
sea born is a visualization library that is built on the top of matplotlib and it is prettry relatively easy to use

In [29]:
#how to install a conda package into the current environment from a jupyter notebook 
import sys #let us acces our computer gives us to acces system
!conda install --yes --prefix (sys.prefix) seaborn

ERROR conda.notices.fetch:get_channel_notice_response(63): Request error <HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/r/notices.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000018E42D9F550>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))> for channel: defaults url: https://repo.anaconda.com/pkgs/r/notices.json
ERROR conda.notices.fetch:get_channel_notice_response(63): Request error <HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/notices.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000018E42D9F850>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))> for channel: defaults url: https://repo.anaconda.com/pkgs/main/notices.json
ERROR conda.notices.fetch:get_channel_notice_response(63): Request error <HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max r

Retrieving notices: ...working... done


In [30]:
#Make our confusion matrix more visual with Seaborn's heatmap()

import seaborn as  sns

#Set the font scale
sns.set(font_scale=1.5)

#create a confusion matrix
conf_mat = confusion_matrix(y_test , y_preds)

#Flot it using Seaborn
sns.heatmap(conf_mat);

NameError: name 'y_test' is not defined

In reality, a perfect ROC curve is unlikely.

#### Confusion matrix
The next way to evaluate a classification model is by using a [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix). 

A confusion matrix is a quick way to compare the labels a model predicts and the actual labels it was supposed to predict. In essence, giving you an idea of where the model is getting confused.

In [None]:

from sklearn.metrics import confusion_matrix

y_preds = clf.predict(X_test)

confusion_matrix(y_test, y_preds)

In [None]:
pd.crosstab(y_test, 
            y_preds, 
            rownames=["Actual Label"], 
            colnames=["Predicted Label"])

####new version this is a new version 



## creating a confusion matrix using scikit-learn
to use the new methods of creating a confusion matrix with scikit-learn you will need sklearnversion 1.0+


In [None]:
import sklearn
sklearn.__version__

Creating a confusion matrix using Scikit-Learn¶
Scikit-Learn has multiple different implementations of plotting confusion matrices:

sklearn.metrics.ConfusionMatrixDisplay.from_estimator(estimator, X, y) - this takes a fitted estimator (like our clf model), features (X) and labels (y), it then uses the trained estimator to make predictions on X and compares the predictions to y by displaying a confusion matrix.
sklearn.metrics.ConfusionMatrixDisplay.from_predictions(y_true, y_pred) - this takes truth labels and predicted labels and compares them by displaying a confusion matrix.
Note: Both of these methods/classes require Scikit-Learn 1.0+. To check your version of Scikit-Learn run:

import sklearn
sklearn.__version__
If you don't have 1.0+, you can upgrade at: https://scikit-learn.org/stable/install.html

In [None]:
#need trained or fitted estimator
clf

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay

ConfusionMatrixDisplay.from_estimator(estimator=clf, X=X, y=y);

         #   above version takes y true and y predic   just like corss tab before you have predcition ready to go but in this you don't have to ready for prediction in this confusion matrxi you don't need  

In [None]:
# Plot confusion matrix from predictions
ConfusionMatrixDisplay.from_predictions(y_true=y_test, 
                                        y_pred=y_preds);

## Classification Report 

#### Classification report

The final major metric you should consider when evaluating a classification model is a classification report.

A classification report is more so a collection of metrics rather than a single one.

You can create a classification report using Scikit-Learn's [`classification_report()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html) function.

Let's see one.

In [None]:
from sklearn.metrics import classification_report

#again comparing true lable of the data vs the prediction that our model has made
from sklearn.metrics import classification_report
print(classification_report(y_test, y_preds))

<img src = "./Screenshot (90).png" />

In [None]:

#when i should use each of these 
 #why not only use accuracy well lets have an example let's see when other metrics come into play and maybe you shouldn't just use accuracy because this is a trap that i got caught in right when i first started building classification model my model is getting 99% accuracy it must be a great model  lets see a senarion 

# Where precision and recall become valuable
#in fact all the metrics in our classification report becomes valuable here
disease_true = np.zeros(10000)
disease_true[0] = 1 # only one case #only one postive case  and  #1 postitive case in 10000

disease_preds = np.zeros(10000) # every prediction is 0 #it means the model pridcits every case as 0 #and we build a model and our model predcts that every single case is zero so it misses the one pridcition

pd.DataFrame(classification_report(disease_true, 
                                   disease_preds, 
                                   output_dict=True,
                                   zero_division=0))


#so this is a prime example , righ t, of where you want to use another metric othe than accuracy is when you have a very large class imbalance so in our case we have a  massive class imbalacne because in our orighinal data set, does these equals true ony one example where the label is 1 and other is 0 we build a model that pridict zero for every case because it just it's only one smaple so it's really hard to learn that there's a pattern there for this one particualr case so what happesn if if we were to measure just accuracy on our model that is predicted zeor for everything it comes out with an accuracy of 0.99 or in other words, 99% and so ask yourself, althought the model achiever 99.99% accuracy, is the model still userful right that why why we look to other matrics

### 4.2.2 Regression model evaluation metrics

Similar to classification, there are [several metrics you can use to evaluate your regression models](https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics).

We'll check out the following.

1. **R^2 (pronounced r-squared) or coefficient of determination** - Compares your models predictions to the mean of the targets. Values can range from negative infinity (a very poor model) to 1. For example, if all your model does is predict the mean of the targets, its R^2 value would be 0. And if your model perfectly predicts a range of numbers it's R^2 value would be 1. 
2. **Mean absolute error (MAE)** - The average of the absolute differences between predictions and actual values. It gives you an idea of how wrong your predictions were.
3. **Mean squared error (MSE)** - The average squared differences between predictions and actual values. Squaring the errors removes negative errors. It also amplifies outliers (samples which have larger errors).

Let's see them in action. First, we'll bring down our regression model code again.

In [None]:
# Import the RandomForestRegressor model class from the ensemble module
from sklearn.ensemble import RandomForestRegressor

# Setup random seed
np.random.seed(42)

# Create the data
X = housing_df.drop("target", axis=1)
y = housing_df["target"]

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Institate and fit the model (on the training set)
model = RandomForestRegressor()
model.fit(X_train, y_train);

In [None]:
**R^2 Score (coefficient of determination)**

Once you've got a trained regression model, the default evaluation metric in the `score()` function is R^2.

In [None]:
# Calculate the models R^2 score
model.score(X_test, y_test)