## Seung Jun Choi in Urban Information Lab
### Model Evaluation

In [1]:
import sys
import numpy as np

print(f"Python version: {sys.version}")
print(f"NumPy version: {np.__version__}")

Python version: 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]
NumPy version: 2.0.2


In [2]:
# If you haven't downloaded sklearn...
!pip install scikit-learn



*italicized text*![image.png](attachment:image.png)

In [3]:
import sklearn

print(sklearn.__version__)

1.6.1


### How to evaluate your models?

![image.png](attachment:image.png)

### Regression Model Evaluation is easier than Other Model Evaluation (It is also more intuitive)

Because all you have to do is calculate the loss of model's prediction to real or validation data

![image.png](attachment:image.png)

### Here we will focus more on Classification Model Evaluation

#### Import BaseEstimator from sklearn.
#### BaseEstimator help you create Customized dummy Classifier Classes
#### We use fit() to train model; however, in BaseEstimator fit doesn't really mean anything

In [4]:
import numpy as np
from sklearn.base import BaseEstimator

class MyDummyClassifier(BaseEstimator):
     #The model here is just prediction the sex; which is classified into binary dummy (1 vs 0)
    def fit(self, X , y=None):
        pass


    #If the value of the sex feature is 1 , it would return 0 if not it would return 1
    def predict(self, X):
        pred = np.zeros( ( X.shape[0], 1 ))
        for i in range (X.shape[0]) :
            if X['Sex'].iloc[i] == 1:
                pred[i] = 0
            else :
                pred[i] = 1

        return pred


We will use the created BaseEstimator for modelling classifier model

Here I will be using titanic sample data which is downloadable in Kaggle

# Preprocessing Data

In [5]:
#Install pandas incase you do not have it
!pip install pandas



In [6]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Prepross null values
def fillna(df):
    df = df.fillna({
    'Age': df['Age'].mean(),
    'Cabin': 'N',
    'Embarked': 'N',
    'Fare': 0
    })
    return df

# delete features not in need
def drop_features(df):
    df.drop(['PassengerId','Name','Ticket'],axis=1,inplace=True)
    return df

# labeling the strings to numeric dummies
def format_features(df):
    df['Cabin'] = df['Cabin'].str[:1]
    features = ['Cabin','Sex','Embarked']
    for feature in features:
        le = LabelEncoder()
        le = le.fit(df[feature])
        df[feature] = le.transform(df[feature])
    return df

# calling out the features transformation functions that I have previously defined
def transform_features(df):
    df = fillna(df)
    df = drop_features(df)
    df = format_features(df)
    return df

# 1. First let's look at Accuracy of your model

In [7]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# calling out the data, preprocessing data, and training and spliting your model
titanic_df = pd.read_csv('./train.csv')
y_titanic_df = titanic_df['Survived']
X_titanic_df= titanic_df.drop('Survived', axis=1)
X_titanic_df = transform_features(X_titanic_df)
X_train, X_test, y_train, y_test=train_test_split(X_titanic_df, y_titanic_df, \
                                                  test_size=0.2, random_state=0)

# using the dummyclassifier model I have made above
myclf = MyDummyClassifier()
myclf.fit(X_train ,y_train)

mypredictions = myclf.predict(X_test)
print('Dummy Classifier Accuaracy is" : {0:.4f}'.format(accuracy_score(y_test , mypredictions)))

FileNotFoundError: [Errno 2] No such file or directory: './train.csv'

## What is Accuracy?

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### There other indicators we use

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### Chossing the evaluation score depends on your research objective. For instance if you modelling cancer classification risk of doing false negative is higher than doing false positive; meaning model identifying even though you have cancer and not capturing it is riskier

![image.png](attachment:image.png)

### In this case Recall is more significant indicator than precision (Positive > Negative)

![image.png](attachment:image.png)

## Decent model is having higher score in both recall and precision score; however they are actually in tradeoff

# 2. Let's get both precision & recall

### I'm going to define some functions for convenience

In [None]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.base import BaseEstimator
from sklearn.metrics import accuracy_score
import numpy as np
import pandas as pd

class MyFakeClassifier(BaseEstimator):
    def fit(self,X,y):
        pass

    def predict(self,X):
        return np.zeros( (len(X), 1) , dtype=bool)

digits = load_digits()

print(digits.data)
print("### digits.data.shape:", digits.data.shape)
print(digits.target)
print("### digits.target.shape:", digits.target.shape)

In [None]:
y = (digits.target == 7).astype(int)
X_train, X_test, y_train, y_test = train_test_split( digits.data, y, random_state=11)
fakeclf = MyFakeClassifier()
fakeclf.fit(X_train , y_train)
fakepred = fakeclf.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score, precision_score , recall_score

print("Precision:", precision_score(y_test, fakepred))
print("Recall:", recall_score(y_test, fakepred))

## Below is the code to get all confusion matrix (accuracy / precision / recall )

In [None]:
from sklearn.metrics import accuracy_score, precision_score , recall_score , confusion_matrix

def get_clf_eval(y_test , pred):
    confusion = confusion_matrix( y_test, pred)
    accuracy = accuracy_score(y_test , pred)
    precision = precision_score(y_test , pred)
    recall = recall_score(y_test , pred)
    print('Confusion Matrix')
    print(confusion)
    print('Accuracy: {0:.4f}, Precision: {1:.4f}, Recall: {2:.4f}'.format(accuracy , precision ,recall))

#### I'm going to use titanic data from kaggle

In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

# Because dependent variable is binary; I'm using logisticregression
from sklearn.linear_model import LogisticRegression

# reload, preprocess
titanic_df = pd.read_csv('./train.csv')
y_titanic_df = titanic_df['Survived']
X_titanic_df= titanic_df.drop('Survived', axis=1)
X_titanic_df = transform_features(X_titanic_df)

X_train, X_test, y_train, y_test = train_test_split(X_titanic_df, y_titanic_df, \
                                                    test_size=0.20, random_state=11)

lr_clf = LogisticRegression()

lr_clf.fit(X_train , y_train)
pred = lr_clf.predict(X_test)
get_clf_eval(y_test , pred)

In [None]:
#install matplotlib incase you do not have it
!pip install matplotlib


And as I said the precision and recall score are in trade-off

In [None]:
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline

def precision_recall_curve_plot(y_test , pred_proba_c1):
    # get precision and recall ndarray per thresholds
    precisions, recalls, thresholds = precision_recall_curve( y_test, pred_proba_c1)

    # Set X-axis as thresholds; Y-axis for Precision & Recall
    plt.figure(figsize=(8,6))
    threshold_boundary = thresholds.shape[0]
    plt.plot(thresholds, precisions[0:threshold_boundary], linestyle='--', label='precision')
    plt.plot(thresholds, recalls[0:threshold_boundary],label='recall')

    # Plot X-axis in 0.1 interval
    start, end = plt.xlim()
    plt.xticks(np.round(np.arange(start, end, 0.1),2))

    # xSetting label, legend, and grid
    plt.xlabel('Threshold value'); plt.ylabel('Precision and Recall value')
    plt.legend(); plt.grid()
    plt.show()

precision_recall_curve_plot( y_test, lr_clf.predict_proba(X_test)[:, 1] )


## 1) Method to incrase your precision
#### Strengthen your positive index. For instance, only diagnose cancer when you are 100% sure (older than 80, obesitiy, cancer cell size in 99% percentile). Precision equation is TP / (TP + FP). So if you get only one positive patient the score became 100%

## 2) Method to increase your recall
#### Diagnoe every patients as positive beacuse TN is not included in the matrix and FN is 0. Either way your score become 100%

## But of course everything should be balanced...

# 3. F1 Score: Mixture of Precision & Recall

In [None]:
from sklearn.metrics import f1_score
f1 = f1_score(y_test , pred)
print('F1 Score: {0:.4f}'.format(f1))

![image.png](attachment:image.png)

In [None]:
def get_clf_eval(y_test , pred):
    confusion = confusion_matrix( y_test, pred)
    accuracy = accuracy_score(y_test , pred)
    precision = precision_score(y_test , pred)
    recall = recall_score(y_test , pred)
    # Adding F1 Score
    f1 = f1_score(y_test,pred)
    print('Confusion Matrix')
    print(confusion)
    # f1 score print
    print('Accuracy: {0:.4f}, Precision: {1:.4f}, Recall: {2:.4f}, F1:{3:.4f}'.format(accuracy, precision, recall, f1))

from sklearn.preprocessing import Binarizer

def get_eval_by_threshold(y_test , pred_proba_c1, thresholds):
    # thresholds
    for custom_threshold in thresholds:
        binarizer = Binarizer(threshold=custom_threshold).fit(pred_proba_c1)
        custom_predict = binarizer.transform(pred_proba_c1)
        print('thresholds:',custom_threshold)
        get_clf_eval(y_test , custom_predict)

thresholds = [0.4 , 0.45 , 0.50 , 0.55 , 0.60]
pred_proba = lr_clf.predict_proba(X_test)
get_eval_by_threshold(y_test, pred_proba[:,1].reshape(-1,1), thresholds)


#### thresholds 0.6; the F1 score is at the highest; however, do note that recall score is lacking behind than other thresholds

## Lastly. ROC Curve

![image.png](attachment:image.png)

In [None]:
from sklearn.metrics import roc_curve

def roc_curve_plot(y_test , pred_proba_c1):
    # calculate TPR, FPR per thresholds
    fprs , tprs , thresholds = roc_curve(y_test ,pred_proba_c1)

    # Plot data
    plt.plot(fprs , tprs, label='ROC')
    # Linear line
    plt.plot([0, 1], [0, 1], 'k--', label='Random')

    # legend
    start, end = plt.xlim()
    plt.xticks(np.round(np.arange(start, end, 0.1),2))
    plt.xlim(0,1); plt.ylim(0,1)
    plt.xlabel('FPR( 1 - Sensitivity )'); plt.ylabel('TPR( Recall )')
    plt.legend()
    plt.show()

roc_curve_plot(y_test, lr_clf.predict_proba(X_test)[:, 1] )


## You can actually calcuate the area (AUC)

![image.png](attachment:image.png)

In [None]:
from sklearn.metrics import roc_auc_score

pred_proba = lr_clf.predict_proba(X_test)[:, 1]
roc_score = roc_auc_score(y_test, pred_proba)
print('ROC AUC value: {0:.4f}'.format(roc_score))

### There is also something called PR Curve

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## How should I calculate PR AUC?

### Well, I'll leave it as a homework

![image.png](attachment:image.png)