# Credit Card Fraud Detection: Using Decision Tree



In [2]:

# Importing Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import SGDClassifier

# from mlxtend.plotting import plot_learning_curves
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, accuracy_score, classification_report
from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer, matthews_corrcoef

import warnings
warnings.filterwarnings("ignore")

## Self-built Decision Tree Classifier

A Decision tree is a type of machine learning algorithm that is used for both classification and regression tasks. It is a tree-like model where internal nodes represent a decision based on the values of one or more input features, and branches represent the possible outcomes of that decision. The leaves of the tree represent the final output or classification. We built a custom class called _DecisionTreeClassifier_ to train and predict values for the data set that we have. We defined a constructor  that takes two optional parameters: max_depth and min_samples_split. The max_depth parameter controls the maximum depth of the decision tree, while the min_samples_split parameter controls the minimum number of samples required to split an internal node.

The _fit()_ function takes the training data X and its corresponding labels y, and trains the model by recursively building the decision tree using the _ _build_tree()_ _ function.

The _ _predict()_ _ function takes an input array X and returns an array of predictions based on the trained decision tree model.

The _ _build_tree()_ _ function is the heart of the algorithm and builds the decision tree recursively. It starts by checking if the stopping criterion has been met, which is defined as any of the following conditions: the depth of the tree has reached the maximum allowed, there is only one class label left in the data, or the number of samples is less than the minimum required to split an internal node. If any of these conditions are true, the method returns a leaf node with the most frequent class label.

Otherwise, it selects the best feature and threshold to split the data using the _ _best_split()_ _ function. It then splits the data into two subsets based on the selected feature and threshold, and recursively builds the decision tree for each subset. Finally, it returns an internal node representing the selected feature and threshold, as well as the two subtrees.

The _ _best_split()_ _ function calculates the information gain for each feature and threshold combination and returns the feature and threshold with the highest information gain.

The _ _gini_impurity()_ _ function calculates the impurity of the data split based on the Gini impurity criterion.

The _ _decision_node()_ _ function creates an internal node representing a decision based on the selected feature and threshold, as well as the two subtrees.

The _ _leaf_node()_ _ function creates a leaf node representing the most frequent class label in the data subset.

The _ _predict_one()_ _ function takes an input array x and traverses the decision tree model until it reaches a leaf node, which represents the predicted class label for x.

In [27]:
import numpy as np

class DecisionTreeClassifier:
    def __init__(self, max_depth=5, min_samples_split=2):
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
    
    def fit(self, X, y):
        print(f"Fitting the Decision Tree Classifier with {len(np.unique(y))} classes...")
        self.tree_ = self._build_tree(X, y)
        
    def predict(self, X):
        X = np.array(X)
        predictions = []
        for i in range(X.shape[0]):
            predictions.append(self._predict_one(X[i]))
        return np.array(predictions)
        
    def _build_tree(self, X, y, depth=0):
        num_samples, num_features = X.shape
        
        if (depth >= self.max_depth or len(np.unique(y)) == 1 or num_samples < self.min_samples_split):
            return self._leaf_node(y)
        
        best_feature, best_threshold = self._best_split(X, y, num_samples, num_features)
        
        left_indices = X[:, best_feature] < best_threshold
        right_indices = X[:, best_feature] >= best_threshold
        
        if (len(X[left_indices]) == 0 or len(X[right_indices]) == 0):
            return self._leaf_node(y)
        
        print("Depth:", depth)
        print("Samples:", num_samples)
        print("Features:", num_features)
        print("Best feature:", best_feature)
        print("Best threshold:", best_threshold)
        print("Left samples:", len(X[left_indices]))
        print("Right samples:", len(X[right_indices]))
        print("="*30)
        
        left_tree = self._build_tree(X[left_indices], y[left_indices], depth+1)
        right_tree = self._build_tree(X[right_indices], y[right_indices], depth+1)
        
        
        return self._decision_node(best_feature, best_threshold, left_tree, right_tree)

        
    def _best_split(self, X, y, num_samples, num_features):
        print("Calculating information gain for each feature...")
        best_impurity = float('inf')
        best_feature, best_threshold = None, None
        
        for feature in range(num_features):
            print(f"Feature {feature}")
            feature_values = np.expand_dims(X[:, feature], axis=1)
            unique_values = np.unique(feature_values)
            
            for threshold in unique_values:
                left_indices = X[:, feature] < threshold
                right_indices = X[:, feature] >= threshold
                
                if (np.sum(left_indices) == 0 or np.sum(right_indices) == 0):
                    continue
                
                left_labels = y[left_indices]
                right_labels = y[right_indices]
                
                impurity = self._gini_impurity(left_labels, right_labels, num_samples)
                
                if (impurity < best_impurity):
                    best_impurity = impurity
                    best_feature = feature
                    best_threshold = threshold
                    
        return best_feature, best_threshold
    
    def _gini_impurity(self, left_labels, right_labels, num_samples):
        p_l = len(left_labels) / num_samples
        p_r = len(right_labels) / num_samples
        
        gini_l = 1.0 - np.sum(np.power(np.unique(left_labels, return_counts=True)[1]/len(left_labels), 2))
        gini_r = 1.0 - np.sum(np.power(np.unique(right_labels, return_counts=True)[1]/len(right_labels), 2))
        
        impurity = (p_l * gini_l) + (p_r * gini_r)
        return impurity
        
    def _decision_node(self, feature, threshold, left_tree, right_tree):
        return {'feature': feature, 'threshold': threshold, 'left': left_tree, 'right': right_tree}
        
    def _leaf_node(self, y):
        return np.bincount(y).argmax()
    
    def _predict_one(self, x):
        node = self.tree_
        while isinstance(node, dict):
            if x[node['feature']] < node['threshold']:
                node = node['left']
            else:
                node = node['right']
        if isinstance(node, np.int64):
            return node
        else:
            print(f"Unexpected node type {type(node)} with value {node}")
            raise ValueError(f"Unexpected node type {type(node)} in prediction")

       


### Data Understanding and Data Preparation
We used the Kaggle Credit Card Fraud Detection Dataset : <a href="https://www.kaggle.com/mlg-ulb/creditcardfraud">Link</a>

In [3]:
# Read Data into a Dataframe
df = pd.read_csv('creditcard.csv')

In [4]:
df

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.108300,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.50,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.206010,0.502292,0.219422,0.215153,69.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,...,0.213454,0.111864,1.014480,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.055080,2.035030,-0.738589,0.868229,1.058415,0.024330,0.294869,0.584800,...,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.249640,-0.557828,2.630515,3.031260,-0.296827,0.708417,0.432454,...,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.240440,0.530483,0.702510,0.689799,-0.377961,0.623708,-0.686180,0.679145,0.392087,...,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.00,0


In [5]:
# Describe Data
df.describe()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
count,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,...,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0
mean,94813.859575,1.16598e-15,3.416908e-16,-1.37315e-15,2.086869e-15,9.604066e-16,1.490107e-15,-5.556467e-16,1.177556e-16,-2.406455e-15,...,1.656562e-16,-3.44485e-16,2.578648e-16,4.471968e-15,5.340915e-16,1.687098e-15,-3.666453e-16,-1.220404e-16,88.349619,0.001727
std,47488.145955,1.958696,1.651309,1.516255,1.415869,1.380247,1.332271,1.237094,1.194353,1.098632,...,0.734524,0.7257016,0.6244603,0.6056471,0.5212781,0.482227,0.4036325,0.3300833,250.120109,0.041527
min,0.0,-56.40751,-72.71573,-48.32559,-5.683171,-113.7433,-26.16051,-43.55724,-73.21672,-13.43407,...,-34.83038,-10.93314,-44.80774,-2.836627,-10.2954,-2.604551,-22.56568,-15.43008,0.0,0.0
25%,54201.5,-0.9203734,-0.5985499,-0.8903648,-0.8486401,-0.6915971,-0.7682956,-0.5540759,-0.2086297,-0.6430976,...,-0.2283949,-0.5423504,-0.1618463,-0.3545861,-0.3171451,-0.3269839,-0.07083953,-0.05295979,5.6,0.0
50%,84692.0,0.0181088,0.06548556,0.1798463,-0.01984653,-0.05433583,-0.2741871,0.04010308,0.02235804,-0.05142873,...,-0.02945017,0.006781943,-0.01119293,0.04097606,0.0165935,-0.05213911,0.001342146,0.01124383,22.0,0.0
75%,139320.5,1.315642,0.8037239,1.027196,0.7433413,0.6119264,0.3985649,0.5704361,0.3273459,0.597139,...,0.1863772,0.5285536,0.1476421,0.4395266,0.3507156,0.2409522,0.09104512,0.07827995,77.165,0.0
max,172792.0,2.45493,22.05773,9.382558,16.87534,34.80167,73.30163,120.5895,20.00721,15.59499,...,27.20284,10.50309,22.52841,4.584549,7.519589,3.517346,31.6122,33.84781,25691.16,1.0


In [6]:
df.columns

Index(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',
       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount',
       'Class'],
      dtype='object')

In [7]:
df.isna().sum()

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

### Under Sampling Dataset

In [8]:
# separating the data for analysis
legit = df[df.Class == 0]
fraud = df[df.Class == 1]

In [9]:
# statistical measures of the data
legit.Amount.describe()

count    284315.000000
mean         88.291022
std         250.105092
min           0.000000
25%           5.650000
50%          22.000000
75%          77.050000
max       25691.160000
Name: Amount, dtype: float64

In [10]:
fraud.Amount.describe()

count     492.000000
mean      122.211321
std       256.683288
min         0.000000
25%         1.000000
50%         9.250000
75%       105.890000
max      2125.870000
Name: Amount, dtype: float64

In [12]:
# compare the values for both transactions
df.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,94838.202258,0.008258,-0.006271,0.012171,-0.00786,0.005453,0.002419,0.009637,-0.000987,0.004467,...,-0.000644,-0.001235,-2.4e-05,7e-05,0.000182,-7.2e-05,-8.9e-05,-0.000295,-0.000131,88.291022
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,...,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


In [13]:
legit_sample = legit.sample(n=492)

In [14]:
new_dataset = pd.concat([legit_sample, fraud], axis=0)

In [15]:
new_dataset.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
159495,112566.0,2.032093,-1.062764,-1.624383,-0.93636,-0.417103,-0.828675,-0.265871,-0.339898,-0.510784,...,0.495672,1.167804,-0.101243,0.644423,0.196585,0.039942,-0.061544,-0.040043,124.95,0
200573,133488.0,2.30323,-1.408775,-1.192854,-1.767757,-0.936633,-0.163433,-1.206191,-0.003637,-1.366624,...,-0.159411,-0.026633,0.219949,0.311729,-0.152742,-0.18805,-0.004349,-0.060767,10.0,0
28626,35087.0,0.510169,-2.539631,0.176184,-1.613871,-2.020751,-0.279,-0.415543,-0.145152,1.117873,...,-0.298099,-1.042499,-0.393458,-0.075449,0.236097,-0.201469,-0.002073,0.105052,466.16,0
54057,46269.0,1.019608,0.26427,-0.595051,1.287979,0.193063,-0.984771,0.646837,-0.336034,-0.252445,...,-0.028383,-0.296316,-0.278201,-0.039814,0.747511,-0.351371,-0.004832,0.061748,135.51,0
44970,42153.0,-1.199478,0.908758,2.221155,0.504918,0.650757,-0.364402,1.830071,-1.136071,0.313504,...,-0.251924,0.488412,-0.323769,0.540509,0.525682,-0.353342,-0.540957,-0.777533,40.17,0


In [16]:
new_dataset.tail()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.88285,0.697211,-2.064945,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.29268,0.147968,390.0,1
280143,169347.0,1.378559,1.289381,-5.004247,1.41185,0.442581,-1.326536,-1.41317,0.248525,-1.127396,...,0.370612,0.028234,-0.14564,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.2137,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.65225,...,0.751826,0.834108,0.190944,0.03207,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.39973,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.2537,245.0,1
281674,170348.0,1.991976,0.158476,-2.583441,0.40867,1.151147,-0.096695,0.22305,-0.068384,0.577829,...,-0.16435,-0.295135,-0.072173,-0.450261,0.313267,-0.289617,0.002988,-0.015309,42.53,1


In [17]:
new_dataset['Class'].value_counts()

1    492
0    492
Name: Class, dtype: int64

In [18]:
new_dataset.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,92769.207317,-0.1462,-0.033887,0.082706,-0.010498,0.054606,0.012207,-0.121738,-0.090758,0.010059,...,0.042611,-0.033244,-0.000366,-0.001006,-0.001223,0.036735,0.039932,0.005307,-0.018024,82.033333
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,...,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


In [19]:
X = new_dataset.drop(columns='Class', axis=1)
Y = new_dataset['Class']

In [20]:
print(X)


            Time        V1        V2        V3        V4        V5        V6  \
159495  112566.0  2.032093 -1.062764 -1.624383 -0.936360 -0.417103 -0.828675   
200573  133488.0  2.303230 -1.408775 -1.192854 -1.767757 -0.936633 -0.163433   
28626    35087.0  0.510169 -2.539631  0.176184 -1.613871 -2.020751 -0.279000   
54057    46269.0  1.019608  0.264270 -0.595051  1.287979  0.193063 -0.984771   
44970    42153.0 -1.199478  0.908758  2.221155  0.504918  0.650757 -0.364402   
...          ...       ...       ...       ...       ...       ...       ...   
279863  169142.0 -1.927883  1.125653 -4.518331  1.749293 -1.566487 -2.010494   
280143  169347.0  1.378559  1.289381 -5.004247  1.411850  0.442581 -1.326536   
280149  169351.0 -0.676143  1.126366 -2.213700  0.468308 -1.120541 -0.003346   
281144  169966.0 -3.113832  0.585864 -5.399730  1.817092 -0.840618 -2.943548   
281674  170348.0  1.991976  0.158476 -2.583441  0.408670  1.151147 -0.096695   

              V7        V8        V9  .

In [21]:
print(Y)

159495    0
200573    0
28626     0
54057     0
44970     0
         ..
279863    1
280143    1
280149    1
281144    1
281674    1
Name: Class, Length: 984, dtype: int64


In [36]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)

In [37]:
print(X.shape, X_train.shape, X_test.shape)

(984, 30) (787, 30) (197, 30)


The dataset is imbalanced, as the number of non-fraudulent transactions (class 0) is significantly higher than the number of fraudulent transactions (class 1). To address this imbalance, under-sampling is performed which involves randomly selecting a subset of the majority class (in this case, non-fraudulent transactions) to create a new balanced dataset with an equal number of instances from each class.

In this specific code, the majority class (class 0) is randomly sampled to obtain 492 instances, which is the same as the number of instances in the minority class (class 1). These two subsets are then combined to create a new dataset with equal representation from both classes, which can be used for further analysis and modeling.

### Evaluation

We make use of AUC-ROC Score, Classification Report, Accuracy and F1-Score to evaluate the performance of the classifiers

In [26]:
# Evaluation of Classifiers
def grid_eval(grid_clf):
    """
        Method to Compute the best score and parameters computed by grid search
        Parameter:
            grid_clf: The Grid Search Classifier 
    """
    print("Best Score", grid_clf.best_score_)
    print("Best Parameter", grid_clf.best_params_)
    
def evaluation(y_test, grid_clf, X_test):
    """
        Method to compute the following:
            1. Classification Report
            2. F1-score
            3. AUC-ROC score
            4. Accuracy
        Parameters:
            y_test: The target variable test set
            grid_clf: Grid classifier selected
            X_test: Input Feature Test Set
    """
    y_pred = grid_clf.predict(X_test)
    print('CLASSIFICATION REPORT')
    print(classification_report(y_test, y_pred))
    
    print('AUC-ROC')
    print(roc_auc_score(y_test, y_pred))
      
    print('F1-Score')
    print(f1_score(y_test, y_pred))
    
    print('Accuracy')
    print(accuracy_score(y_test, y_pred))

In [28]:
# create an instance of the DecisionTreeClassifier class
D_tree = DecisionTreeClassifier(max_depth=5, min_samples_split=2)

# fit the classifier on the training data
D_tree.fit(X_train.values, Y_train.values)




Fitting the Decision Tree Classifier with 2 classes...
Calculating information gain for each feature...
Feature 0
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
Feature 7
Feature 8
Feature 9
Feature 10
Feature 11
Feature 12
Feature 13
Feature 14
Feature 15
Feature 16
Feature 17
Feature 18
Feature 19
Feature 20
Feature 21
Feature 22
Feature 23
Feature 24
Feature 25
Feature 26
Feature 27
Feature 28
Feature 29
Depth: 0
Samples: 787
Features: 30
Best feature: 14
Best threshold: -1.6794082141796398
Left samples: 361
Right samples: 426
Calculating information gain for each feature...
Feature 0
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
Feature 7
Feature 8
Feature 9
Feature 10
Feature 11
Feature 12
Feature 13
Feature 14
Feature 15
Feature 16
Feature 17
Feature 18
Feature 19
Feature 20
Feature 21
Feature 22
Feature 23
Feature 24
Feature 25
Feature 26
Feature 27
Feature 28
Feature 29
Depth: 1
Samples: 361
Features: 30
Best feature: 9
Best threshold: 1.5648278649946

In [32]:
print(D_tree)

Y_pred = D_tree.predict(X_test)

<__main__.DecisionTreeClassifier object at 0x7ff3f3318160>


In [38]:
    y_pred = D_tree.predict(X_test)
    print('CLASSIFICATION REPORT')
    print(classification_report(Y_test, y_pred))
    
    print('AUC-ROC')
    print(roc_auc_score(Y_test, y_pred))
      
    print('F1-Score')
    print(f1_score(Y_test, y_pred))
    
    print('Accuracy')
    print(accuracy_score(Y_test, y_pred))

CLASSIFICATION REPORT
              precision    recall  f1-score   support

           0       0.87      0.91      0.89        99
           1       0.90      0.87      0.89        98

    accuracy                           0.89       197
   macro avg       0.89      0.89      0.89       197
weighted avg       0.89      0.89      0.89       197

AUC-ROC
0.8882189239332097
F1-Score
0.8854166666666667
Accuracy
0.8883248730964467


Using maximum depth as 5 and minimum smaples split as 2 the precision is 0.87 for class 0 and 0.90 for class 1 means that out of all the predicted positives, 87% of them actually belong to class 0, and out of all the predicted positives, 90% of them actually belong to class 1. Recall of 0.91 for class 0 and 0.87 for class 1 means that out of all the actual positives in class 0, the model correctly predicted 91% of them, and out of all the actual positives in class 1, the model correctly predicted 87% of them.

The f1-score is the harmonic mean of precision and recall, and it is a measure of the model's accuracy. The f1-score is 0.89 for both classes, indicating good performance. The accuracy of the model is 0.89, which means that the model correctly predicted the class labels for 89% of the instances.

The AUC-ROC (Area Under the Receiver Operating Characteristic Curve) is a measure of the model's ability to distinguish between the positive and negative classes. The AUC-ROC is 0.8882, indicating that the model is able to distinguish between the classes moderately well.

In [58]:
# create an instance of the DecisionTreeClassifier class
D_tree2 = DecisionTreeClassifier(max_depth=10, min_samples_split=4)

# fit the classifier on the training data
D_tree2.fit(X_train.values, Y_train.values)


DecisionTreeClassifier(max_depth=10, min_samples_split=4)

In [59]:
Y_pred = D_tree2.predict(X_test)

In [60]:
    y_pred = D_tree2.predict(X_test)
    print('CLASSIFICATION REPORT')
    print(classification_report(Y_test, y_pred))
    
    print('AUC-ROC')
    print(roc_auc_score(Y_test, y_pred))
      
    print('F1-Score')
    print(f1_score(Y_test, y_pred))
    
    print('Accuracy')
    print(accuracy_score(Y_test, y_pred))

CLASSIFICATION REPORT
              precision    recall  f1-score   support

           0       0.90      0.92      0.91        99
           1       0.92      0.90      0.91        98

    accuracy                           0.91       197
   macro avg       0.91      0.91      0.91       197
weighted avg       0.91      0.91      0.91       197

AUC-ROC
0.9085755514326943
F1-Score
0.9072164948453607
Accuracy
0.9086294416243654


Using maximum depth as 10 and minimum smaples split as 4 the precision for class 0 is 0.90, which means that out of all the transactions predicted to be legitimate by the model, 90% of them are actually legitimate. The precision for class 1 is 0.92, which means that out of all the transactions predicted to be fraudulent by the model, 92% of them are actually fraudulent.

The recall for class 0 is 0.92, which means that out of all the actual legitimate transactions, 92% of them were correctly identified by the model. The recall for class 1 is 0.90, which means that out of all the actual fraudulent transactions, 90% of them were correctly identified by the model. The F1-score for both classes is 0.91, which is the harmonic mean of precision and recall. This metric provides a balance between precision and recall, and in this case, it indicates that the model performs similarly well for both classes.

The AUC-ROC score is 0.9086, which is a measure of how well the model can distinguish between the two classes. The score ranges from 0 to 1, with a score of 0.5 indicating that the model performs no better than random guessing, and a score of 1 indicating perfect classification. In this case, the score is close to 1, which indicates that the model is good at distinguishing between fraudulent and legitimate transactions.

In [64]:
# create an instance of the DecisionTreeClassifier class
D_tree3 = DecisionTreeClassifier(max_depth=20, min_samples_split=10)

# fit the classifier on the training data
D_tree3.fit(X_train.values, Y_train.values)

DecisionTreeClassifier(max_depth=20, min_samples_split=10)

In [65]:

Y_pred = D_tree3.predict(X_test)

In [66]:
    y_pred = D_tree3.predict(X_test)
    print('CLASSIFICATION REPORT')
    print(classification_report(Y_test, y_pred))
    
    print('AUC-ROC')
    print(roc_auc_score(Y_test, y_pred))
      
    print('F1-Score')
    print(f1_score(Y_test, y_pred))
    
    print('Accuracy')
    print(accuracy_score(Y_test, y_pred))

CLASSIFICATION REPORT
              precision    recall  f1-score   support

           0       0.88      0.94      0.91        99
           1       0.93      0.87      0.90        98

    accuracy                           0.90       197
   macro avg       0.91      0.90      0.90       197
weighted avg       0.91      0.90      0.90       197

AUC-ROC
0.9033704390847248
F1-Score
0.8994708994708994
Accuracy
0.9035532994923858


In this experiment using maximum depth as 20 and minimum samples split as 10 we can see that the precision and recall scores for both classes are relatively high, indicating that the model is able to make accurate predictions for both fraudulent and legitimate transactions. The f1-score is also high, which is a balance between precision and recall.

The accuracy of the model is 0.90, which means that the model is able to correctly classify 90% of the transactions. The AUC-ROC score is also high, indicating that the model is able to distinguish between the two classes well.

Overall the model is performing well with an accuracy of 0.91. The precision and recall values for both classes are high, indicating that the model is making accurate predictions for both fraudulent and legitimate transactions. The F1-score is also high, indicating a good balance between precision and recall. Finally, the AUC-ROC score is close to 1, indicating that the model is good at distinguishing between fraudulent and legitimate transactions.

## Sklean Model

In [41]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score

classifier = DecisionTreeClassifier()
classifier.fit(X_train, Y_train)
Y_pred = classifier.predict(X_test)
print("Classification Report:")
print(classification_report(Y_test, Y_pred))
auc_roc = roc_auc_score(Y_test, Y_pred)
print("AUC-ROC: ", auc_roc)

f1 = f1_score(Y_test, Y_pred)

accuracy = accuracy_score(Y_test, Y_pred)

print(f"AUC-ROC: {auc_roc:.4f}")
print(f"F1-score: {f1:.4f}")
print(f"Accuracy: {accuracy:.4f}")

Classification Report:
              precision    recall  f1-score   support

           0       0.90      0.89      0.89        99
           1       0.89      0.90      0.89        98

    accuracy                           0.89       197
   macro avg       0.89      0.89      0.89       197
weighted avg       0.89      0.89      0.89       197

AUC-ROC:  0.8934240362811792
AUC-ROC: 0.8934
F1-score: 0.8934
Accuracy: 0.8934


The precision for class 0 is 0.90, which means that out of all the samples predicted as class 0, 90% of them are actually class 0. Similarly, the precision for class 1 is 0.89, which means that out of all the samples predicted as class 1, 89% of them are actually class 1. 

The recall for class 0 is 0.89, which means that out of all the samples that are actually class 0, the model identified 89% of them correctly. Similarly, the recall for class 1 is 0.90, which means that out of all the samples that are actually class 1, the model identified 90% of them correctly. The f1-score is the harmonic mean of precision and recall, and it is around 0.89 for both classes. This indicates a good balance between precision and recall for both classes.

The AUC-ROC (Area Under the Receiver Operating Characteristic Curve) is 0.8934, which is a measure of the model's ability to distinguish between positive and negative classes. An AUC-ROC score of 0.5 indicates a random classifier, and a score of 1.0 indicates a perfect classifier. Therefore, an AUC-ROC score of 0.8934 indicates that the model is performing well in distinguishing between the two classes.

Finally, the accuracy of the model is also around 0.89, which means that the model correctly predicted 89% of the samples.

### Conclusion
In terms of precision, recall, and f1-score, both models show similar results for both classes, with minor differences in precision and recall values. However, the custom class classifier shows slightly higher precision and recall values for both classes, resulting in a higher overall f1-score compared to the Sklearn model.

Regarding accuracy, the custom class classifier performs slightly better than the Sklearn model, with an accuracy of 0.91 compared to 0.89.

Finally, in terms of the AUC-ROC metric, the custom class classifier performs better with a score of 0.9086 compared to the Sklearn model's score of 0.8934.

In conclusion, the custom class classifier outperforms the Sklearn model in terms of accuracy, f1-score, and AUC-ROC. However, the differences are minor, and further analysis is required to determine if the custom class classifier is significantly better than the Sklearn model.

### Sources

Data - https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud