# Credit Card Fraud Detection: Using Random Forest


In [1]:

# Importing Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import SGDClassifier

# from mlxtend.plotting import plot_learning_curves
from sklearn.model_selection import train_test_split
from imblearn.over_sampling import SMOTE
from sklearn.metrics import precision_score, recall_score, f1_score, roc_auc_score, accuracy_score, classification_report
from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer, matthews_corrcoef

import warnings
warnings.filterwarnings("ignore")

## Self-built Random Forest Classifier

Random forest is a type of ensemble learning method that combines multiple decision trees to make predictions for classification or regression tasks. It works by constructing multiple decision trees on randomly selected subsets of the training data and using the combined predictions of all trees to make a final prediction.

We can build a custom class called _RandomForestClassifier_ to train and predict values for the dataset. The class constructor takes several optional parameters, including n_estimators, max_depth, min_samples_split, and max_features. 

The n_estimators parameter controls the number of decision trees to be constructed. The max_depth and min_samples_split parameters control the maximum depth of each tree and the minimum number of samples required to split an internal node, respectively. The max_features parameter controls the number of features to consider when splitting each internal node.

The _fit()_ function takes the training data X and its corresponding labels y, and trains the model by constructing multiple decision trees using the _DecisionTreeClassifier_ and combining their predictions using a majority vote.

The _predict()_ function takes an input array X and returns an array of predictions based on the trained random forest model.

The random forest algorithm works by randomly selecting a subset of the training data and a subset of the features for each decision tree. This helps to reduce overfitting and improve the generalization of the model. The algorithm then constructs each decision tree independently and combines their predictions using a majority vote.

The _DecisionTreeClassifier_ class described earlier is used to construct each decision tree in the random forest. The main difference is that the _best_split()_ function in the _DecisionTreeClassifier_ class considers only a subset of the available features for each split, as determined by the max_features parameter.

In addition to the _fit()_ and _predict()_ functions, the _RandomForestClassifier_ class may also include functions for calculating feature importance and visualizing the decision trees in the forest.

In [30]:
import numpy as np

class RandomForestClassifier:
    def __init__(self, n_estimators=10, max_depth=5, min_samples_split=2, bootstrap=True):
        self.n_estimators = n_estimators
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
        self.bootstrap = bootstrap
        self.trees = []
    
    def fit(self, X, y):
        print(f"Fitting the Random Forest Classifier with {self.n_estimators} trees...")
        for i in range(self.n_estimators):
            if self.bootstrap:
                sample_indices = np.random.choice(X.shape[0], X.shape[0], replace=True)
            else:
                sample_indices = np.arange(X.shape[0])
            sample_X = X[sample_indices]
            sample_y = y[sample_indices]
            tree = DecisionTreeClassifier(max_depth=self.max_depth, min_samples_split=self.min_samples_split)
            tree.fit(sample_X, sample_y)
            self.trees.append(tree)
        
    def predict(self, X):
        X = np.array(X)
        predictions = []
        for i in range(X.shape[0]):
            votes = []
            for tree in self.trees:
                votes.append(tree._predict_one(X[i]))
            predictions.append(max(set(votes), key=votes.count))
        return np.array(predictions)


        
class DecisionTreeClassifier:
    def __init__(self, max_depth=5, min_samples_split=2):
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
    
    def fit(self, X, y):
        print(f"Fitting the Decision Tree Classifier with {len(np.unique(y))} classes...")
        self.tree_ = self._build_tree(X, y)
        
    def predict(self, X):
        X = np.array(X)
        predictions = []
        for i in range(X.shape[0]):
            predictions.append(self._predict_one(X[i]))
        return np.array(predictions)
        
    def _build_tree(self, X, y, depth=0):
        num_samples, num_features = X.shape
        
        if (depth >= self.max_depth or len(np.unique(y)) == 1 or num_samples < self.min_samples_split):
            return self._leaf_node(y)
        
        best_feature, best_threshold = self._best_split(X, y, num_samples, num_features)
        
        left_indices = X[:, best_feature] < best_threshold
        right_indices = X[:, best_feature] >= best_threshold
        
        if (len(X[left_indices]) == 0 or len(X[right_indices]) == 0):
            return self._leaf_node(y)
        
        print("Depth:", depth)
        print("Samples:", num_samples)
        print("Features:", num_features)
        print("Best feature:", best_feature)
        print("Best threshold:", best_threshold)
        print("Left samples:", len(X[left_indices]))
        print("Right samples:", len(X[right_indices]))
        print("="*30)
        
        left_tree = self._build_tree(X[left_indices], y[left_indices], depth+1)
        right_tree = self._build_tree(X[right_indices], y[right_indices], depth+1)
        
        
        return self._decision_node(best_feature, best_threshold, left_tree, right_tree)

        
    def _best_split(self, X, y, num_samples, num_features):
        print("Calculating information gain for each feature...")
        best_impurity = float('inf')
        best_feature, best_threshold = None, None
        
        for feature in range(num_features):
            print(f"Feature {feature}")
            feature_values = np.expand_dims(X[:, feature], axis=1)
            unique_values = np.unique(feature_values)
            
            for threshold in unique_values:
                left_indices = X[:, feature] < threshold
                right_indices = X[:, feature] >= threshold
                
                if (np.sum(left_indices) == 0 or np.sum(right_indices) == 0):
                    continue
                
                left_labels = y[left_indices]
                right_labels = y[right_indices]
                
                impurity = self._gini_impurity(left_labels, right_labels, num_samples)
                
                if (impurity < best_impurity):
                    best_impurity = impurity
                    best_feature = feature
                    best_threshold = threshold
                    
        return best_feature, best_threshold
    
    def _gini_impurity(self, left_labels, right_labels, num_samples):
        p_l = len(left_labels) / num_samples
        p_r = len(right_labels) / num_samples
        
        gini_l = 1.0 - np.sum(np.power(np.unique(left_labels, return_counts=True)[1]/len(left_labels), 2))
        gini_r = 1.0 - np.sum(np.power(np.unique(right_labels, return_counts=True)[1]/len(right_labels), 2))
        
        impurity = (p_l * gini_l) + (p_r * gini_r)
        return impurity
        
    def _decision_node(self, feature, threshold, left_tree, right_tree):
        return {'feature': feature, 'threshold': threshold, 'left': left_tree, 'right': right_tree}
        
    def _leaf_node(self, y):
        return np.bincount(y).argmax()
    
    def _predict_one(self, x):
        node = self.tree_
        while isinstance(node, dict):
            if x[node['feature']] < node['threshold']:
                node = node['left']
            else:
                node = node['right']
        if isinstance(node, np.int64):
            return node
        else:
            print(f"Unexpected node type {type(node)} with value {node}")
            raise ValueError(f"Unexpected node type {type(node)} in prediction")

       


### Data Understanding and Data Preparation
We used the Kaggle Credit Card Fraud Detection Dataset : <a href="https://www.kaggle.com/mlg-ulb/creditcardfraud">Link</a>

Since the data set is imbalanced SMOTE technique is used to balance the datatset

In [2]:
# Read Data into a Dataframe
df = pd.read_csv('creditcard.csv')

In [3]:
df

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.108300,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.50,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.206010,0.502292,0.219422,0.215153,69.99,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
284802,172786.0,-11.881118,10.071785,-9.834783,-2.066656,-5.364473,-2.606837,-4.918215,7.305334,1.914428,...,0.213454,0.111864,1.014480,-0.509348,1.436807,0.250034,0.943651,0.823731,0.77,0
284803,172787.0,-0.732789,-0.055080,2.035030,-0.738589,0.868229,1.058415,0.024330,0.294869,0.584800,...,0.214205,0.924384,0.012463,-1.016226,-0.606624,-0.395255,0.068472,-0.053527,24.79,0
284804,172788.0,1.919565,-0.301254,-3.249640,-0.557828,2.630515,3.031260,-0.296827,0.708417,0.432454,...,0.232045,0.578229,-0.037501,0.640134,0.265745,-0.087371,0.004455,-0.026561,67.88,0
284805,172788.0,-0.240440,0.530483,0.702510,0.689799,-0.377961,0.623708,-0.686180,0.679145,0.392087,...,0.265245,0.800049,-0.163298,0.123205,-0.569159,0.546668,0.108821,0.104533,10.00,0


In [4]:
# Describe Data
df.describe()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
count,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,...,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0,284807.0
mean,94813.859575,1.16598e-15,3.416908e-16,-1.37315e-15,2.086869e-15,9.604066e-16,1.490107e-15,-5.556467e-16,1.177556e-16,-2.406455e-15,...,1.656562e-16,-3.44485e-16,2.578648e-16,4.471968e-15,5.340915e-16,1.687098e-15,-3.666453e-16,-1.220404e-16,88.349619,0.001727
std,47488.145955,1.958696,1.651309,1.516255,1.415869,1.380247,1.332271,1.237094,1.194353,1.098632,...,0.734524,0.7257016,0.6244603,0.6056471,0.5212781,0.482227,0.4036325,0.3300833,250.120109,0.041527
min,0.0,-56.40751,-72.71573,-48.32559,-5.683171,-113.7433,-26.16051,-43.55724,-73.21672,-13.43407,...,-34.83038,-10.93314,-44.80774,-2.836627,-10.2954,-2.604551,-22.56568,-15.43008,0.0,0.0
25%,54201.5,-0.9203734,-0.5985499,-0.8903648,-0.8486401,-0.6915971,-0.7682956,-0.5540759,-0.2086297,-0.6430976,...,-0.2283949,-0.5423504,-0.1618463,-0.3545861,-0.3171451,-0.3269839,-0.07083953,-0.05295979,5.6,0.0
50%,84692.0,0.0181088,0.06548556,0.1798463,-0.01984653,-0.05433583,-0.2741871,0.04010308,0.02235804,-0.05142873,...,-0.02945017,0.006781943,-0.01119293,0.04097606,0.0165935,-0.05213911,0.001342146,0.01124383,22.0,0.0
75%,139320.5,1.315642,0.8037239,1.027196,0.7433413,0.6119264,0.3985649,0.5704361,0.3273459,0.597139,...,0.1863772,0.5285536,0.1476421,0.4395266,0.3507156,0.2409522,0.09104512,0.07827995,77.165,0.0
max,172792.0,2.45493,22.05773,9.382558,16.87534,34.80167,73.30163,120.5895,20.00721,15.59499,...,27.20284,10.50309,22.52841,4.584549,7.519589,3.517346,31.6122,33.84781,25691.16,1.0


In [5]:
df.columns

Index(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',
       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount',
       'Class'],
      dtype='object')

In [6]:
df.isna().sum()

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

### Under Sampling Dataset

In [7]:
# separating the data for analysis
legit = df[df.Class == 0]
fraud = df[df.Class == 1]

In [8]:
# statistical measures of the data
legit.Amount.describe()

count    284315.000000
mean         88.291022
std         250.105092
min           0.000000
25%           5.650000
50%          22.000000
75%          77.050000
max       25691.160000
Name: Amount, dtype: float64

In [9]:
fraud.Amount.describe()

count     492.000000
mean      122.211321
std       256.683288
min         0.000000
25%         1.000000
50%         9.250000
75%       105.890000
max      2125.870000
Name: Amount, dtype: float64

In [10]:
# compare the values for both transactions
df.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,94838.202258,0.008258,-0.006271,0.012171,-0.00786,0.005453,0.002419,0.009637,-0.000987,0.004467,...,-0.000644,-0.001235,-2.4e-05,7e-05,0.000182,-7.2e-05,-8.9e-05,-0.000295,-0.000131,88.291022
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,...,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


In [11]:
legit_sample = legit.sample(n=492)

In [12]:
new_dataset = pd.concat([legit_sample, fraud], axis=0)

In [13]:
new_dataset.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
278152,168065.0,-0.412406,0.265221,0.324103,-0.329454,2.952006,4.386999,0.149436,0.972234,0.179437,...,-0.484038,-1.175925,0.080682,0.62125,-0.338391,-1.046873,0.043037,-0.061452,19.93,0
38797,39558.0,-0.951351,0.913805,0.843073,-1.667488,0.294264,0.331837,0.056193,0.741926,-0.358534,...,-0.037619,-0.191765,-0.145952,-1.117513,-0.26871,0.806195,0.144182,0.083973,3.84,0
261116,159871.0,-0.261835,0.321582,0.662906,-2.323372,0.421523,-1.183712,1.040757,-0.477804,-1.105187,...,0.002599,0.239631,-0.261089,-0.077818,0.078991,-0.391798,0.160735,-0.035668,8.38,0
222425,142963.0,-0.679817,1.127994,-2.065994,-0.59563,0.397613,-0.414047,1.984642,0.063166,-0.367593,...,0.019606,0.026068,-0.21431,0.010086,0.480037,0.703271,0.02577,-0.04214,228.0,0
255524,157269.0,-5.90329,5.201291,-5.420179,-1.241619,-2.640632,-2.233447,-2.095874,4.016496,-0.224934,...,0.413652,0.643334,0.405876,0.062922,0.35726,0.124222,0.095446,0.16845,7.7,0


In [14]:
new_dataset.tail()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.88285,0.697211,-2.064945,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.29268,0.147968,390.0,1
280143,169347.0,1.378559,1.289381,-5.004247,1.41185,0.442581,-1.326536,-1.41317,0.248525,-1.127396,...,0.370612,0.028234,-0.14564,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.2137,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.65225,...,0.751826,0.834108,0.190944,0.03207,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.39973,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.2537,245.0,1
281674,170348.0,1.991976,0.158476,-2.583441,0.40867,1.151147,-0.096695,0.22305,-0.068384,0.577829,...,-0.16435,-0.295135,-0.072173,-0.450261,0.313267,-0.289617,0.002988,-0.015309,42.53,1


In [15]:
new_dataset['Class'].value_counts()

1    492
0    492
Name: Class, dtype: int64

In [16]:
new_dataset.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,92301.871951,-0.044732,0.017763,0.070342,0.011407,-0.046691,0.098298,0.014367,0.033107,-0.032775,...,0.03797,0.000303,-0.000838,-0.007662,0.028937,-0.020593,-0.025006,-0.002289,0.02212,98.323069
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,...,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


In [17]:
X = new_dataset.drop(columns='Class', axis=1)
Y = new_dataset['Class']

In [18]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)

In [19]:
print(X.shape, X_train.shape, X_test.shape, Y_train.shape)

(984, 30) (787, 30) (197, 30) (787,)


The dataset is imbalanced, as the number of non-fraudulent transactions (class 0) is significantly higher than the number of fraudulent transactions (class 1). To address this imbalance, under-sampling is performed which involves randomly selecting a subset of the majority class (in this case, non-fraudulent transactions) to create a new balanced dataset with an equal number of instances from each class.

In this specific code, the majority class (class 0) is randomly sampled to obtain 492 instances, which is the same as the number of instances in the minority class (class 1). These two subsets are then combined to create a new dataset with equal representation from both classes, which can be used for further analysis and modeling.

### Evaluation

We make use of AUC-ROC Score, Classification Report, Accuracy and F1-Score to evaluate the performance of the classifiers

 ### Experiments

In [31]:
rf = RandomForestClassifier(n_estimators=5, max_depth=5, min_samples_split=2)

# train the random forest classifier on the data
rf.fit(X_train.values, Y_train.values )



Fitting the Random Forest Classifier with 5 trees...
Fitting the Decision Tree Classifier with 2 classes...
Calculating information gain for each feature...
Feature 0
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
Feature 7
Feature 8
Feature 9
Feature 10
Feature 11
Feature 12
Feature 13
Feature 14
Feature 15
Feature 16
Feature 17
Feature 18
Feature 19
Feature 20
Feature 21
Feature 22
Feature 23
Feature 24
Feature 25
Feature 26
Feature 27
Feature 28
Feature 29
Depth: 0
Samples: 787
Features: 30
Best feature: 14
Best threshold: -1.7347034696806598
Left samples: 379
Right samples: 408
Calculating information gain for each feature...
Feature 0
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Feature 6
Feature 7
Feature 8
Feature 9
Feature 10
Feature 11
Feature 12
Feature 13
Feature 14
Feature 15
Feature 16
Feature 17
Feature 18
Feature 19
Feature 20
Feature 21
Feature 22
Feature 23
Feature 24
Feature 25
Feature 26
Feature 27
Feature 28
Feature 29
Depth: 1
Samples: 379
Feature

In [32]:
print(rf)

y_pred = rf.predict(X_test)

<__main__.RandomForestClassifier object at 0x7f6b96a205f8>


In [34]:
print('CLASSIFICATION REPORT')
print(classification_report(Y_test, y_pred))

print('AUC-ROC')
print(roc_auc_score(Y_test, y_pred))

print('F1-Score')
print(f1_score(Y_test, y_pred))

print('Accuracy')
print(accuracy_score(Y_test, y_pred))

CLASSIFICATION REPORT
              precision    recall  f1-score   support

           0       0.87      0.98      0.92        99
           1       0.98      0.86      0.91        98

    accuracy                           0.92       197
   macro avg       0.93      0.92      0.92       197
weighted avg       0.93      0.92      0.92       197

AUC-ROC
0.9184704184704185
F1-Score
0.9130434782608695
Accuracy
0.9187817258883249


The evaluation report presents the precision, recall, and F1-score metrics for a binary classification task with two classes: class 0 representing non-fraudulent transactions and class 1 representing fraudulent transactions. Using 5 estimatores with a maximum depth of 5 and minimum number of splits as 2 the precision of class 0 is 0.87, indicating that 87% of the transactions classified as non-fraudulent are actually non-fraudulent. The recall of class 0 is 0.98, indicating that 98% of the actual non-fraudulent transactions are correctly classified as non-fraudulent. The F1-score of class 0 is 0.92, which is the harmonic mean of precision and recall.

For class 1, the precision is 0.98, indicating that 98% of the transactions classified as fraudulent are actually fraudulent. The recall is 0.86, indicating that 86% of the actual fraudulent transactions are correctly classified as fraudulent. The F1-score of class 1 is 0.91, which is the harmonic mean of precision and recall. The weighted average of precision, recall, and F1-score is 0.93, demonstrating an overall good performance of the model. The support column displays the number of samples for each class.

The accuracy of the model is 0.92, indicating that 92% of all transactions are correctly classified. The AUC-ROC score is 0.918, which is a reliable measure of the model's overall performance in terms of the true positive rate and false positive rate tradeoff. The F1-score is 0.913, representing the harmonic mean of precision and recall for both classes.

In [50]:
rf2 = RandomForestClassifier(n_estimators=10, max_depth=10, min_samples_split=4)

# train the random forest classifier on the data
rf2.fit(X_train.values, Y_train.values )



RandomForestClassifier(max_depth=10, min_samples_split=4, n_estimators=10)

In [51]:
print(rf2)

y_pred = rf2.predict(X_test)

RandomForestClassifier(max_depth=10, min_samples_split=4, n_estimators=10)


In [41]:
print('CLASSIFICATION REPORT')
print(classification_report(Y_test, y_pred))

print('AUC-ROC')
print(roc_auc_score(Y_test, y_pred))

print('F1-Score')
print(f1_score(Y_test, y_pred))

print('Accuracy')
print(accuracy_score(Y_test, y_pred))

CLASSIFICATION REPORT
              precision    recall  f1-score   support

           0       0.89      1.00      0.94        99
           1       1.00      0.88      0.93        98

    accuracy                           0.94       197
   macro avg       0.95      0.94      0.94       197
weighted avg       0.95      0.94      0.94       197

AUC-ROC
0.9387755102040816
F1-Score
0.9347826086956522
Accuracy
0.9390862944162437


Using 10 estimatores with a maximum depth of 10 and minimum number of splits as 4 the precision of class 0 and class 1 is above 0.89 and 1.00, respectively, indicating that the model has a low rate of false positives. The recall of class 0 and class 1 is above 0.88 and 0.88, respectively, indicating that the model has a low rate of false negatives. The weighted average of precision, recall, and F1-score is 0.95, indicating that the model has achieved good overall performance. The AUC-ROC score of 0.939 is also a good measure of the model's overall performance in terms of the true positive rate and false positive rate tradeoff.

The accuracy of the model is 0.94, indicating that the model has correctly predicted 94% of the instances. One reason for the high accuracy could be the availability of a well-balanced dataset with sufficient representative examples for each class. Another reason could be the use of appropriate data preprocessing and feature engineering techniques that have helped the model to learn relevant patterns from the data.



In [47]:
rf3 = RandomForestClassifier(n_estimators=10, max_depth=20, min_samples_split=6)

# train the random forest classifier on the data
rf3.fit(X_train.values, Y_train.values )

RandomForestClassifier(max_depth=20, min_samples_split=6, n_estimators=10)

In [48]:
print(rf3)

y_pred = rf3.predict(X_test)

RandomForestClassifier(max_depth=20, min_samples_split=6, n_estimators=10)


In [49]:
print('CLASSIFICATION REPORT')
print(classification_report(Y_test, y_pred))

print('AUC-ROC')
print(roc_auc_score(Y_test, y_pred))

print('F1-Score')
print(f1_score(Y_test, y_pred))

print('Accuracy')
print(accuracy_score(Y_test, y_pred))

CLASSIFICATION REPORT
              precision    recall  f1-score   support

           0       0.88      0.99      0.93        99
           1       0.99      0.87      0.92        98

    accuracy                           0.93       197
   macro avg       0.94      0.93      0.93       197
weighted avg       0.94      0.93      0.93       197

AUC-ROC
0.9286229643372501
F1-Score
0.9239130434782609
Accuracy
0.9289340101522843


Using 10 estimatores with a maximum depth of 20 and minimum number of splits as 6 the model exhibits favorable performance characteristics as demonstrated by a precision of 0.88 and 0.99 for class 0 and class 1 respectively, indicating low false positive rates, and a recall of 0.99 and 0.87 for class 0 and class 1 respectively, indicating low false negative rates. Additionally, the model achieves a weighted average of precision, recall, and F1-score of 0.94, indicating good overall performance. The AUC-ROC score of 0.929 further supports the model's effectiveness in balancing the true positive rate and false positive rate tradeoff. The model's high accuracy of 0.93 may be attributed to a well-balanced dataset and appropriate data preprocessing and feature engineering techniques. However, further analysis and evaluation are necessary to ensure the model's robustness and generalizability. The F1-score of 0.924 is a suitable metric to assess the model's performance as it considers both precision and recall. Possible factors affecting the model's performance include the training and testing dataset, as well as the model's hyperparameters and architecture. Further experimentation can help to identify and improve the model's performance.

Overall, the model has achieved good performance in terms of both accuracy and AUC-ROC score, indicating its potential for deployment in practical applications. However, further analysis and evaluation are necessary to ensure the robustness and generalizability of the model.

## Sklearn Model

In [35]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score

classifier = RandomForestClassifier()
classifier.fit(X_train, Y_train)
Y_pred = classifier.predict(X_test)
print("Classification Report:")
print(classification_report(Y_test, Y_pred))
auc_roc = roc_auc_score(Y_test, Y_pred)
print("AUC-ROC: ", auc_roc)

f1 = f1_score(Y_test, Y_pred)

accuracy = accuracy_score(Y_test, Y_pred)

print(f"AUC-ROC: {auc_roc:.4f}")
print(f"F1-score: {f1:.4f}")
print(f"Accuracy: {accuracy:.4f}")

Classification Report:
              precision    recall  f1-score   support

           0       0.88      0.99      0.93        99
           1       0.99      0.86      0.92        98

    accuracy                           0.92       197
   macro avg       0.93      0.92      0.92       197
weighted avg       0.93      0.92      0.92       197

AUC-ROC:  0.9235209235209235
AUC-ROC: 0.9235
F1-score: 0.9180
Accuracy: 0.9239


The classification report for sklearn model shows that it achieved a precision of 0.88 and 0.99 for classes 0 and 1 respectively, indicating low false positive rates, and a recall of 0.99 and 0.86 for classes 0 and 1 respectively, indicating low false negative rates. The F1-score of 0.9180 indicates the model's effectiveness in balancing precision and recall. The overall accuracy of the model was 0.9239, indicating that the model correctly predicted 92.39% of the instances. The AUC-ROC score of 0.9235 further supports the model's effectiveness in balancing the true positive rate and false positive rate tradeoff. These results demonstrate the potential of the model for deployment in practical applications. However, further analysis and evaluation are necessary to ensure the robustness and generalizability of the model.

### Conclusion
Looking at the results, the self-built classifier performed better than the first one in terms of accuracy (0.9391 vs. 0.9239) and AUC-ROC (0.9388 vs. 0.9235). The custom classifier also achieved a slightly higher F1-score (0.9348 vs. 0.9180) and a higher precision for class 0 (0.89 vs. 0.88) while achieving the same precision for class 1 (1.0 vs. 0.99). However, the custom classifier achieved a slightly lower recall for class 1 (0.88 vs. 0.86).

Overall, the self-built classifier seems to have achieved better results in terms of accuracy and AUC-ROC, which are important metrics for evaluating a classifier's performance. However, both classifiers achieved good results, with F1-scores above 0.9 and high precision and recall values for both classes. Therefore, both classifiers have the potential to be used in practical applications.

### Sources

Data - https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud