# Credit Card Fraud Detection using Machine Learning

# About Data:
This is a dataset containing credit card transactions with 31 features and a class label. The features represent various aspects of the transaction, and the class label indicates whether the transaction was fraudulent (class 1) or not (class 0).

The first feature is "Time", which represents the number of seconds elapsed between the transaction and the first transaction in the dataset. The next 28 features, V1 to V28, are anonymized variables resulting from a principal component analysis (PCA) transformation of the original features. They represent different aspects of the transaction, such as the amount, location, and type of transaction.

The second last feature is "Amount", which represents the transaction amount in USD. The last feature is the "Class" label, which indicates whether the transaction is fraudulent (class 1) or not (class 0).

Overall, this dataset is used to train machine learning models to detect fraudulent transactions in real-time. The features are used to train the model to learn patterns in the data, which can then be used to detect fraudulent transactions in future transactions.

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [3]:
credit_card_data = pd.read_csv('creditcard.csv')
credit_card_data.head(5)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,0.0,1.191857,0.266151,0.16648,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.008983,0.014724,2.69,0
2,1.0,-1.358354,-1.340163,1.773209,0.37978,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
3,1.0,-0.966272,-0.185226,1.792993,-0.863291,-0.010309,1.247203,0.237609,0.377436,-1.387024,...,-0.1083,0.005274,-0.190321,-1.175575,0.647376,-0.221929,0.062723,0.061458,123.5,0
4,2.0,-1.158233,0.877737,1.548718,0.403034,-0.407193,0.095921,0.592941,-0.270533,0.817739,...,-0.009431,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0


In [4]:
credit_card_data.sample()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
51169,44825.0,-0.53798,0.425643,0.32431,-1.630769,2.917555,3.222399,0.38508,0.682322,-0.475536,...,-0.261012,-0.890899,-0.104308,0.978814,-0.039591,-0.06913,-0.130644,-0.120316,8.98,0


In [5]:
# dataset informations
credit_card_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     28

In [6]:
# checking the number of missing values in each column
credit_card_data.isnull().sum()

Time      0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

In [7]:
# distribution of legit transactions & fraudulent transactions
credit_card_data['Class'].value_counts()

Class
0    284315
1       492
Name: count, dtype: int64

This Dataset is highly unblanced

0 --> Normal Transaction

1 --> fraudulent transaction

The first line of code creates a new dataframe called "legit" by selecting only the rows from the original "credit_card_data" dataframe where the "Class" label is equal to 0. In other words, it filters out all transactions labeled as fraudulent (Class == 1) and keeps only the legitimate transactions (Class == 0).

The second line of code creates a new dataframe called "fraud" by selecting only the rows from the original "credit_card_data" dataframe where the "Class" label is equal to 1. This filters out all legitimate transactions and keeps only the fraudulent transactions.

By separating the data into two dataframes, it becomes easier to analyze and compare the characteristics of legitimate and fraudulent transactions separately. This can be useful for identifying patterns or features that are more common in fraudulent transactions, which can then be used to develop models for fraud detection.

In [8]:
legit = credit_card_data[credit_card_data.Class==0]
fraud = credit_card_data[credit_card_data['Class']==1]

In [9]:
fraud['Class']

541       1
623       1
4920      1
6108      1
6329      1
         ..
279863    1
280143    1
280149    1
281144    1
281674    1
Name: Class, Length: 492, dtype: int64

In [10]:
# statistical measures of the data
legit.Amount.describe()

count    284315.000000
mean         88.291022
std         250.105092
min           0.000000
25%           5.650000
50%          22.000000
75%          77.050000
max       25691.160000
Name: Amount, dtype: float64

In [11]:
fraud.Amount.describe()

count     492.000000
mean      122.211321
std       256.683288
min         0.000000
25%         1.000000
50%         9.250000
75%       105.890000
max      2125.870000
Name: Amount, dtype: float64

In [12]:
# compare the values for both transactions
credit_card_data.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,94838.202258,0.008258,-0.006271,0.012171,-0.00786,0.005453,0.002419,0.009637,-0.000987,0.004467,...,-0.000644,-0.001235,-2.4e-05,7e-05,0.000182,-7.2e-05,-8.9e-05,-0.000295,-0.000131,88.291022
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,...,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


Build a sample dataset containing similar distribution of normal transactions and Fraudulent Transactions

Number of Fraudulent Transactions --> 492

legit_sample = legit.sample(n=492) is a line of code that takes a random sample of 492 observations from the legit dataset. This is done to balance the number of observations in the legit and fraud datasets, which is necessary for training a machine learning model to predict fraud. Since the original dataset has a large number of legitimate transactions and a small number of fraudulent transactions, the model may be biased towards predicting that all transactions are legitimate. By creating a balanced dataset with an equal number of legitimate and fraudulent transactions, the model can be trained to better recognize the patterns that differentiate fraudulent transactions from legitimate ones

In [13]:
legit_sample = legit.sample(n=492)

In [14]:
new_df = pd.concat([legit_sample,fraud],axis=0)

In [15]:
new_df

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
100166,67439.0,1.231392,0.456721,0.119812,1.119747,-0.056764,-1.048124,0.466232,-0.347707,-0.284420,...,-0.010389,0.033606,-0.118439,0.403716,0.757215,-0.362165,0.007450,0.021077,17.46,0
4326,3760.0,1.401276,-0.683958,-0.915143,-1.616304,1.458234,3.274761,-1.170329,0.710345,0.360104,...,-0.278087,-0.898279,0.081211,0.920023,0.357016,-0.508385,-0.019937,0.015447,42.00,0
250909,155124.0,2.107170,-0.083080,-1.384193,0.243143,0.219118,-0.781873,0.174728,-0.288366,0.591441,...,-0.336084,-0.839181,0.243466,-0.715377,-0.199308,0.242098,-0.072462,-0.066817,5.99,0
283859,171933.0,2.219537,-0.522871,-1.420302,-0.924451,-0.337605,-1.415039,-0.050859,-0.498773,-0.843595,...,0.065885,0.188144,0.196979,-0.057347,-0.039596,-0.310990,-0.039283,-0.060392,17.99,0
18777,29755.0,1.269032,0.188296,-0.054703,0.453526,-0.095770,-0.689943,0.148671,-0.091481,-0.208662,...,-0.039536,-0.149271,-0.099477,0.015003,0.554675,0.373752,-0.061152,-0.009992,1.23,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
279863,169142.0,-1.927883,1.125653,-4.518331,1.749293,-1.566487,-2.010494,-0.882850,0.697211,-2.064945,...,0.778584,-0.319189,0.639419,-0.294885,0.537503,0.788395,0.292680,0.147968,390.00,1
280143,169347.0,1.378559,1.289381,-5.004247,1.411850,0.442581,-1.326536,-1.413170,0.248525,-1.127396,...,0.370612,0.028234,-0.145640,-0.081049,0.521875,0.739467,0.389152,0.186637,0.76,1
280149,169351.0,-0.676143,1.126366,-2.213700,0.468308,-1.120541,-0.003346,-2.234739,1.210158,-0.652250,...,0.751826,0.834108,0.190944,0.032070,-0.739695,0.471111,0.385107,0.194361,77.89,1
281144,169966.0,-3.113832,0.585864,-5.399730,1.817092,-0.840618,-2.943548,-2.208002,1.058733,-1.632333,...,0.583276,-0.269209,-0.456108,-0.183659,-0.328168,0.606116,0.884876,-0.253700,245.00,1


In [16]:
new_df['Class'].value_counts()

Class
0    492
1    492
Name: count, dtype: int64

In [17]:
new_df.groupby('Class').mean()

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,97616.623984,-0.037738,-0.182749,-0.063253,-0.054325,0.072327,0.051236,0.005137,0.022935,0.094994,...,0.024464,0.018819,0.059443,0.053901,-0.025496,0.028224,-0.014082,-0.010925,0.040191,102.365081
1,80746.806911,-4.771948,3.623778,-7.033281,4.542029,-3.151225,-1.397737,-5.568731,0.570636,-2.581123,...,0.372319,0.713588,0.014049,-0.040308,-0.10513,0.041449,0.051648,0.170575,0.075667,122.211321


In [18]:
X = new_df.drop(columns='Class', axis=1)
Y = new_df['Class']

In [19]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)

# Model Training

Logistic Regression

In [29]:
model = LogisticRegression()
# training the Logistic Regression Model with Training Data
model.fit(X_train, Y_train)

# accuracy on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy on Training data (LR) : ', training_data_accuracy)

# accuracy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score on Test Data (LR) : ', test_data_accuracy)

Accuracy on Training data (LR) :  0.9466327827191868
Accuracy score on Test Data (LR) :  0.934010152284264


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [30]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy_lr = accuracy_score(Y_train, X_train_prediction)
precision_lr = precision_score(Y_train, X_train_prediction)
recall_lr = recall_score(Y_train, X_train_prediction)
f1_lr = f1_score(Y_train, X_train_prediction)

print("Accuracy (LR):", accuracy_lr)
print("Precision (LR):", precision_lr)
print("Recall (LR):", recall_lr)
print("F1 Score (LR):", f1_lr)

Accuracy (LR): 0.9466327827191868
Precision (LR): 0.9631578947368421
Recall (LR): 0.9289340101522843
F1 Score (LR): 0.9457364341085271


SVM

In [31]:
from sklearn import svm

# Support Vector Machine (SVM) model training and evaluation

svm_model = svm.SVC(kernel='linear')
svm_model.fit(X_train, Y_train)

# Predictions on training and test data
X_train_prediction_svm = svm_model.predict(X_train)
X_test_prediction_svm = svm_model.predict(X_test)

# Accuracy scores
training_data_accuracy_svm = accuracy_score(X_train_prediction_svm, Y_train)
test_data_accuracy_svm = accuracy_score(X_test_prediction_svm, Y_test)

print('Accuracy on Training data (SVM):', training_data_accuracy_svm)
print('Accuracy score on Test Data (SVM):', test_data_accuracy_svm)

Accuracy on Training data (SVM): 0.9097839898348158
Accuracy score on Test Data (SVM): 0.9086294416243654


In [32]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(Y_train, X_train_prediction_svm)
precision = precision_score(Y_train, X_train_prediction_svm)
recall = recall_score(Y_train, X_train_prediction_svm)
f1 = f1_score(Y_train, X_train_prediction_svm)

print("Accuracy (SVM):", accuracy)
print("Precision (SVM):", precision)
print("Recall (SVM):", recall)
print("F1 Score (SVM):", f1)

Accuracy (SVM): 0.9097839898348158
Precision (SVM): 0.9969230769230769
Recall (SVM): 0.8223350253807107
F1 Score (SVM): 0.9012517385257302


KNN

In [33]:
from sklearn.neighbors import KNeighborsClassifier

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_knn = knn_model.predict(X_train)
training_data_accuracy_knn = accuracy_score(X_train_prediction_knn, Y_train)
print('Accuracy on Training data (KNN):', training_data_accuracy_knn)

# Accuracy on test data
X_test_prediction_knn = knn_model.predict(X_test)
test_data_accuracy_knn = accuracy_score(X_test_prediction_knn, Y_test)
print('Accuracy score on Test Data (KNN):', test_data_accuracy_knn)

Accuracy on Training data (KNN): 0.7573062261753494
Accuracy score on Test Data (KNN): 0.6345177664974619


In [34]:
accuracy_knn = accuracy_score(Y_train, X_train_prediction_knn)
precision_knn = precision_score(Y_train, X_train_prediction_knn)
recall_knn = recall_score(Y_train, X_train_prediction_knn)
f1_knn = f1_score(Y_train, X_train_prediction_knn)

print("Accuracy (KNN):", accuracy_knn)
print("Precision (KNN):", precision_knn)
print("Recall (KNN):", recall_knn)
print("F1 Score (KNN):", f1_knn)

Accuracy (KNN): 0.7573062261753494
Precision (KNN): 0.7706666666666667
Recall (KNN): 0.733502538071066
F1 Score (KNN): 0.7516254876462939


Random Forest

In [35]:
from sklearn.ensemble import RandomForestClassifier

# Create and train the Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_rf = rf_model.predict(X_train)
training_data_accuracy_rf = accuracy_score(X_train_prediction_rf, Y_train)
print('Accuracy on Training data (Random Forest):', training_data_accuracy_rf)

# Accuracy on test data
X_test_prediction_rf = rf_model.predict(X_test)
test_data_accuracy_rf = accuracy_score(X_test_prediction_rf, Y_test)
print('Accuracy score on Test Data (Random Forest):', test_data_accuracy_rf)

Accuracy on Training data (Random Forest): 1.0
Accuracy score on Test Data (Random Forest): 0.9187817258883249


In [36]:
accuracy_rf = accuracy_score(Y_train, X_train_prediction_rf)
precision_rf = precision_score(Y_train, X_train_prediction_rf)
recall_rf = recall_score(Y_train, X_train_prediction_rf)
f1_rf = f1_score(Y_train, X_train_prediction_rf)

print("Accuracy (RF):", accuracy_rf)
print("Precision (RF):", precision_rf)
print("Recall (RF):", recall_rf)
print("F1 Score (RF):", f1_rf)

Accuracy (RF): 1.0
Precision (RF): 1.0
Recall (RF): 1.0
F1 Score (RF): 1.0


Decision Tree

In [37]:
from sklearn.tree import DecisionTreeClassifier

# Create and train the Decision Tree model
dtree = DecisionTreeClassifier(random_state=42)
dtree.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_dtree = dtree.predict(X_train)
training_data_accuracy_dtree = accuracy_score(X_train_prediction_dtree, Y_train)
print('Accuracy on Training data (Decision Tree):', training_data_accuracy_dtree)

# Accuracy on test data
X_test_prediction_dtree = dtree.predict(X_test)
test_data_accuracy_dtree = accuracy_score(X_test_prediction_dtree, Y_test)
print('Accuracy score on Test Data (Decision Tree):', test_data_accuracy_dtree)

Accuracy on Training data (Decision Tree): 1.0
Accuracy score on Test Data (Decision Tree): 0.9035532994923858


In [38]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy_dtree = accuracy_score(Y_train, X_train_prediction_dtree)
precision_dtree = precision_score(Y_train, X_train_prediction_dtree)
recall_dtree = recall_score(Y_train, X_train_prediction_dtree)
f1_dtree = f1_score(Y_train, X_train_prediction_dtree)

print("Accuracy (DT):", accuracy_dtree)
print("Precision (DT):", precision_dtree)
print("Recall (DT):", recall_dtree)
print("F1 Score (DT):", f1_dtree)

Accuracy (DT): 1.0
Precision (DT): 1.0
Recall (DT): 1.0
F1 Score (DT): 1.0


XGBoost

In [39]:
from xgboost import XGBClassifier

# Create and train the XGBoost model
xgb_model = XGBClassifier(eval_metric='logloss', random_state=42)
xgb_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_xgb = xgb_model.predict(X_train)
training_data_accuracy_xgb = accuracy_score(X_train_prediction_xgb, Y_train)
print('Accuracy on Training data (XGBoost):', training_data_accuracy_xgb)

# Accuracy on test data
X_test_prediction_xgb = xgb_model.predict(X_test)
test_data_accuracy_xgb = accuracy_score(X_test_prediction_xgb, Y_test)
print('Accuracy score on Test Data (XGBoost):', test_data_accuracy_xgb)

Accuracy on Training data (XGBoost): 1.0
Accuracy score on Test Data (XGBoost): 0.9137055837563451


In [40]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy_xgb = accuracy_score(Y_train, X_train_prediction_xgb)
precision_xgb = precision_score(Y_train, X_train_prediction_xgb)
recall_xgb = recall_score(Y_train, X_train_prediction_xgb)
f1_xgb = f1_score(Y_train, X_train_prediction_xgb)

print("Accuracy (XGBoost):", accuracy_xgb)
print("Precision (XGBoost):", precision_xgb)
print("Recall (XGBoost):", recall_xgb)
print("F1 Score (XGBoost):", f1_xgb)

Accuracy (XGBoost): 1.0
Precision (XGBoost): 1.0
Recall (XGBoost): 1.0
F1 Score (XGBoost): 1.0


In [42]:
from sklearn.ensemble import AdaBoostClassifier

# Create and train the AdaBoost model
ada_model = AdaBoostClassifier(n_estimators=100, random_state=42)
ada_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_ada = ada_model.predict(X_train)
training_data_accuracy_ada = accuracy_score(X_train_prediction_ada, Y_train)
print('Accuracy on Training data (AdaBoost):', training_data_accuracy_ada)

# Accuracy on test data
X_test_prediction_ada = ada_model.predict(X_test)
test_data_accuracy_ada = accuracy_score(X_test_prediction_ada, Y_test)
print('Accuracy score on Test Data (AdaBoost):', test_data_accuracy_ada)

Accuracy on Training data (AdaBoost): 0.9949174078780177
Accuracy score on Test Data (AdaBoost): 0.9137055837563451


In [43]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

ada_train_pred = ada_model.predict(X_train)

accuracy_ada = accuracy_score(Y_train, ada_train_pred)
precision_ada = precision_score(Y_train, ada_train_pred)
recall_ada = recall_score(Y_train, ada_train_pred)
f1_ada = f1_score(Y_train, ada_train_pred)

print("Accuracy (AdaBoost):", accuracy_ada)
print("Precision (AdaBoost):", precision_ada)
print("Recall (AdaBoost):", recall_ada)
print("F1 Score (AdaBoost):", f1_ada)

Accuracy (AdaBoost): 0.9949174078780177
Precision (AdaBoost): 0.9974489795918368
Recall (AdaBoost): 0.9923857868020305
F1 Score (AdaBoost): 0.9949109414758269


In [47]:
from sklearn.ensemble import GradientBoostingClassifier

# Create and train the Gradient Boosting model
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_gb = gb_model.predict(X_train)
training_data_accuracy_gb = accuracy_score(X_train_prediction_gb, Y_train)
print('Accuracy on Training data (GB):', training_data_accuracy_gb)

# Accuracy on test data
X_test_prediction_gb = gb_model.predict(X_test)
test_data_accuracy_gb = accuracy_score(X_test_prediction_gb, Y_test)
print('Accuracy score on Test Data (GB):', test_data_accuracy_gb)

Accuracy on Training data (GB): 1.0
Accuracy score on Test Data (GB): 0.9289340101522843


In [48]:
# Predictions on training data for Gradient Boosting
X_train_prediction_gb = gb_model.predict(X_train)

# Calculate metrics for Gradient Boosting
accuracy_gb = accuracy_score(Y_train, X_train_prediction_gb)
precision_gb = precision_score(Y_train, X_train_prediction_gb)
recall_gb = recall_score(Y_train, X_train_prediction_gb)
f1_gb = f1_score(Y_train, X_train_prediction_gb)

print("Accuracy (GB):", accuracy_gb)
print("Precision (GB):", precision_gb)
print("Recall (GB):", recall_gb)
print("F1 Score (GB):", f1_gb)

Accuracy (GB): 1.0
Precision (GB): 1.0
Recall (GB): 1.0
F1 Score (GB): 1.0


In [49]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Build a simple feedforward neural network
model_keras = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

model_keras.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model_keras.fit(X_train, Y_train, epochs=100, batch_size=32, validation_data=(X_test, Y_test), verbose=0)

# Evaluate on training data
train_loss, train_acc_keras = model_keras.evaluate(X_train, Y_train, verbose=0)
print('Keras Model Accuracy on Training data:', train_acc_keras)

# Evaluate on test data
test_loss, test_acc_keras = model_keras.evaluate(X_test, Y_test, verbose=0)
print('Keras Model Accuracy on Test data:', test_acc_keras)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Keras Model Accuracy on Training data: 0.566709041595459
Keras Model Accuracy on Test data: 0.4974619150161743
