# Credit Card Fraud Detection using Machine Learning

# About Data:
This is a dataset containing credit card transactions with 31 features and a class label. The features represent various aspects of the transaction, and the class label indicates whether the transaction was fraudulent (class 1) or not (class 0).

The first feature is "Time", which represents the number of seconds elapsed between the transaction and the first transaction in the dataset. The next 28 features, V1 to V28, are anonymized variables resulting from a principal component analysis (PCA) transformation of the original features. They represent different aspects of the transaction, such as the amount, location, and type of transaction.

The second last feature is "Amount", which represents the transaction amount in USD. The last feature is the "Class" label, which indicates whether the transaction is fraudulent (class 1) or not (class 0).

Overall, this dataset is used to train machine learning models to detect fraudulent transactions in real-time. The features are used to train the model to learn patterns in the data, which can then be used to detect fraudulent transactions in future transactions.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [2]:
credit_card_data = pd.read_csv('test-4.csv')
credit_card_data.head()

Unnamed: 0,id,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0,-0.260648,-0.469648,2.496266,-0.083724,0.129681,0.732898,0.519014,-0.130006,0.727159,...,-0.110552,0.217606,-0.134794,0.165959,0.12628,-0.434824,-0.08123,-0.151045,17982.1,0
1,1,0.9851,-0.356045,0.558056,-0.429654,0.27714,0.428605,0.406466,-0.133118,0.347452,...,-0.194936,-0.605761,0.079469,-0.577395,0.19009,0.296503,-0.248052,-0.064512,6531.37,0
2,2,-0.260272,-0.949385,1.728538,-0.457986,0.074062,1.419481,0.743511,-0.095576,-0.261297,...,-0.00502,0.702906,0.945045,-1.154666,-0.605564,-0.312895,-0.300258,-0.244718,2513.54,0
3,3,-0.152152,-0.508959,1.74684,-1.090178,0.249486,1.143312,0.518269,-0.06513,-0.205698,...,-0.146927,-0.038212,-0.214048,-1.893131,1.003963,-0.51595,-0.165316,0.048424,5384.44,0
4,4,-0.20682,-0.16528,1.527053,-0.448293,0.106125,0.530549,0.658849,-0.21266,1.049921,...,-0.106984,0.729727,-0.161666,0.312561,-0.414116,1.071126,0.023712,0.419117,14278.97,0


In [3]:
credit_card_data.sample()

Unnamed: 0,id,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
7128,7128,0.904764,-0.616737,1.230644,-0.430939,-0.096479,0.546615,0.184914,-0.117618,2.596404,...,-0.243241,-0.554799,0.075115,0.186641,-0.091382,1.864618,-0.302563,-0.076686,3926.59,0


In [4]:
# dataset informations
credit_card_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   id      7500 non-null   int64  
 1   V1      7500 non-null   float64
 2   V2      7500 non-null   float64
 3   V3      7500 non-null   float64
 4   V4      7500 non-null   float64
 5   V5      7500 non-null   float64
 6   V6      7500 non-null   float64
 7   V7      7500 non-null   float64
 8   V8      7500 non-null   float64
 9   V9      7500 non-null   float64
 10  V10     7500 non-null   float64
 11  V11     7500 non-null   float64
 12  V12     7500 non-null   float64
 13  V13     7500 non-null   float64
 14  V14     7500 non-null   float64
 15  V15     7500 non-null   float64
 16  V16     7500 non-null   float64
 17  V17     7500 non-null   float64
 18  V18     7500 non-null   float64
 19  V19     7500 non-null   float64
 20  V20     7500 non-null   float64
 21  V21     7500 non-null   float64
 22  

In [5]:
# checking the number of missing values in each column
credit_card_data.isnull().sum()

id        0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

In [6]:
# distribution of legit transactions & fraudulent transactions
credit_card_data['Class'].value_counts()

Class
0    7475
1      25
Name: count, dtype: int64

This Dataset is highly unblanced

0 --> Normal Transaction

1 --> fraudulent transaction

The first line of code creates a new dataframe called "legit" by selecting only the rows from the original "credit_card_data" dataframe where the "Class" label is equal to 0. In other words, it filters out all transactions labeled as fraudulent (Class == 1) and keeps only the legitimate transactions (Class == 0).

The second line of code creates a new dataframe called "fraud" by selecting only the rows from the original "credit_card_data" dataframe where the "Class" label is equal to 1. This filters out all legitimate transactions and keeps only the fraudulent transactions.

By separating the data into two dataframes, it becomes easier to analyze and compare the characteristics of legitimate and fraudulent transactions separately. This can be useful for identifying patterns or features that are more common in fraudulent transactions, which can then be used to develop models for fraud detection.

In [7]:
legit = credit_card_data[credit_card_data.Class==0]
fraud = credit_card_data[credit_card_data['Class']==1]

In [8]:
fraud['Class']

541     1
623     1
4920    1
6108    1
6329    1
6331    1
6334    1
6336    1
6338    1
6427    1
6446    1
6472    1
6529    1
6609    1
6641    1
6717    1
6719    1
6734    1
6774    1
6820    1
6870    1
6882    1
6899    1
6903    1
6971    1
Name: Class, dtype: int64

In [9]:
# statistical measures of the data
legit.Amount.describe()

count     7475.000000
mean     11980.784062
std       6930.834270
min         50.530000
25%       5964.210000
50%      11819.710000
75%      18047.565000
max      24035.200000
Name: Amount, dtype: float64

In [10]:
fraud.Amount.describe()

count       25.000000
mean      8458.306400
std       5243.111838
min       1415.950000
25%       3875.510000
50%       9890.390000
75%      12162.550000
max      20487.280000
Name: Amount, dtype: float64

In [11]:
# compare the values for both transactions
credit_card_data.groupby('Class').mean()

Unnamed: 0_level_0,id,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,3741.802943,0.253756,-0.378376,1.143572,-0.650991,0.316449,0.5345,0.474093,-0.157364,0.997445,...,-0.12787,-0.132506,-0.173199,-0.049104,0.181071,0.080637,-0.102907,-0.192528,-0.101273,11980.784062
1,6050.92,-0.040107,0.368566,-0.550948,0.811516,0.184242,-0.79313,0.001613,0.00728,-0.231929,...,0.080662,0.048478,-0.274943,-0.11591,-0.260672,0.131179,0.535532,0.341847,0.470742,8458.3064


Build a sample dataset containing similar distribution of normal transactions and Fraudulent Transactions

Number of Fraudulent Transactions --> 492

legit_sample = legit.sample(n=492) is a line of code that takes a random sample of 492 observations from the legit dataset. This is done to balance the number of observations in the legit and fraud datasets, which is necessary for training a machine learning model to predict fraud. Since the original dataset has a large number of legitimate transactions and a small number of fraudulent transactions, the model may be biased towards predicting that all transactions are legitimate. By creating a balanced dataset with an equal number of legitimate and fraudulent transactions, the model can be trained to better recognize the patterns that differentiate fraudulent transactions from legitimate ones

In [12]:
legit_sample = legit.sample(n=25)

In [13]:
new_df = pd.concat([legit_sample,fraud],axis=0)

In [14]:
new_df

Unnamed: 0,id,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
128,128,-0.610754,-0.149765,0.302544,-1.42133,0.286041,0.170879,0.469659,0.042772,0.741785,...,-0.270997,-0.563318,0.045425,-1.468704,-0.161036,1.478826,-0.298853,-1.130311,14760.58,0
28,28,0.029627,-0.156969,1.686545,-0.051448,0.256393,0.359752,0.721268,-0.15908,0.168199,...,-0.072341,0.391042,-0.062662,1.382419,-0.379041,-0.655254,-0.021582,0.268532,23323.5,0
2994,2994,1.005689,-0.311317,0.717422,-0.303302,0.090903,-0.279546,0.493904,-0.215264,0.413425,...,-0.204713,-0.653335,0.129627,1.510187,0.248351,0.173034,-0.258249,-0.00877,14098.01,0
3393,3393,-0.143532,-0.111575,0.9991,-1.935826,0.28187,-0.461033,0.898202,-0.269078,1.386969,...,-0.068953,0.450173,-0.266352,0.803263,0.059249,-2.293908,-0.322375,0.024324,4973.61,0
7422,7422,0.997534,-0.342724,0.912906,-0.30711,0.087697,0.03155,0.385952,-0.184436,1.251875,...,-0.24423,-0.736661,0.143571,0.987821,0.131187,0.075143,-0.305805,-0.090308,11748.73,0
1942,1942,-0.345661,0.064169,0.95472,-1.005471,0.03592,-0.037934,0.435495,-0.229325,0.850908,...,0.114849,-0.743604,0.099002,0.7568,-0.290592,0.668049,0.089233,0.497558,4931.93,0
3260,3260,0.013419,-0.237575,1.22635,-1.947241,0.396109,-0.016496,0.844823,-0.194349,0.828426,...,-0.040919,0.775316,-0.279963,0.508205,-0.159942,-2.198149,0.066544,-0.159552,23074.54,0
5779,5779,0.960974,-0.445771,1.176734,-0.18799,-0.018479,0.281216,0.285136,-0.183313,1.987363,...,-0.129922,0.223121,-0.147451,0.907639,0.740187,0.999925,-0.260431,-0.055551,17341.91,0
1906,1906,0.924651,-0.283614,0.817879,0.277407,0.175499,0.13659,0.515569,-0.180973,0.09408,...,-0.173689,-0.62222,0.03394,0.699917,0.463023,-0.437077,-0.261566,-0.024991,13139.1,0
3228,3228,-0.140111,-0.156173,1.052054,-1.997431,0.294758,0.051713,0.674685,-0.340592,1.186621,...,0.233818,0.370654,-0.134932,0.385115,-0.128379,-2.152113,0.231118,-0.00183,15979.11,0


In [15]:
new_df['Class'].value_counts()

Class
0    25
1    25
Name: count, dtype: int64

In [16]:
new_df.groupby('Class').mean()

Unnamed: 0_level_0,id,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2892.76,0.088962,-0.222053,1.402564,-0.691858,0.341717,0.394498,0.579052,-0.169638,0.927304,...,-0.113478,-0.102039,-0.052382,-0.129851,0.643858,-0.053491,-0.26522,-0.162619,-0.184061,14286.8288
1,6050.92,-0.040107,0.368566,-0.550948,0.811516,0.184242,-0.79313,0.001613,0.00728,-0.231929,...,0.080662,0.048478,-0.274943,-0.11591,-0.260672,0.131179,0.535532,0.341847,0.470742,8458.3064


In [17]:
X = credit_card_data.drop(columns='Class', axis=1)
Y = credit_card_data['Class']

In [18]:
X = new_df.drop(columns='Class', axis=1)
Y = new_df['Class']

In [19]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)

# Model Training

Logistic Regression

In [20]:
model = LogisticRegression()
# training the Logistic Regression Model with Training Data
model.fit(X_train, Y_train)

# accuracy on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(X_train_prediction, Y_train)
print('Accuracy on Training data (LR) : ', training_data_accuracy)

# accuracy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)
print('Accuracy score on Test Data (LR) : ', test_data_accuracy)

Accuracy on Training data (LR) :  1.0
Accuracy score on Test Data (LR) :  0.9


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [21]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy_lr = accuracy_score(Y_train, X_train_prediction)
precision_lr = precision_score(Y_train, X_train_prediction)
recall_lr = recall_score(Y_train, X_train_prediction)
f1_lr = f1_score(Y_train, X_train_prediction)

print("Accuracy (LR):", accuracy_lr)
print("Precision (LR):", precision_lr)
print("Recall (LR):", recall_lr)
print("F1 Score (LR):", f1_lr)

Accuracy (LR): 1.0
Precision (LR): 1.0
Recall (LR): 1.0
F1 Score (LR): 1.0


In [22]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction)
print(report)

              precision    recall  f1-score   support

           0       0.83      1.00      0.91         5
           1       1.00      0.80      0.89         5

    accuracy                           0.90        10
   macro avg       0.92      0.90      0.90        10
weighted avg       0.92      0.90      0.90        10



SVM

In [23]:
from sklearn import svm

# Support Vector Machine (SVM) model training and evaluation

svm_model = svm.SVC(kernel='linear')
svm_model.fit(X_train, Y_train)

# Predictions on training and test data
X_train_prediction_svm = svm_model.predict(X_train)
X_test_prediction_svm = svm_model.predict(X_test)

# Accuracy scores
training_data_accuracy_svm = accuracy_score(X_train_prediction_svm, Y_train)
test_data_accuracy_svm = accuracy_score(X_test_prediction_svm, Y_test)

print('Accuracy on Training data (SVM):', training_data_accuracy_svm)
print('Accuracy score on Test Data (SVM):', test_data_accuracy_svm)

Accuracy on Training data (SVM): 1.0
Accuracy score on Test Data (SVM): 0.9


In [24]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy = accuracy_score(Y_train, X_train_prediction_svm)
precision = precision_score(Y_train, X_train_prediction_svm)
recall = recall_score(Y_train, X_train_prediction_svm)
f1 = f1_score(Y_train, X_train_prediction_svm)

print("Accuracy (SVM):", accuracy)
print("Precision (SVM):", precision)
print("Recall (SVM):", recall)
print("F1 Score (SVM):", f1)

Accuracy (SVM): 1.0
Precision (SVM): 1.0
Recall (SVM): 1.0
F1 Score (SVM): 1.0


In [25]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction_svm)
print(report)

              precision    recall  f1-score   support

           0       0.83      1.00      0.91         5
           1       1.00      0.80      0.89         5

    accuracy                           0.90        10
   macro avg       0.92      0.90      0.90        10
weighted avg       0.92      0.90      0.90        10



KNN

In [26]:
from sklearn.neighbors import KNeighborsClassifier

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_knn = knn_model.predict(X_train)
training_data_accuracy_knn = accuracy_score(X_train_prediction_knn, Y_train)
print('Accuracy on Training data (KNN):', training_data_accuracy_knn)

# Accuracy on test data
X_test_prediction_knn = knn_model.predict(X_test)
test_data_accuracy_knn = accuracy_score(X_test_prediction_knn, Y_test)
print('Accuracy score on Test Data (KNN):', test_data_accuracy_knn)

Accuracy on Training data (KNN): 0.9
Accuracy score on Test Data (KNN): 0.7


In [27]:
accuracy_knn = accuracy_score(Y_train, X_train_prediction_knn)
precision_knn = precision_score(Y_train, X_train_prediction_knn)
recall_knn = recall_score(Y_train, X_train_prediction_knn)
f1_knn = f1_score(Y_train, X_train_prediction_knn)

print("Accuracy (KNN):", accuracy_knn)
print("Precision (KNN):", precision_knn)
print("Recall (KNN):", recall_knn)
print("F1 Score (KNN):", f1_knn)

Accuracy (KNN): 0.9
Precision (KNN): 0.9444444444444444
Recall (KNN): 0.85
F1 Score (KNN): 0.8947368421052632


In [28]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction_knn)
print(report)

              precision    recall  f1-score   support

           0       0.67      0.80      0.73         5
           1       0.75      0.60      0.67         5

    accuracy                           0.70        10
   macro avg       0.71      0.70      0.70        10
weighted avg       0.71      0.70      0.70        10



Random Forest

In [29]:
from sklearn.ensemble import RandomForestClassifier

# Create and train the Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_rf = rf_model.predict(X_train)
training_data_accuracy_rf = accuracy_score(X_train_prediction_rf, Y_train)
print('Accuracy on Training data (Random Forest):', training_data_accuracy_rf)

# Accuracy on test data
X_test_prediction_rf = rf_model.predict(X_test)
test_data_accuracy_rf = accuracy_score(X_test_prediction_rf, Y_test)
print('Accuracy score on Test Data (Random Forest):', test_data_accuracy_rf)

Accuracy on Training data (Random Forest): 1.0
Accuracy score on Test Data (Random Forest): 0.9


In [30]:
accuracy_rf = accuracy_score(Y_train, X_train_prediction_rf)
precision_rf = precision_score(Y_train, X_train_prediction_rf)
recall_rf = recall_score(Y_train, X_train_prediction_rf)
f1_rf = f1_score(Y_train, X_train_prediction_rf)

print("Accuracy (RF):", accuracy_rf)
print("Precision (RF):", precision_rf)
print("Recall (RF):", recall_rf)
print("F1 Score (RF):", f1_rf)

Accuracy (RF): 1.0
Precision (RF): 1.0
Recall (RF): 1.0
F1 Score (RF): 1.0


In [31]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction_rf)
print(report)

              precision    recall  f1-score   support

           0       0.83      1.00      0.91         5
           1       1.00      0.80      0.89         5

    accuracy                           0.90        10
   macro avg       0.92      0.90      0.90        10
weighted avg       0.92      0.90      0.90        10



Decision Tree

In [32]:
from sklearn.tree import DecisionTreeClassifier

# Create and train the Decision Tree model
dtree = DecisionTreeClassifier(random_state=42)
dtree.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_dtree = dtree.predict(X_train)
training_data_accuracy_dtree = accuracy_score(X_train_prediction_dtree, Y_train)
print('Accuracy on Training data (Decision Tree):', training_data_accuracy_dtree)

# Accuracy on test data
X_test_prediction_dtree = dtree.predict(X_test)
test_data_accuracy_dtree = accuracy_score(X_test_prediction_dtree, Y_test)
print('Accuracy score on Test Data (Decision Tree):', test_data_accuracy_dtree)

Accuracy on Training data (Decision Tree): 1.0
Accuracy score on Test Data (Decision Tree): 0.7


In [33]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy_dtree = accuracy_score(Y_train, X_train_prediction_dtree)
precision_dtree = precision_score(Y_train, X_train_prediction_dtree)
recall_dtree = recall_score(Y_train, X_train_prediction_dtree)
f1_dtree = f1_score(Y_train, X_train_prediction_dtree)

print("Accuracy (DT):", accuracy_dtree)
print("Precision (DT):", precision_dtree)
print("Recall (DT):", recall_dtree)
print("F1 Score (DT):", f1_dtree)

Accuracy (DT): 1.0
Precision (DT): 1.0
Recall (DT): 1.0
F1 Score (DT): 1.0


In [34]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction_dtree)
print(report)

              precision    recall  f1-score   support

           0       0.75      0.60      0.67         5
           1       0.67      0.80      0.73         5

    accuracy                           0.70        10
   macro avg       0.71      0.70      0.70        10
weighted avg       0.71      0.70      0.70        10



XGBoost

In [35]:
from xgboost import XGBClassifier

# Create and train the XGBoost model
xgb_model = XGBClassifier(eval_metric='logloss', random_state=42)
xgb_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_xgb = xgb_model.predict(X_train)
training_data_accuracy_xgb = accuracy_score(X_train_prediction_xgb, Y_train)
print('Accuracy on Training data (XGBoost):', training_data_accuracy_xgb)

# Accuracy on test data
X_test_prediction_xgb = xgb_model.predict(X_test)
test_data_accuracy_xgb = accuracy_score(X_test_prediction_xgb, Y_test)
print('Accuracy score on Test Data (XGBoost):', test_data_accuracy_xgb)

Accuracy on Training data (XGBoost): 1.0
Accuracy score on Test Data (XGBoost): 0.9


In [36]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

accuracy_xgb = accuracy_score(Y_train, X_train_prediction_xgb)
precision_xgb = precision_score(Y_train, X_train_prediction_xgb)
recall_xgb = recall_score(Y_train, X_train_prediction_xgb)
f1_xgb = f1_score(Y_train, X_train_prediction_xgb)

print("Accuracy (XGBoost):", accuracy_xgb)
print("Precision (XGBoost):", precision_xgb)
print("Recall (XGBoost):", recall_xgb)
print("F1 Score (XGBoost):", f1_xgb)

Accuracy (XGBoost): 1.0
Precision (XGBoost): 1.0
Recall (XGBoost): 1.0
F1 Score (XGBoost): 1.0


In [37]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction_xgb)
print(report)

              precision    recall  f1-score   support

           0       0.83      1.00      0.91         5
           1       1.00      0.80      0.89         5

    accuracy                           0.90        10
   macro avg       0.92      0.90      0.90        10
weighted avg       0.92      0.90      0.90        10



In [38]:
from sklearn.ensemble import AdaBoostClassifier

# Create and train the AdaBoost model
ada_model = AdaBoostClassifier(n_estimators=100, random_state=42)
ada_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_ada = ada_model.predict(X_train)
training_data_accuracy_ada = accuracy_score(X_train_prediction_ada, Y_train)
print('Accuracy on Training data (AdaBoost):', training_data_accuracy_ada)

# Accuracy on test data
X_test_prediction_ada = ada_model.predict(X_test)
test_data_accuracy_ada = accuracy_score(X_test_prediction_ada, Y_test)
print('Accuracy score on Test Data (AdaBoost):', test_data_accuracy_ada)

Accuracy on Training data (AdaBoost): 1.0
Accuracy score on Test Data (AdaBoost): 1.0


In [39]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

ada_train_pred = ada_model.predict(X_train)

accuracy_ada = accuracy_score(Y_train, ada_train_pred)
precision_ada = precision_score(Y_train, ada_train_pred)
recall_ada = recall_score(Y_train, ada_train_pred)
f1_ada = f1_score(Y_train, ada_train_pred)

print("Accuracy (AdaBoost):", accuracy_ada)
print("Precision (AdaBoost):", precision_ada)
print("Recall (AdaBoost):", recall_ada)
print("F1 Score (AdaBoost):", f1_ada)

Accuracy (AdaBoost): 1.0
Precision (AdaBoost): 1.0
Recall (AdaBoost): 1.0
F1 Score (AdaBoost): 1.0


In [40]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction_ada)
print(report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         5
           1       1.00      1.00      1.00         5

    accuracy                           1.00        10
   macro avg       1.00      1.00      1.00        10
weighted avg       1.00      1.00      1.00        10



In [41]:
from sklearn.ensemble import GradientBoostingClassifier

# Create and train the Gradient Boosting model
gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)
gb_model.fit(X_train, Y_train)

# Accuracy on training data
X_train_prediction_gb = gb_model.predict(X_train)
training_data_accuracy_gb = accuracy_score(X_train_prediction_gb, Y_train)
print('Accuracy on Training data (GB):', training_data_accuracy_gb)

# Accuracy on test data
X_test_prediction_gb = gb_model.predict(X_test)
test_data_accuracy_gb = accuracy_score(X_test_prediction_gb, Y_test)
print('Accuracy score on Test Data (GB):', test_data_accuracy_gb)

Accuracy on Training data (GB): 1.0
Accuracy score on Test Data (GB): 0.9


In [42]:
# Predictions on training data for Gradient Boosting
X_train_prediction_gb = gb_model.predict(X_train)

# Calculate metrics for Gradient Boosting
accuracy_gb = accuracy_score(Y_train, X_train_prediction_gb)
precision_gb = precision_score(Y_train, X_train_prediction_gb)
recall_gb = recall_score(Y_train, X_train_prediction_gb)
f1_gb = f1_score(Y_train, X_train_prediction_gb)

print("Accuracy (GB):", accuracy_gb)
print("Precision (GB):", precision_gb)
print("Recall (GB):", recall_gb)
print("F1 Score (GB):", f1_gb)

Accuracy (GB): 1.0
Precision (GB): 1.0
Recall (GB): 1.0
F1 Score (GB): 1.0


In [43]:
from sklearn.metrics import classification_report

# Generate classification report for test predictions
report = classification_report(Y_test, X_test_prediction_gb)
print(report)

              precision    recall  f1-score   support

           0       0.83      1.00      0.91         5
           1       1.00      0.80      0.89         5

    accuracy                           0.90        10
   macro avg       0.92      0.90      0.90        10
weighted avg       0.92      0.90      0.90        10



In [44]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Build a simple feedforward neural network
model_keras = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

model_keras.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model_keras.fit(X_train, Y_train, epochs=100, batch_size=32, validation_data=(X_test, Y_test), verbose=0)

# Evaluate on training data
train_loss, train_acc_keras = model_keras.evaluate(X_train, Y_train, verbose=0)
print('Keras Model Accuracy on Training data:', train_acc_keras)

# Evaluate on test data
test_loss, test_acc_keras = model_keras.evaluate(X_test, Y_test, verbose=0)
print('Keras Model Accuracy on Test data:', test_acc_keras)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Keras Model Accuracy on Training data: 0.8500000238418579
Keras Model Accuracy on Test data: 0.8999999761581421
