# 🧾 Credit Card Transactions Dataset (2023)

This dataset contains **credit card transactions** made by **European cardholders** during the year **2023**. With over **550,000 anonymized records**, it is designed to support the development and evaluation of **fraud detection algorithms**.

> **Note:** All personal and sensitive information has been removed to ensure privacy and compliance with ethical data handling standards.

---

##  Key Features

| Column Name | Description |
|-------------|-------------|
| `id`        | Unique identifier for each transaction |
| `V1`-`V28`  | Anonymized features representing various transaction attributes (e.g., time, location, patterns) |
| `Amount`    | The transaction amount (in euros) |
| `Class`     | Binary label: `1` indicates a fraudulent transaction, `0` indicates a legitimate one |

---




In [46]:
!pip install fireducks



**Importing Dependencies**

In [47]:
import fireducks.pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import warnings
from sklearn.exceptions import ConvergenceWarning

warnings.filterwarnings("ignore", category=ConvergenceWarning)


In [48]:
# Loading data into pandas dataframe
credit_card_data = pd.read_csv("/content/creditcard_2023.csv")

In [49]:
credit_card_data.head()

Unnamed: 0,id,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,0,-0.260648,-0.469648,2.496266,-0.083724,0.129681,0.732898,0.519014,-0.130006,0.727159,...,-0.110552,0.217606,-0.134794,0.165959,0.12628,-0.434824,-0.08123,-0.151045,17982.1,0
1,1,0.9851,-0.356045,0.558056,-0.429654,0.27714,0.428605,0.406466,-0.133118,0.347452,...,-0.194936,-0.605761,0.079469,-0.577395,0.19009,0.296503,-0.248052,-0.064512,6531.37,0
2,2,-0.260272,-0.949385,1.728538,-0.457986,0.074062,1.419481,0.743511,-0.095576,-0.261297,...,-0.00502,0.702906,0.945045,-1.154666,-0.605564,-0.312895,-0.300258,-0.244718,2513.54,0
3,3,-0.152152,-0.508959,1.74684,-1.090178,0.249486,1.143312,0.518269,-0.06513,-0.205698,...,-0.146927,-0.038212,-0.214048,-1.893131,1.003963,-0.51595,-0.165316,0.048424,5384.44,0
4,4,-0.20682,-0.16528,1.527053,-0.448293,0.106125,0.530549,0.658849,-0.21266,1.049921,...,-0.106984,0.729727,-0.161666,0.312561,-0.414116,1.071126,0.023712,0.419117,14278.97,0


In [50]:
# dataset information
credit_card_data.info()

<class 'fireducks.pandas.frame.DataFrame'>
RangeIndex: 568630 entries, 0 to 568629
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   id      568630 non-null  int64  
 1   V1      568630 non-null  float64
 2   V2      568630 non-null  float64
 3   V3      568630 non-null  float64
 4   V4      568630 non-null  float64
 5   V5      568630 non-null  float64
 6   V6      568630 non-null  float64
 7   V7      568630 non-null  float64
 8   V8      568630 non-null  float64
 9   V9      568630 non-null  float64
 10  V10     568630 non-null  float64
 11  V11     568630 non-null  float64
 12  V12     568630 non-null  float64
 13  V13     568630 non-null  float64
 14  V14     568630 non-null  float64
 15  V15     568630 non-null  float64
 16  V16     568630 non-null  float64
 17  V17     568630 non-null  float64
 18  V18     568630 non-null  float64
 19  V19     568630 non-null  float64
 20  V20     568630 non-null  float64
 21  V21  

In [51]:
# checking the number of rows and columns
credit_card_data.shape

(568630, 31)

In [52]:
# Checking for missing values
credit_card_data.isnull().sum()

id        0
V1        0
V2        0
V3        0
V4        0
V5        0
V6        0
V7        0
V8        0
V9        0
V10       0
V11       0
V12       0
V13       0
V14       0
V15       0
V16       0
V17       0
V18       0
V19       0
V20       0
V21       0
V22       0
V23       0
V24       0
V25       0
V26       0
V27       0
V28       0
Amount    0
Class     0
dtype: int64

In [53]:
# distribution of legitimate and fraudulent transaction
credit_card_data['Class'].value_counts()

Class
0    284315
1    284315
Name: count, dtype: int64

**Seperate the Features and Labels**

In [54]:
X = credit_card_data.drop(columns=['id', 'Class'], axis=1)
Y = credit_card_data['Class']

In [55]:
print(X)

              V1        V2        V3        V4        V5        V6        V7  \
0      -0.260648 -0.469648  2.496266 -0.083724  0.129681  0.732898  0.519014   
1       0.985100 -0.356045  0.558056 -0.429654  0.277140  0.428605  0.406466   
2      -0.260272 -0.949385  1.728538 -0.457986  0.074062  1.419481  0.743511   
3      -0.152152 -0.508959  1.746840 -1.090178  0.249486  1.143312  0.518269   
4      -0.206820 -0.165280  1.527053 -0.448293  0.106125  0.530549  0.658849   
...          ...       ...       ...       ...       ...       ...       ...   
568625 -0.833437  0.061886 -0.899794  0.904227 -1.002401  0.481454 -0.370393   
568626 -0.670459 -0.202896 -0.068129 -0.267328 -0.133660  0.237148 -0.016935   
568627 -0.311997 -0.004095  0.137526 -0.035893 -0.042291  0.121098 -0.070958   
568628  0.636871 -0.516970 -0.300889 -0.144480  0.131042 -0.294148  0.580568   
568629 -0.795144  0.433236 -0.649140  0.374732 -0.244976 -0.603493 -0.347613   

              V8        V9       V10  .

In [56]:
print(Y)

0         0
1         0
2         0
3         0
4         0
         ..
568625    1
568626    1
568627    1
568628    1
568629    1
Name: Class, Length: 568630, dtype: int64


**Seperate Training and Test data**

In [57]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, stratify=Y, random_state=13)

**Model Training: Logistic Regression**

In [58]:
model = LogisticRegression()

In [59]:
model.fit(X_train, Y_train)

**Model Evaluation**

In [60]:
# Accurcy on training data
X_train_prediction = model.predict(X_train)
training_data_accuracy = accuracy_score(Y_train, X_train_prediction)
print('Accuracy on training data :', training_data_accuracy)

Accuracy on training data : 0.9559741054034355


In [61]:
# Accurcy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(Y_test, X_test_prediction)
print('Accuracy on test data :', test_data_accuracy)

Accuracy on test data : 0.9556653711552328
