# 🔍 Credit Card Fraud Detection


## 📌 Introduction
This project aims to detect fraudulent credit card transactions using machine learning techniques.  
The dataset used is from **European cardholders**, containing **highly imbalanced data**.  

### 🚀 Steps in This Notebook:
1️⃣ **Data Loading & Exploration**  
2️⃣ **Preprocessing & Feature Engineering**  
3️⃣ **Model Training & Evaluation**  
4️⃣ **Results & Insights**  


In [None]:

# 📚 Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')


In [None]:

# 📂 Load Dataset
df = pd.read_csv("data/creditcard.csv")

# Display first few rows
df.head()


## 📊 Exploratory Data Analysis

In [None]:

# 🔍 Check for Missing Values
print("Missing Values:
", df.isnull().sum())

# 🔍 Class Distribution (Fraud vs Non-Fraud)
class_counts = df['Class'].value_counts()
plt.figure(figsize=(6,4))
sns.barplot(x=class_counts.index, y=class_counts.values, palette="coolwarm")
plt.xlabel("Transaction Type (0: Non-Fraud, 1: Fraud)")
plt.ylabel("Count")
plt.title("Class Distribution in Credit Card Transactions")
plt.show()


## 🛠️ Data Preprocessing

In [None]:

# Handling class imbalance using undersampling
fraud_cases = df[df['Class'] == 1]
non_fraud_cases = df[df['Class'] == 0].sample(len(fraud_cases), random_state=42)

# Combining the balanced dataset
balanced_df = pd.concat([fraud_cases, non_fraud_cases])

# Separating features and target variable
X = balanced_df.drop(columns=['Class'])
y = balanced_df['Class']

# Standardizing the data (Scaling 'Amount' and 'Time' features)
scaler = StandardScaler()
X[['Amount', 'Time']] = scaler.fit_transform(X[['Amount', 'Time']])

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 🤖 Model Training & Evaluation

In [None]:

# Define models
models = {
    "Logistic Regression": LogisticRegression(),
    "Random Forest": RandomForestClassifier()
}

# Train models and evaluate performance
results = []
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_pred_prob = model.predict_proba(X_test)[:,1]

    results.append({
        "Model": name,
        "Accuracy": accuracy_score(y_test, y_pred),
        "Precision": precision_score(y_test, y_pred),
        "Recall": recall_score(y_test, y_pred),
        "F1 Score": f1_score(y_test, y_pred),
        "ROC-AUC": roc_auc_score(y_test, y_pred_prob)
    })

# Creating a DataFrame to compare model performance
results_df = pd.DataFrame(results)
results_df


## 📌 Conclusion


- **Logistic Regression** and **Random Forest** were trained to classify fraudulent transactions.  
- **Precision of 1.0** indicates that all flagged fraudulent transactions were indeed fraud.  
- **Recall score** shows how many actual frauds were detected correctly.  
- Further improvements can be made using **Hyperparameter tuning, Ensemble Learning, and Deep Learning techniques**.  

🔍 **Next Steps:** Model Deployment & Real-time Fraud Detection System 🚀  
