
# **Fraud Detection: Predicting Credit Card Fraud Using Logistic Regression**

## **Introduction**
Financial institutions use **Logistic Regression** to detect fraudulent transactions.  
In this notebook, we apply **feature engineering, hyperparameter tuning, and evaluation metrics** on a **Credit Card Fraud Dataset**.

## **Fixing Convergence Issue**
To address the **"lbfgs failed to converge"** warning, we:
✔ **Scale features** using `StandardScaler()` to normalize data  
✔ **Increase `max_iter` to 5000** to allow more iterations for convergence  
✔ **Use the `saga` solver**, which is better for large datasets  
✔ **Balance the dataset** to avoid bias in predictions  

## **Dataset**
The dataset consists of anonymized transaction details, with labels indicating fraud (1) or legitimate (0).  
We will use a **verified dataset from Google Cloud & Kaggle**.


In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.utils import resample
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc

# Load dataset from a verified source
url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.csv"
df = pd.read_csv(url)

# Reduce dataset size for faster processing
df = df.sample(frac=0.2, random_state=42)

# Balance the dataset (Fraud cases are very few, so we oversample)
df_fraud = df[df['Class'] == 1]
df_legit = df[df['Class'] == 0].sample(n=len(df_fraud), random_state=42)
df_balanced = pd.concat([df_legit, df_fraud])

# Feature Engineering: Adding new features
df_balanced["Transaction_Amount_Log"] = np.log(df_balanced["Amount"] + 1)  # Log transformation

# Select features and target variable
X = df_balanced.drop(columns=["Class", "Amount"])  # Excluding Amount since we transformed it
y = df_balanced["Class"]  # 1 = Fraud, 0 = Legitimate

# Scale the features to improve convergence
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Hyperparameter tuning
param_grid = {'C': [0.01, 0.1, 1, 10, 100]}
model = GridSearchCV(LogisticRegression(max_iter=5000, solver='saga'), param_grid, cv=5)
model.fit(X_train, y_train)

# Best model selection
best_model = model.best_estimator_

# Make Predictions
y_pred = best_model.predict(X_test)
y_pred_proba = best_model.predict_proba(X_test)[:, 1]


In [None]:

# Evaluate Model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy Score: {accuracy:.2f}")
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

# Confusion Matrix
plt.figure(figsize=(5,4))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, cmap="Reds", fmt='d', 
            xticklabels=["Legitimate", "Fraud"], yticklabels=["Legitimate", "Fraud"])
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

# ROC-AUC Curve
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(6,4))
plt.plot(fpr, tpr, color="red", label=f"ROC curve (area = {roc_auc:.2f})")
plt.plot([0, 1], [0, 1], linestyle="--", color="gray")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC-AUC Curve")
plt.legend()
plt.show()



## **Conclusion**
✔ **Fixed convergence issue** by scaling features, increasing `max_iter`, and using `saga` solver  
✔ **Balanced the dataset** to improve fraud detection accuracy  
✔ **Feature Engineering:** **Log transformation** reduced skewness in transaction amounts  
✔ **Evaluation Metrics:** Confusion matrix and **ROC-AUC Curve** confirm the model's effectiveness  

### **Next Steps**
🔹 Experiment with **ensemble models** (Random Forest, Gradient Boosting)  
🔹 Try **deep learning methods** for better fraud detection  
🔹 Improve real-time detection speed for financial institutions  

💬 Have you worked on fraud detection models before? Share your experience in the comments! 🚀  
