# Credit Card Fraud Detection

This notebook analyzes a dataset of credit card transactions to detect fraudulent activities using machine learning techniques. The goal is to identify fraudulent transactions with high accuracy and provide recommendations for reducing financial losses.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.utils import resample
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

# Load dataset
file_path = 'creditcard.csv'
df = pd.read_csv(file_path)

# Basic information
print(df.info())
print(df.head())
print(df.describe())

### Class Distribution
The target variable `Class` indicates:
- 0: Legitimate transaction
- 1: Fraudulent transaction

In [None]:
# Class distribution
class_distribution = df['Class'].value_counts()
plt.figure(figsize=(6, 4))
class_distribution.plot(kind='bar', color=['green', 'red'])
plt.title("Class Distribution")
plt.xlabel("Class (0: Legitimate, 1: Fraudulent)")
plt.ylabel("Count")
plt.show()

print(class_distribution)

### Data Preprocessing
- Standardize `Time` and `Amount`.
- Handle class imbalance using undersampling.

In [None]:
# Features and target
X = df.drop('Class', axis=1)
y = df['Class']

# Standardize 'Time' and 'Amount'
scaler = StandardScaler()
X[['Time', 'Amount']] = scaler.fit_transform(X[['Time', 'Amount']])

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Combine train data
train_data = pd.concat([X_train, y_train], axis=1)

# Separate majority and minority classes
legit = train_data[train_data['Class'] == 0]
fraud = train_data[train_data['Class'] == 1]

# Undersample majority class
legit_undersampled = resample(legit, 
                              replace=False, 
                              n_samples=len(fraud), 
                              random_state=42)

# Combine undersampled majority and minority classes
train_balanced = pd.concat([legit_undersampled, fraud])
X_train_bal = train_balanced.drop('Class', axis=1)
y_train_bal = train_balanced['Class']

# Check new class distribution
print(y_train_bal.value_counts())

### Model Building
Train a **Random Forest Classifier** to detect fraudulent transactions.

In [None]:
# Train Random Forest Classifier
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train_bal, y_train_bal)

# Predictions on test set
y_pred = rf_model.predict(X_test)

# Evaluate the model
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("ROC AUC Score:", roc_auc_score(y_test, y_pred))

### Conclusion
The Random Forest model was able to detect fraudulent transactions effectively, but further improvements can be made by:
- Using oversampling techniques (e.g., SMOTE) to improve class balance.
- Tuning hyperparameters for better performance.
- Trying other algorithms like XGBoost, LightGBM, or anomaly detection methods.