# Exploratory Data Analysis (EDA)

## Credit Card Fraud Detection

This notebook focuses on understanding the dataset, identifying class imbalance, and exploring feature distributions. Proper EDA is critical in fraud detection problems due to extreme skewness in class labels.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('default')

In [None]:
df = pd.read_csv('../data/raw/creditcard.csv')
df.head()

## Dataset Overview

- The dataset contains anonymized transaction features (V1â€“V28)
- `Amount` represents transaction value
- `Class` is the target variable (0 = legitimate, 1 = fraud)

In [None]:
df.info()

## Class Imbalance Analysis

Fraud detection datasets are typically highly imbalanced. Visualizing the class distribution helps justify the need for specialized evaluation metrics and resampling techniques.

In [None]:
sns.countplot(x='Class', data=df)
plt.title('Class Distribution')
plt.show()

## Feature Correlation

Correlation analysis helps identify redundant features and potential relationships useful for model learning.

In [None]:
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), cmap='coolwarm', linewidths=0.5)
plt.title('Feature Correlation Heatmap')
plt.show()