# Exploratory Data Analysis (EDA) — Bank Transaction Activity Insights

**Name:** Anthony Roca  
**Email:** aroca@charlotte.edu  
**Due Date:** January 21, 2026

---

## Overview

**Dataset:** Kaggle – Bank Transaction Dataset for Fraud Detection

This dataset contains ~2,512 transaction records and ~16 columns with behavioral, transactional, and demographic attributes:
- TransactionID
- AccountID
- TransactionAmount
- TransactionDate
- TransactionType
- Location
- DeviceID
- IP Address
- MerchantID
- AccountBalance
- Channel
- CustomerAge
- CustomerOccupation
- TransactionDuration
- LoginAttempts
- PreviousTransactionDate

The dataset is synthetic and intended for exploratory pattern analysis.

**Link:** https://www.kaggle.com/datasets/valakhorasani/bank-transaction-dataset-for-fraud-detection/data

---

## Objectives

1. **Data Health & Understanding:** Establish dataset quality, schema, and basic distributions.
2. **Business Lens EDA:** Quantify activity by account, city, merchant, channel, device/IP, and time; surface operational anomalies and plausible explanations.

---

## Setup & Constraints

- Use Python (Pandas/NumPy/Seaborn/Matplotlib) in Google Colab.
- Document every data cleaning decision using comments in your notebook.
- The dataset is synthetic; focus on method and reasoning rather than conclusions about real-world volumes.

In [None]:
# Part A — Data Intake & Quality Audit

## A1) Load & Schema Check

**Action:** Inspect column names, types, and nulls; parse timestamps for TransactionDate and PreviousTransactionDate.

**Steps:**
- Load the dataset using pandas
- Check column names, data types, and null values
- Parse date columns properly

In [None]:
# Import packages and load data


In [None]:
# Check info, null values, and parse dates


## A2) Integrity & Consistency

**Action:** Check uniqueness (TransactionID), reasonableness of numeric ranges (e.g., amounts non-negative), and categorical value sets (TransactionType, Channel).

**Steps:**
- Assert TransactionID uniqueness
- Check for negative transaction amounts
- Review categorical value distributions

In [None]:
# Check TransactionID uniqueness and data integrity


## A3) Basic Profiling

**Action:** Produce summary statistics and visualize missing data.

**Steps:**
- Generate descriptive statistics for all columns
- Create a heatmap to visualize missing values

In [None]:
# Generate summary statistics


In [None]:
# Visualize missing data with heatmap


# Part B — Business-Oriented EDA

## B1) Activity by Account

**Action:** Compute transactions per account; identify top accounts by volume; assess distribution shape (long tail vs even spread).

**Steps:**
- Group by AccountID and count transactions
- Identify top 10 accounts by transaction volume
- Visualize distribution of transactions per account

In [None]:
# Compute transactions per account and identify top accounts


In [None]:
# Visualize distribution of transactions per account


## B2) Time Spacing Between Transactions

**Action:** Compute delta_t = TransactionDate - PreviousTransactionDate; profile high-frequency bursts and typical intervals.

**Steps:**
- Calculate time difference between consecutive transactions
- Analyze distribution of inter-transaction times
- Identify high-frequency patterns

In [None]:
# Calculate inter-transaction time and visualize


## B3) Geographic Sanity Checks (City)

**Action:** Count distinct accounts per city and total transactions per city; flag cities with unexpected ranks (e.g., Raleigh exceeding NYC).

**Steps:**
- Count unique accounts per city (Location)
- Count total transactions per city
- Visualize and identify anomalies

In [None]:
# Analyze accounts and transactions per city


In [None]:
# Visualize geographic distribution


## B4) Merchant Concentration

**Action:** Identify top merchants by count and amount; compute merchant diversity per account (unique merchants/account).

**Steps:**
- Identify top merchants by transaction count and total amount
- Calculate merchant diversity (unique merchants per account)
- Visualize merchant concentration patterns

In [None]:
# Analyze top merchants by count and amount


In [None]:
# Calculate and visualize merchant diversity per account


## B5) Channel Mix & Journey Metrics

**Action:** Compare volume and amounts across channels (Online/ATM/Branch); analyze typical durations and login attempts across channels.

**Steps:**
- Aggregate transactions by channel
- Compare transaction counts, amounts, durations, and login attempts
- Visualize channel-based patterns

In [None]:
# Analyze channel metrics


In [None]:
# Visualize channel comparisons


## B6) Device & IP Reuse

**Action:** Measure the number of accounts per DeviceID and per IP address; inspect cross-account sharing patterns.

**Steps:**
- Count unique accounts per DeviceID
- Count unique accounts per IP Address
- Identify devices/IPs with multiple accounts

In [None]:
# Analyze device and IP reuse patterns


## B7) Temporal Patterns

**Action:** Analyze hourly/daily/weekday trends; identify spikes around odd hours, weekends, end-of-month.

**Steps:**
- Extract hour, day, and weekday from TransactionDate
- Visualize transaction patterns by hour and weekday
- Analyze daily trends over time

In [None]:
# Extract temporal features and analyze hourly patterns


In [None]:
# Analyze daily transaction trends


## B8) Amount vs Balance Dynamics

**Action:** Check reasonableness of AccountBalance changes relative to TransactionAmount and TransactionType (debit vs credit).

**Steps:**
- Analyze TransactionAmount by TransactionType
- Create derived features (e.g., balance_to_amount_ratio)
- Validate business logic of balance changes

In [None]:
# Analyze amount vs balance dynamics


---

## Deliverables

1. **Google Colab Notebook:** Reproducible EDA with narrative, code, and plots.
2. **Executive Summary (2–3 pages):** Top activity insights and suggested analytical follow-ups.

---

## Key Reminders

- **Synthetic Data Caveat:** Treat odd findings as opportunities to practice reasoning and method design, not ground truth.
- **Validate Anomalies:** Cross-check findings across multiple dimensions (city + device + merchant + time) before drawing conclusions.
- **Document Everything:** Use comments to explain all data cleaning decisions and analytical choices.
- **Keep Charts Readable:** Limit categories per plot, rotate labels, and annotate key takeaways.