# GRC Big Data Visuals — Churn, Inventory, Fraud, and Decision Latency

This notebook contains **four visualizations** that illustrate how a modern big data stack drives business value for **Global Retail Corporation (GRC)**:

1. **Churn Rate by Risk Decile** — enables targeted retention campaigns.
2. **Inventory Imbalance Heatmap (DC × SKU)** — guides stock rebalancing to avoid stock-outs/overstock.
3. **Fraud Anomaly Score Over Time** — surfaces suspicious spikes for review in near-real time.
4. **Decision Latency: Before vs After** — shows the step-change from batch to streaming analytics.

> Notes: Uses only `matplotlib` (no seaborn), one chart per figure.


In [None]:
# Setup (Deepnote/Jupyter)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
np.random.seed(42)


## 1) Customer Retention — Churn Rate by Risk Decile
**Why this matters:** Sorting customers into deciles by predicted churn risk lets GRC run **precision retention** (offers/outreach) where it returns the most value, instead of blanket discounts.

In [None]:
n_customers = 10000
risk_scores = np.clip(np.random.beta(2, 5, size=n_customers), 0, 1)
churn = (np.random.rand(n_customers) < risk_scores * 0.7).astype(int)
df_churn = pd.DataFrame({"risk": risk_scores, "churn": churn})
df_churn["decile"] = pd.qcut(df_churn["risk"], 10, labels=False) + 1
churn_by_decile = df_churn.groupby("decile")["churn"].mean().reset_index()

plt.figure(figsize=(10, 6))
plt.bar(churn_by_decile["decile"], churn_by_decile["churn"] * 100)
plt.title("Churn Rate by Risk Decile (Higher = Riskier)")
plt.xlabel("Risk Decile")
plt.ylabel("Observed Churn Rate (%)")
plt.xticks(range(1, 11))
plt.tight_layout()
plt.show()


## 2) Inventory Optimization — Imbalance Heatmap (DC × SKU)
**Why this matters:** Near-real-time DC×SKU imbalance highlights **where to move stock now**. Negative values imply **stock-out pressure**; positive values imply **overstock** — powering automated rebalancing.

In [None]:
n_dc, n_sku = 10, 10
imbalance = np.random.normal(loc=0, scale=1, size=(n_dc, n_sku))

plt.figure(figsize=(8, 7))
plt.imshow(imbalance, aspect='auto')
plt.title("Inventory Imbalance Heatmap (DC x SKU)")
plt.xlabel("SKU Index")
plt.ylabel("Distribution Center Index")
plt.colorbar(label="Imbalance (− stockout  |  + overstock)")
plt.tight_layout()
plt.show()


## 3) Fraud Detection — Daily Anomaly Score with Threshold
**Why this matters:** Streaming anomaly detection flags **suspicious periods** for analyst review and automated holds, cutting fraud losses while keeping false positives manageable with a threshold.

In [None]:
days = 60
dates = [datetime.today().date() - timedelta(days=days - i) for i in range(days)]
baseline = np.random.normal(0.3, 0.05, size=days)
spikes_idx = np.random.choice(range(days), size=5, replace=False)
anomaly_scores = baseline.copy()
anomaly_scores[spikes_idx] += np.random.uniform(0.3, 0.6, size=5)
threshold = 0.5

plt.figure(figsize=(12, 5))
plt.plot(dates, anomaly_scores, marker='o')
plt.axhline(threshold)
plt.title("Daily Fraud Anomaly Score (with Threshold)")
plt.xlabel("Date")
plt.ylabel("Anomaly Score")
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()


## 4) Decision Acceleration — Latency Before vs After Big Data Stack
**Why this matters:** Moving from batch ETL to a streaming stack (Kafka/Kinesis + Spark/Flink) collapses time-to-insight from **hours to minutes**, enabling timely pricing, promo, and ops decisions.

In [None]:
labels = ["Batch Legacy", "Streaming Stack"]
latency_minutes = [720, 5]

plt.figure(figsize=(7, 5))
plt.bar(labels, latency_minutes)
plt.title("Decision Latency Before vs After Big Data Stack")
plt.ylabel("Minutes to Insight")
plt.tight_layout()
plt.show()
