# Build fact_customer_churn

Purpose:
- Construct a customer-level churn fact table at a fixed quarterly snapshot
- Store churn outcome, revenue proxies, and lifecycle context required for decision-oriented segmentation
- Support cross-sectional churn exposure and revenue-at-risk analysis

Scope Notes:
- The dataset represents a single quarterly snapshot
- Churn is treated as a customer state, not a temporal event
- Time-series analysis is intentionally out of scope

In [9]:
import pandas as pd
import numpy as np

raw = "../data/raw/"
processed = "../data/processed/"

customer = pd.read_excel(raw + "CustomerChurn.xlsx")
services = pd.read_excel(raw + "Telco_customer_churn_services.xlsx")
status = pd.read_excel(raw + "Telco_customer_churn_status.xlsx")

In [10]:
customer.shape, services.shape, status.shape

((7043, 21), (7043, 31), (7043, 12))

In [11]:
fact_status = status[["Customer ID", "Churn Value"]].copy()
fact_status.columns = ["customer_id", "churned"]

In [12]:
fact_status["churn_reference_date"] = pd.Timestamp("2023-09-30")

In [13]:
fact_status["churned"].value_counts()
fact_status["customer_id"].nunique()

7043

In [14]:
fact_customer = customer[["Customer ID", "Monthly Charges", "Tenure"]].copy()
fact_customer.columns = ["customer_id", "monthly_charges", "tenure_months"]

In [15]:
fact_services = services[["Customer ID", "Total Revenue"]].copy()
fact_services.columns = ["customer_id", "total_revenue"]

In [16]:
fact_customer_churn = (
    fact_status
    .merge(fact_customer, on="customer_id", how="left")
    .merge(fact_services, on="customer_id", how="left")
)

In [18]:
fact_customer_churn.isna().sum()

customer_id             0
churned                 0
churn_reference_date    0
monthly_charges         0
tenure_months           0
total_revenue           0
dtype: int64

In [19]:
fact_customer_churn.shape

(7043, 6)

In [20]:
fact_customer_churn["customer_id"].nunique()

7043

In [21]:
fact_customer_churn = fact_customer_churn[
    [
        "customer_id",
        "churned",
        "churn_reference_date",
        "monthly_charges",
        "total_revenue",
        "tenure_months",
    ]
]

In [22]:
fact_customer_churn.to_csv(
    processed + "fact_customer_churn.csv",
    index=False
)

## Fact Table Usage Notes

- This fact table represents a single quarterly snapshot (Q3)
- Churn is modeled as a binary customer state at the snapshot date
- The table is designed for:
  - revenue-based prioritization
  - tenure-based lifecycle segmentation
  - concentration and exposure analysis
- The table is not intended for time-series or trend analysis