# Notebook 1 : Why Survival Analysis in Marketing

---

## 1. Why Survival Analysis (Time-to-Event-Analysis) in marketing

Many marketing questions are **how long until..?**
- How long do subscribers stay before cancelling?
- How long does it take a free user to **upgrade**?
- How long before a shopper places the **next order**?
- When will a re-engagement campaign bring a user back?

Traditional churn/conversion models reduce these to a **yes/no** problem within a fixed ime window (e.g, will churn in 90 days?). That throws away timing and ignores people still active.

**Time-to-event-analysis** - historically called *survival analysis* was developed in **medicine** and **engineering**  to study time until death or time until failure.

> **References**  
> • [Wikipedia — Survival analysis](https://en.wikipedia.org/wiki/Survival_analysis)  
> • Kleinbaum & Klein, *Survival Analysis: A Self-Learning Text*, Springer 2012   

---

## 2. Why it's different from simple churn classification

A churn classifier asks: *Will this customer churn in 90 days - yes or no?*
A survival model asks: *What is the probabaility this customer is still active at any time \(t\)?*
It keeps customers who are **still active** by marking them as **censored** rather than discarding or mislabelling them.

This lets us:
- Forecast **retention or adoption curves** over time.
- Compare strategies or cohorts fairly.
- Plan timings of **offers or interventions**.

---

## 3. Marketing Use Cases

Survival (time-to-event) analysis is useful well beyond churn:
- **Subscription cancellation** - length of paid membership
- **Repeat purchase** - time from first to second/third order
- **Upgrade timimg** - free -> premium conversion times
- **Campaign Response** - how long a user acts after a promotion
- **Referral** - time until a customer refers a friend

All share the same pattern: **time until an event, with many customers not yet at the event**.

---

## 4. Censoring - The Key Data Feature

Often, when we pull data, many customers haven't yet had the event. 
Instead of discarding them, we keep them as **censored** :
we know they lasted *at least* this long.

For each customer $i$:

$$
T_i^{\mathrm{true}}=\text{actual event time},\quad
C_i=\text{censoring time},\quad
T_i=\min(T_i^{\mathrm{true}},C_i),\quad
\delta_i=\mathbf{1}\{T_i^{\mathrm{true}}\le C_i\}.
$$

We observe $(T_i,\delta_i,X_i)$ where $X_i$ are customer features (channel, spend, tenure…).

---

### 4.1 Right-Censoring (the most common)

| Customer | Start | Observed Time (mo) | Event (\(\delta\)) | Meaning |
|----------|-------|--------------------|--------------------|---------|
| A        | Jan-01| 2                  | 1                  | Cancelled after 2 mo |
| B        | Jan-01| 3                  | 1                  | Cancelled after 3 mo |
| C        | Jan-01| 5                  | 0                  | Still subscribed at 5 mo (censored) |
| D        | Jan-01| 7                  | 0                  | Still subscribed at 7 mo (censored) |

Right censoring = the user has **not yet** taken action when observation stops.

---

### 4.2 Left Censoring

Event happened **before** we started observing.

| Customer | Joined | Tracking Began | Event Status | Meaning |
|----------|--------|----------------|--------------|---------|
| E        | Jan-01 | Jun-01         | Already upgraded | Upgrade time < June but unknown (left-censored) |
| F        | Feb-15 | Jun-01         | Upgraded Jul | Observed upgrade at 5 mo |
| G        | Mar-01 | Jun-01         | Still free | Under observation |


Occurs if tracking starts mid-journey (e.g., referral program launched late).

---

### 4.3 Interval Censoring

You only check status at intervals, we don't know the exact date.
You run a **loyalty campaign** and check customer upgrade status **every quarter** (Q1, Q2, Q3…).
If a customer was free in Q1 but premium by Q2, the true upgrade date is **somewhere between those quarters**.

| Customer | Status at Q1 | Status at Q2 | Status at Q3 | Interpretation |
|----------|--------------|--------------|--------------|----------------|
| H        | Free         | **Premium**  | Premium      | Upgraded sometime **between Q1 and Q2**  (interval-censored) |
| I        | Free         | Free         | Premium      | Upgraded **between Q2 and Q3** (interval-censored) |
| J        | Premium      | Premium      | Premium      | Already premium before Q1 (could be left-censored) |
| K        | Free         | Free         | Free         | Still free at Q3 (right-censored at Q3) |

We don’t know the exact upgrade day for H and I — only that it occurred between the quarterly checks.  
This is **interval censoring**: the event is known to have happened *within a time interval*.

## 5. Quick Look at a Dataset

We can use the small `rossi` dataset (Rossi et al., 1980; available in [`lifelines`](https://lifelines.readthedocs.io/en/latest/Examples.html)).  
It tracks time until re-arrest — structurally identical to many marketing datasets.

```python
from lifelines.datasets import load_rossi
rossi = load_rossi()
rossi.head()









In [None]:
from lifelines.datasets import load_rossi
rossi = load_rossi()
rossi.head()