Survival analysis isn’t just for clinical trials. It’s widely used in churn prediction, equipment failure, credit default analysis, and customer behavior modeling. The goal? To estimate the time until an event happens — like a customer churning or a machine breaking down.

In this notebook, we’ll walk through how to:

Create synthetic time-to-event data

Use the Kaplan-Meier estimator

Implement survival analysis inside Snowflake using Snowpark Python



In [None]:
# Import python packages
import streamlit as st
import pandas as pd
from lifelines import KaplanMeierFitter
from snowflake.snowpark import functions as F
import matplotlib.pyplot as plt

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()


In [None]:
import pandas as pd
import numpy as np

np.random.seed(42)
n = 500

# Simulate duration and churn status
durations = np.random.exponential(scale=365, size=n).astype(int)
event_observed = np.random.binomial(1, p=0.7, size=n)

df = pd.DataFrame({
    "user_id": range(1, n+1),
    "duration_days": durations,
    "churned": event_observed
})


session.write_pandas(df, "CHURN_SURVIVAL_DATA", overwrite=True)

In [None]:
from lifelines import KaplanMeierFitter

In [None]:


# Pull the data from Snowflake
sdf = session.table("CHURN_SURVIVAL_DATA")
pdf = sdf.to_pandas()

# Fit the model
kmf = KaplanMeierFitter()
kmf.fit(durations=pdf["duration_days"], event_observed=pdf["churned"])

In [None]:


plt.figure(figsize=(10, 6))
kmf.plot_survival_function()
plt.title("Kaplan-Meier Survival Curve")
plt.xlabel("Days Since Signup")
plt.ylabel("Survival Probability")
plt.grid()
plt.show()