# Survival Analysis

`Survival Analysis` is a collection of statistical methods used to examine and predict the time until an event of interest occurs. This form of analysis originated in Healthcare, with a focus on time-to-death. Since then, Survival Analysis has been successfully applied to use cases in virtually every industry around the globe.


Telco Use Case Examples:

1. `Customer Retention`: It is widely accepted that the cost of retention is lower than the cost of acquisition. With the event of interest being a service cancellation, Telco companies can more effectively manage churn by using Survival Analysis to better predict at what point in time specific customers are likely to be in risk.

2. `Hardware Failures`: The quality of experience a customer has with your products and services plays a key role in the decision to renew or cancel. The network itself is at the epicenter of this experience. With time to failure as the event of interest, Survival Analysis can be used to predict when hardware will need to be repaired or replaced.

3. `Device and Data Plan Upgrades`: There are key moments in a customer's lifecycle when changes to their plan take place. With the event of interest being a plan change, Survival Analysis can be used to predict when such a change will take place and then actions can be taken to positively influence the selected products or services.


We will apply and review several techniques that are commonly used for Survival Analysis:

1. Kaplan-Meier & the Log-Rank Test
2. Cox Proportional Hazards
3. Accelerated Failure Time

In [1]:
import pandas as pd

#### Download the data

In [2]:
! wget https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv

--2022-08-02 17:54:21--  https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 970457 (948K) [text/plain]
Saving to: ‘Telco-Customer-Churn.csv’


2022-08-02 17:54:21 (190 MB/s) - ‘Telco-Customer-Churn.csv’ saved [970457/970457]



### EDA

In [3]:
df = pd.read_csv('/content/Telco-Customer-Churn.csv')
print('Data Shape : ', df.shape)
df.head()

Data Shape :  (7043, 21)


Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [4]:
print('Number of Unique customers : ', df['customerID'].nunique())

Number of Unique customers :  7043


Total customers churned

In [6]:
df['Churn'].value_counts()

No     5174
Yes    1869
Name: Churn, dtype: int64

Gender vs Churn

In [7]:
pd.crosstab(df['gender'], df['Churn'])

Churn,No,Yes
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
Female,2549,939
Male,2625,930


Looks well balanced!

Do senior citizens have a higher tendency to churn?

In [8]:
pd.crosstab(df['SeniorCitizen'], df['Churn'])

Churn,No,Yes
SeniorCitizen,Unnamed: 1_level_1,Unnamed: 2_level_1
0,4508,1393
1,666,476


Senior citizens have ~40% chance of churning, it's only 25% otherwise!