In [2]:
import numpy as np
import pandas as pd

We'll work with data where:
- Time: days after a disease is diagnosed and the patient either dies or left the hospital's supervision.
- Event: 
    - 1 if the patient died
    - 0 if the patient was not observed to die beyond the given 'Time' (their data is censored)
    
Notice that these are the same numbers that you see in the lecture video about estimating survival.

In [3]:
df = pd.DataFrame({'Time': [10,8,60,20,12,30,15],
                   'Event': [1,0,1,1,0,1,0]
                  })
df

Unnamed: 0,Time,Event
0,10,1
1,8,0
2,60,1
3,20,1
4,12,0
5,30,1
6,15,0


### Count number of censored patients

In [6]:
sum(df['Event']==0)

3

### Count number of patients who definitely survived past time t

This assumes that any patient who was censored died at the time of being censored ( **died immediately**).

If a patient survived past time `t`:
- Their `Time` of event should be greater than `t`.  
- Notice that they can have an `Event` of either 1 or 0.  What matters is their `Time` value.

In [8]:
t=25

sum(df['Time']>25)

2

### Count the number of patients who may have survived past t

This assumes that censored patients **never die**.
- The patient is censored at any time and we assume that they live forever.
- The patient died (`Event` is 1) but after time `t`

In [10]:
t=25
sum((df['Event']==0) | (df['Time']>25))

5

### Count number of patients who were not censored before time t

If patient was not censored before time `t`:
- They either had an event (death) before `t`, at `t`, or after `t` (any time)
- Or, their `Time` occurs after time `t` (they may have either died or been censored at a later time after `t`)

In [13]:
t=25
sum((df['Event']==1) | (df['Time']>25))

4

<a name="kaplan-meier"></a>
## Kaplan-Meier

The Kaplan Meier estimate of survival probability is:

$$
S(t) = \prod_{t_i \leq t} (1 - \frac{d_i}{n_i})
$$

- $t_i$ are the events observed in the dataset 
- $d_i$ is the number of deaths at time $t_i$
- $n_i$ is the number of people who we know have survived up to time $t_i$.


In [14]:
df = pd.DataFrame({'Time': [3,3,2,2],
                   'Event': [0,1,0,1]
                  })
df

Unnamed: 0,Time,Event
0,3,0
1,3,1
2,2,0
3,2,1


### Find those who survived up to time $t_i$

If they survived up to time $t_i$, 
- Their `Time` is either greater than $t_i$
- Or, their `Time` can be equal to $t_i$

In [16]:
t_i = 2
df['Time'] >= t_i

0    True
1    True
2    True
3    True
Name: Time, dtype: bool

### Find those who died at time $t_i$

- If they died at $t_i$:
- Their `Event` value is 1.  
- Also, their `Time` should be equal to $t_i$

In [17]:
t_i = 2
(df['Event'] == 1) & (df['Time'] == t_i)

0    False
1    False
2    False
3     True
dtype: bool