# Chapter 11 Survival Analysis and Censored Data

* Both of these rely on time until an event occurs. Like time surviving once being diagnosed with cancer.

* Problem can use survival analysis for when a certain threshold is reached then it becomes censored data.



## 11.1 Survival and Censoring Times

* Survival time is when the patient dies or the subscription is canceled, called $T$. The Censoring time is when the patient leaves the study or the study ends, called $C$.
  * $Y = min(T,C)$
*$σ =  \begin{cases}
      0 & T ≤ C \\
      1 & T > C
   \end{cases}$
   * Thus $σ = 1$ we observe the survival else the censoring time
* Therefore for $n$ observations we can find $n(Y, σ)$


## 11.2 A Closer Look at Censoring
* Must assume the censoring and survivor analysis are independent of each other.
* Right Censoring $ T ≥ Y $
* Left Censoring $ T ≤ Y $
* Interval Censoring don't know the exact $T$ but rather an interval for $T$

## 11.3 The Kaplan - Meier Survival Curve
* $S(t) = Pr(T >t)$
* This can be difficult to compute with the data having a mix of censored and survivor data.
  * To fix this we fix $K$ unique death times $d_{1}..... d_{K}$ among non censored patients
  * Let $q_{k}$ denote the amount of patients who died at $d_{k}$
  * For $1 ... K$ we let $r_{k}$ denote the amount of at risk patients alive
  * Use the total law of probability and some simplification. This gives us the Kaplan - Meier Survival Curve
* $S(d_{k}) =Π_{j=1}^{k}(\frac {(r_{j} - q_{j})}{r_{j}})$


## 11.4 The Log Rank Test
* $ W = \frac {X - μ}{\sqrt{Var(X)}}$
  * $ X = ∑_{k =1}^{K}q_{1k}$
  * $μ = ∑_{k =1}^{K} \frac {r_{1k}}{r_{k}}q_{k}$
  * $Var(X) = ∑ _{k =1}^{K} \frac {q_{k}(r_{1k}/r_{k})(1 -r_{1k}/r_{k})(r_{k} - q_{k})}{r_{k} - 1}$
* Then run a p test to see if you can reject the null hypothesis

## 11.5 Regression Model with a Survival Response
* Tempting to use linear regression on the 11.1 pairs, however censored data will pose a problem.

### 11.5.1 The Hazard Function

* Rate of death at a certain time t
* $h(t) =  \frac {f(t)}{S(t)}$ where
  * $h(t)$ is the hazard function
  * $f(t)$ probability density function
  * $S(t)$ Survival function
* Thus $ = f(y_{i}^{σ_{i}})S(y_{i}^{1- σ_{i}})$ so if the data is censored then $σ = 1$ so $Y = S(y_{i})$ and if it is not censored then $σ = 0$ thus $ Y= f(y_{i})$
  * For n observations this would yield $L = Π_{i=1}^{n} f(y_{i}^{σ_{i}})S(y_{i}^{1- σ_{i}})$
  * Can estimate $f(t)$ using the probability density function of the form $f(t) = λ exp(-λt)$. Then plug values for $λ$ that maximize $L$
* We can use the above equations to calculate $S(t)$ and $f(t)$ and use an assumption that $h(t|x_{i}) = exp(β_{0} + Σ_{j=1}^{p}β_{j}x_{ij})$ Then we can find the $β_{0}...... β_{p}$ that maximize $L$. This isn't the best option as we have to make an assumption about the $h(t)$

### 11.5.2 Proportional Hazards
* $h(t|x_{i}) = h_{0}(t) + exp(Σ_{j=1}^{p}x_{ij}β_{j})$ where $h_{0}(t) is the baseline hazard
* the baseline hazard is also unspecified which means it can take any form.


#### Cox Proportional Hazard Model
* We don't know $h_{0}(t)$ and can't just plug into $h(t|x_{i}$ into $L$ for likelihood to estimate $β = (β{1}....β_{p})$
* Magic of Cox Proportional is it is possible to estimate $β$ without knowing $h_{0}(t)$
* The $h_{0}(t)$ terms cancel in the deriving of the Cox Proportional Hazard. Thus it can be calculated without it.
* Then compute the partial likelihood with the values of $β$ that maximize it.
* Can then find p values, and confidence intervals like with linear regression and logistic regression. Can then reject or accept the null hypothesis and certain parameters.

#### Connection with Log Ranked Test
* When performing the Cox Proportional Hazard Model test with a binary single predictor it is the same as the log test.
* Normally have two options to test if there is a difference in survival times of two groups we can use Cox proportional hazard model to find the $β$ values then run a p test. See if we can reject the null hypothesis. OR run a log rank test. Both perform the same in this case.

#### Additional Details
* No intercept in Cox since it is absorbed into $h_{0}(t)$
* Can estimate $h_{0}(t)$ to discover $S(t)$ and can be implemented in Python.
* Partial Likelihood isn't the full likelihood but estimates it rather well.

### 11.5.3 Example: Brain Cancer Data
* Important to add all predictors as they can increase the accuracy of a model and its p values.

## 11.6 Shrinkage of the Cox Model
* Can add a $ λP(β)$ penalty to the end of the Cox Model.
  * Can Pick $P(B) = Σ_{i=1}^{p}Β_{j}^{2}$ Ridge Regression
  * or $P(B) = Σ_{i=1}^{p}|B_{j}|$ Lasso
  * Lasso will take some coefficients to 0 while ridge will shrink all coefficients
* We can use cross validation on the $λ$ to find the most efficient value.
* However we can't break the data into test and train as the censoring alters this. Therefore we break the data into sets like high, medium, and low risk. Then use our predictor formula on each group and see how well it performs.


## 11.7 Additional Topics

### 11.7.1 Area Under the Curve for Survival Analysis
* Gives a percentage based on two classes if it will be identified correctly.

### 11.7.2 Choice of Time Scale
* Must be careful defining time zero as it could be the day the disease was discovered, their birthday, age the disease came about etc.

### 11.7.3 Time-Dependent Covariates
* Takes the $n$ observation of predictor $p$ such that $x_{np}$ has a function of time thus $x_{np}(t)$. This seamlessly integrates with the cox formula.

### 11.7.4 Checking the Proportional Hazard Assumption
* Should check to make sure the assumption holds when using this formula

### 11.7.5 Survival Trees
* Can make survival trees like in the regression and classification case.
* Make trees that maximize the survival curves in the result daughter nodes
* Can combine several trees to make a random survival forest.