# Bayesian Survival Analysis

This notebook aims to test a Bayesian Survival Analysis for non-small cell lung cancer (NSCLC) patients. It is an attempt to 
replicate, or perform a similar analysis, to that in [Jochems _et al_, International Journal of Radiation Oncology, **99** 
(2017)](https://doi.org/10.1016/j.ijrobp.2017.04.021) using a MCMC methodology like that in the [PyMC3 Bayesian Survival 
Analysis example](https://docs.pymc.io/notebooks/survival_analysis.html). The data from Jochems _et al_ is openly available 
[here](https://www.cancerdata.org/publication/developing-and-validating-survival-prediction-model-nsclc-patients-through-distributed).

In [1]:
%matplotlib inline

import pandas as pd
import requests
import os

First, download the raw data csv file from https://www.cancerdata.org/system/files/publications/Jochems-2017-MaastroDataUnbinned.csv.

In [2]:
csvfile = 'Jochems-2017-MaastroDataUnbinned.csv'
dataurl = 'https://www.cancerdata.org/system/files/publications/{}'

# check if file already exists
if not os.path.isfile(csvfile):
    # download the data
    data = requests.get(dataurl.format(csvfile))

    # output to a file
    fp = open(csvfile, 'w')
    fp.write(data.content.decode())
    fp.close()

# read in the data using pandas
table = pd.read_csv(csvfile)

In [3]:
table.head()

Unnamed: 0,yearrt,med,maxeso,gender,intake_who,age,chemo,ott,chemo3g,gtv1,...,CumultativeTotalTumorDose,meanlungdose,lungv20,CumOTT,OverallBaselineDysp,OverallPostRTDyspFullScore,DyspGT2,DeltaDyspGe1,TreatmentType,TwoYearSurvival
0,2010,17.0346,44.8989,0,2,67,1.0,21,2.0,,...,45.0,15.4178,33.8485,21,0,0,0,0,,0
1,2014,,,0,1,69,,36,,,...,67.0,,,36,1,1,0,0,2.0,0
2,2010,,,1,1,82,,24,,,...,52.25,,,24,0,1,0,1,,0
3,2013,17.2298,48.9217,1,1,77,1.0,36,2.0,,...,69.0,25.0428,11.9888,36,1,1,0,0,2.0,0
4,2013,,,1,2,83,,28,,,...,72.0,,,28,2,2,0,0,2.0,0


In [6]:
table.age.mean()

68.28980322003578