# Import libraries

Our data analysis is dependent on the following packages:
- [pandas](https://pandas.pydata.org/) for data manipulation
- [PyEnsembl](https://github.com/openvax/pyensembl) for working with genome data
- [seaborn](https://seaborn.pydata.org/) for drawing attractive statistical graphs
- [Matplotlib](https://matplotlib.org/) for visualizing data relationships
- [methylcheck](https://pypi.org/project/methylcheck/) for the density plot (they do it nice)
- [methylprep](https://pypi.org/project/methylprep/) to use methylcheck
- [umap-learn](https://pypi.org/project/umap-learn/) for building UMAPs
- [scikit-learn](https://scikit-learn.org/stable/index.html) for machine learning


You can install these modules by running the below script inside Jupyter:
```
%pip install pandas
%pip install pyensembl
%pip install seaborn
%pip install methylcheck
%pip install methylprep
%pip install umap-learn
%pip install scikit-learn
```

In [2]:
from pyensembl import EnsemblRelease
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import methylcheck
import pickle
from sklearn.decomposition import PCA
import umap.umap_ as umap

# Change Ensembl IDs to HGNC

In [None]:
# Open counts file
df = pd.read_pickle('../Data/Raw/BCCA_Counts.pickle')    

# Open clinical data
clinical_data = pd.read_pickle("../Data/clinical_data.pickle")

In [None]:
clinical_data

## Gather meta data of all identified genes

In [None]:
# Reference genome = GRCh38
# 9,033 Ensembl IDs were not identified
# 11,309 Ensembl IDs were identified but have no Gene ID (for the reference genome GRCh38)
    # These 11,309 were included in meta_data.pickle (gene_meta_data dataframe)

# Using pyensembl to get Ensembl Gene IDs
Genome = EnsemblRelease()
gene = Genome.gene_by_id(df.index[0])
gene_meta_data = pd.Series([gene.gene_id, gene.gene_name, gene.biotype, gene.contig, gene.start, gene.end, gene.strand]).to_frame().transpose()

# Getting all the genes in the count file
unnamed_genes = []
for id, ensembl in enumerate(df.index):
    try:

        if id == 0:
            continue
        
        gene = Genome.gene_by_id(ensembl)

        temp_list = pd.Series([gene.gene_id, gene.gene_name, gene.biotype, gene.contig, gene.start, gene.end, gene.strand]).to_frame().transpose()
        gene_meta_data = gene_meta_data.append(temp_list)

    except:
        unnamed_genes.append(ensembl)

# Making it pretty
gene_meta_data = gene_meta_data.rename(columns={0:'Gene_ID', 1:'Gene_Name', 2:'Gene_Biotype', 3:'Contig_Chromosome', 4:'Start', 5:'End', 6:'Strand'})
gene_meta_data = gene_meta_data.set_index("Gene_ID")

## Dropping nameless genes from dataframe

In [None]:
identified_nameless_genes = []

for id, ensembl in enumerate(df.index):
    try:
        gene = Genome.gene_by_id(ensembl)

        if gene.gene_name == "":
            identified_nameless_genes.append(ensembl)
            continue
    except:
        continue

# Dropping the genes that have info about them, but no name (interesting ones)
gene_meta_data2 = gene_meta_data.drop(identified_nameless_genes)

# Gathering only the name of the genes (instead of the metadata (where can we find them, chr, etc.))
gene_names_column = gene_meta_data2["Gene_Name"]

# Dropping unnamed genes
df2 = df.drop(unnamed_genes)

# Finally making the HGNC genes in the index (where Ensembl lied)
df2 = df2.join(gene_names_column.to_frame()).set_index("Gene_Name")

# Log Scale the data

In [None]:
df2 = np.log2(df2 + 1)
description_log_scaled_dataset = df2.T.describe().T

# Adjusting for non-expressed genes (37,688 in total)
HGNC_Log_Scaled = df2[description_log_scaled_dataset["std"] > 0]

# Creating a final, cleaned data frame

In [None]:
# Gather Log-Scaled data
df0 = HGNC_Log_Scaled.T.reset_index()
df1 = df0.join(df0['index'].str.rsplit(pat='-',n=2,expand=True)[0])
df2 = df1.set_index(0)

# Change index of clinical data, then drop duplicates
clinical_data = clinical_data.rename(columns={"TARGET USI":"Patient_ID"}).set_index("Patient_ID")
clinical_data2 = clinical_data.reset_index().drop_duplicates(subset='Patient_ID').set_index('Patient_ID')

# Join Log-Scale and Clinical Data
df3 = df2.join(clinical_data2, how='inner')

df4 = df3.set_index("index")

# Only protocol/cohort we are going to use because it has >1000 samples and cohorts are not batch-corrected
df5 = df4[df4["Protocol"] == "AAML1031"]

# Get clinical data from the dataframe
df5_clinical  = df5.iloc[:,df3.shape[1]-clinical_data2.shape[1]-1:]

# Get the data from the dataframe
x = df5.iloc[:,:df3.shape[1]-clinical_data2.shape[1]-2]

# Export

In [None]:
# Clinical Data
clinical_data.to_pickle("../Data/pickled/clinical_data.pickle")

# Processed Data
x.to_pickle("../Data/pickled/log_scaled_data.pickle")

# Gene meta data (where it lies, which chromosome, what it does, etc)
gene_meta_data.to_pickle('../Data/Pre_Processed/meta_data.pickle')

# Genes with no HGNC
unnamed_genes.to_pickle('../Data/Pre_Processed/unnamed_genes.pickle')

# Plotting

## Load stuff

In [49]:
log_scaled_data = pd.read_pickle('../Data/log_scaled_data.pickle')
log_scaled_data

Unnamed: 0,TARGET-20-PAUHFI-09A-01R,TARGET-20-PAUHXN-03A-01R,TARGET-20-PAUIIB-09A-01R,TARGET-20-PAUIPM-09A-01R,TARGET-20-PAUJCF-09A-01R,TARGET-20-PAUJMC-09A-01R,TARGET-20-PAUJNJ-09A-01R,TARGET-20-PAUKDH-09A-01R,TARGET-20-PAUKEZ-03A-01R,TARGET-20-PAUKTH-09A-01R,...,TARGET-20-PAXMKU-09A-01R,TARGET-20-PAXMLI-09A-01R,TARGET-20-PAXMLN-09A-01R,TARGET-20-PAXMLW-09A-01R,TARGET-20-PAXMNG-09A-01R,TARGET-20-PAXMNR-09A-01R,TARGET-20-PAXMPF-09A-01R,TARGET-20-PAXMPG-09A-01R,TARGET-20-PAXMPW-09A-01R,TARGET-20-PAXMRA-09A-01R
SCYL3,10.068778,8.569856,10.117643,9.881114,9.436712,10.262095,9.754888,10.022368,8.144658,8.672425,...,8.348728,8.797662,9.207014,9.511753,9.095397,8.353147,8.194757,10.539159,9.047124,9.451211
C1orf112,7.294621,11.408330,9.862637,9.199672,8.413628,9.084808,9.746514,9.411511,8.523562,9.977280,...,6.741467,9.654636,9.812177,9.729621,9.672425,9.169925,7.665336,10.889504,9.865733,10.274960
FGR,12.414157,14.326359,12.146250,14.136190,13.954015,12.002463,10.926296,9.584963,13.896900,8.820179,...,11.756139,13.592807,13.118617,12.899735,12.976206,13.574239,9.622052,10.595257,9.430453,13.702714
CFH,5.087463,2.807355,8.108524,4.247928,2.807355,4.906891,0.000000,4.459432,0.000000,3.584963,...,3.700440,5.044394,3.584963,9.310613,5.000000,5.459432,4.459432,12.147205,5.554589,7.924813
STPG1,7.417853,6.894818,7.930737,7.475733,5.781360,6.507795,6.599913,9.361944,6.554589,7.523562,...,6.965784,7.483816,7.087463,7.366322,7.727920,6.930737,5.321928,7.754888,8.066089,7.761551
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
DUX4L16,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
DUX4L19,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
DUX4L18,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
DUX4L17,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


In [50]:
clinical = pd.read_pickle('../Data/clinical_data.pickle')
clinical

Unnamed: 0_level_0,Gender,Race,Ethnicity,Age at Diagnosis in Days,First Event,Event Free Survival Time in Days,Vital Status,Overall Survival Time in Days,Year of Diagnosis,Year of Last Follow Up,...,Chloroma Site of Relapse/Induction Failure,Cytogenetic Site of Relapse/Induction Failure,Other Site of Relapse/Induction Failure,Gene Fusion,Gemtuzumab ozogamicin treatment,Refractory Timepoint sent for Induction Failure Project,Comment,Gene Fusion...62,Blast count used for RNA seq,Gene Fusion...67
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
TARGET-20-PAUHFI-09A-01R,Female,Black or African American,Not Hispanic or Latino,722.0,Relapse,257.0,Dead,276.0,2011.0,2012.0,...,No,No,No,,,,WHO Classification (final: path then study ent...,,,
TARGET-20-PAUHXN-03A-01R,Male,Black or African American,Not Hispanic or Latino,5704.0,Censored,1145.0,Alive,1145.0,2011.0,2014.0,...,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,,
TARGET-20-PAUIIB-09A-01R,Female,White,Not Hispanic or Latino,4395.0,Censored,2468.0,Alive,2468.0,2011.0,2018.0,...,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,,
TARGET-20-PAUIPM-09A-01R,Male,White,Not Hispanic or Latino,3561.0,Censored,2721.0,Alive,2721.0,2011.0,2019.0,...,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,,
TARGET-20-PAUJCF-09A-01R,Female,White,Hispanic or Latino,3976.0,Relapse,742.0,Alive,2681.0,2011.0,2018.0,...,No,No,No,,,,WHO Classification (final: path then study ent...,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TARGET-20-PAXMNR-09A-01R,Male,White,Not Hispanic or Latino,721.0,Relapse,168.0,Dead,454.0,2016.0,2017.0,...,Yes,Yes,No,,,,WHO Classification (final: path then study ent...,,,
TARGET-20-PAXMPF-09A-01R,Female,White,Hispanic or Latino,4071.0,Death,103.0,Dead,103.0,2016.0,2016.0,...,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,,
TARGET-20-PAXMPG-09A-01R,Female,White,Not Hispanic or Latino,334.0,Relapse,270.0,Dead,743.0,2016.0,2018.0,...,No,No,No,,,,WHO Classification (final: path then study ent...,,,
TARGET-20-PAXMPW-09A-01R,Male,White,Not Hispanic or Latino,2945.0,Censored,911.0,Alive,911.0,2016.0,2018.0,...,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,,


## Density plot

In [None]:
# TODO Find size of expression matrix, identify degree of variation; basically analyze this chart below
methylcheck.beta_density_plot(data, full_range=True, plot_title="Density Plot")

## PCA

In [None]:
# Fit PCA
pca_decomp = PCA(random_state=42, n_components=2)
data_PCs = pca_decomp.fit_transform(data)

In [None]:
def draw_PCAplot(score, hue=None):
    
    sns.set_theme(style="white", color_codes=True)

    # Define variables
    score2 = score[:,0:2]
    xs = score2[:,0]
    ys = score2[:,1]

    scalex = 1.0/(xs.max() - xs.min())
    scaley = 1.0/(ys.max() - ys.min())

    # Define scatterplot

    sns.scatterplot(x=xs * scalex, y=ys * scaley,
                    palette='husl', s=10,
                    linewidth=0, alpha=1,
                    data=clinical, hue=hue)

    # Define plot specs
    plt.title("PCA Decomposition by RNA-seq",
               fontsize = 12)        
               
    plt.xlabel("PC 1")
    plt.ylabel("PC 2")
    plt.tight_layout()

    # Save figure
    #plt.savefig('test' + title + '.png',
    #bbox_inches='tight', dpi=300)

    return(plt.show())

In [None]:
draw_PCAplot(data_PCs)
draw_PCAplot(data_PCs, hue="Primary Cytogenetic Code")

## UMAP

In [None]:
reducer = umap.UMAP(n_neighbors=15, min_dist=0.01, n_epochs=100, random_state=42)
mp_umap = reducer.fit_transform(data)

In [None]:
def draw_UMAPplot(score, hue=None):
    
    sns.set_theme(style="white", color_codes=True)

    # Define variables
    score2 = score[:,0:2]
    xs = score2[:,0]
    ys = score2[:,1]

    # Define scatterplot

    sns.scatterplot(data=clinical, x=xs, y=ys,
                    palette='husl', s=10,
                    linewidth=0, alpha=1, hue=hue)

    # Define plot specs
    plt.title("UMAP by RNA-seq",
               fontsize = 12)        
               
    plt.xlabel("UMAP 1")
    plt.ylabel("UMAP 2")
    plt.tight_layout()

    # Save figure
    #plt.savefig('test2' + title + '.png',
    #bbox_inches='tight', dpi=300)

    return(plt.show())

In [None]:
#Call the UMAP function.
draw_UMAPplot(mp_umap)
draw_UMAPplot(mp_umap, hue="Primary Cytogenetic Code")

## Preparing data for R

In [12]:
# Read pickled data into data frame
counts = pd.read_pickle('../Data/raw_counts_data.pickle')
# Read data frame into .csv for exporting to R
counts.to_csv('../Data/raw_counts_data.csv', mode ='w+',index_label="Gene")
#Excluding the genes on the .pickle as well - 
counts.to_pickle('../Data/raw_counts_data.pickle')

# Same for clinical data
clinical = pd.read_pickle('../Data/clinical_data.pickle')
clinical.to_csv('../Data/clinical_data.csv')

## Heat Map
TODO

In [51]:
significant_genes = pd.read_csv('../Data/wrong_significant_genes.csv').rename(columns={"Unnamed: 0":"Genes"}).set_index("Genes") # From R
log_scaled_data = pd.read_pickle('../Data/log_scaled_data.pickle')
clinical = pd.read_pickle('../Data/clinical_data.pickle')

In [52]:
log_scaled_data

Unnamed: 0,TARGET-20-PAUHFI-09A-01R,TARGET-20-PAUHXN-03A-01R,TARGET-20-PAUIIB-09A-01R,TARGET-20-PAUIPM-09A-01R,TARGET-20-PAUJCF-09A-01R,TARGET-20-PAUJMC-09A-01R,TARGET-20-PAUJNJ-09A-01R,TARGET-20-PAUKDH-09A-01R,TARGET-20-PAUKEZ-03A-01R,TARGET-20-PAUKTH-09A-01R,...,TARGET-20-PAXMKU-09A-01R,TARGET-20-PAXMLI-09A-01R,TARGET-20-PAXMLN-09A-01R,TARGET-20-PAXMLW-09A-01R,TARGET-20-PAXMNG-09A-01R,TARGET-20-PAXMNR-09A-01R,TARGET-20-PAXMPF-09A-01R,TARGET-20-PAXMPG-09A-01R,TARGET-20-PAXMPW-09A-01R,TARGET-20-PAXMRA-09A-01R
SCYL3,10.068778,8.569856,10.117643,9.881114,9.436712,10.262095,9.754888,10.022368,8.144658,8.672425,...,8.348728,8.797662,9.207014,9.511753,9.095397,8.353147,8.194757,10.539159,9.047124,9.451211
C1orf112,7.294621,11.408330,9.862637,9.199672,8.413628,9.084808,9.746514,9.411511,8.523562,9.977280,...,6.741467,9.654636,9.812177,9.729621,9.672425,9.169925,7.665336,10.889504,9.865733,10.274960
FGR,12.414157,14.326359,12.146250,14.136190,13.954015,12.002463,10.926296,9.584963,13.896900,8.820179,...,11.756139,13.592807,13.118617,12.899735,12.976206,13.574239,9.622052,10.595257,9.430453,13.702714
CFH,5.087463,2.807355,8.108524,4.247928,2.807355,4.906891,0.000000,4.459432,0.000000,3.584963,...,3.700440,5.044394,3.584963,9.310613,5.000000,5.459432,4.459432,12.147205,5.554589,7.924813
STPG1,7.417853,6.894818,7.930737,7.475733,5.781360,6.507795,6.599913,9.361944,6.554589,7.523562,...,6.965784,7.483816,7.087463,7.366322,7.727920,6.930737,5.321928,7.754888,8.066089,7.761551
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
DUX4L16,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
DUX4L19,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
DUX4L18,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
DUX4L17,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


# Sorting by Relapse

In [72]:
clinical['First Event'].value_counts() # Censored: 0 / Relapse: 1 (Censored means patient dropped out of the study = Patient is likely healthy)

Censored                   475
Relapse                    407
Induction failure           83
Death                       54
Death without remission     19
Name: First Event, dtype: int64

In [None]:
clinical_sorted = clinical.sort_values(by=["Relapse"])
overall_sorted = clinical_sorted.join(log_scaled_data.T, how='inner')
clinical_sorted = overall_sorted.iloc[:,:66]
data_sorted = overall_sorted.iloc[:,67:]

# Sorting by Vital Status

In [67]:
clinical_sorted = clinical.sort_values(by=["Vital Status"])
overall_sorted = clinical_sorted.join(log_scaled_data.T, how='inner')
clinical_sorted = overall_sorted.iloc[:,:66]
data_sorted = overall_sorted.iloc[:,67:]

In [70]:
data_sorted

Unnamed: 0,SCYL3,C1orf112,FGR,CFH,STPG1,NIPAL3,AK2,KDM1A,TTC22,ST7L,...,RNU6-941P,RNA5SP519,RNU6-521P,RN7SKP282,RNU6-255P,DUX4L16,DUX4L19,DUX4L18,DUX4L17,TTTY25P
TARGET-20-PAXMRA-09A-01R,9.451211,10.274960,13.702714,7.924813,7.761551,11.526010,13.180375,11.894439,6.700440,10.133142,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
TARGET-20-PAWBYK-09A-01R,8.060696,9.411511,12.939579,3.321928,7.139551,10.531381,11.976564,11.929258,4.906891,10.952013,...,1.584963,0.0,0.0,0.0,2.807355,0.0,0.0,0.0,0.0,0.0
TARGET-20-PAWCAW-03A-01R,10.507795,10.951285,11.803324,7.118941,8.820179,12.396605,13.947089,12.898223,3.807355,10.501837,...,5.643856,0.0,0.0,0.0,4.643856,0.0,0.0,0.0,0.0,0.0
TARGET-20-PAWCBJ-09A-01R,9.710806,10.369597,14.461863,7.982994,8.129283,11.830515,13.612753,12.262095,7.294621,10.238405,...,5.491853,0.0,0.0,0.0,4.643856,0.0,0.0,0.0,0.0,0.0
TARGET-20-PAWCBZ-09A-01R,9.202124,10.832099,14.617697,8.375039,6.392317,9.839204,12.150065,11.397675,3.321928,10.760720,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TARGET-20-PAVZZC-03A-01R,10.020980,10.122828,14.768391,3.169925,8.471675,11.975131,13.352457,12.001056,7.238405,9.859535,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
TARGET-20-PAWAFT-03A-01R,9.550747,10.256209,11.467606,10.112440,8.071462,10.757390,13.524174,12.687376,2.807355,10.516685,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0
TARGET-20-PAWAIG-09A-01R,9.667112,7.988685,10.821774,6.539159,7.727920,10.705632,11.290595,11.821774,4.523562,9.592457,...,5.807355,0.0,0.0,0.0,3.000000,0.0,0.0,0.0,0.0,0.0
TARGET-20-PAVZFT-09A-01R,9.036174,8.700440,11.525521,8.124121,7.383704,10.479780,12.364408,11.892543,5.209453,9.688250,...,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0


In [69]:
clinical_sorted

Unnamed: 0,Gender,Race,Ethnicity,Age at Diagnosis in Days,First Event,Event Free Survival Time in Days,Vital Status,Overall Survival Time in Days,Year of Diagnosis,Year of Last Follow Up,...,CNS Site of Relapse/Induction Failure,Chloroma Site of Relapse/Induction Failure,Cytogenetic Site of Relapse/Induction Failure,Other Site of Relapse/Induction Failure,Gene Fusion,Gemtuzumab ozogamicin treatment,Refractory Timepoint sent for Induction Failure Project,Comment,Gene Fusion...62,Blast count used for RNA seq
TARGET-20-PAXMRA-09A-01R,Female,White,Not Hispanic or Latino,8799.0,Relapse,223.0,Alive,878.0,2016.0,2018.0,...,No,No,No,No,,,,WHO Classification (final: path then study ent...,,
TARGET-20-PAWBYK-09A-01R,Male,Unknown,Hispanic or Latino,1422.0,Induction failure,76.0,Alive,1611.0,2013.0,2018.0,...,No,No,No,Yes,,,,WHO Classification (final: path then study ent...,,
TARGET-20-PAWCAW-03A-01R,Male,White,Not Hispanic or Latino,1871.0,Induction failure,96.0,Alive,1601.0,2013.0,2018.0,...,No,No,No,No,,,,WHO Classification (final: path then study ent...,,
TARGET-20-PAWCBJ-09A-01R,Male,White,Not Hispanic or Latino,6539.0,Censored,1605.0,Alive,1605.0,2013.0,2018.0,...,Not done,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,
TARGET-20-PAWCBZ-09A-01R,Female,Black or African American,Not Hispanic or Latino,5361.0,Relapse,444.0,Alive,1658.0,2013.0,2018.0,...,No,No,No,No,,,,WHO Classification (final: path then study ent...,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TARGET-20-PAVZZC-03A-01R,Female,White,Not Hispanic or Latino,3373.0,Induction failure,32.0,Dead,480.0,2013.0,2015.0,...,Yes,No,No,No,,,,WHO Classification (final: path then study ent...,,
TARGET-20-PAWAFT-03A-01R,Female,Black or African American,Not Hispanic or Latino,4561.0,Death,270.0,Dead,270.0,2013.0,2014.0,...,Not done,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,
TARGET-20-PAWAIG-09A-01R,Male,White,Hispanic or Latino,6449.0,Relapse,237.0,Dead,266.0,2013.0,2014.0,...,No,No,Yes,No,,,,WHO Classification (final: path then study ent...,,
TARGET-20-PAVZFT-09A-01R,Female,White,Not Hispanic or Latino,4922.0,Death,672.0,Dead,672.0,2013.0,2015.0,...,Not done,Not done,Not done,Not done,,,,WHO Classification (final: path then study ent...,,
