# HNSCC Progressors and NonProgressors

## Annotate HNSCC TCGA Tumors as Progressors or NonProgressors

### McWeeney Lab, Oregon Health & Science University

** Author: Gabrielle Choonoo (choonoo@ohsu.edu) **

## Introduction

This is the step-by-step workflow for annotating HNSCC tumors as Progressors or NonProgressors based on clinical data from TCGA. 

Required Files:
* Clinical Data (.txt): [[Can download from TCGA using this method]](https://github.com/gchoonoo/HNSCC_Clinical_Data_Notebook)
* This notebook (Annotate_Progressors_Notebook.ipynb): [[Download here]](https://raw.githubusercontent.com/gchoonoo/HNSCC_Annotate_Tumor_Progression/master/Annotate_Progressors_Notebook.ipynb)

**Note: this notebook can also be downloaded as an R script (only the code blocks seen below will be included):[[Download R script here]](https://raw.githubusercontent.com/gchoonoo/HNSCC_Annotate_Tumor_Progression/master/annotate_progressors.r)

** All code is available on GitHub: [https://github.com/gchoonoo/HNSCC_Annotate_Tumor_Progression](https://github.com/gchoonoo/HNSCC_Annotate_Tumor_Progression) **

## Annotate HNSCC Tumor Progressors and NonProgressors

### Original criteria: Patients were first classified as progressor or nonprogressor based on follow-up annotation, specifically the presence or absence of a new tumor event. We required annotation to confirm the tumor event (days to new tumor and/or new tumor anatomical location). All patients were required to have treatment annotation in addition to the follow-up data. https://www.ncbi.nlm.nih.gov/pubmed/26747525

# Read in cleaned clinical data

In [None]:
read.delim(file="raw_clinical_data.txt", sep="\t", header=T, stringsAsFactors = F) -> hnsc_data

# Add new column for annotation

In [None]:
hnsc_data$Progression_FINAL_new = NA

### Criteria:

### Progressor if they had any one of these (all occurrences of these columns v1-6):
* "days_to_new_tumor_event_after_initial_treatment" not NA
* "new_tumor_event_after_initial_treatment" = YES
* "new_neoplasm_event_occurrence_anatomic_site" not NA
* "followup_treatment_success" = Progressive Disease or Persistent Disease

### Nonprogressor if they had any one of these (all occurrences of these columns v1-6) and weren't annotated as Progressor based on above criteria:
* "days_to_new_tumor_event_after_initial_treatment" = NA
* "new_tumor_event_after_initial_treatment"  = NO
* "new_neoplasm_event_occurrence_anatomic_site" = NA
* "followup_treatment_success"  = Complete Remission/Response, Stable Disease, Partial Remission/Response, or NA

In [None]:
# Observe all occurrences of the above columns:

names(hnsc_data_v2)[grep("days_to_new_tumor_event_after_initial_treatment", names(hnsc_data_v2))]

names(hnsc_data_v2)[grep("new_tumor_event_after_initial_treatment", names(hnsc_data_v2))][-grep("days",names(hnsc_data_v2)[grep("new_tumor_event_after_initial_treatment", names(hnsc_data_v2))])]

names(hnsc_data_v2)[grep("new_neoplasm_event_occurrence_anatomic_site", names(hnsc_data_v2))]

names(hnsc_data_v2)[grep("followup_treatment_success", names(hnsc_data_v2))]


# Annotate Progressors

In [None]:
hnsc_data[
  
  which(!is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment"]) | 
          
          !is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment1"]) |
          
          !is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment2"]) |
          
          !is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment3"]) |
          
          !is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment4"]) |
          
          !is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment5"]) |
          
          hnsc_data[,"new_tumor_event_after_initial_treatment"] == "YES" | 
          
          hnsc_data[,"new_tumor_event_after_initial_treatment1"] == "YES" |
          
          hnsc_data[,"new_tumor_event_after_initial_treatment2"] == "YES" |
          
          hnsc_data[,"new_tumor_event_after_initial_treatment3"] == "YES" |
          
          hnsc_data[,"new_tumor_event_after_initial_treatment4"] == "YES" |
          
          hnsc_data[,"new_tumor_event_after_initial_treatment5"] == "YES" |
          
          hnsc_data[,"new_tumor_event_after_initial_treatment6"] == "YES" |
          
          !is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site"]) | 
          
          !is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site1"]) |  
          
          !is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site2"]) |
          
          !is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site3"]) |
          
          !is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site4"]) |
          
          !is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site5"]) |
          
          hnsc_data[,"followup_treatment_success"] == "Progressive Disease" | 
          
          hnsc_data[,"followup_treatment_success"] == "Persistent Disease" |
          
          hnsc_data[,"followup_treatment_success1"] == "Progressive Disease" | 
          
          hnsc_data[,"followup_treatment_success1"] == "Persistent Disease" | 
          
          hnsc_data[,"followup_treatment_success2"] == "Progressive Disease" | 
          
          hnsc_data[,"followup_treatment_success2"] == "Persistent Disease" |
          
          hnsc_data[,"followup_treatment_success3"] == "Progressive Disease" | 
          
          hnsc_data[,"followup_treatment_success3"] == "Persistent Disease" |
          
          hnsc_data[,"followup_treatment_success4"] == "Progressive Disease" | 
          
          hnsc_data[,"followup_treatment_success4"] == "Persistent Disease" |
          
          hnsc_data[,"followup_treatment_success5"] == "Progressive Disease" | 
          
          hnsc_data[,"followup_treatment_success5"] == "Persistent Disease"),
  
  "Progression_FINAL_new"] <- "Progressor"


# Annotate NonProgressors

In [None]:
hnsc_data[which((
  
  is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment"]) | 
    
    is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment1"]) | 
    
    is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment2"]) | 
    
    is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment3"]) | 
    
    is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment4"]) | 
    
    is.na(hnsc_data[, "days_to_new_tumor_event_after_initial_treatment5"]) | 
    
    hnsc_data[,"new_tumor_event_after_initial_treatment"] == "NO" | 
    
    hnsc_data[,"new_tumor_event_after_initial_treatment1"] == "NO" | 
    
    hnsc_data[,"new_tumor_event_after_initial_treatment2"] == "NO" | 
    
    hnsc_data[,"new_tumor_event_after_initial_treatment3"] == "NO" | 
    
    hnsc_data[,"new_tumor_event_after_initial_treatment4"] == "NO" | 
    
    hnsc_data[,"new_tumor_event_after_initial_treatment5"] == "NO" | 
    
    hnsc_data[,"new_tumor_event_after_initial_treatment6"] == "NO" | 
    
    is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site"]) | 
    
    is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site1"]) | 
    
    is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site2"]) | 
    
    is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site3"]) | 
    
    is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site4"]) | 
    
    is.na(hnsc_data[,"new_neoplasm_event_occurrence_anatomic_site5"]) | 
    
    hnsc_data[,"followup_treatment_success"] == "Complete Remission/Response" |
    
    hnsc_data[,"followup_treatment_success"] == "Stable Disease" | 
    
    hnsc_data[,"followup_treatment_success"] == "Partial Remission/Response" | 
    
    is.na(hnsc_data[,"followup_treatment_success"]) |
    
    hnsc_data[,"followup_treatment_success1"] == "Complete Remission/Response" |
    
    hnsc_data[,"followup_treatment_success1"] == "Stable Disease" | 
    
    hnsc_data[,"followup_treatment_success1"] == "Partial Remission/Response" | 
    
    is.na(hnsc_data[,"followup_treatment_success1"]) |
    
    hnsc_data[,"followup_treatment_success2"] == "Complete Remission/Response" |
    
    hnsc_data[,"followup_treatment_success2"] == "Stable Disease" | 
    
    hnsc_data[,"followup_treatment_success2"] == "Partial Remission/Response" | 
    
    is.na(hnsc_data[,"followup_treatment_success2"]) |
    
    hnsc_data[,"followup_treatment_success3"] == "Complete Remission/Response" |
    
    hnsc_data[,"followup_treatment_success3"] == "Stable Disease" | 
    
    hnsc_data[,"followup_treatment_success3"] == "Partial Remission/Response" | 
    
    is.na(hnsc_data[,"followup_treatment_success3"]) |
    
    hnsc_data[,"followup_treatment_success4"] == "Complete Remission/Response" |
    
    hnsc_data[,"followup_treatment_success4"] == "Stable Disease" | 
    
    hnsc_data[,"followup_treatment_success4"] == "Partial Remission/Response" | 
    
    is.na(hnsc_data[,"followup_treatment_success4"]) |
    
    hnsc_data[,"followup_treatment_success5"] == "Complete Remission/Response" |
    
    hnsc_data[,"followup_treatment_success5"] == "Stable Disease" | 
    
    hnsc_data[,"followup_treatment_success5"] == "Partial Remission/Response" | 
    
    is.na(hnsc_data[,"followup_treatment_success5"])) & is.na(hnsc_data[,"Progression_FINAL_new"])),"Progression_FINAL_new"] <- "NonProgressor"

# Save data

In [None]:
write.table(file="clinical_data_annotated.txt", x=hnsc_data, sep="\t", quote=F, row.names=F)