# Analyzing colon tumor gene expression data
Data source: 
- https://dx.doi.org/10.1038%2Fsdata.2018.61
- https://www.ncbi.nlm.nih.gov/gds?term=GSE8671
- https://www.ncbi.nlm.nih.gov/gds?term=GSE20916

### 1. Initialize the environment and variables
Upon launching this page, run the below code to initialize the analysis environment by selecting the cell and pressing `Shift + Enter`

In [None]:
#Set path to this directory for accessing and saving files
import os
import warnings
warnings.filterwarnings('ignore')

__path__  = os.getcwd() + os.path.sep
print('Current path: ' + __path__)

from local_utils import init_tcga, init_GSE8671, init_GSE20916, sort_data
from local_utils import eval_gene, make_heatmap

%matplotlib inline

# Read data 
print("Loading data. Please wait..."')
tcga_scaled, tcga_data, tcga_info, tcga_palette = init_tcga()
GSE8671_scaled, GSE8671_data, GSE8671_info, GSE8671_palette = init_GSE8671()
GSE20916_scaled, GSE20916_data, GSE20916_info, GSE20916_palette = init_GSE20916()
print("Data import complete. Continue below...")

### 2a. Explore a gene of interest in the Unified TCGA data or GSE8671 and GSE20916
- In the first line, edit the gene name (human) within the quotes
- Press `Shift + Enter`

In [None]:
gene = "FABP1" # <-- edit between the quotation marks here

# Do not edit below this line
# ------------------------------------------------------------------------
print("Running analysis. Please wait...")
eval_gene(gene, tcga_data, tcga_info, tcga_palette, 'TCGA (unified)')
eval_gene(gene, GSE8671_data, GSE8671_info, GSE8671_palette, 'GSE8671')
eval_gene(gene, GSE20916_data, GSE20916_info, GSE20916_palette, 'GSE20916')

### 2a. Explore a set of genes in the Unified TCGA data or GSE8671 and GSE20916
- Between the brackets, edit the gene names (human) within the quotes
- If you want to have less than the provided number of genes, remove the necessary number of lines 
- If you want to have more than the provided number of genes, add lines with the gene name in quotes, followed by a comma outside of the quotes
- Press `Shift + Enter`

In [None]:
gene_list = [
    "FABP1", # <-- edit between the quote marks here
    "ME1",
    "ME2",
    "PC", # <-- add more genes by adding a line, the gene name between quotes, and a comma after that quote
    
]

# Do not edit below this line
# ------------------------------------------------------------------------
print("Running analysis. Please wait...")
make_heatmap(gene_list, tcga_scaled, tcga_info, tcga_palette, 'TCGA (unified)')
make_heatmap(gene_list, GSE8671_scaled, GSE8671_info, GSE8671_palette, 'GSE8671')
make_heatmap(gene_list, sort_data(GSE20916_scaled, GSE20916_info, ['adenoma', 'adenocarcinoma','normal_colon']), GSE20916_info, GSE20916_palette, 'GSE20916')