## Loading in the data

In [1]:


Moriarty_SuppTable1_name = 'Moriarty_SuppTable1.txt' # Define the file name for Moriarty_SuppTable1


tpmdata    = {}  # Instantiate an empty dictionary as the data structure to store a table.                      
                                       
for line in open(Moriarty_SuppTable1_name):  # Iterate through each line in the file
    
    if line[0] == '#': continue    # Skips the first line with the headers
    line   = line.rstrip('\n')     # Remove trailing white spaces 
    fields = line.split()  # Split line into fields corresponding to gene_name and various time points         

    tpmdata[fields[0]] = [float(s) for s in fields[1:6]] # Assigning gene_name as key and measurements at time points stored in a list as the value
    
Moriarty_SuppTable1 = tpmdata


In [2]:
Adler_SuppTable2_name = 'Adler_SuppTable2.txt' # Define the file name for Adler_SuppTable2

tpmdata    = {}   # Instantiate an empty dictionary as the data structure to store a table.                      
                                       
for line in open(Adler_SuppTable2_name):  
    
    if line[0] == '#': continue      # Skips the first line with the headers
    line   = line.rstrip('\n')       # Remove trailing white spaces
    fields = line.split()           # Split line into fields 

    tpmdata[fields[0]] = [float(s) for s in fields[1:3]]  # Assigning gene_name as key and synth_rate and halflife stored in a list as the value
    
Adler_SuppTable2 = tpmdata



## 1. Check that the gene names 
### Gene names in the form of a date such as '8-Sep' may have originally been 'Sept8' but was errnously changed to '8-Sep' by excel's automatic date conversion feature. Therefore, excel should not part of the preprocess pipline. 


In [3]:
Moriarty_SuppTable_gene_names = set(Moriarty_SuppTable1.keys()) # Turn gene names from dict_keys into a set
Adler_SuppTable2_gene_names = set(Adler_SuppTable2.keys()) # Turn gene names from dict_keys into a set
Moriarty_SuppTable_gene_names.difference(Adler_SuppTable2_gene_names) # Perform the difference between the two sets to get the gene names unique to Moriarty_SuppTable1



{'1-Dec',
 '1-Mar',
 '1-Sep',
 '10-Mar',
 '10-Sep',
 '11-Mar',
 '11-Sep',
 '12-Sep',
 '14-Sep',
 '15-Sep',
 '2-Mar',
 '2-Sep',
 '3-Mar',
 '3-Sep',
 '4-Mar',
 '4-Sep',
 '5-Mar',
 '5-Sep',
 '6-Mar',
 '6-Sep',
 '7-Mar',
 '7-Sep',
 '8-Mar',
 '8-Sep',
 '9-Mar',
 '9-Sep'}

## 2. Explore the data
### I observed that genes with the highest mRNA synthesis rates tend to have shorter half lives, whereas genes with the higest mRNA half lives tend to have lower mRNA synthesis rates. 


### 2.1 Output the five genes with the highest mRNA synthesis rate. 

In [4]:
"""
Using the sorted() python built in function
pass in the dictionary Adler_SuppTable2 as the iterable, and the lambda function passed into the 'key' argument tells the function to sort
based on synth_rate. Subset only the first five.
"""
top_five_synth_rate = sorted(Adler_SuppTable2.items(), key=lambda gene: gene[1][0],reverse=True)[0:5] 

print('Genes with top five synthesis rate:')
for gene in top_five_synth_rate: 
    print(gene[0])



Genes with top five synthesis rate:
CCDC169-SOHLH2
DDX60L
LRRK1
SLC25A45
FARP1


### 2.2 Output the five genes with the longest mRNA halflife. (i.e. in Adler_SuppTable2)

In [5]:
"""
Using the sorted() python built in function
pass in the dictionary Adler_SuppTable2 as the iterable, and the lambda function passed into the 'key' argument tells the function to sort
based on halflife. Subset only the first five.
"""
top_five_halflife = sorted(Adler_SuppTable2.items(), key=lambda gene: gene[1][1],reverse=True)[0:5] 

print('Genes with top five half life:')
for gene in top_five_halflife:
    print(gene[0])




Genes with top five half life:
TFRC
SPINK8
DIRC1
PLA1A
SAMSN1


###  2.3 Output the five genes that have the highest ratio of expression at t=96 hours post-mortem vs. t=0 (i.e. in Moriarty_SuppTable1)


In [6]:
# First iterate through the dictionary to append the ratio of expression to the list in each value 
for gene in Moriarty_SuppTable1.keys():
  Moriarty_SuppTable1[gene].append(Moriarty_SuppTable1[gene][4] / Moriarty_SuppTable1[gene][0])

# Use the sorted() function again to output the five genes that have the higest ratio of expression
top_five_ratio = sorted(Moriarty_SuppTable1.items(), key=lambda gene: gene[1][5],reverse=True)[0:5] 

print('Genes with top five highest ratio of expression at t=96 vs t=0:')
for gene in top_five_ratio: # for loop to print the top five gene names with highest synth_rate
    print(gene[0])
   


Genes with top five highest ratio of expression at t=96 vs t=0:
TFRC
SPINK8
DIRC1
PLA1A
RSPRY1


## 3. Figure out what happened
### I observed that four out five genes that have the highest ratio of expression at t=96 hours post-mortem vs. t=0 are also within the top five genes that have the longest half life. Thus, it is likely that the results observed in the Moriarty experiment that a large number of genes differentially upregulated after the mouse dies is due to the fact that those genes have a slow synthesis rate and a long half life causing TPM to be still increasing hours after the mouse dies. Therefore, we can reject the author's hypothesis that there is an ancient program of cortical gene expression that causes the sand mouse's life to flash before its eyes until further evidence are provided.

In [7]:
gene_names_unique_to_Moriarty = Moriarty_SuppTable_gene_names.difference(Adler_SuppTable2_gene_names) # Getting gene names unqiue to Moriarty using set theory

for gene in Moriarty_SuppTable1.keys():
    """
    Loop through each gene name in Moriarty_SuppTable1. If gene name is not unqiue to Moriarty_SuppTable1, 
    then append the calculated ratios and synth_rate and halflife from Adler_SuppTable2.
    """
    if gene not in gene_names_unique_to_Moriarty:
        Moriarty_SuppTable1[gene].append(Moriarty_SuppTable1[gene][1] / Moriarty_SuppTable1[gene][0])
        Moriarty_SuppTable1[gene].append(Moriarty_SuppTable1[gene][2] / Moriarty_SuppTable1[gene][0])
        Moriarty_SuppTable1[gene].append(Moriarty_SuppTable1[gene][3] / Moriarty_SuppTable1[gene][0])
        Moriarty_SuppTable1[gene].append(Moriarty_SuppTable1[gene][4] / Moriarty_SuppTable1[gene][0])
        Moriarty_SuppTable1[gene].append(Adler_SuppTable2[gene][0])
        Moriarty_SuppTable1[gene].append(Adler_SuppTable2[gene][1])


In [10]:
output = open('merged_table.txt', 'w') # open the output file
for key, value in Moriarty_SuppTable1.items():
     """
     Iterate through each key value pair in the updated Moriarty_SuppTable1. 
     If the gene name is not unique to Moriarty_SuppTable1, then write the relevant numbers of
     the list to the same white-space deliminated line. 
     """
     if key not in gene_names_unique_to_Moriarty:
        output.write("{0:15s} {1:4.2f} {2:4.2f} {3:4.2f} {4:4.2f} {5:4.2f} {6:4.2f}\n".format(str(key), value[6],value[7],value[8],value[9],value[10],value[11]))  
output.close()   