# Organizing fold-change of differentially expressed elements in a heatmap-formated dataframe

## Prerequisite:
* diffexpanalysis module

## Purpose
Here, we show the user how to organize their differentially expressed elements into heatmap formats for further analyses. In addition, we will be splitting fold-changes for all replicates for different comparisons. This module assumes that the user already have dataframe files (or excel files) with differentially expressed elements. 

First, we import the neccessary packages for this module.

In [36]:
import pandas as pd
import numpy as np

## Trial 1
$$\Delta cheA4\ versus\ WT\ (SP7)$$
$$\Delta cheA1\Delta cheA4\ versus\ WT\ (SP7)$$
$$\Delta cheA1\Delta cheA4\ versus\ \Delta cheA4$$

In [37]:
data = pd.read_excel('cleaned_trial1_analyzed.xlsx')
data.head()

Unnamed: 0.1,Unnamed: 0,Accession,Gene function,"Abundances (Normalized): F7: Sample, Bio Rep1, SP7","Abundances (Normalized): F8: Sample, Bio Rep2, SP7","Abundances (Normalized): F9: Sample, Bio Rep3, SP7","Abundances (Normalized): F4: Sample, Bio Rep1, CheA4","Abundances (Normalized): F5: Sample, Bio Rep2, CheA4","Abundances (Normalized): F6: Sample, Bio Rep3, CheA4","Abundances (Normalized): F1: Sample, Bio Rep1, CheA1CheA4","Abundances (Normalized): F2: Sample, Bio Rep2, CheA1CheA4","Abundances (Normalized): F3: Sample, Bio Rep3, CheA1CheA4",p_correct_A4vssp7,p_correct_A1A4vssp7,p_correct_A1A4vsA4,log2_A4/sp7,log2_A1A4/sp7,log2_A1A4/A4
0,0,A0A0P0F6W5,Uncharacterized protein,11553830.0,6643362.0,13026900.0,34197220.0,37878350.0,36943400.0,15995740.0,11215370.0,20378440.0,0.005841,0.303602,0.023782,1.803848,0.607985,-1.195863
1,1,A0A0N7I7H6,Uncharacterized protein,2349336000.0,2385109000.0,2635075000.0,657277462.9,695382200.0,645693400.0,2195988000.0,2295658000.0,2148146000.0,0.009008,0.213539,0.00339,-1.882759,-0.150433,1.732327
2,2,A0A0P0FD78,Peptide ABC transporter substrate-binding protein,385277000.0,388884000.0,177702100.0,600323064.0,494605300.0,488110000.0,162910100.0,137931700.0,187121000.0,0.101209,0.276157,0.016554,0.73387,-0.963983,-1.697853
3,3,A0A0P0EW12,DNA helicase,2759189000.0,2906706000.0,2672235000.0,222301899.3,253353000.0,193287200.0,2600452000.0,3042813000.0,764031400.0,0.003224,0.553611,0.15411,-3.639771,-0.380008,3.259763
4,4,A0A0P0F5R5,Glutathione S-transferase,2089823000.0,2056875000.0,4708423000.0,655938818.3,798519700.0,780025300.0,9891769000.0,8797010000.0,11740200000.0,0.153216,0.047262,0.023764,-1.98657,1.780862,3.767432


In [38]:
# Indexing out 
sp7_index = data.columns[range(3,6)]
a4_index = data.columns[range(6,9)]
a1a4_index = data.columns[range(9,12)]

#Creating list with name proteins names and accessions
name = []
for i, values in enumerate(data['Accession']):
    name.append(values + ' ' + data['Gene function'][i])
    
#Creating a new empty dataframe filled with zeros
a = np.zeros(shape=(len(data),10))
heatmap_df = pd.DataFrame(a,columns=['protein name','A4/sp7_rep1','A4/sp7_rep2','A4/sp7_rep3','A1A4/sp7_rep1','A1A4/sp7_rep2','A1A4/sp7_rep3',
                                    'A1A4/A4_rep1','A1A4/A4_rep2','A1A4/A4_rep3'])
#Filling in the dataframe
heatmap_df[heatmap_df.columns[0]] = name
heatmap_df[heatmap_df.columns[1]] = list(np.log2(data[a4_index[0]]/data[sp7_index[0]]))
heatmap_df[heatmap_df.columns[2]] = list(np.log2(data[a4_index[1]]/data[sp7_index[1]]))
heatmap_df[heatmap_df.columns[3]] = list(np.log2(data[a4_index[2]]/data[sp7_index[2]]))

heatmap_df[heatmap_df.columns[4]] = list(np.log2(data[a1a4_index[0]]/data[sp7_index[0]]))
heatmap_df[heatmap_df.columns[5]] = list(np.log2(data[a1a4_index[1]]/data[sp7_index[1]]))
heatmap_df[heatmap_df.columns[6]] = list(np.log2(data[a1a4_index[2]]/data[sp7_index[2]]))

heatmap_df[heatmap_df.columns[7]] = list(np.log2(data[a1a4_index[0]]/data[a4_index[0]]))
heatmap_df[heatmap_df.columns[8]] = list(np.log2(data[a1a4_index[1]]/data[a4_index[1]]))
heatmap_df[heatmap_df.columns[9]] = list(np.log2(data[a1a4_index[2]]/data[a4_index[2]]))

heatmap_df.head()


Unnamed: 0,protein name,A4/sp7_rep1,A4/sp7_rep2,A4/sp7_rep3,A1A4/sp7_rep1,A1A4/sp7_rep2,A1A4/sp7_rep3,A1A4/A4_rep1,A1A4/A4_rep2,A1A4/A4_rep3
0,A0A0P0F6W5 Uncharacterized protein,1.565508,2.511388,1.503823,0.469317,0.755492,0.64555,-1.096192,-1.755896,-0.858273
1,A0A0N7I7H6 Uncharacterized protein,-1.837679,-1.778177,-2.028923,-0.097383,-0.055148,-0.294752,1.740296,1.72303,1.734171
2,A0A0P0FD78 Peptide ABC transporter substrate-b...,0.639843,0.346938,1.457745,-1.24182,-1.495386,0.074511,-1.881663,-1.842324,-1.383235
3,A0A0P0EW12 DNA helicase,-3.633652,-3.520165,-3.789229,-0.085482,0.06602,-1.806343,3.54817,3.586185,1.982886
4,A0A0P0F5R5 Glutathione S-transferase,-1.671748,-1.365054,-2.593651,2.242848,2.096559,1.318141,3.914595,3.461613,3.911792


## Trial 2
$$\Delta cheA1\ versus\ WT\ (SP7)$$
$$\Delta cheA1(pBBRTMX) versus\ WT\ (SP7)$$
$$\Delta cheA1\ versus\ \Delta cheA1(pBBRTMX)$$

In [39]:
data = pd.read_excel('cleaned_trial2_analyzed.xlsx')
data.head()

Unnamed: 0.1,Unnamed: 0,CheA1_pBBR_TMX_Rep01,CheA1_pBBR_TMX_Rep02,CheA1_pBBR_TMX_Rep03,CheA1_Rep01,CheA1_Rep02,CheA1_Rep03,sp7_Rep01,sp7_Rep02,sp7_Rep03,Accession,Gene function,p_correct_A1vssp7,p_correct_A1TMXvssp7,p_correct_A1vsA1TMX,log2_A1/sp7,log2_A1TMX/sp7,log2_A1/A1TMX
0,0,29.682458,28.780629,28.815896,30.209004,29.42238,30.518898,30.092997,30.733791,30.465057,A0A060DBW2,Acyl carrier protein (ACP),0.381844,0.025316,0.095875,-0.380521,-1.337621,0.9571
1,1,26.588165,26.235722,26.671305,28.03411,26.608943,27.684494,28.101245,27.734091,27.279743,A0A060DEF4,Phosphatidylserine decarboxylase proenzyme (EC...,0.628135,0.019331,0.149397,-0.262511,-1.206629,0.944118
2,2,27.14637,27.508063,27.661259,28.390654,27.960768,29.168877,28.979082,29.503765,30.593822,A0A060DFR0,MucR family transcriptional regulator (Transcr...,0.121887,0.032033,0.077444,-1.185457,-2.253659,1.068202
3,3,26.829908,24.609909,24.092496,24.461508,23.922084,26.188228,28.217229,28.071609,27.76966,A0A060DG81,Urease subunit gamma (EC 3.5.1.5) (Urea amidoh...,0.039455,0.073926,0.782729,-3.162226,-2.842062,-0.320164
4,4,31.203365,31.392614,30.631203,30.528273,29.868579,30.030057,30.056165,29.67491,29.384763,A0A060DGY9,Protein HflC,0.190911,0.010952,0.03789,0.437023,1.370448,-0.933425


In [40]:
# Indexing out 
a1TMX_index = data.columns[range(1,4)]
a1_index = data.columns[range(4,7)]
sp7_index = data.columns[range(7,10)]

#Creating list with name proteins names and accessions
name = []
for i, values in enumerate(data['Accession']):
    name.append(values + ' ' + data['Gene function'][i])
    
#Creating a new empty dataframe filled with zeros
a = np.zeros(shape=(len(data),10))
heatmap_df = pd.DataFrame(a,columns=['protein name','A1/sp7_rep1','A1/sp7_rep2','A1/sp7_rep3','A1pbbrTMX/sp7_rep1','A1pbbrTMX/sp7_rep2','A1pbbrTMX/sp7_rep3',
                                    'A1/A1pbbrTMX_rep1','A1/A1pbbrTMX_rep2','A1/A1pbbrTMX_rep3'])
#Filling in the dataframe
heatmap_df[heatmap_df.columns[0]] = name
heatmap_df[heatmap_df.columns[1]] = list(np.log2(data[a1_index[0]]/data[sp7_index[0]]))
heatmap_df[heatmap_df.columns[2]] = list(np.log2(data[a1_index[1]]/data[sp7_index[1]]))
heatmap_df[heatmap_df.columns[3]] = list(np.log2(data[a1_index[2]]/data[sp7_index[2]]))

heatmap_df[heatmap_df.columns[4]] = list(np.log2(data[a1TMX_index[0]]/data[sp7_index[0]]))
heatmap_df[heatmap_df.columns[5]] = list(np.log2(data[a1TMX_index[1]]/data[sp7_index[1]]))
heatmap_df[heatmap_df.columns[6]] = list(np.log2(data[a1TMX_index[2]]/data[sp7_index[2]]))

heatmap_df[heatmap_df.columns[7]] = list(np.log2(data[a1_index[0]]/data[a1TMX_index[0]]))
heatmap_df[heatmap_df.columns[8]] = list(np.log2(data[a1_index[1]]/data[a1TMX_index[1]]))
heatmap_df[heatmap_df.columns[9]] = list(np.log2(data[a1_index[2]]/data[a1TMX_index[2]]))

heatmap_df.head()

Unnamed: 0,protein name,A1/sp7_rep1,A1/sp7_rep2,A1/sp7_rep3,A1pbbrTMX/sp7_rep1,A1pbbrTMX/sp7_rep2,A1pbbrTMX/sp7_rep3,A1/A1pbbrTMX_rep1,A1/A1pbbrTMX_rep2,A1/A1pbbrTMX_rep3
0,A0A060DBW2 Acyl carrier protein (ACP),0.005551,-0.062912,0.002547,-0.019817,-0.094728,-0.080291,0.025368,0.031816,0.082838
1,A0A060DEF4 Phosphatidylserine decarboxylase pr...,-0.003451,-0.059749,0.021248,-0.07985,-0.080128,-0.032542,0.076399,0.020379,0.05379
2,A0A060DFR0 MucR family transcriptional regulat...,-0.029596,-0.077495,-0.068811,-0.094253,-0.101045,-0.145374,0.064657,0.023549,0.076563
3,A0A060DG81 Urease subunit gamma (EC 3.5.1.5) (...,-0.206063,-0.230769,-0.084591,-0.072734,-0.189872,-0.204926,-0.133329,-0.040896,0.120335
4,A0A060DGY9 Protein HflC,0.022485,0.009385,0.031339,0.054041,0.081182,0.059934,-0.031556,-0.071797,-0.028595
