# Growth rate analysis with Bioscreen or any 96-well plate

This notebook contain workflow to prepare and analyse the results of microbial growth rate in Bioscreen Honeycomb plates or any 96-well plate.

### Steps:
1. Install requirements.
2. Import raw data.
3. Prepare raw data for visualisation (numpy, pandas).
4. Visualisation of growth curves.
5. Usage of [pyphe-growthcurves made by Stephan Kamrad from Bahler Lab](https://github.com/Bahler-Lab/pyphe-growthcurves) for determination of maximal slope of growth curves and time of microbial lag phase.
6. Visualisation of max_slope and lap phase on box plots (matplotlib, seaborn).



### Requirements: 
    . numpy >= 1.8.0
    . scipy >= 0.17.0
    . pysam >= 0.8
    . matplotlib >= 1.4.0
    . seaborn
    . pytime
    . parsedatatime
    . pytimeparse

### Install requirenments:

In [None]:
!pip install -r requirements.txt

In [None]:
# Install other requierments with pip install
!pip install parsedatetime

In [None]:
!pip install pytimeparse

In [None]:
!pip install pytime

### Import libraries and moduls:

In [3]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import datetime as DT
from datetime import timedelta
import parsedatetime
from pytimeparse import parse

%matplotlib inline

### Import metadata file with layout of the plate(s):

Prepare the layour in a separate file , only with the names of samples in the raws and columns as below:

In [25]:
metadata = pd.read_csv("Metadata_layout.csv", sep = ',')

## Function to look at the head of DataFrame (only first 5 rows):
# metadata.head() 

### Import the file with layout prepared beforehand:

In [26]:
metadata = pd.read_csv("Metadata_example_results_dots.csv", sep = ',')

### This step will make one column with the names of the samples from all columns in the layout:

In [27]:
column_names = pd.concat([metadata[col_name] for col_name in metadata.columns],ignore_index=True)

### Import the raw results and change index (the first row) to the names of the samples: 

In [28]:
## Import raw data as raw_results_d:
raw_results_d = pd.read_csv("raw_results_example.csv", sep=';')

## Make a list from your column names :
lista_nowych_nazw=list(column_names)

## Add "Time" to the list:
lista_nowych_nazw=["time"]+lista_nowych_nazw

## Use belowed code if you want to change the name of some sample (here 3rd sample):
# lista_nowych_nazw[3]="Other name"

## Add prepared list with names with your DataFrame containing raw results as the index.
raw_results_d.columns=lista_nowych_nazw

## Check it if you want:
#raw_results_d.head()

### Preparation of data to obtain DataFrame with desirable format of time:

In [48]:
## Drop column "time" to make new column with time in hours instead of in the format: %H:%M:%S:
raw_results_bezczas = raw_results_d.drop(['time'], axis=1)

## Transpose DataFrame to add column with proper column names from the metadata layout:
cos_nowego = raw_results_bezczas.transpose()

## Make new variable column_times containing column "time" from DataFrame raw_results_d
column_times = raw_results_d['time']

### Use parse function to gain time in secounds from format '%H:%M:%S':
for i in range(len(column_times)):
    column_times[i] = parse(column_times[i], '%H:%M:%S')
    
## Gain time in hours:
for i in range(len(column_times)):
    column_times[i] = column_times[i]/3600.0 
    
## Add column with time in hours to the DataFrame:
indexed['Time'] = column_times

## Set collumn tima as the index:
indexed_plus_time = indexed.set_index('Time')

## Export new DataFrame to .csv file:
indexed_plus_time.to_csv("probna_tabelka.csv", sep='\t')

indexed_plus_time.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  exec(code_obj, self.user_global_ns, self.user_ns)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()


Unnamed: 0_level_0,WT.1,WT.2,CID1.1,CID1.2,CID11.1,CID11.2,CID13.1,CID13.2,CID16.1,CID16.2,...,WT.etOH.1,WT.etOH.2,CID1.etOH.1,CID1.etOH.2,CID11.etOH.1,CID11.etOH.2,CID13.etOH.1,CID13.etOH.2,CID16.etOH.1,CID16.etOH.2
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.001389,0.176,0.165,0.055,0.103,0.078,0.147,0.099,0.122,0.12,0.061,...,0.144,0.165,0.146,0.114,0.175,0.138,0.15,0.171,0.191,0.108
0.192778,0.086,0.094,0.15,0.123,0.118,0.08,0.051,0.056,0.061,0.102,...,0.093,0.134,0.143,0.149,0.097,0.142,0.122,0.129,0.143,0.164
0.441667,0.102,0.108,0.18,0.133,0.134,0.093,0.065,0.064,0.069,0.122,...,0.106,0.117,0.136,0.131,0.092,0.128,0.117,0.118,0.139,0.154
0.691389,0.112,0.107,0.186,0.137,0.132,0.093,0.065,0.066,0.071,0.116,...,0.12,0.141,0.148,0.153,0.103,0.147,0.129,0.128,0.157,0.157
0.940556,0.102,0.095,0.145,0.115,0.109,0.082,0.048,0.053,0.058,0.104,...,0.121,0.125,0.144,0.142,0.1,0.139,0.127,0.12,0.148,0.148


## THE PLOTS

In [None]:
krzywe_Time = pd.read_csv("probna_tabelka.csv", sep = "\t")

#krzywe_Time.set_index("Time")

#column_names=krzywe_Time.columns


In [None]:
krzywe_Time.head()

In [None]:
def przeszukaj(nazwa,colum_names):
    """nazwa (str),
    column_names (df)
    """
    ile_kopi=sum([nazwa in column for column in column_names])
    return ile_kopi 


nazwy_orinalow=[orginal for orginal in column_names if przeszukaj(orginal,column_names)>1]
print(nazwy_orinalow)

In [None]:
column_names= krzywe_Time.columns
time=krzywe_Time["Time"]

In [None]:
def wyciagnij_wszystkie_powtorzenia(nazwa_orginalu, column_names=column_names):
    kolumny=np.concatenate([np.array(krzywe_Time[name]) for name in column_names if nazwa_orginalu in name])
    
    czasy=np.concatenate([np.array(time) for name in column_names if nazwa_orginalu in name])
    
    zebrana_dataframe= pd.DataFrame(data={'Time': czasy, "OD":kolumny})
    return zebrana_dataframe

In [None]:
proba=wyciagnij_wszystkie_powtorzenia(nazwy_orinalow[1])
proba.head()

In [None]:
mozliwe_szczepy=['WT', 'CID','Blank']

def zrob_nowe_wiersze(column_name):
    proba=wyciagnij_wszystkie_powtorzenia(column_name)
    
    przed_p_kropka=column_name.split(".")[0]
    
    
    proba["strain"]=next(szczep for szczep in mozliwe_szczepy if szczep in przed_p_kropka)
    proba["name"]=column_name
    return proba

lista_nowy_df=[zrob_nowe_wiersze(column_name) for column_name in nazwy_orinalow]

splaszczony_df=pd.concat(lista_nowy_df)

In [None]:
### dla wszystkich szczepow

for szczep_name in mozliwe_szczepy:
    print(szczep_name)
    tylko_szczep_32=splaszczony_df[splaszczony_df["strain"]==szczep_name]
    f = plt.figure(figsize=(10,10))
    plt.title("Growth curve of {}".format(szczep_name))
    plt.axvline(x=10, linewidth=2, color='r')
    sns.lineplot(x="Time", y="OD", data=tylko_szczep_32, hue="name",err_style="bars")
   # f.savefig("{}_8.png".format(szczep_name))

In [None]:
#column_names= krzywe_Time.columns
#time=krzywe_Time["Time"]

f = plt.figure(figsize=(30,30))

len(column_names)
for siatka_n, column_name in enumerate(nazwy_orinalow):
    proba=wyciagnij_wszystkie_powtorzenia(column_name)

    plt.subplot(20,10, siatka_n+1)
    plt.title(column_name)
    plt.ylim([0,1])
    plt.axvline(x=10, linewidth=2, color='r')
    sns.lineplot(data=proba, x="Time", y="OD",err_style="bars")
    #plt.plot(time, krzywe_Time[column_name])
    
plt.tight_layout()
#f.savefig("all_figures_mean.pdf")

In [None]:
#column_names= krzywe_Time.columns
#time=krzywe_Time["Time"]

f = plt.figure(figsize=(30,30))

len(column_names)
for siatka_n, column_name in enumerate(nazwy_orinalow):
    proba=wyciagnij_wszystkie_powtorzenia(column_name)

    plt.subplot(20, 10, siatka_n+1)
    plt.title(column_name)
    plt.ylim([0,1])
    plt.axvline(x=10, linewidth=2, color='r')
    sns.lineplot(data=proba, x="Time", y="OD",err_style="band")
    #plt.plot(time, krzywe_Time[column_name])
    
plt.tight_layout()
#f.savefig("all_figures.pdf")

In [None]:
column_names= krzywe_Time.columns
time=krzywe_Time["Time"]

f = plt.figure(figsize=(30,30))

len(column_names)
for siatka_n, column_name in enumerate(column_names[1:]):
    plt.subplot(20, 10, siatka_n+1)
    plt.title(column_name)
#    plt.ylim([0,0.8])
    plt.axvline(x=10, linewidth=2, color='r')
    plt.plot(time, krzywe_Time[column_name])
    
plt.tight_layout()
#f.savefig("all_figures_2.pdf")

In [None]:
#nieudane próby przygotowania danych do pyphe

In [None]:
krzywe_Time_without_first_cell = krzywe_Time
krzywe_Time_without_first_cell.head()

In [None]:
kkw = krzywe_Time_without_first_cell.rename(columns={"Time": ""})
kkw.head()

In [None]:
time_as_index = kkw.set_index([1])
time_as_index.head()

In [None]:
kk = time_as_index.rename(index=str, columns={"Time": ""})
kk.head()

## Use pyphe-growthcurves module

"pyphe-growthcurves
Python module, including command line interface, for on-parametric characterisation of microbial growth curves written by Stephan Kamrad (stephan.kamrad@crick.ac.uk).
Source of " Source and code: https://github.com/Bahler-Lab/pyphe-growthcurves

In [None]:
pyphe_analysis = pd.read_csv("table_without index.csv", sep = '\t')
pyphe_analysis.head()

In [None]:
#Pyphe growthcurves
!python pyphe-growthcurves_3pkt.py --input pyphe_table.csv --plots

In [None]:
wyniki_pyphe_2 = pd.read_csv("probna_tabelka_bezczas.csv", sep = ',')
wyniki_pyphe_2.head()


In [None]:
wyniki_pyphe_transposed = wyniki_pyphe.transpose()
wyniki_pyphe_transposed.head()

In [None]:
wyniki_pyphe_transposed.reset_index(drop=False)
wyniki_pyphe_transposed.head()

In [None]:
pyphe_wyniki = pd.read_csv("Bioscreen_stresy_lagphase_maxslope_short.csv", sep = '\t')
pyphe_wyniki.head()

In [None]:
# lagfaza
f = plt.figure(figsize=(45,7))
sns.boxplot(x=pyphe_wyniki["Mutant"], y=pyphe_wyniki["lag"])
f.savefig("pyphe_lag_stres_20190517.png")

In [None]:
# max_slope
f = plt.figure(figsize=(45,7))
sns.boxplot(x=pyphe_wyniki["Mutant"], y=pyphe_wyniki["max_slope"])
f.savefig("pyphe_max_slope_stres_20190517.png")

In [None]:
new_krzywe_Time_without_first_cell =krzywe_Time_without_first_cell.rename(index=str, columns={"Time": ""})
new_krzywe_Time_without_first_cell.head()