<p><font size="6"><b> CASE - Bacterial resistance experiment</b></font></p>


> *DS Data manipulation, analysis and visualisation in Python*  
> *December, 2017*

> *© 2017, Joris Van den Bossche and Stijn Van Hoey  (<mailto:jorisvandenbossche@gmail.com>, <mailto:stijnvanhoey@gmail.com>). Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)*

---

In this case study, we will make use of the open data, affiliated to the following [journal article](http://rsbl.royalsocietypublishing.org/content/12/5/20160064):

>Arias-Sánchez FI, Hall A (2016) Effects of antibiotic resistance alleles on bacterial evolutionary responses to viral parasites. Biology Letters 12(5): 20160064. https://doi.org/10.1098/rsbl.2016.0064



<img src="http://blogs.discovermagazine.com/notrocketscience/files/2011/05/Bacteriophage.jpg">

Check the full paper on the [web version](http://rsbl.royalsocietypublishing.org/content/12/5/20160064). The study handles:
> Antibiotic resistance has wide-ranging effects on bacterial phenotypes and evolution. However, the influence of antibiotic resistance on bacterial responses to parasitic viruses remains unclear, despite the ubiquity of such viruses in nature and current interest in therapeutic applications. We experimentally investigated this by exposing various Escherichia coli genotypes, including eight antibiotic-resistant genotypes and a mutator, to different viruses (lytic bacteriophages). Across 960 populations, we measured changes in population density and sensitivity to viruses, and tested whether variation among bacterial genotypes was explained by their relative growth in the absence of parasites, or mutation rate towards phage resistance measured by fluctuation tests for each phage

In [None]:
%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt

In [None]:
import seaborn as sns
plt.style.use('seaborn-white')

## Reading and processing the data

The data is available on [Dryad](http://www.datadryad.org/resource/doi:10.5061/dryad.90qb7.3), a general purpose data repository providing all kinds of data sets linked to journal papers. The downloaded data is available in this repository in the `data` folder as an excel-file called `Dryad_Arias_Hall_v3.xlsx`.

For the exercises, two sheets of the excel file will be used: 
* `Main experiment`: 


| Variable name | Description |
|---------------:|:-------------|
|**AB_r** |	Antibotic resistance |
|**Bacterial_genotype** | Bacterial genotype |
|**Phage_t** |	Phage treatment |
|**OD_0h** |	Optical density at the start of the experiment (0h) |
|**OD_20h**	| Optical density after 20h |
|**OD_72h**	| Optical density at the end of the experiment (72h) |
|**Survival_72h** |	Population survival at 72h (1=survived, 0=extinct) |
|**PhageR_72h**	| Bacterial sensitivity to the phage they were exposed to (0=no bacterial growth, 1= colony formation in the presence of phage) |

* `Falcor`: we focus on a subset of the columns:

| Variable name | Description |
|---------------:|:-------------|
| **Phage**  | Bacteriophage used in the fluctuation test (T4, T7 and lambda) |
| **Bacterial_genotype** | Bacterial genotype. |
| **log10 Mc** |	Log 10 of corrected mutation rate |
| **log10 UBc** |	Log 10 of corrected upper bound |
| **log10 LBc** |	Log 10 of corrected lower bound |

Reading the `main experiment` data set from the corresponding sheet:

In [None]:
main_experiment = pd.read_excel("../data/Dryad_Arias_Hall_v3.xlsx", sheet_name="Main experiment")
main_experiment.head()

Read the `Falcor` data and subset the columns of interest:

In [None]:
falcor = pd.read_excel("../data/Dryad_Arias_Hall_v3.xlsx", sheet_name="Falcor", 
                       skiprows=1)
falcor = falcor[["Phage", "Bacterial_genotype", "log10 Mc", "log10 UBc", "log10 LBc"]]
falcor.head()

## Tidy the `main_experiment` data

*(If you're wondering what `tidy` data representations are, check again the `pandas_07_reshaping.ipynb` notebook)*

Actually, the columns `OD_0h`, `OD_20h` and `OD_72h` are representing the same variable (i.e. `optical_density`) and the column names itself represent a variable, i.e. `experiment_time_h`. Hence, it is stored in the table as *short* format and we could *tidy* these columns by converting them to 2 columns: `experiment_time_h` and `optical_density`.

<div class="alert alert-success">

<b>EXERCISE</b>:

 <ul>
  <li>Convert the columns `OD_0h`, `OD_20h` and `OD_72h` to a long format with the values stored in a column `optical_density` and the time in the experiment as `experiment_time_h`. Save the variable as `tidy_experiment`</li>

</ul>
</div>

In [None]:
# %load _solutions/case3_bacterial_resistance_lab_experiment5.py

## Visual data exploration

In [None]:
tidy_experiment.head()

<div class="alert alert-success">

<b>EXERCISE</b>:

 <ul>
  <li>Make a histogram to check the distribution of the `optical_density`</li>
  <li>Change the border color of the bars to `lightgrey`</li>

</ul>
</div>

In [None]:
# %load _solutions/case3_bacterial_resistance_lab_experiment7.py

<div class="alert alert-success">

<b>EXERCISE</b>:

 <ul>
  <li>Use a *violin plot* to check the distribution of the `optical_density` in each of the experiment time phases (`experiment_time_h`)</li>

</ul>
</div>

In [None]:
# %load _solutions/case3_bacterial_resistance_lab_experiment8.py

<div class="alert alert-success">

<b>EXERCISE</b>:

 <ul>
  <li>Create a summary table of the average `optical_density` with the `Bacterial_genotype` in the rows and the `experiment_time_h` in the columns</li>
</ul>
</div>



In [None]:
# %load _solutions/case3_bacterial_resistance_lab_experiment9.py

In [None]:
# %load _solutions/case3_bacterial_resistance_lab_experiment10.py

<div class="alert alert-success">

<b>EXERCISE</b>:

 <ul>
  <li>Calculate for each combination of `Bacterial_genotype`, `Phage_t` and `experiment_time_h` the *mean* `optical_density` and store the result as a dataframe called `density_mean`</li>
  <li>Based on `density_mean`, make a *barplot* of the mean values for each `Bacterial_genotype`, with for each Bacterial_genotype an individual bar per `Phage_t` in a different color (grouped bar chart).</li>
  <li>Use the `experiment_time_h` to split into subplots. As we mainly want to compare the values within each subplot, make sure the scales in each of the subplots are adapted to the data range, and put the subplots on different rows.</li>

</ul>
</div>



In [None]:
# %load _solutions/case3_bacterial_resistance_lab_experiment11.py

In [None]:
density_mean.head()

In [None]:
# %load _solutions/case3_bacterial_resistance_lab_experiment13.py