In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Overview

In this exercise, we are going to look at a calibration dataset taken from the [Miriam device](https://github.com/strawlab/miriam).

The purpose of this experiment was to validate that the 96 locations in the device are are homogeneous in terms of their measurement abilities. Therefore 96 well plates of fluorescent dye were prepared where each well was pipetted from a stock solution and any differences between wells should be due to differences in the device (or pipetting differences when loading individual wells). Measurements we performed at several different dilutions of the stock solution.

# Q1 Read the metadata file

The file `data_list_heat_2.csv` contains the "metadata" for the various measurements made. Read this into the dataframe `df_meta` using the Pandas `read_csv` function.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert isinstance(df_meta,pd.DataFrame)
assert len(df_meta.columns)==2
assert len(df_meta)==9

# Q2 read all individual experimental CSV files and `concat` them into one large dataframe `df` with a new column `Concentration` indicating their concentration (called `dilution` in `df_meta` above).

In [None]:
# Hint, here is an example of concatenating multiple dataframes into a bigger one

sample_a = pd.DataFrame({'col 1':[1,2,3],'col 2':[4, 5, 6]})
sample_b = pd.DataFrame({'col 1':[11,12,13],'col 2':[24, 25, 26]})
df = None
for sample_letter, sample_df in [('A', sample_a), ('B', sample_b)]:
    dfi = sample_df.copy()
    dfi['Sample'] = sample_letter
    if df is None:
        df = dfi
    else:
        df = pd.concat([df, dfi])
display(sample_a)
display(sample_b)
display(df)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert len(df)==144
assert len(df.columns==105)
assert 'Time' in df.columns
assert 'A1' in df.columns
assert 'H12' in df.columns
assert 'Concentration' in df.columns

# Q3 create a variable `letters` which is a sequence type in which each item is a character letter from `A` through `H` (8 letters total). Create a variable `numbers` which is a sequence type in which each item is an integer from `1` through `12` (12 numbers total). Create a variable `wells` which is a sequence of strings like `A1`, `A2`, .., `A12`, `B1`, .., `H12`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert len(letters)==8
assert letters[0]=='A'
assert letters[7]=='H'
assert len(numbers)==12
assert numbers[0]==1
assert numbers[11]==12
assert len(wells)==96
assert wells[0]=='A1'
assert wells[95]=='H12'

# Looking at the data

Let's have a look at the data. Run the cell below to examine the layout of the large dataframe we have created, which is just the concatenation of the individual CSV files with the `Concentration` column added.

In [None]:
df

Note that here there are various columns of data regarding the time, temperature measured in the device, the heat commands for the three seperately controlled heating elements in the device, and most of the columns are the raw fluorescence values from the 96 wells `A1` through `H12`.

# Q4 Let's get an overview of the fluorescence as a function of concentration by plotting the values for well A1.

Your plot should look like this:

![well-A1.png](well-A1.png)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

(Strangely, fluorescence decreases above at concentrations over 0.25. We are not certain of the cause, but we think this is a phenomenon related to the flourescent dye used - from an Edding highlighter for these tests-  rather than a problem with the Miriam device itself.)

# Let's now visualize the fluorescence values from all wells across space.

Here is a function `plot_plate` which is a glorified call to matplotlib's `imshow` function to plot values from a 96 well plate.

In [None]:
def plot_plate(well_data_2d, title=None, vmin=0, vmax=1300, ax=None):
    assert well_data_2d.shape == (8, 12)
    if ax is None:
        (fig, ax) = plt.subplots(nrows=1,ncols=1)
    im = ax.imshow(well_data_2d, vmin=vmin, vmax=vmax)
    if title:
        ax.set_title(title)
    ax.set_xticks(np.arange(0, 12, 1))
    ax.set_yticks(np.arange(0, 8, 1))
    ax.set_yticklabels(list('ABCDEFGH'))
    ax.set_xticklabels(list(range(1,13,1)))
    ax.set_xlabel('column')
    ax.set_ylabel('row')
    plt.colorbar(im, ax=ax, label='fluorescence intensity (raw units)')

To be able to use this function, we need to create an array of the correct shape (8 rows, 12 columns). How do we do this from our DataFrame?

Let's create a `test_df` to see the concept:

In [None]:
test_df = pd.DataFrame({'color': ['red', 'green', 'blue', 'purple'], 'column 1':[1,2,3,4], 'column 2': [4,5,6,7], 'column 3': [8,9,10,11]})
test_df

Now, we can extract just some columns by using the DataFrame's `__getitem__` method with a list of column names:

In [None]:
extracted_numerical_columns = test_df[ ['column 1', 'column 2', 'column 3']]
extracted_numerical_columns

In [None]:
# Convert to numpy
np.array(extracted_numerical_columns)

So, with this concept and our `wells` variable we defined earlier, we can extract just the well data and convert it to numpy.

Note that we will have 96 columns in such an array and, if we take only the `n` rows for a given concentration, we will have an `(n,96)` shape numpy array. We want to average across all `n` measurements and then reshape the remaining array of shape `(96,)` to shape `(8,12)`.


# Q5 Create a variable `df_025` which has all the rows of `df` with a concentration of 0.25.

Plot the luminance values for concentration 0.25.

In [None]:
df_025 = df[ df['Concentration']==0.25 ]



In [None]:
assert df_025.shape == (16, 105)

# Q6 Now extract the numerical values of luminance from the well columns, convert to a numpy array, and take the average acrosss all timepoints for that well.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert well_data.shape == (96,)

# Q7 Now plot this using `plot_plate` from above

Your answer should look like this:

![plate025.png](plate025.png)

In [None]:
well_data.shape = 8,12
plot_plate(well_data, title='concentration {}'.format(0.25))
#plt.savefig('plate025.png')

# Q8 Now using groupby and looping over all concentrations, plot all plates.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()