## Credits: 
John Heath wrote code and explanations. 

## Problem 4.1: Analysis of FRAP data (40 pts)

Both problems in this homework set consist of the image processing portion of a greater inference problem. For this problem, you will perform image analysis of data from a fluorescence recovery after photobleaching (FRAP) experiment. The data set comes from [Nate Goehring](https://goehringlab.crick.ac.uk). The images are taken of a *C. elegans* one-cell embryo expressing a GFP fusion to the PH domain of Protein Lipase C delta 1 (PH-PLCd1). This domain binds PIP2, a lipid enriched in the plasma membrane. By using FRAP, we can investigate the dynamics of diffusion of the PH-PLCd1/PIP2 complex on the cell membrane, as well as the binding/unbinding dynamics of PH-PLCd1. The the FRAP experiment, a square patch of the membrane is exposed to intense light, thereby photobleaching the fluorescent molecules. If we quantify how the fluorescence returns to that region, we can infer the diffusion coefficient of the PH-PLCd1/PIP2 complex as well as the binding rate of the two molecules.

We will be taking a simplified approach, but there is more sophisticated analysis we can do to get better estimates for the phenomenological coefficients. To motivate why you are processing the images, I will work through a physical model connecting the diffusion coefficient and binding constats to fluorescence recorvey.

If $c$ is the concentration of the PH-PLDd1/PIP2 complex on the membrane and $c_\mathrm{cyto}$ is the concentration of PH-PLCd1 in the cytoplasm (assumed to be spatially uniform since diffusion in the cytoplasm is very fast), the dynamics are described by a reaction-diffusion equation.

\begin{align}
\frac{\partial c}{\partial t} = D\left(\frac{\partial^2 c}{\partial x^2} + \frac{\partial^2 c}{\partial y^2}\right) + k_\mathrm{on} c_\mathrm{cyto} - k_\mathrm{off} c.
\end{align}

Here, $k_\mathrm{on}$ and $k_\mathrm{off}$ are the phenomenological rate constants for binding and unbinding to PIP2 on the membrane, and $D$ is the diffusion coefficient for the PH-PLCd1/PIP2 complex on the membrane.

In [their paper](http://dx.doi.org/10.1016/j.bpj.2010.08.033), the authors discuss techniques for analyzing the data taking into account the fluorescence recovery of the bleached region in time and space. For simplicity here, we will only consider recovery of the normalized mean fluorescence. If $I(t)$ is the mean fluorescence of the bleached region and $I_0$ is the mean fluorescence of the bleached region immediately before photobleaching, we have, as derived in the paper,

\begin{align}
I_\mathrm{norm}(t) \equiv \frac{I(t)}{I_0} &= 
1 - f_b\,\frac{4 \mathrm{e}^{-k_\mathrm{off}t}}{d_x d_y}\,\psi_x(t)\,\psi_y(t),\\[1mm]
\text{where } \psi_i(t) &= \frac{d_i}{2}\,\mathrm{erf}\left(\frac{d_i}{\sqrt{4Dt}}\right)
-\sqrt{\frac{D t}{\pi}}\left(1 - \mathrm{e}^{-d_i^2/4Dt}\right),
\end{align}

where $d_x$ and $d_y$ are the extent of the photobleached box in the $x$- and $y$-directions, $f_b$ is the fraction of fluorophores that were bleached, and $\mathrm{erf}(x)$ is the [error function](http://en.wikipedia.org/wiki/Error_function).  Note that this function is defined such that the photobleaching event occurs at time $t = 0$.

We measure $I(t)$, $d_x$, and $d_y$. We can also measure $f_b$ as

\begin{align}
f_b \approx 1 - \frac{I(0^+)}{I_0},
\end{align}

though we will consider this a parameter to estimate.  In practice, the normalized fluorescent recovery does not go all the way to unity.  This is because the FRAP area is a significant portion of the membrane, and we have depleted fluorescent molecules.  We should thus adjust our equation to be

\begin{align}
I_\mathrm{norm}(t) \equiv \frac{I(t)}{I_0} &= 
f_f\left(1 - f_b\,\frac{4 \mathrm{e}^{-k_\mathrm{off}t}}{d_x d_y}\,\psi_x(t)\,\psi_y(t)\right),
\end{align}

where $f_f$ is the fraction of fluorescent species left.  So, we have four parameters to use in regression, the physical parameters of interest, $D$ and $k_\mathrm{off}$, and $f_f$ and $f_b$.

The FRAP images come in a **TIFF stack**, which is a single TIFF file containing multiple frames.  You can load these with the `skimage.io.ImageCollection` class. Note that for this TIFF stack, the image collection is a list that contains a single image, which has all 149 frames.

Your task in this problem is to extract the mean normalized fluorescence versus time from each of the TIFF stacks for the experimental repeats.  Note that important information is contained in the associated README file.  You can download the data set [here](../data/goehring_FRAP_data.zip).

The rest of the inference will be tackled in a subsequent homework. Be sure to store the results of your analysis in a CSV file so you can use it in future homework. (This is a good idea, anyway.)

## Solution

We start by importing the tools of the trade. 

In [None]:
import numpy as np

import os
import glob

# Image processing tools
import skimage
import skimage.io
import skimage.filters
import skimage.morphology

import bebi103

import bokeh
import bokeh.plotting
import bokeh.io
from bokeh.palettes import Dark2_5 as palette
from bokeh.models import Legend, LegendItem
import itertools
bokeh.io.output_notebook()

In [None]:
# Load in TIFF stack
# The directory containing daytime data
data_dir = '../data/goehring_FRAP_data'

# Glob string for images
im_glob = os.path.join(data_dir, '*.tif')

# Get list of files in directory
im_list = sorted(glob.glob(im_glob))


image_collections = np.asarray(skimage.io.ImageCollection(im_list,
                                          conserve_memory=False))

I want to take a look at the metadata for this dataset, because this will give hints as to how we should start to probe for the photobleaching region.  

In [None]:
! textutil -convert txt '../data/goehring_FRAP_data/readme_PHdata.rtf' -output './4.1_readme.txt'
! head -200 '4.1_readme.txt'
! rm './4.1_readme.txt'

Okay, so the first photobleached frame is number 21. This will probably be the most valuable frame for determining the ROI. Let's load this image!

In [None]:
ic = image_collections[0]
print("Max: %i, Min: %i" % (np.max(ic), np.min(ic)))

These are 12-bit images, which max out at the value $2^{12} - 1 = 4095$. 

In [None]:
INTERPIXEL_DIST = 0.138 # in μm / pixel

def display_im(image, title):
    "Displays a passed image."
    plot = bebi103.viz.imshow(image,
                              plot_height=300,
                              title=title,
                              interpixel_distance=INTERPIXEL_DIST,
                              length_units='µm')
    return plot
plots = [display_im(ic[i], "Frame %i"%(i+1)) for i in range(19, 22)]
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=3))

Cool! The metadata was correct, and it is absolutely clear that photobleaching occurs first in frame 21, and we can even see recovery begining in the 22nd frame. We know that the photobleached area is a 40x40 square, and we need to find the center in order to establish the ROI. I will do this by passing a mean filter over the image using a 40x40 structuring element, such that the darkest pixel in the filtered image actually represents the square containing the darkest 1600 pixels. 

In [None]:
# I will use a 40x40 square structuring element
sq = skimage.morphology.square(40, dtype= np.uint16)

# Perform the mean filter, using an offset so that the darkest pixel 
# is the top left corner of the ROI
bleach_start_filt = skimage.filters.rank.mean(ic[20],
                                              sq, 
                                              shift_x = -20, 
                                              shift_y = -20)
# Display the result of the mean filter
bokeh.io.show(display_im(bleach_start_filt, "Mean filter"))

Let's find the minimum within this image! One problem with the way I set up this filter is that when it calculates the values for the bottom of the image, most of the values are zero (because they are out of the bounds of the image). We know *roughly* where the roi is, so I will just slice the bottom of the filtered image prior to finding the min. Slicing 39 pixels off the bottom can be done without the chance of missing the actual ROI in any of the images, since the ROI must be determined by a 40x40 pixel square in the image. 

In [None]:
# Cut off the bottom because of distorition due to filter. 
sliced = bleach_start_filt[:-39]
# Display the sliced image to be sure I didn't cut out the ROI
bokeh.io.show(display_im(sliced, "Mean filter"))
# Get indices of the minimum-valued pixel
ind = np.unravel_index(np.argmin(sliced, axis=None), sliced.shape)
print("The top left region of the ROI is found at (%i, %i)"%ind)

Great. Let's see how our ROI:
$$
X \rightarrow (12, 51)\\
Y \rightarrow (22, 61)
$$
matches up to the original photobleaching frame. 

In [None]:
# Make grayscale image that is now RGB
im = np.dstack(3*[ic[20]])

# Pump up blue channel to highlight roi. 
im[12:52, 22:62, 2] = 1500

# Create image with ROI highlighted
p = bebi103.viz.imshow(im, 
                       color_mapper='rgb',
                       interpixel_distance=INTERPIXEL_DIST,
                       length_units='µm')

# Create original image with the same color scheme
gray_color_mapper = bebi103.viz.mpl_cmap_to_color_mapper('gray')
plots = [p, bebi103.viz.imshow(ic[20], 
                               color_mapper = gray_color_mapper,
                               interpixel_distance=INTERPIXEL_DIST,
                               length_units='µm')]

# Display both side-by-side
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

Our ROI looks fantastic! I will now traverse through the entire image collection and collect summed raw intensity data from this ROI. 

In [None]:
raw_intensity = np.zeros(len(ic))
for image, i in zip(ic, range(0, len(ic))):
    raw_intensity[i] = np.sum(image[12:52, 22:62])

Now we want to normalize the intensity to be between 0 and 1. 

In [None]:
print("Maximum intensity: %i\nMinimum Intensity:  %i"
      %(np.max(raw_intensity), np.min(raw_intensity)))

In [None]:
# subtract the min and divide by the new max to normalize
intensity = ((raw_intensity - np.min(raw_intensity)) / 
                    (np.max(raw_intensity) - np.min(raw_intensity)))

# Time in seconds. Frames are spaced 0.188 seconds apart
# according to the metadata (printed above)
time = 0.188 * np.asarray(range(0, len(ic)))

Now we can plot average normalized intensity over time!

In [None]:
p = bokeh.plotting.Figure(width = 600, 
                          height = 400,
                          title = "Normalized Fluorescence Intensity over Time",
                          x_axis_label = "Time (seconds)")
p.circle(time, intensity)
bokeh.io.show(p)

Now I shall streamline this process for each image collection. I will start by making a more general function to determine ROIs. 

In [None]:
def get_roi(ic):
    """Accepts an image collection and finds the pixels that correpsond to 
       the upper left corner of the ROI for the frap experiment of that
       image collection."""
    # I will use a 40x40 square structuring element
    sq = skimage.morphology.square(40, dtype= np.uint16)
    
    # Perform the mean filter, using an offset so that the darkest pixel 
    # is the top left corner of the ROI
    bleach_start_filt = skimage.filters.rank.mean(ic[20], 
                                                  sq, 
                                                  shift_x = -20, 
                                                  shift_y = -20)
    
    # Cut off the bottom because of distorition due to filter. I had to 
    # include some additional bounds in order to correctly find all ROIs in
    # the dataset. 
    sliced = bleach_start_filt[:-49][10:]
    
    # Get indices of the minimum-valued pixel
    ind = np.asarray(np.unravel_index(np.argmin(sliced, axis=None), sliced.shape))
    
    # Correction for slicing
    ind += np.array([10,0])
    return ind

Now I will write a function to streamline the process of extracting and normalizing intensity data from each image collection. 

In [None]:
def get_intensity(ic, roi):
    """Computes the normalized intensities within the previously determined ROI
    over the duration of the experiment."""
    # Extract raw intensity values, summed over the ROI
    raw_intensity = np.zeros(len(ic))
    for image, i in zip(ic, range(0, len(ic))):
        raw_intensity[i] = np.sum(image[roi[0]:roi[0] + 40, roi[1]:roi[1] + 40])

    # subtract the min and divide by the new max to normalize
    intensity = ((raw_intensity - np.min(raw_intensity)) / 
                        (np.max(raw_intensity) - np.min(raw_intensity)))
    return intensity

Now I can cycle through the set of image collections and collect intensities.

In [None]:
# This will store intensity arrays for each experiment. 
intensities = [0]*len(image_collections)

# Time values are the same for each experiment.
time = 0.188 * np.asarray(range(0, len(image_collections[0])))
    
for ic, index in zip(image_collections, range(0, len(image_collections))):
    
    # Obtain indicies of roi
    roi = get_roi(ic)
    
    # Compute all total normalized fluoresnce intensities
    intensity = get_intensity(ic, roi)
    intensities[index] = intensity

Now I will plot all intensities in a very similar way to the method I used for HW 3.2, because I love interactive legends. 

In [None]:
# Deals with coloring of different lines
colors = itertools.cycle(palette)

first = True # used to only show the first plot. 

p = bokeh.plotting.Figure(width = 800, 
                          height = 500,
                          title = "Normalized Fluorescence Intensity over Time",
                          x_axis_label = "time (seconds)")

plots = [0] * len(image_collections)

for index, color in zip(range(0,len(image_collections)), colors):
    
    # This is the experiment name (A-H)
    name = im_list[index][-5:-4]
    
    q = p.line(time, 
               intensities[index], 
               line_width=2,
               color = color,
               visible = first, # used to only show the first plot. 
               line_join='bevel')
    
    # I must store the plots as legendItems so that I can make a cool legend. 
    plots[index] = LegendItem(label=name, renderers=[q])
    
    first = False # used to only show the first plot. 

# Make a super cool legend!!
legend = Legend(items=plots,
                location=(10,200),
                click_policy = "hide")

# Place the legend outside the plot area
p.add_layout(legend, 'right')

bokeh.io.show(p)

The data looks fabulous. The photobleaching clearly occurs at the same timepoint for each experiment, and slight discrepencies in recovery times can be easily discerned. For example, it appears that experiment E had a slower recovery than did experiment A, and experiments D and H experienced the least-complete full recovery. 

# Grading

**Grade: 37/40**

Good job overall!

Note: You could have performed the filtering using your structuring element without passing it over those edge pixels that would give artificially low values by taking into account pixels outside the image frame.

When calculating mean normalized intensity, as the equation in the problem statement shows, you only need to divide the raw intensity by $I_0$ such that the value immediately before photobleaching (or the maximum value, however you choose to define $I_0$) is 1.  Normalizing the intensity as per the mathematical model does not require making the minimum equal to 0, and this would affect subsequent analysis. (-2 pts)

You were told to store the data as a csv. (-1 pt)
