In [157]:
%matplotlib inline

# Virtual FCS, using MD Simulation Data

This goal of this notebook is to generate a simulated FCS measurement using data from a GROMACS simulation.

The setup of the virtual system is a membrane with a circularly symmetric incident beam.

Data is read in from an .xtc or .trr file, using a .gro file to define the system topology. Data frames from the input file are iterated through, and for each frame, a detected intensity from each lipid is calculated. The intensity trace and autocorrelation functions for each lipid, and for the total at each frame, are plotted.

## TODOs


#### ☒ Why don't autocorrelation curves look right? Flatness in middle is unexpected
- Done. See below
 
#### ☒ Generate and plot autocorrelation curve prediction. I'm still not understanding something about autocorrelation curves - I don't see where the sigmoid shape comes from.
 - Done. Curves need to be plotted on a semilog scale in order to see the expected sigmoid shape.
 
#### ☒ Try autocorrelating the data myself instead of using acorr. Maybe something weird is happening with the way matplotlib does autocorrelation that isn't desirable.
 - Done. The manually autocorrelated data is identical to calling plt.acorr with normed=True
 
#### ☒ Parallelize data analysis
 - Tried it. The analysis is quite computationally cheap compared to the extra overhead from spawning new threads -- not worth the time saved by running the analysis in parallel.
 
#### ☒ Handle breaking data up into bins better -- currently ignores the remainder of lipids that don't fall into a bin. (I.e. 11 lipids, bin size 3, the remaining 2 that don't evenly fall into bins are discarded.)
 - Done. The leftover lipids are put into their own bin, which may be smaller than the BIN_SIZE.
 
#### ☒ How do PBC vs unwrapped affect prediction of diffusion constant?
 - Done. Don't appear to significant affect results.
 
#### ☐ Curve fit method seems to work better for small spot sizes, but 0.5-crossing method seems better for larger spots? Why?

#### ☐ Come up with a way to score the different tests in the Excel spreadsheet. Factor in how close the average is, and how big the error is.
     
[Checkbox symbols]:<> (☒ ☐)

It's recommended to 'unwrap' simulation data to remove potential artifacts from periodic boundary conditions in the simulation.

### Method 1 (Better for big systems)
The *best* way of doing this requires a trajectory file (i.e. `.xtc`, `.trr`), a `.gro`, and a `.tpr`. By doing this, the input data filesizes can be significantly reduced off the bat by selecting only the lipid groups to keep in the new trajectory file. This can be accomplished by running the following.

When prompted to select a group, select only the group of lipids.

`$>gmx trjconv -f <trajectory file> -s <.gro file> -o <output trajectory file> -pbc nojump`

`$>gmx trjconv -f <.gro file> -s <.tpr file> -o <output .gro file> -pbc nojump`

### Method 2
This can also be done in one step, with only a trajectory file and a `.gro` file by selecting the group of ALL atoms when prompted.

`$>gmx trjconv -f <trajectory file> -s <.gro file> -o <output trajectory file> -pbc nojump`

## Notes on input data

In [158]:
trajectory_file = "40nm/run_nojump.xtc"
trr_file = "40nm/run.trr"
topology_file = "40nm/system.gro"

In [159]:
import mdtraj as md
import numpy as np
from scipy.optimize import curve_fit, fsolve
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt

import multiprocessing
from multiprocessing.pool import ThreadPool

from threading import Lock # For print statements

import trajectory

plt.rcParams['figure.figsize'] = (20, 12)

## Simulation constants

#### Detection area parameters

`spot_radius` is used to determine whether a particle is within the detection area. Currently, this is unused, as cutoff is determined by the sigma of the beam Gaussian.

In [160]:
# Radius of detection area (in nanometers)
spot_radius = 50 # Currently unused

# Coordinates of detection area center (in nanometers)
spotX = 0.0
spotY = 0.0
spotZ = 10.0 # This isn't used, since we're looking at 2D membranes

#### Gaussian parameters

Note that what's being set here is the std. deviation `w_xy` of the Gaussian - spot size is probably more accurately represented by the FWHM, or `w_xy * 2.3548`

In [161]:
#   Radial and axial std. dev.s of the Gaussian beam profile
FWHM = 250.
w_xy = FWHM / (2 * np.sqrt(2 * np.log(2)))
w_z = 2 # Unused
k = w_z/w_xy # Unused, just considering a 2-D membrane

#### Various simulation parameters

`STEP` is the stride used when iterating through data frames. Data frames are taken every timestep.

`INTENSITY` is a scaling constant used to determine the maximum intensity of fluorescence.

`SAMPLING_RATIO` defines the percentage of particles to be 'tagged'. Untagged particles are discarded at the beginning of the simulation.

`CUTOFF` defines a cutoff for the beam profile. This may be useful to avoid artifacts from periodic boundary conditions.

In [162]:
#### Step size for iterating through data frames
STEP = 1

# Scaling constant for the intensity of a fluorescing particle
INTENSITY = 1

# Percentage of tagged particles
SAMPLING_RATIO = 1

# How many sigmas out from the beam center to truncate the beam's Gaussian profile at
CUTOFF = 2.5

# How many lipids to bin into a single trajectory
BIN_SIZE = 1

# How many random spots to use
N_SPOTS = 1

# Radius within which to randomly place spots
SPOT_RANGE = 750

#### Diffusion parameters

`D` is the diffusion constant for POPC, ~~using data from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1303347/~~ using values from Andrew's STRD paper$^{[2]}$, in units of nm^2/ns.

`tauD` is the expected diffusion time, or the time of the half-max for the autocorrelation curve.

In [163]:
#D = .01 From NIH paper
D = .0245
D = 4
tauD = FWHM**2 / (4*D)
print("Expected diffusion time is %.2f nanoseconds" % tauD)

## Expected diffusion times for various beam waists

In [164]:
beam_waists = np.arange(10,100,5) # In nanometers
print("beam FWHM (nm)".ljust(16) + "|" + "tau_D (ns)".rjust(15))
print("-"*27)
for v in beam_waists:
    print(str(v).ljust(16) + "|" + str(v ** 2 / (4*D)).rjust(15) )

## Useful functions

### `check_in_detection_volume`
Function to check if a given lipid is within the detection volume.

Right now, we're interested in contributions from all lipids regardless of position, so the `return True` short circuits it. 

Furthermore, this is somewhat deprecated by implementing the Gaussian `CUTOFF`.

In [165]:
def check_in_detection_volume(t, frame_index, residue):

    # For now, pay attention to all atoms, regardless of whether or not they're
    #   in the detection volume. 
    # Keep this function as a placeholder, in case this changes.
    return True

    x, y, z = t.xyz[frame_index, residue._atoms[0].index]

    # Get magnitude of distance to the spot center
    distance = (x - spotX)**2 + (y - spotY)**2

    # Check if the distance is within the spot radius
    in_detection_area = distance <= spot_radius**2

    return in_detection_area

### `generate_detection`
Defines what happens when a detection is generated from a lipid. Right now, `INTENSITY` is weighted by a 2-D Gaussian determined by the cell's position:

$I = I_0 \exp{ \left( - \frac{(x-x_0)^2 + (y-y_0)^2 }{2 \sigma^2} \right) }$

#### Inputs
- `t`: mdtraj.Trajectory object
- `frame_index`: index of frame to analyze in trajectory  
- `atom`: mdtraj.Atom object to be used for detection

#### Returns
- Intensity contribution from `atom`

In [166]:
def generate_detection(t, frame_index, atom, spotcenter):
    
#     print(np.shape(t.xyz))
    _x = spotcenter[0]
    _y = spotcenter[1]

    # Get coordinates of residue (more correctly, of the P atom)
    #x, y, z = t.xyz[frame_index, residue._atoms[0].index]
    try:
        x, y, z = t.xyz[frame_index, atom.index]
    except IndexError:
        raise IndexError("Indexing error. Attempting to use frame index %d and atom index %d. Shape is %s" % (frame_index, atom.index, t.xyz.shape))
    
    # Get magnitude of distance to the spot center
    distance = (x - _x)**2 + (y - _y)**2
    
    # Truncate at 2 sigma
    if distance > CUTOFF**2 * w_xy**2:
        return 0.0

    # Calculate contribution to intensity from an atom, based on the Gaussian
    #   profile of the incident beam and the particle's position.
    intensity = \
        INTENSITY * np.exp(
        -( distance ) # + ((z - spotZ)/k)**2)
        / (2 * w_xy**2) )

    return intensity

### `analyze_frame`

Analyzes the positions of atoms in a given frame, and updates the `detections` list with each atom's position.

#### Inputs
- `t`: mdtraj.Trajectory object
- `frame_index`: index of frame to analyze in trajectory  
- `detections`: shared list of all detections

In [167]:
def analyze_frame(t, frame_index, detections, spot_number=0, spotcenter=(spotX, spotY)): 
    
    print("\rProcessing frame %d out of %d     " % (frame_index/STEP, len(t)/STEP), end="\r")

    # Iterate through each atom remaining in the topology
    for atom in t.topology.atoms:

        # Do analysis if atom is in detection volume
        if not check_in_detection_volume(t, frame_index, atom):
            print("Not in detection volume, skipping.")
            continue

        detected = generate_detection(t, frame_index, atom, spotcenter)
        
        detections[spot_number][atom.index][int(frame_index/STEP)] += detected

In [168]:
def analyze_frame_mp(t, frame_index, detections, spotcenter=(spotX, spotY)): 
    
#     print("\rProcessing frame %d out of %d     " % (frame_index/STEP, len(t)/STEP), end="\r")

    # Iterate through each atom remaining in the topology
    for atom in t.topology.atoms:

        # Do analysis if atom is in detection volume
        if not check_in_detection_volume(t, frame_index, atom):
#             print("Not in detection volume, skipping.")
            continue

        detected = generate_detection(t, frame_index, atom, spotcenter)
        
#         print(atom.index)
        detections[atom.index][int(frame_index/STEP)] += detected

### `autocorr_model`

Models the expected autocorrelation curve at a given time, for a given diffusion constant.

#### Inputs
- `t`: Time to calculate autocorrelation at
- `D`: Diffusion constant

#### Returns
- Value of autocorrelation curve at time `t`

In [169]:
def autocorr_model(t, D):
    tauD = FWHM**2 / (4*D)
    
    return (1 + t / tauD)**(-1)

### `autocorrelate`
Computes the autocorrelation curve for a set of data.

#### Inputs
- `_data`: 1-D array of data
- `normed`: Boolean, determines whether to normalize data to unity. Default - `True`

#### Returns
- `autocorrelated`: Autocorrelation for input data

In [170]:
def autocorrelate(_data, normed=True):
    if normed:
        normalized = _data/np.linalg.norm(_data)
        
    autocorrelated = np.correlate(normalized, normalized, mode='full')
    autocorrelated = autocorrelated[(autocorrelated.size-1)//2 :]
    
    return autocorrelated

### `gauss2d`

Computes the value of a 2-D Gaussian.

#### Inputs
- `xy`: list of (x,y) tuples
- `spot`: (x,y) tuple of coordinates of center of detection area
- `sigma`: Standard deviation of Gaussian
- `cutoff`: Number of standard deviations after which to truncate Gaussian and return 0. Default - `CUTOFF`

#### Returns
- `z`: A flat list of the value of the Gaussian at each input point.

In [171]:
 def gauss2d(xy, spot, sigma, cutoff=CUTOFF):
    x, y = xy
    x0, y0 = spot
    distance = (x - x0)**2 + (y - y0)**2
    
    # Set values outside the cutoff to 0
    s2 = (cutoff * sigma)**2 # Define this so I don't have to keep recomputing it in the list comprehension
    distance = np.array([d if d < s2 else 1e10 for d in distance])
    
    z = np.exp(-(distance)/(2 * sigma**2))
    
    # Returns an array of the Z values
    return z

### `get_crossings`
Gets the 0-crossing of a set of normalized data. This will only find one 0-crossing and can be quite sensitive to the initial guess provided to the solver.

#### Inputs
- `x`: List of x values
- `data`: Normalized set of data
- `guess`: Initial guess to solver

#### Returns
- `crossing`: x coordinate of 0 crossing

In [172]:
def get_crossing(x, data, guess):
    
    interpolated = interp1d(x, data - .5, fill_value='extrapolate')
#     interpolated = np.interp(x, data - .5, left=1, right=0)
    
    try:
        crossing = fsolve(interpolated, tauD)
    except ValueError:
        print("**** BAD GUESS *****")
        print("Skipping this spot...")
        raise ValueError("Bad guess.")
        return False
        
    return crossing

### `analyze_spot_mp`

Function to analyze detections from a given spot. Suitable for multiprocessed applications.

#### Inputs
- `t`: Trajectory (or FakeTrajectory) object
- `spot`: (x,y) tuple of center of spot to use
- `detections`: Queue object to store detections in
- `spot_num`: Index of spot.

#### Returns
- None. However, will add a tuple of (spot_num, \_detections) to the queue, where detections is an np array of shape (n_residues, number of timesteps) storing the detections for each residue.

In [173]:
def analyze_spot(t, spot, detections, spot_num):
    for frame_index in range(0, len(t), STEP):
        analyze_frame(t, frame_index, detections, spot_num, spot)

def analyze_spot_mp(t, spot, detections, spot_num):
    _detections = np.full(shape=(t.topology.n_residues, int(np.ceil(len(t)/STEP))), fill_value=0.0) 
    for frame_index in range(0, len(t), STEP):
        analyze_frame_mp(t, frame_index, _detections, spot)
    detections.put((spot_num, _detections))

## Load trajectory data

Import FCS data from .xtc file. Also specify a .gro file for the system topology.

**NB:** A .trr file can be used for better resolution, since an .xtc typically uses some compression. However, a .trr is also much larger.

In [174]:
%%time
#t = md.load(trajectory_file, top=topology_file)
#t = md.load_trr(trr_file, top=topology_file)

Calculate timestep for data analysis, given the simulation data timestep and current stride.

In [175]:
# print("Timestep for data analysis is %.2f picoseconds (%.2f nanoseconds)" % (t.timestep * STEP, t.timestep * STEP / 1000))

## Reduce atom selection to only phosphorous atoms

This is a bit of a simplification, but significantly reduces the amount of atoms to iterate over if we're only considering the phosphorous at the center of the phosphate group. Error from this would be on the order of the bond lengths, so roughly 1.5 angstrom.

In [176]:
# print("Starting with %d atoms" % t.topology.n_atoms)

# phosphorous_atoms = [a.index for a in t.topology.atoms if a.element.symbol == 'P']
# t.atom_slice(phosphorous_atoms, inplace=True)

# print("Reduced to %d phosphorous atoms" % t.topology.n_atoms)

# # Reduce to the sampling ratio * number of phosphorous atoms
# num_sampled = int(t.topology.n_atoms * SAMPLING_RATIO)

# # Randomly select the sampled atoms
# sampled = np.random.choice([a.index for a in t.topology.atoms], num_sampled, replace=False)
# t.atom_slice(sampled, inplace=True)

# print("Reduced to %d \"tagged\" phosphorous atoms" % t.topology.n_atoms)

## Import data from Pickle file

In [177]:
import pickle
t = pickle.load(open('../../windrive/linux/output.pkl', 'rb'))
print("Shape is %s, reducing by %d%%." % (np.shape(t.xyz), 100-SAMPLING_RATIO*100))
# t.xyz = t.xyz[:10000]
t.reduce(SAMPLING_RATIO)
print(np.shape(t.xyz))

trajectories = []
print(int(t.topology.n_residues / BIN_SIZE))

# BIN_SIZE is used a little differently with this technique -- applies to binning trajectories in the experiment itself, rather than binning experimental data during analysis.
import copy
for traj in range(int(t.topology.n_residues / BIN_SIZE)):
    _t = trajectory.FakeTrajectory()
    _t.initialize(t.xyz[:,BIN_SIZE*traj:BIN_SIZE*(traj+1)], BIN_SIZE)
#     _t = copy.deepcopy(t)
#     _t.xyz = t.xyz[:,BIN_SIZE*traj:BIN_SIZE*(traj+1)]
#     _t.topology.n_residues = BIN_SIZE
#     _t.topology.atoms = t.topology.atoms[BIN_SIZE*traj:BIN_SIZE*(traj+1)]
#     _t.reindex()
    trajectories.append(_t)
N_SPOTS = len(trajectories)
print(N_SPOTS)

Create a list of lists to store detected intensity at each timestep for each lipid.

In [178]:
%%time
# detections = np.full(shape=(N_SPOTS, t.topology.n_residues, int(np.ceil(len(t)/STEP))), fill_value=0.0)
detections = np.full(shape=(len(trajectories), BIN_SIZE, int(np.ceil(len(t)/STEP))), fill_value=0.0)

## Data Analysis

Iterate through each frame of data in the trajectory file, and generate a detected intensity from each lipid (represented by its head group P atom).

In [179]:
# spot_centers = np.random.uniform(-SPOT_RANGE, SPOT_RANGE, size=(N_SPOTS,2))
spot_center = np.random.uniform(-SPOT_RANGE, SPOT_RANGE, size=(3,2))

### Legacy threaded implementation

Much slower than the process equivalent code below. Keeping it *just in case*.

This threaded code works on having each analyze_spot 

In [180]:
# spot_centers = np.random.uniform(-SPOT_RANGE, SPOT_RANGE, size=(N_SPOTS,2))

# def analyze_spot(t, spot, detections, spot_num):
#     for frame_index in range(0, len(t), STEP):
#         analyze_frame(t, frame_index, detections, spot_num, spot)

# pool = ThreadPool()

# print("Performing analysis for %d randomly assigned spots, parallelized over %d cores." % (N_SPOTS, multiprocessing.cpu_count()))

# for i in range(N_SPOTS):
#     pool.apply_async(analyze_spot, args=(t, spot_centers[i], detections, i))
    
# pool.close()
# pool.join()

# print("\n\n", end='', flush=True) # Flush stdout so the printed output doesn't spread itself into the next few output cells

### Process implementation

#### Iterate over a number of random spots

In [181]:
# print("Performing analysis for %d randomly assigned spots, parallelized over %d cores." % (N_SPOTS, multiprocessing.cpu_count()))

# # Set up a pool to draw workers from.
# pool = multiprocessing.Pool()

# # Need to use a Manager to share the Queue across multiple workers in different processes.
# #   Using a Queue allows each process to access a shared object. Typically processes cannot do this
# #   but threads can. However, implementing this parallelization in processes yields a ~6x speedup
# #   over the threaded equivalent.
# m = multiprocessing.Manager()
# detections_Q = m.Queue(N_SPOTS)


# # Make a list to hold AsyncResult objects. This is useful for debugging - call job.get() for an element to see
# #    any exceptions that were raised. Otherwise, they'll fail silently.
# jobs = [] 
# for i in range(N_SPOTS):
#     jobs += [pool.apply_async(analyze_spot_mp, args=(t, spot_centers[i], detections_Q, i))]
# pool.close()
# pool.join()


# # Process the queued data, and add it to the detections list.
# print("Processing queued data")

# # detections = [x for x in iter(detections_Q.get, None)] # This should work, but throws a ValueError. Whatever.
# for i in range(N_SPOTS):
#     if detections_Q.empty():
#         print("Queue prematurely empty..")
#         break
#     _spot, _detection = detections_Q.get()
#     detections[_spot] = _detection
    
# print("Done")

#### Iterate over entire dataset, vs over multiple random spots

In [182]:
print("Performing analysis for %d sets of trajectories, parallelized over %d cores." % (len(trajectories), multiprocessing.cpu_count()))

# Set up a pool to draw workers from.
pool = multiprocessing.Pool()

# Need to use a Manager to share the Queue across multiple workers in different processes.
#   Using a Queue allows each process to access a shared object. Typically processes cannot do this
#   but threads can. However, implementing this parallelization in processes yields a ~6x speedup
#   over the threaded equivalent.
m = multiprocessing.Manager()
detections_Q = m.Queue(len(trajectories))


# Make a list to hold AsyncResult objects. This is useful for debugging - call job.get() for an element to see
#    any exceptions that were raised. Otherwise, they'll fail silently.
jobs = []
    
# print(np.shape(trajectories[0].xyz))
# analyze_spot_mp(trajectories[0], spot_center[0], detections_Q, 0)
# Instead of iterating over each spot, iterate over slices of the trajectory
for i in range(len(trajectories)):
    jobs += [pool.apply_async(analyze_spot_mp, args=(trajectories[i], [0,0], detections_Q, i))]
# print([job.get() for job in jobs])
pool.close()
pool.join()

In [183]:
# Process the queued data, and add it to the detections list.
print("Processing queued data")

# detections = [x for x in iter(detections_Q.get, None)] # This should work, but throws a ValueError. Whatever.
for i in range(len(trajectories)):
#     print("Analyzing queue entry %d" % i)
    if detections_Q.empty():
        print("Queue prematurely empty..")
        break
    _spot, _detection = detections_Q.get()
    detections[_spot] = _detection
    
print("Done")

In [184]:
detections.shape


### Bin lipids in data

`binned_tots[bin][timestep]` is the total intensity for a certain bin at a certain timestep

`binned_avgs[bin]` is the average intensity for a bin, over the whole time

`binned_dI[bin][timestep]` is the difference of the total intensity from the average at a given timestep

`binned_tot[timestep]` is the average dI from all bins at a certain timestep

In [185]:
n_bins = int(np.ceil(trajectories[0].topology.n_residues/BIN_SIZE))

print("Attempting to bin %d residues into %d bins." % (t.topology.n_residues, BIN_SIZE))

if not t.topology.n_residues%BIN_SIZE == 0:
    print("Number of residues is not evenly divisible by bin size. Desired size is %d, one bin will contain %d." % (BIN_SIZE, t.topology.n_residues%BIN_SIZE) )

binned_tots = np.ndarray(shape=(N_SPOTS, n_bins, len(t)))
binned_avgs = np.ndarray(shape=(N_SPOTS, n_bins, len(t)))
binned_dI = np.ndarray(shape=(N_SPOTS, n_bins, len(t)))
binned_tot = np.ndarray(shape=(N_SPOTS, len(t)))
    
for spot_num in range(N_SPOTS):
#     print("Sorting data into %d groups" % n_bins)

    binned =  [ [] for x in range(t.topology.n_residues//BIN_SIZE) ]


    # TODO: May want to use np.random.choice to randomly select the binned lipids, though the choice of lipids to sample is already random
    # For each group...
    for g in range(n_bins):

        # Pick the slice of detections that are relevant to it
        _detections = [x for x in detections[spot_num][g*BIN_SIZE:BIN_SIZE*(g+1)]]
#         print("Detections_spot_num length is %d " % len(detections[spot_num]))
#         print(len(_detections))


        avg_I = np.mean(_detections)
        tot_I = np.sum(_detections, axis=0)

        delta_I = [tot_I[x] - avg_I for x in range(len(tot_I))]


        binned_tots[spot_num][g] = tot_I
        binned_avgs[spot_num][g] = avg_I
        binned_dI[spot_num][g] = delta_I

    # binned_tot = np.sum(binned_tots, axis=0) - np.mean(binned_tots)
    binned_tot[spot_num] = np.mean(binned_dI[spot_num], axis=0)

Compute average numbers of particles contributing to the intensity. (I.e., that have a nonzero intensity)

In [186]:
detections.shape

In [187]:
n_nonzero = 0
n_tot = 0
n_zero = 0

# Get the number of particles that don't contribute
for i in range(N_SPOTS):
    n_zero += np.all(detections[i,:,:] == 0)
print("%d particles contribute intensities to this spot" % (t.topology.n_residues - n_zero))


# # Get total number of nonzero contributions
# for spot_num in range(N_SPOTS):
#     for particle in detections[spot_num]:
#         _nonzero = [x for x in particle if not x == 0.0 ]
#         n_nonzero += len(_nonzero)
#         n_tot += len(particle)
    
# # Get total number of detections
# # n_detections = N_SPOTS * t.topology.n_residues * len(t)

# print("Average percentage of the %d sampled particles that contribute to intensity is %f%%" % (t.topology.n_residues, 100 * n_nonzero / n_tot))

## Plotting

### Intensity Traces
Plot the intensity traces for the individual lipids, and for the summed intensities.

In [188]:
# plt.plot(np.arange(0, len(t), 1), binned_tot[11], marker='None', label="Spot %d" % spot_num)

### Autocorrelations
Plot the autocorrelation functions for the individual lipids tracked, and for the summed intensities of all of them.

## Determine diffusion coefficient from FCS data

The following cells determine the diffusion coefficient from the FCS data using two different techniques. The binned and summed data are treated separately. First, a curve is fit to the autocorrelation curve using the diffusion constant as the parameter.

### Curve Fit Method

Using the equation for diffusion in a membrane presented by Schwille$^{[1]}$, attempt to fit the FCS data to a function of the form

$G(t) = \frac{1}{N} \left(1 + \frac{t}{\tau_D} \right)^{-1}$, where

$\tau_D = \frac{w_{xy}^2}{4 D}$

Since the autocorrelation curves are normalized, $N$ is set to 1.


#### Binned data

In [189]:
# _x = np.arange(0, len(t), 1)

# plt.xscale('log')
# optimals = []
# covariances = []
# i = 0
# for _data in binned_dI:
#     i+= 1
#     print(i, end="\r")
    
#     autocorrelated = autocorrelate(_data)
    
#     _optimal, _covariance = curve_fit(autocorr_model, _x, autocorrelated, p0=D)
#     optimals.append(_optimal)
#     covariances.append(_covariance)
    
#     plt.plot(_x, autocorrelated)
#     plt.plot(_x, autocorr_model(_x, _optimal), linestyle='--')
    
# var = [np.sqrt(np.diag(x)) for x in covariances]
    
# print("Average D from data was %f +- %f" % (np.mean(optimals), np.mean(var)))
# plt.xlim([0,len(t)])

#### Summed data

In [None]:
# autocorrelated = autocorrelate(binned_tot)

# normalized = binned_tot/np.linalg.norm(binned_tot)

# optimal, covariance = curve_fit(autocorr_model, _x, autocorrelated, p0=D)

# print("D was set at %f" % D)
# print("D estimated at %f +- %f." % (optimal, np.sqrt(np.diag(covariance))))

# plt.xscale('log')

# # Orange line is using the known diffusion constant
# plt.plot(_x, autocorr_model(_x, D), linestyle='--', label="Model - Known D")
# plt.acorr(binned_tot, maxlags=max_lag, usevlines=False, linestyle='-', marker="None", normed=True, label="Data")

# # Black line is using the calculated diffusion constant
# plt.plot(_x, autocorr_model(_x, optimal), linestyle='--', color='black', label="Model - Calced D")
# var = np.sqrt(np.diag(covariance))

# _y1 = autocorr_model(_x, optimal+var)
# _y2 = autocorr_model(_x, optimal-var)

# plt.fill_between(_x, _y1, _y2)

# plt.xlim([0,len(t)])
# plt.legend()

### .5 crossing method
This method determines $\tau_D$ first by looking for where the normalized autocorrelation crosses 0.5, then computes the diffusion coefficient from that using the same formula as above, solved for $D$

$D = \frac{w_{xy}^2}{4 \tau_D}$

#### Binned data

In [None]:
_x = np.arange(0, len(t), 1)
ax = plt.gca()

crossings = []

plt.xscale('log')
for spot_num in range(N_SPOTS):
    for _data in binned_dI[spot_num]:

        autocorrelated = autocorrelate(_data)
    
        try:
            crossings.append(get_crossing(_x, autocorrelated, tauD))
        except ValueError:
            print("Bad guess. Skipping a bin in spot %d" % (spot_num))
            continue

        color = ax._get_lines.get_next_color()
        
        plt.plot(_x, autocorrelated, color=color)

        calced_D = FWHM**2 / (4 * crossings[-1])
        plt.plot(_x, autocorr_model(_x, calced_D), color=color, linestyle='--')
    

# calced_D = [FWHM**2 / (4 * x) for x in crossings]
calced_D = FWHM**2 / (4 * np.mean(crossings))


plt.plot(_x, autocorr_model(_x, D), color='k', linestyle='-', linewidth=2, label="Model - Expected")
    

# Plot tau_D, diffusion time
calc_tauD = FWHM**2 / (4 * np.mean(calced_D))
plt.axvline(calc_tauD)
plt.axvline(tauD)

plt.xticks(list(plt.xticks()[0]) + [calc_tauD, tauD], list(plt.xticks()[0]) + ['calculated tau_D', 'expected tau_D'])

plt.xlim([0,len(t)])

print("Average tau_D is %f +- %f" % (np.mean(crossings), np.std(crossings)))
print("Average diffusion constant is %.3f +- %.4f" % (np.mean(calced_D), np.std(calced_D)))

#### Summed data for each spot

In [None]:
plt.xscale('log')
ax = plt.gca()
_x = np.arange(0, len(t), 1)

Ds = []

for spot_num in range(N_SPOTS):
    
    print("--- Spot %d ---" % spot_num, end="\r")
    
    if np.sum(binned_tot[spot_num]) == 0.0:
#         print("This spot is all 0, skipping", end="")
        continue
    
    autocorrelated = autocorrelate(binned_tot[spot_num])
    
    
    try:
        crossing = get_crossing(_x, autocorrelated, tauD*.01)
    except ValueError:
        print("Bad guess. Skipping a bin in spot %d" % (spot_num))
        continue
    
#     print(crossing)

    calced_D = FWHM**2 / (4 * crossing)

    color = ax._get_lines.get_next_color()
    plt.plot(_x, autocorrelated, color=color, label="Data - Spot %d" % spot_num)
    plt.plot(_x, autocorr_model(_x, calced_D), color=color, linestyle='--', label="Model - Spot %d" % spot_num)

    print("\nCalculated D is %.3f, tau_D is %.3f" %( calced_D, crossing))
    
    Ds.append(calced_D)


plt.plot(_x, autocorr_model(_x, D), color='k', linestyle='--', linewidth=2, label="Model - Expected")
    
# Plot tau_D, diffusion time

calc_tauD = FWHM**2 / (4 * np.mean(Ds))
print("\n\nAvg. D is %.3f +- %.4f, yielding a tau_D of %.2f compared to the expected %.2f for D=%d" % (np.mean(Ds), np.std(Ds), calc_tauD, tauD, D))

plt.axvline(calc_tauD)
plt.axvline(tauD)

plt.xticks(list(plt.xticks()[0]) + [calc_tauD, tauD], list(plt.xticks()[0]) + ['calculated tau_D', 'expected tau_D'])

# plt.legend()

plt.xlim([0,len(t)])

#### Average autocorrelations for each spot

In [None]:
plt.xscale('log')
ax = plt.gca()
_x = np.arange(0, len(t), 1)

autocorrelated = np.ndarray(shape=(N_SPOTS,len(t)))

# Average autocorrelations for each spot
for spot_num in range(N_SPOTS):
    
    if np.sum(binned_tot[spot_num]) == 0.0:
#         print("Spot %d is all 0, setting to NaN" % spot_num)
        
        # If the spot didn't capture any intensity contributions, just set it to NaN.
        # NaN values will be ignored when taking the mean. (Thanks, np.nanmean)
        autocorrelated[spot_num] = np.full(len(t), np.NAN)
        continue
    
    autocorrelated[spot_num] = autocorrelate(binned_tot[spot_num])
    
autocorrelated = np.nanmean(autocorrelated, axis=0)

crossing = get_crossing(_x, autocorrelated, tauD)

calced_D = FWHM**2 / (4 * crossing)

print(calced_D)

plt.plot(_x, autocorrelated, label="Averaged spot data")

plt.plot(_x, autocorr_model(_x, calced_D), linestyle='-', linewidth=1, label="Model - Calculated")

plt.plot(_x, autocorr_model(_x, D), color='k', linestyle='--', linewidth=2, label="Model - Expected")

plt.axvline(crossing)
plt.axvline(tauD)
plt.xticks(list(plt.xticks()[0]) + [crossing, tauD], list(plt.xticks()[0]) + ['calculated tau_D', 'expected tau_D'])

plt.legend()

## Plot Heatmap

The goal of this is to plot the spots overlaid on the trajectories of the sampled particles, with a heatmap indicating the Gaussian profile that's being used to sample.

In [None]:
# Make the aspect ratio of the plot square so that the spot circles are circles and not ellipses
plt.axis('equal') 

############## Plot trajectories ############## 
for w in range(0, t.topology.n_residues, 1):
    plt.plot(t.xyz[:,w,0], t.xyz[:,w,1], linestyle='-', linewidth=.2, zorder=1, alpha=.6)


############## Plot spots ############## 
ax = plt.gca()
for spot in spot_centers:
    circle = plt.Circle(spot, FWHM, fill=False, linewidth=2, ec='white', zorder=3)
    ax.add_artist(circle)
    
    
############## Plot heatmap ############## 
# Get axis ranges to use for heatmap point grid
xrange = ax.get_xlim()
yrange = ax.get_ylim()

# Set the number of points. Make this imaginary so that np.mgrid includes the endpoints.
npoints = 500j
zpreds = []

# Make a grid of points that will be used to display the Gaussians. May also be able to do this as a countour..
yi, xi = np.mgrid[yrange[1]:yrange[0]:npoints, xrange[0]:xrange[1]:npoints]
xyi = np.vstack([xi.ravel(), yi.ravel()])

# Determine Gaussian profiles for each spot
for spot in spot_centers:
    zpred = gauss2d(xyi, spot, w_xy, cutoff=CUTOFF)
    zpred.shape = xi.shape # Convert Z values from a flat list to a 2-D (x,y) array
    zpreds.append(zpred)
    plt.annotate(xy=spot, s="Spot %d" % np.where(spot_centers==spot)[0][0], color='w') # The `where` is some magic to get the index of the spot
    
# Set each Z value to the max at that position, most meaningful way of plotting multiple profiles (?)
zpreds = np.maximum.reduce(zpreds)

# Display Gaussian profiles
im = ax.imshow(zpreds, extent=[xi.min(), xi.max(), yi.min(), yi.max()], aspect='equal', zorder=0, alpha=1, cmap='magma')
cbar = plt.colorbar(im)
cbar.set_label("Gaussian profile - Intensity Contribution",size=18)

# Bibliography

[1] Chiantia, Salvatore, Jonas Ries, and Petra Schwille. "Fluorescence correlation spectroscopy in membrane structure elucidation." Biochimica et Biophysica Acta (BBA)-Biomembranes 1788.1 (2009): 225-233.

[2] Zgorski, Andrew, and Edward Lyman. "Toward Hydrodynamics with Solvent Free Lipid Models: STRD Martini." Biophysical journal 111.12 (2016): 2689-2697.