This example notebook demonstrates how a normal processing pipleline works without batch processing. The outline of processing steps are as follows:

1. Clean up the data
2. Perform drift correction
3. Apply light filtering to the data to prepare for merging
4. Merge localizations into one
5. Apply any final filtering

### Import the necessary libraries

In [1]:
%pylab
from DataSTORM import processors as proc
from pathlib import Path
import pandas as pd

Using matplotlib backend: Qt4Agg
Populating the interactive namespace from numpy and matplotlib


# Read in a file
Python's *pathlib* library makes reading files easy inside Jupyter Notebooks because of its tab-completion feature. Simply start typing the directory inside the Path object, press TAB, and it will list files and folders in the current directory. Use `..` to go up one directory.

In [2]:
# Press TAB inside the quotation marks
filePath = Path('../test-data/Centrioles/FOV_1_noPB_1500mW_10ms_1/FOV_1_noPB_1500mW_10ms_1_MMStack_locResults.dat')

Now, we will load the data into a DataFrame. A DataFrame is container for data from the Pandas library that essentially acts as a spreadsheet. It's very fast and supports loading from multiple file formats. It also supports out-of-core processing by loading chunks of data into memory at time. We will use its simplest method with standard arguments, `read_csv()`.

In [3]:
# str() converts the Path to a string
# 'r' means to open the file in read-mode
# df holds the DataFrame returned from pd.read_csv()
with open(str(filePath), 'r') as file:
    df = pd.read_csv(file)

To get a summary of the data, you can use the `describe()` method.

In [4]:
df.describe()

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
count,1128672.0,1128672.0,1128672,1128672.0,1120018.0,1128672.0,1128672.0,1128672.0,1128672.0
mean,25114.914616,31378.310512,0,29113.635001,766.523967,4130.626808,186.466802,235.056396,139.046655
std,16600.385105,19042.742324,0,24341.727052,10230.296829,2859.651891,38.698657,443.057482,25.664258
min,85.238,0.10644,0,100.0,0.63245,1.0,68.953,-37.829,54.0
25%,9274.2,12213.0,0,6756.0,3.7525,2296.4,167.7,87.016,127.15
50%,24997.0,27570.0,0,23178.0,5.1529,3252.7,179.74,119.16,134.8
75%,36672.0,52188.0,0,49093.0,6.6376,5070.9,193.48,202.19,143.99
max,65011.0,65004.0,0,79999.0,172430.0,67290.0,1315.4,33596.0,378.0


# Clean up the data
Many data files might have a few rows that contain NaN's, Inf's, or incorrectly formatted data. DataSTORM provides a processor called CleanUp that fixes these. It is not necessary to clean up the data in this example, but we will do it anyway to demonstrate how it's done.

In [5]:
cleaner = proc.CleanUp()
df      = cleaner(df)

# Perform fiducial-based drift correction
The most work is in performing fiducial based drift correction because of the large number of parameters you can tune. If your fiducials are present in nearly every frame and highly visible, the easiest option is to simply set the interactive search to True and skip linking and spatial clustering.

To see all options you can set, press `SHIFT-TAB` twice with the cursor just after the first paranthesis of FiducialDriftCorrect.

In [6]:
corrector = proc.FiducialDriftCorrect(minFracFiducialLength = 0.75, # Fiducials must span 75% of number of frames
                                      interactiveSearch     = True, # Select fiducials by eye
                                      noLinking             = True, # Do not perform Crocker-Grier linking
                                      noClustering          = True) # Do not spatially cluster fiducials

When the corrector is run, it will display a 2D histogram image. You may zoom in and out of regions and draw a rectangle around areas with large counts. Areas with counts that are approximately equal to the number of frames are likely to be fiducials. There is a fiducial in this dataset in three bins around (x = 28, y = 55.5).

With the selection rectangle around a region, press `SPACE` to add the region to the list of areas to search for fiducials. Press `r` if you want to reset the regions to empty. When you are done, simply close the window.

If no region is selected, the fiducial search will be performed over the whole set of localizations, which can either be slow or lead to completely wrong results if linking and clustering are turned off.

Also note that the corrector removes fiducials, so it is best to save the output to another DataFrame, in this case `corrDF`.

In [7]:
corrDF = corrector(df)

1 fiducial(s) detected.
Performing spline fits...




We can check the quality of the drift correction curves using `plotFiducials()`.

In [8]:
corrector.plotFiducials()

It's not necessary for this dataset, but if the drift correction could be improved, we can adjust some of the smoothing parameters and rerun the drift correction. For this example, we'll turn on linking and throw out trajectories shorter than ten consecutive frames. Additionally, we'll shrink the size of the smoothing window and filters to better capture changes in the fiducial trajectory. There are a few parameters for linking and clustering, but we'll leave them at their defaults.

Note that clustering the fiducials can often help get rid of noisy points. However, DBSCAN breaks down if the fiducials are more than about 50,000 frames, so it is preferable to turn it off if you have a long fiducial track as in this example.

In [9]:
corrector = proc.FiducialDriftCorrect(minFracFiducialLength = 0.75,   # Fiducials must span 75% of number of frames
                                      interactiveSearch     = True,   # Select fiducials bye eye
                                      noLinking             = False,  # Perform Crocker-Grier linking
                                      noClustering          = True,   # Perform DBSCAN to cluster fiducials
                                      smoothingWindowSize   = 750,    # Set the moving window size for smoothing
                                      smoothingFilterSize   = 100)    # Set Gaussian filter std. dev. for smoothing

In [10]:
corrDF = corrector(df)

Frame 79999: 1 trajectories present
1 fiducial(s) detected.
Performing spline fits...


In [11]:
corrector.plotFiducials()

We can now investigate the corrected localizations. The x and y columns now contain the corrected localizations. `dx` and `dy` contain the amount of the correction. To get the original data back, one can simply add `dx` to `x` and the same for `y`.

Note that the new count is less than the original one. This is because the drift correction removed localizations belonging to the fiducial marker.

In [12]:
corrDF.describe()

Unnamed: 0,frame,intensity [photon],loglikelihood,offset [photon],sigma [nm],uncertainty [nm],x [nm],y [nm],z [nm],dx [nm],dy [nm]
count,1038065.0,1038065.0,1038065.0,1038065.0,1038065.0,1038065.0,1038065.0,1038065.0,1038065,1038065.0,1038065.0
mean,28369.02327,3742.794742,188.487044,183.713188,139.266048,826.460206,25045.116519,29098.806386,0,-306.818476,165.845133
std,24240.787384,2145.875535,260.687906,34.693614,24.418035,10621.837356,17120.038472,18317.52788,0,237.007375,130.143315
min,100.0,1.0,-37.829,68.953,54.0,0.87085,118.719992,-323.895629,0,-716.349865,0.0
25%,6277.0,2235.5,84.861,166.52,128.28,4.0644,9413.261986,10843.727038,0,-524.920259,34.514715
50%,22220.0,3095.1,113.04,178.49,135.77,5.3558,24527.129021,27045.403647,0,-294.999218,159.071978
75%,47965.0,4598.7,171.91,191.23,144.63,6.7473,38484.616667,42788.851947,0,-71.026208,301.076087
max,79999.0,42861.0,14267.0,953.39,378.0,172430.0,65708.69565,64992.385253,0,0.0,383.677938


# Filtering the data
At this point, we can now filter the data by setting criteria on the columns. First we define the filters. After that, we simply apply them in reverse order to the DataFrame to get the filtered data.

In [13]:
filter1 = proc.Filter('sigma [nm]', '<', 200)
filter2 = proc.Filter('sigma [nm]', '>', 100)

fcDF = filter2(filter1(corrDF)) # First filter1 is applied, then filter2 is applied.

In [14]:
fcDF.describe()

Unnamed: 0,frame,intensity [photon],loglikelihood,offset [photon],sigma [nm],uncertainty [nm],x [nm],y [nm],z [nm],dx [nm],dy [nm]
count,1025062.0,1025062.0,1025062.0,1025062.0,1025062.0,1025062.0,1025062.0,1025062.0,1025062,1025062.0,1025062.0
mean,28448.567049,3753.470622,180.940362,183.674606,137.453151,6.02789,24994.342718,28887.124122,0,-307.522946,166.241694
std,24277.797105,2127.239115,250.288653,34.805369,15.042487,92.310791,17134.877659,18186.228443,0,237.357656,130.331834
min,100.0,1.0,-37.829,68.953,100.01,0.87085,118.719992,-323.895629,0,-716.349865,0.0
25%,6270.0,2245.6,84.531,166.44,128.2,4.0486,9405.271247,10565.351991,0,-526.422254,34.455312
50%,22342.0,3097.3,112.17,178.41,135.62,5.3303,24487.483017,27022.838042,0,-296.816074,161.121653
75%,48138.0,4583.2,167.97,191.15,144.21,6.7188,38501.893255,42786.781681,0,-70.935131,301.407909
max,79999.0,42861.0,14267.0,953.39,199.99,27045.0,65708.69565,64992.385253,0,0.0,383.677938


# Merging localizations
The last step in the analysis pipeline typically involves merging localizations that are on for several frames into one. This is performed by the Crocker-Grier algorithm in trackpy, but all you have to worry about it defining a Merge processor and applying it to the DataFrame.


In [15]:
merger = proc.Merge(tOff            = 1,  # Number of frames that a molecule can be missing and still be part of a track
                    mergeRadius     = 40) # Maximum distance between successive molecules

mfcDF = merger(fcDF)

Frame 79999: 11 trajectories present


In [16]:
mfcDF.describe()

Unnamed: 0,x [nm],y [nm],z [nm],sigma [nm],offset [photon],frame,loglikelihood,intensity [photon],length [frames]
count,385070.0,385070.0,385070,385070.0,385070.0,385070.0,385070.0,385070.0,385070.0
mean,24713.677501,27930.754369,0,142.534721,488.944501,28046.657558,231.616838,9991.793968,2.662015
std,16340.133254,17942.480379,0,17.048999,811.964806,23759.556005,327.729059,17450.924492,4.665393
min,118.719992,-323.895629,0,100.01,68.953,100.0,-27.349,1.0,1.0
25%,9491.498679,9836.69301,0,131.261354,180.3,6509.0,87.2135,3056.6,1.0
50%,24651.934876,26380.560311,0,138.97,253.09,22379.0,116.507851,5935.6,1.0
75%,37917.686645,39721.250516,0,150.4,538.4375,46636.0,215.34,11077.4,3.0
max,65708.69565,64989.381891,0,199.99,168286.11,79999.0,8967.8,4117516.9,1086.0


### Final filtering and saving
At this point, the data may be filtered once more in the same manner as above. Let's skip this part and save the data to disk.

In [17]:
outputFile = filePath.parent /  Path(filePath.stem + '_DC_Merged' + filePath.suffix)
print(outputFile)

../test-data/Centrioles/FOV_1_noPB_1500mW_10ms_1/FOV_1_noPB_1500mW_10ms_1_MMStack_locResults_DC_Merged.dat


In [18]:
with open(str(outputFile), 'w') as file:
    mfcDF.to_csv(file)