# Semi-automated Fiducial Detection

The unfortunate thing about the automatic fiducial detection algorithm is that it is prohibitively slow for large datasets. This is a problem, especially if you don't set the detection parameters correctly the first time. To speed things up, we can ask the user to identify areas to search for fiducials, which should drastically reduce the detection time of the algorithm.

One way to find the fiducials would be to make a 2D histogram of the localizations and search for the bright spots. The user would then identify rectangular regions where the fiducials are apparent and only these regions would be searched. In this workbook I am going to play with this idea on a large dataset containing two fiducials. The automatic algorithm took multiple hours to run, so reducing the search region should be very beneficial.

### Clean up of data

In [1]:
%pylab
import DataSTORM.processors as ds
import pandas as pd
from pathlib import Path

Using matplotlib backend: Qt4Agg
Populating the interactive namespace from numpy and matplotlib


In [2]:
filename = Path('../test-data/MicroTubules_LargeFOV/FOV1_1500_10ms_1_MMStack_locResults.dat')
with open(str(filename.resolve()), 'r') as file:
    df = pd.read_csv(file, engine = 'c')

  interactivity=interactivity, compiler=compiler, result=result)


Unfortunately, the data in the uncertainty column was saved such that most of the numbers are floats, but some are strings representing floats and some are in a strange complex exponential form. Let's filter out these rows to make working with the data frame easier and to protoype a clean up routine.

First, I'll generate a mask to pick out the rows containing strings.

In [3]:
stringMask = df['uncertainty [nm]'].map(lambda x: isinstance(x, str)).as_matrix()

Let's see what the strings look like:

In [4]:
df['uncertainty [nm]'][stringMask]

7995392    5.5184
7995393    5.4603
7995394    8.7009
7995395    7.2815
7995396    8.0866
7995397    8.9921
7995398    8.8527
7995399    10.908
7995400    3.7348
7995401    11.423
7995402    12.652
7995403    11.685
7995404    2.6682
7995405    6.9114
7995406    7.5201
7995407    6.5698
7995408    4.4594
7995409    4.9046
7995410    6.7301
7995411    6.8057
7995412    7.6371
7995413    6.6477
7995414    9.4678
7995415    9.9622
7995416    9.2404
7995417         7
7995418    6.5284
7995419    9.4526
7995420    10.175
7995421    5.1579
            ...  
8060898    8.2222
8060899    8.4614
8060900    3.2069
8060901     7.711
8060902    4.4853
8060903    9.9555
8060904    8.3152
8060905    10.502
8060906    9.7958
8060907    5.1087
8060908    8.9102
8060909    9.5619
8060910    8.5585
8060911    10.727
8060912    7.9335
8060913    8.7758
8060914    8.7189
8060915     7.841
8060916    6.6401
8060917    11.829
8060918    9.8223
8060919    5.0113
8060920    9.5929
8060921    7.1712
8060922   

So, strangely there were about 11,000 localizations that were interpreted as strings in this data set. Let's cast them as numeric data types. Some of the strings cannot be recognized by the parser, so we'll convert those to NaN's by using the `errors='coerce'` argument.

In [5]:
df['uncertainty [nm]'] = pd.to_numeric(df['uncertainty [nm]'], errors='coerce')

Finally, we need to replace any Inf's with NaN's and then drop the NaN's. We'll reindex the final result.

In [6]:
df.replace([np.inf, -np.inf], np.nan, inplace = True)
df.dropna().describe()

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
count,8667397.0,8667397.0,8667397,8667397.0,8667397.0,8667397.0,8667397.0,8667397.0,8667397.0
mean,34426.251067,34906.420891,0,20088.62312,805529200000.0,3642.044266,344.474718,203.382508,134.408034
std,22145.547453,13027.020305,0,14416.319868,1344216000000000.0,2569.207387,69.663961,824.384742,18.041903
min,7.4708,2.1061,0,100.0,0.44138,1.0,98.01,-29.974,54.0
25%,12235.0,24933.0,0,6720.0,4.9379,2256.8,289.83,96.946,123.7
50%,36195.0,35067.0,0,19463.0,6.6686,3023.2,336.66,124.96,131.56
75%,55460.0,45292.0,0,31084.0,8.4971,4289.3,391.2,174.81,141.41
max,67067.0,67033.0,0,49999.0,3.5605e+18,86565.0,2294.8,454360.0,378.0


In [7]:
df.reindex()

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
0,151.05,19343.0,0,100,8.6886,4111.8,472.33,137.820,170.48
1,367.18,21417.0,0,100,6.6719,3815.7,394.68,146.180,150.83
2,422.42,28225.0,0,100,8.6193,1847.1,388.10,190.480,113.18
3,519.29,15155.0,0,100,10.2410,2570.1,372.53,79.677,162.12
4,590.09,24756.0,0,100,8.2979,2400.6,365.81,92.517,133.72
5,685.17,2734.3,0,100,3.3740,7808.7,379.33,243.630,142.11
6,607.30,7347.8,0,100,6.1314,3491.5,332.58,120.590,140.65
7,701.78,29090.0,0,100,9.5573,1899.1,357.28,135.030,127.62
8,822.21,31915.0,0,100,7.6322,5612.2,354.24,575.820,210.79
9,745.04,56281.0,0,100,7.0961,1887.1,324.88,180.240,104.79


## Normal localization processing
Now that the data is cleaned up a bit, we'll proceed with our normal processing.

In [8]:
df.describe()

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
count,8688783.0,8688783.0,8688783,8688783.0,8667397.0,8688783.0,8688783.0,8688783.0,8688783.0
mean,34492.653808,34934.850122,0,20092.557556,805529200000.0,3642.38282,344.541196,204.765151,134.693483
std,22168.604059,13035.587121,0,14416.324588,1344216000000000.0,2566.772999,69.656499,824.324347,19.02017
min,7.4708,2.1061,0,100.0,0.44138,1.0,98.01,-29.974,54.0
25%,12279.0,24959.0,0,6723.0,4.9379,2258.5,289.88,97.013,123.72
50%,36313.0,35114.0,0,19466.0,6.6686,3024.9,336.74,125.12,131.6
75%,55547.0,45321.0,0,31090.0,8.4971,4289.5,391.27,175.44,141.51
max,67068.0,67056.0,0,49999.0,3.5605e+18,86565.0,2294.8,454360.0,378.0


In [9]:
FilterLLR  = ds.Filter('loglikelihood', '<', 250)
FilterSig1 = ds.Filter('sigma [nm]',    '>', 115)
FilterSig2 = ds.Filter('sigma [nm]',    '<', 150)
df2        = FilterSig2(FilterSig1(FilterLLR(df)))

In [10]:
df2.describe()

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
count,6163083.0,6163083.0,6163083,6163083.0,6163077.0,6163083.0,6163083.0,6163083.0,6163083.0
mean,34768.443722,35015.085803,0,20069.532182,6.83715,3231.41758,338.149178,123.854047,131.693001
std,21979.205049,12983.403491,0,14174.307946,10.667165,1368.089529,59.360179,42.007099,8.459992
min,70.389,14.671,0,100.0,1.5897,1.0,204.24,-29.974,115.01
25%,12781.0,25080.0,0,7099.0,5.1211,2222.15,285.75,93.209,125.28
50%,36964.0,35343.0,0,19379.0,6.6573,2875.1,329.8,115.52,131.33
75%,55525.0,45305.0,0,30737.0,8.3819,3890.1,387.05,147.0,137.88
max,67060.0,67031.0,0,49999.0,25931.0,21836.0,637.96,249.99,149.99


## Display the 2D histogram to visually identify fiducials

Now we need to make a 2D histogram to see whether the fiducial localizations are apparent.

In [24]:
# Find maximum x or y coordinate
maxPos    = np.max([df2['x [nm]'].max(), df2['y [nm]'].max()])
pixelSize = 100 # nm

numBins = int(maxPos / pixelSize)
plt.hist2d(df2['x [nm]'], df2['y [nm]'], bins = numBins)
plt.show()

In [12]:
plt.close()

In [13]:
# Filter localizations around one of the fiducials for testing
df3 = df2[(df2['x [nm]'] < 4000) & (df2['y [nm]'] > 32000) & (df2['y [nm]'] < 35000)]

In [14]:
df3.describe()

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
count,48035.0,48035.0,48035,48035.0,48035.0,48035.0,48035.0,48035.0,48035.0
mean,1933.308282,33389.368939,0,23115.984345,7.156153,3036.496126,366.468969,131.905763,129.129871
std,1028.43847,842.652218,0,13883.912319,2.192142,1193.398861,64.358035,43.347646,8.489086
min,84.517,32001.0,0,100.0,2.2287,1086.1,230.78,1.8872,115.01
25%,969.73,32605.0,0,9561.5,5.3492,2157.25,304.86,100.24,122.35
50%,2288.3,33530.0,0,24827.0,7.0594,2719.2,371.87,123.14,128.23
75%,2479.2,34044.0,0,34330.5,8.7482,3677.65,419.71,156.48,134.98
max,3999.7,34999.0,0,49998.0,16.708,11865.0,589.08,249.99,149.99


In [16]:
corrector = ds.FiducialDriftCorrect(offTime = 1, minSegmentLength = 15, minFracFiducialLength = 0.1, neighborRadius = 1000)

In [17]:
df4 = corrector(df3)

Frame 49998: 1 trajectories present


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return func(*args, **kwargs)


In [18]:
corrector.avgSpline['xS'].plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f09fbc137f0>

In [None]:
# Find maximum x or y coordinate
maxPos    = np.max([df4['x [nm]'].max(), df4['y [nm]'].max()])
pixelSize = 500 # nm

numBins = int(maxPos / pixelSize)
plt.hist2d(df4['x [nm]'], df4['y [nm]'], bins = numBins)
plt.show()

In [23]:
plt.plot(corrector.fiducialTrajectories[0]['frame'], corrector.fiducialTrajectories[0]['x'])

[<matplotlib.lines.Line2D at 0x7f09fb9f1c50>]

In [26]:
df2

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
0,590.09,24756.0,0,100,8.2979,2400.6,365.81,92.517,133.72
1,685.17,2734.3,0,100,3.3740,7808.7,379.33,243.630,142.11
2,607.30,7347.8,0,100,6.1314,3491.5,332.58,120.590,140.65
3,701.78,29090.0,0,100,9.5573,1899.1,357.28,135.030,127.62
4,887.76,32529.0,0,100,6.5520,3589.7,384.85,108.780,143.63
5,903.38,34491.0,0,100,6.8811,2683.2,372.54,144.680,124.75
6,962.35,6728.2,0,100,7.2275,2431.8,357.69,143.140,121.62
7,996.53,13430.0,0,100,6.7943,3220.3,420.29,120.290,134.69
8,919.37,64056.0,0,100,9.5226,2173.8,325.24,83.735,143.56
9,1119.20,10249.0,0,100,6.3653,3793.8,377.86,198.620,147.14


In [30]:
df2

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
0,590.09,24756.0,0,100,8.2979,2400.6,365.81,92.517,133.72
1,685.17,2734.3,0,100,3.3740,7808.7,379.33,243.630,142.11
2,607.30,7347.8,0,100,6.1314,3491.5,332.58,120.590,140.65
3,701.78,29090.0,0,100,9.5573,1899.1,357.28,135.030,127.62
4,887.76,32529.0,0,100,6.5520,3589.7,384.85,108.780,143.63
5,903.38,34491.0,0,100,6.8811,2683.2,372.54,144.680,124.75
6,962.35,6728.2,0,100,7.2275,2431.8,357.69,143.140,121.62
7,996.53,13430.0,0,100,6.7943,3220.3,420.29,120.290,134.69
8,919.37,64056.0,0,100,9.5226,2173.8,325.24,83.735,143.56
9,1119.20,10249.0,0,100,6.3653,3793.8,377.86,198.620,147.14


In [59]:
# BE SURE TO DROP NULLS FIRST
# df2.dropna(inplace = True)
df2.to_csv('fullData.csv', index = False)

In [55]:
df5 = df2[df2['frame'] > 4000]

In [56]:
df5.to_csv('partialData.csv', index = False)

In [57]:
df2.dropna(inplace = True)

Unnamed: 0,x [nm],y [nm],z [nm],frame,uncertainty [nm],intensity [photon],offset [photon],loglikelihood,sigma [nm]
0,590.09,24756.0,0,100,8.2979,2400.6,365.81,92.517,133.72
1,685.17,2734.3,0,100,3.3740,7808.7,379.33,243.630,142.11
2,607.30,7347.8,0,100,6.1314,3491.5,332.58,120.590,140.65
3,701.78,29090.0,0,100,9.5573,1899.1,357.28,135.030,127.62
4,887.76,32529.0,0,100,6.5520,3589.7,384.85,108.780,143.63
5,903.38,34491.0,0,100,6.8811,2683.2,372.54,144.680,124.75
6,962.35,6728.2,0,100,7.2275,2431.8,357.69,143.140,121.62
7,996.53,13430.0,0,100,6.7943,3220.3,420.29,120.290,134.69
8,919.37,64056.0,0,100,9.5226,2173.8,325.24,83.735,143.56
9,1119.20,10249.0,0,100,6.3653,3793.8,377.86,198.620,147.14
