This notebook demonstrates how to do batch processing on localization files that fit within memory. This should be the case for most applications. One exception is the very large datasets from large field of view microtubule data; these are sometimes too large to fit into memory and most be processed out-of-core.

Because the drift correction step is interactive, it would occur before the steps taken here. There is another tutorial that describes how to do drift correction in batch. This tutorial assumes the data is already drift-corrected or does not need drift correction.

### Import the software libraries

In [1]:
%pylab
import DataSTORM.processors as proc
import DataSTORM.batch      as batch
import pandas               as pd
from pathlib import Path

Using matplotlib backend: Qt4Agg
Populating the interactive namespace from numpy and matplotlib


# Outline of batch processing steps

1. Define the list of operations that will be performed on the localizations
2. Set the input and output folders
3. Run the batch processor

# Define the list of operations
Let's say we have a folder which contains a few subfolders with localization files in them. Each localization file ends in `.dat` and we want to clean up the data in them, filter by the uncertainty, merge the localizations, and then filter by loglikelihood. We first need to define processors for each step.

In [2]:
cleanup = proc.CleanUp()
filter1 = proc.Filter('uncertainty [nm]', '<', 30)
filter2 = proc.Filter('loglikelihood',    '<', 250)
merger  = proc.Merge(tOff = 1, mergeRadius = 40)

After defining the processors, we build a pipeline by placing each processor in a Python list. The order of processors in the list will be the same order in which their applied to the data.

In [3]:
pipeline = [cleanup,
            filter1,
            merger,  # Data is merged before filter2 is applied!
            filter2]

# Set the input and output directories
The next thing that the batch processor needs to know is the parent directory and where it should save the data. We can set an output directory if we want, though this is optional. The default behavior is to save the data in a subfolder of the parent folder called `processed_data`. If we want the output files to be saved in the same folder(s) as the input files, we set the `useSameFolder` flag of BatchProcessor to True (see a few lines below).

In [4]:
inputData = Path('../test-data/Centrioles/') # All folders in this folder will be searched

# Initialize the batch processor and start it

In [5]:
 # We will use the default output directory
bp = batch.BatchProcessor(inputData,
                          pipeline,
                          useSameFolder = False,
                          suffix = 'locResults.dat') # Look for files ending in 'locResults.dat'

The last step is to start the batch processor.

In [6]:
bp.go()

Frame 101: 74 trajectories present
