BStore comes with a built-in data Parser known as an MMParser. This Parser understands how to convert data from Micro-Manager acquisitions into a format that works with BStore's DatabaseAtom interface.

Sometimes, it would be nice to override the default read behaviors of MMParser without writing an entirely new Parser to do this. For example, MMParser attempts to read csv files by default when interpreting localization results. It does this by calling the function `read_csv` from the [Pandas](http://pandas.pydata.org/) library. If for some reason you want to change the arguments to the call to `read_csv`, you can do this by passing a custom-written function to the MMParser constructor that tells it how to read data. Or you could use a different read function entirely.

Another example where this could be useful is when changing the column names before writing the DatabaseAtom into a database. BStore's `ConvertHeader` processor could be used to do this. We'll walk through an example of how to do this to demonstrate this functionality.

In [1]:
from bstore.parsers import MMParser
import pandas as pd

# Define the custom read behavior

By default, the MMParser uses a function with a [property decorator](https://docs.python.org/3.5/library/functions.html#property) to read data. If, for example, DefaultParser is the name of your MMParser instance, then running the short line of code `DefaultParser.data` will return a DataFrame when DefaultParser has parsed a locResults file. By default, the data property performs this function:

In [2]:
import inspect

DefaultParser = MMParser()
print(inspect.getsource(DefaultParser._getDataDefault))

    def _getDataDefault(self):
        """Default function used for reading the data in a database atom.
        
        This function defines the default behaviors for reading data.
        It may be overriden by this Parser's constructor to allow for
        more specialized reading, such as converting DataFrame column
        names upon import.
        
        """
        if self._uninitialized:
            raise ParserNotInitializedError(('Error: this parser has not yet'
                                             ' been initialized.'))
        
        if self.datasetType == 'locResults':
            # Loading the csv file when data() is called reduces the
            # chance that large DataFrames do not needlessly
            # remain in memory.
            with open(str(self._fullPath), 'r') as file:            
                df = pd.read_csv(file)
                return df
                
        elif self.datasetType == 'locMetadata':
            # self._metadata is set

The function `_getDataDefault()` simply uses the pandas `pd.read_csv` function to read locResults data into memory. Metadata that is parsed elsewhere inside the class is returned to the user when the dataset type is set to `locMetadata`. Finally, `imread` from the [matplotlib](http://matplotlib.org/) library is used to read image data.

We will rewrite this function to use the `ConvertHeader` processor to change the column names of the DataFrame.

## Write the customGetter function

First, we'll define the mapping between column names using `parsers.FormatMap`. Since the column name **frames** will remain unchanged, we don't need to define a mapping for it.

In [3]:
from bstore.parsers    import FormatMap

myMapping = FormatMap({'x [nm]' : 'x',
                       'y [nm]' : 'y'})

Next, we will define a ConvertHeader processor with this mapping.

In [4]:
from bstore.processors import ConvertHeader
converter = ConvertHeader(mapping = myMapping)

We can now implement the header conversion into a customized getter for when the `data` property is called. We'll use the default getter and modify it slightly.

In [5]:
from matplotlib.pyplot import imread

def customGetter(self):
    # Note that the customGetter requires an argument that holds
    # the class instance, in this case `self`.
    if self._uninitialized:
        raise ParserNotInitializedError(('Error: this parser has not yet'
                                         ' been initialized.'))

    if self.datasetType == 'locResults':
        # Loading the csv file when data is called and convert
        # its column names.
        with open(str(self._fullPath), 'r') as file:            
            df = pd.read_csv(file)
            return converter(df)

    elif self.datasetType == 'locMetadata':
        # self._metadata is set by self._parseLocMetadata
        return self._metadata

    elif self.datasetType == 'widefieldImage':
        # Load the image data only when called
        return imread(str(self._fullPath))

If you look closely, all we did here was change the line `return df` to `return converter(df)`. Now, all that remains is to pass this function to a MMParser instance.

In [6]:
CustomParser = MMParser(dataGetter = customGetter)

# Create the test dataset
The raw localization data will use the column names **x [nm]**, **y [nm]**, and **frames**, but we want to store the data in an `HDFDatabase` with column names **x**, **y**, and **frames**. We can implement this by telling the parser to convert the column names before the DataFrame even reaches the database. Let's start by generating a small test dataset.

In [7]:
# Create a test dataset
test_data = pd.DataFrame({'x [nm]' : [10, 20, 5],
                          'y [nm]' : [20, 5, 10],
                          'frames' : [0, 1, 2]})

# Save it to a csv file without the index column
test_data.to_csv('Cells_TestData_1_MMStack_Pos0_locResults.csv', index = False)

# Display the dataset
test_data.head()

Unnamed: 0,frames,x [nm],y [nm]
0,0,10,20
1,1,20,5
2,2,5,10


# Testing the parsers
When we parse the test data file with the default parser, the column names will remain unchanged.

In [8]:
# Parse the filename and load the data into memory
DefaultParser.parseFilename('Cells_TestData_1_MMStack_Pos0_locResults.csv')
DefaultParser.data

Unnamed: 0,frames,x [nm],y [nm]
0,0,10,20
1,1,20,5
2,2,5,10


And here is what happens when we use our custom parser.

In [9]:
CustomParser.parseFilename('Cells_TestData_1_MMStack_Pos0_locResults.csv')
CustomParser.data

Unnamed: 0,frames,x,y
0,0,10,20
1,1,20,5
2,2,5,10


# Summary
+ We can override the default method used by an MMParser to read in experimental data.
+ To override the default behavior, we pass the MMParser a custom-written function to its constructor like `CustomParser = MMParser(dataGetter = customGetter)`.
+ The custom function must tell the MMParser how to read all the dataset types, like **locResults** and **locMetadata**.
+ This functionality allows you to modify the built-in MMParser without having to write an entirely new Parser yourself.

In [10]:
# Remove the test dataset
%rm 'Cells_TestData_1_MMStack_Pos0_locResults.csv'