# Writing custom parsers (WORK IN PROGRESS)
B-Store was designed to work with your data by not enforcing strict rules about file formats. This means, for example, that you are not required to follow a certain column naming convention or to use .csv files when generating your raw data.

While this gives you a lot of flexibility when acquiring your data in the lab, it does come at a cost: you must write your own parser to translate your files into a format that can be organized by B-Store.

B-Store comes with a built-in parser known as a `SimpleParser` to provide out-of-the-box functionality for simple datasets. In this tutorial, we'll write the SimpleParser from scratch to demonstrate how you may write your own parsers for B-Store.

## The logic of B-Store
B-Store was designed to take localization data, widefield images, and metadata and convert them into a format that was easily stored for both human and machine interpretation. This logic is illustrated below:

<img src="../design/dataset_logic.png" width = 50%/>

The role of the `Parser` is take these raw datasets and assign to them a name (known as a `prefix`) that identifies datasets that should be grouped together, such as grouping data from controls and treatments into separate groups. Within these groups, which are known as acquisition groups, each dataset is identified by a number known as the `acqID` and the type of data it contains, the `datasetType`. Finally, there are a number of other fields that may identify the dataset if more precise ID's are required.

When provided with a file, a `Parser` is required to specify the following fields:

- `acqID` - a unique integer for a given prefix
- `prefix` - a string that gives a descriptive name to the dataset
- `datasetType` - one of the strings listed in the `__Types_Of_Atoms__` variable in *config.py*; at the time of writing, these are 'locResults', 'locMetadata', or 'widefieldImage'

Additionally, the `Parser` must provide a way to access the actual data contained in a file. Depending on the `datasetType`, the data from a file is represented internally as one of these data types after loading from memory:

- `locResults` - [Pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe)
- `locMetadata` - [JSON](http://www.json.org/) string-value pairs
- `widefieldImage` - 2D [Numpy](http://www.numpy.org/) array

# The `Parser` interface
The reason that B-Store needs this information is that organization in the database can be automated only if the data matches the database interface. In B-Store, this interface is known as a `DatabaseAtom`.

To ease their creation, a parsers must also implement an interface known as a `Parser`. The `Parser` interface is simply a list of functions that a Python class must implement to be called a `Parser`. Let's start by looking at the code for this interface:

In [1]:
# Import B-Store's parsers module
from bstore import parsers

# Used to retrieve the code
import inspect

In [2]:
print(inspect.getsource(parsers.Parser))

class Parser(metaclass = ABCMeta):
    """Translates files to machine-readable data structures with acq. info.
    
    Attributes
    ----------
    acqID       : int
        The number identifying the Multi-D acquisition for a given prefix name.
    channelID   : str
        The color channel associated with the dataset.
    dateID      : str
        The date of the acquistion in the format YYYY-mm-dd.
    posID       : (int,) or (int, int)
        The position identifier. It is a single element tuple if positions were
        manually set; otherwise, it's a 2-tuple indicating the x and y
        identifiers.
    prefix      : str
        The descriptive name given to the dataset by the user.
    sliceID     : int
        The number identifying the z-axis slice of the dataset.
    datasetType : str
        The type of data contained in the dataset. Can be one of 'locResults',
        'locMetadata', or 'widefieldImage'.
       
    """
    def __init__(self, acqID, channelID, dateID,


Examining the code above, we can see that a `Parser` has two functions:

- `__init__()` : the constructor that assigns the class fields
- `getBasicInfo()` : returns a dictionary with the Parser's information

Furthermore, there are a few functions that are preceded by `abstractproperty` or `abstractmethod` that don't actually do anything (their body's contents only contain the word `pass`). These are the functions and properties that our custom `Parser` must define to work with our data. They are:

- `data` - contains the actual data from a file
- `getDatabaseAtom()` - returns a DatabaseAtom instance that can be put inside a B-Store database
- `parseFilename` - generates the DatabaseAtom ID fields from a file or filename

# Designing the `SimpleParser`

## File naming conventions
For the sake of this tutorial, let's suppose that our acquisition software produces the files that follow this naming convention:

- **prefix_acqID.csv** : `locResults` come in .csv files that with a common name, followed by an underscore, and then an integer identifier. For example, HeLa_2.csv
- **prefix_acqID.txt** : `locMetadata` is found in .txt files with prefixes and acquisition ID's that match their corresponding localization data
- **prefix_acqID.tiff** : `widefieldImage`'s are found in tiff files that also match the corresponding the localization data.

## SimpleParser inputs and outputs
Our `SimpleParser` will be relatively simple to convert these files into a format that B-Store can organize. Hopefully this will give you the main idea about how you may write your own.

The parser's constructor will take no arguments. It's main function, `parseFilename()` will take a string as input that represents a file's name and another string representing the `datasetType` of the file. This function will set the ID fields of the `Parser` and also tell the Parser how to read the data.

Let's write an outline of this class following this design that doesn't actuall do anything.

```python
class SimpleParser(Parser):
    """A simple parser for and extracting acquisition information.
    
    The SimpleParser converts files of the format prefix_acqID.* into
    DatabaseAtoms for insertion into a database. * may represent .csv files
    (for locResults), .json (for locMetadata), and .tiff (for widefieldImages).
    
    """
    def __init__(self):
        pass
    
    def getDatabaseAtom(self):
        pass
    
    def parseFilename(self):
        pass
    
    @property
    def data(self):
        pass 
```

With the skeleton above we have all the functions and the `data` property that are required by the interface. The problem is, there's no actualy functionality at the moment.

### The \_\_init\_\_ constructor
Unlike