# Welcome to the Biotek Package

The Biotek package of `murraylab_tools` is designed to make your life easier when analyzing Biotek time series data. Specifically, murraylab_tools.Biotek converts Biotek data to tidy format for easy analysis with Pandas and easy plotting with Seaborn. The package also contains a few convenience functions for some of the things you're likely to do with TX-TL time series data, currently including:
* Background subtraction
* Endpoint summarization.

### Data Tidying

Converting from Biotek output to a tidy, Pandas-readable format is simple:

In [1]:
import murraylab_tools.biotek as mt_biotek
import os

# Note that the input file must be from excel output of a Biotek experiment,
# saved in CSV format. 
data_filename = os.path.join("biotek_examples", "RFP_GFP_traces.csv")
mt_biotek.tidy_biotek_data(data_filename)

A tidified version of the data will be created with the same name and location as the original time trace file, with "_tidy" appended to the end of the name (pre-suffix).

Usually you will also want access to some meta-data on your experiment -- most importantly, what plasmids were put in each well, in what concentration. You can add this metadata in the form of a supplementary spreadsheet. The first column of the supplementary spreadsheet is assumed to contain a well number (e.g., "D4" or "A08"). Every other column contains some kind of metadata keyed to that well number, with a name of that metadata given in a header row. You can write that supplementary data file yourself in Excel or notepad. 

Here is an example of how to write a supplementary file programmatically. This experiment contains three replicated 2D titrations of a GFP plasmid on one axis (at concentrations of 0.25, 0.5, 1, and 2 nM) and an RFP plasmid on the other axis (at concentrations of 0.5, 1, 2, and 4 nM).

In [2]:
import csv, string

rfp_concs = [0.44, 0.88, 2.20, 3.97]
gfp_concs = [0.24, 0.49, 0.98, 2.01]
replicates = [1,2,3]

supplementary_filename = os.path.join("biotek_examples", "RFP_GFP_supplementary.csv")
with open(supplementary_filename, 'w') as outfile:
    writer = csv.writer(outfile)
    writer.writerow(['Well', 'RFP Plasmid (nM)', 'GFP Plasmid (nM)', 'Replicate'])
    for row in range(4):
        for col in range(4):
            well_names = [string.ascii_uppercase[row] + "%d" % (col+6),
                          string.ascii_uppercase[row] + "%d" % (col+11),
                          string.ascii_uppercase[row+6] + "%d" % (col+1)]
            for rep in replicates:
                writer.writerow([well_names[rep-1], rfp_concs[row], gfp_concs[col], 
                                  rep])
    # Also include metadata for three negative control wells.
    writer.writerow(["E10", 0, 0, 1])
    writer.writerow(["E15", 0, 0, 2])
    writer.writerow(["K5", 0, 0, 3])
                

and to use that supplementary data:

In [3]:
mt_biotek.tidy_biotek_data(data_filename, supplementary_filename)

Now we can easily read our Biotek data using Pandas:

In [4]:
import pandas as pd

tidy_filename = os.path.join("biotek_examples", "RFP_GFP_traces_tidy.csv")
df = pd.read_csv(tidy_filename)

Let's peek at the data in its tidy format:

In [5]:
df.head()

Unnamed: 0,Channel,Gain,Time (sec),Well,AFU,uM,Excitation,Emission,Replicate,GFP Plasmid (nM),RFP Plasmid (nM)
RFP,100,0,0.0,A6,16,0.009473,580,610,1,0.24,0.44
RFP,100,0,0.0,A7,13,0.007697,580,610,1,0.49,0.44
RFP,100,0,0.0,A8,7,0.004144,580,610,1,0.98,0.44
RFP,100,0,0.0,A9,12,0.007105,580,610,1,2.01,0.44
RFP,100,0,0.0,A11,10,0.005921,580,610,2,0.24,0.44


As you can see, each row of tidy data describes one well's read value for a single channel and gain at a single time, along with some metadata about that well.

Now we can use Seaborn to start plotting data quickly and (relatively) easily:

### Automatic Unit Conversion

Note in the example above 

# Convenience Functions

### Background Subtraction

### Endpoint Averaging

### Manual Unit Conversion