# dknlab_tools demo

In [1]:
import dknlab_tools

Let's start by opening some data from an experiment I did a while back. 

dknlab_tools > example_data > tecan_data_double_measurement.xlsx

Here are the data, I took it on the tecan plate reader. From looking at this file, we know that I took an OD500 reading, and a PI fluorescence reading, but that's really it. What's in these wells though? Well for both a human and a computer to know, I need to have filled out a 96-well plate condition map. I've made a standard one that is recognized by the program that you'll have to fill out if you want to use this on your data. 

dknlab_tools > example_data > tecan_condmap_double_measurement.xlsx

I know it looks a little busy, but this actually isn't that hard to fill out, especially when you have copy-paste, and it opens up some really cool functionality later on. So within each well, you provide the strain, the medium, and the conditions. Here you can see that in some wells, I have multiple conditions separated by commas and their concentrations separated by semicolons. This is important because those commas and semicolons are delimiters that the program uses to separate out all this metadata.

From here we see what the experiment actually was. I had multiple different strains in multiple different combinations of phenazines and propidium iodide, all in LB. Great, now we're oriented and we can get cooking.

In [2]:
data = '/Users/John.Ciemniecki/git/dknlab_tools/example_data/tecan_data_double_measurement.xlsx'
cond_map = '/Users/John.Ciemniecki/git/dknlab_tools/example_data/tecan_condmap_double_measurement.xlsx'

In [3]:
dknlab_tools.tecan.import_growthcurves(data, cond_map)

Unnamed: 0,Time [hr],Cycle Nr.,Time [s],Temp. [°C],Well,OD500,PI,Strain,Medium,Condition,Condition Conc. (µM)
0,0.0,1.0,0.000,36.9,A1,0.0742,130,WT,LB,No addition,
1,1.0,2.0,3593.499,37.1,A1,0.0764,128,WT,LB,No addition,
2,2.0,3.0,7193.521,36.9,A1,0.0845,128,WT,LB,No addition,
3,3.0,4.0,10793.533,37.0,A1,0.1320,131,WT,LB,No addition,
4,4.0,5.0,14393.575,37.1,A1,0.2385,128,WT,LB,No addition,
5,5.0,6.0,17993.569,37.1,A1,0.3218,125,WT,LB,No addition,
6,6.0,7.0,21593.562,36.9,A1,0.4309,128,WT,LB,No addition,
7,7.0,8.0,25193.580,37.1,A1,0.5859,115,WT,LB,No addition,
8,8.0,9.0,28793.616,37.1,A1,0.7676,119,WT,LB,No addition,
9,9.0,10.0,32393.628,37.0,A1,0.8864,119,WT,LB,No addition,


Boom! We've got our data imported in what seems to be a special format. It's kinda weird, right? Each row has a time, and OD500, and a PI reading, but then the wells repeat? So this is called a tidy dataframe, and I would love to talk anytime about how it's organized and all the advantages of organizing your data this way, but in the interest of lab meeting time, suffice it to say that when data is organized this way, the computer can work with it lot more easily. Now let's save this in a variable df.

In [4]:
df = dknlab_tools.tecan.import_growthcurves(data, cond_map)

So that's all well and good, be we want to see the data! Let's plot!

In [5]:
dknlab_tools.viz.plot_growthcurves(data=df,
                                   yaxis='OD500',
                                   colorby='Strain',
                                   plotby='Condition')

Cool! So we can see the program intelligently identified all the different strains, separated them by color, and then within those strain collections, separated them again by condition, and plotted each of them on a different plot. In addition, the program automatically recognized the data contained technical replicates, found the mean of those replicates, and plotted all the data. Now we have a great overview to explore. 

By a quick scan, we can see *E. coli* is consistently different from *Pseudomonas* in all the conditions, and oddly, that *Pseudomonas* starts to go wonky after 48 hours in any of the conditions that contained PYO. Let's take a closer look at that. hm, weird, each replicate took a different trajectory. But if we pan along here we see it's pretty much the same otherwise. OK, maybe just some bubbles, who knows, and besides, were interested in the differences in growth. So the way we've plotted it here is nice, but it's hard to see the effect of the conditions on the strains. So let's try something different.

In [6]:
dknlab_tools.viz.plot_growthcurves(data=df,
                                   yaxis='OD500',
                                   colorby='Condition',
                                   plotby='Strain')

Great! We can see that besides the stuff happening past 48 hours, the *Pseudomonas* strains seemed to not care what we threw at them. There's some effect on *E. coli*, but it's kinda hard to see. We can zoom in to make it a little better, but let's turn some off to really get a good idea of what's going on.

[turn off subset of results with legend click]

Pretty cool! As a final note with this dataset, you can also pass in multiple groupings into the plotby or colorby arguments. So for example, if you wanted to plot every unique strain-condition combination separately, you would type:

In [7]:
dknlab_tools.viz.plot_growthcurves(data=df,
                                   yaxis='OD500',
                                   plotby=['Strain', 'Condition']
                                   )

Let's look at one more dataset. I showed you import and plotting of Tecan data, but let's look at some Biotek data. I got this dataset from Lucas (thanks again Lucas), and it's one from his first paper.

dknlab_tools > example_data > biotek_data_triple_measurement.xls

Now the biotek doesn't have a standard output format like the Tecan does, but everyone who's shared datasets with me seems to use this style of output with all the wells as columns and different measurements separated in these vertically stacked chunks. So by popular use, I've decided with zero authority that this is now the official output format of all biotek users that want to use the dknlab_tools package. You must output your data in this format for the program to work, and I hope you can all forgive that necessary act of despotism. Now Lucas has annotated these data a little bit at the top, but we'll still have to fill out the standardized condition map that I showed you before. These annotations won't affect the data import because the program will recognize them as extraneous to the format of this biotek data output. Let's see what kind of experiment Lucas was doing.

dknlab_tools > example_data > biotek_condmap_triple_measurement.xlsx

OK, so we only have ∆*phz* in this experiment with some no cell controls. There's glucose minimal media all around, with different concentrations of glucose, and some cells got PYO but others didn't. Now, notice that unlinke in my condiition map, Lucas stated that there was zero PYO in his samples. In my condition map, when there wasn't PYO, I just didn't even list it. In this case, we'll call what Lucas has done here "verbose", and that will actually add some functionality a little later. Ok, let's get to it.

In [8]:
lucas_data = '/Users/John.Ciemniecki/git/dknlab_tools/example_data/biotek_data_triple_measurement.xls'
lucas_cond_map = '/Users/John.Ciemniecki/git/dknlab_tools/example_data/biotek_condmap_triple_measurement.xlsx'

In [9]:
dknlab_tools.biotek.import_growthcurves(lucas_data, lucas_cond_map, verbose=True)

(      Time [hr] Well  Read 4:500    Strain   Medium  Medium Conc. (mM)  \
 0         0.000   A1       0.067       NaN      NaN                NaN   
 1         0.964   A1       0.070       NaN      NaN                NaN   
 2         1.964   A1       0.074       NaN      NaN                NaN   
 3         2.964   A1       0.083       NaN      NaN                NaN   
 4         3.964   A1       0.100       NaN      NaN                NaN   
 5         4.964   A1       0.139       NaN      NaN                NaN   
 6         5.964   A1       0.224       NaN      NaN                NaN   
 7         6.964   A1       0.330       NaN      NaN                NaN   
 8         7.964   A1       0.412       NaN      NaN                NaN   
 9         8.964   A1       0.486       NaN      NaN                NaN   
 10        9.964   A1       0.542       NaN      NaN                NaN   
 11       10.964   A1       0.600       NaN      NaN                NaN   
 12       11.964   A1    

This output is different than before. The reason is because there's a fundamental difference between the tecan and the biotek: the tecan records time as the start of a measurement cycle, ie, all measurements have the same set of timestamps associated with them. In contrast, the Biotek records time as the start of each unique measurement. So each block of measurements that I pointed out before has to be treated separately, and in this case, we got not one dataframe, but multiple dataframes. So if I save that output and I look at just the first entry, it looks more familiar. 

In [10]:
df_500, df_310, df_533 = dknlab_tools.biotek.import_growthcurves(lucas_data, lucas_cond_map, verbose=True)
df_500

Unnamed: 0,Time [hr],Well,Read 4:500,Strain,Medium,Medium Conc. (mM),PYO Conc. (µM)
0,0.000,A1,0.067,,,,
1,0.964,A1,0.070,,,,
2,1.964,A1,0.074,,,,
3,2.964,A1,0.083,,,,
4,3.964,A1,0.100,,,,
5,4.964,A1,0.139,,,,
6,5.964,A1,0.224,,,,
7,6.964,A1,0.330,,,,
8,7.964,A1,0.412,,,,
9,8.964,A1,0.486,,,,


So by the column we can see this "Read 4 500" label which corresponds to the OD500 reading. We also see well A1 has a lot of "not a number" placeholders stored, which is good, because Lucas didn't have anything in those wells. We also notice that the condition column from before is now replaced: the verbose formatting that Lucas provided allowed the program to make a dedicated column to PYO concentration. Awesome, let's plot with that.

In [11]:
dknlab_tools.viz.plot_growthcurves(data=df_500,
                                   yaxis='Read 4:500',
                                   plotby=['Strain', 'PYO Conc. (µM)'],
                                   colorby='Medium Conc. (mM)')

Interesting. So consistently, Lucas seems to have had a bubble in his wells in the beginning. When did the bubble end? Looks like at the 2 hour mark. We can fix that.

In [12]:
good_times = df_500['Time [hr]'] >= 2.0
df_good500 = df_500[good_times]

In [13]:
dknlab_tools.viz.plot_growthcurves(data=df_good500,
                                   yaxis='Read 4:500',
                                   plotby=['Strain', 'PYO Conc. (µM)'],
                                   colorby='Medium Conc. (mM)')

That's looking good. But there were other data, right? What about PI? For the sake of time, I'll just tell you it's the third dataframe in our collection, here:

In [14]:
df_533

Unnamed: 0,Time [hr],Well,"Read 6:533,617",Strain,Medium,Medium Conc. (mM),PYO Conc. (µM)
0,0.000,A1,47,,,,
1,0.981,A1,53,,,,
2,1.981,A1,54,,,,
3,2.981,A1,56,,,,
4,3.981,A1,61,,,,
5,4.981,A1,54,,,,
6,5.981,A1,55,,,,
7,6.981,A1,59,,,,
8,7.981,A1,59,,,,
9,8.981,A1,74,,,,


In [15]:
dknlab_tools.viz.plot_growthcurves(data=df_533,
                                   yaxis='Read 6:533,617',
                                   colorby=['Strain', 'PYO Conc. (µM)'],
                                   plotby='Medium Conc. (mM)',
                                   yaxis_log=False,
                                   palette=['orange', 'dodgerblue', 'red', 'green'])

Wow! This is a finding! Now let's say Lucas is seeing these data for the first time, and he needs to get Dianne's eyes on this ASAP! Well it's kinda hard to send this notebook to her, she might not have all the stuff she needs installed and it would just be kind of cumbersome even if she did. We could do the handy screen shot, but then we'd only get part of the whole dataset... annoying. Let's not do any of that. Instead,

In [16]:
amazing_discovery = dknlab_tools.viz.plot_growthcurves(data=df_533,
                                   yaxis='Read 6:533,617',
                                   colorby=['Strain', 'PYO Conc. (µM)'],
                                   plotby='Medium Conc. (mM)',
                                   yaxis_log=False,
                                   palette=['orange', 'dodgerblue', 'red', 'green'])

In [17]:
dknlab_tools.viz.save(amazing_discovery, '200731_mingluc_PYO')

Now if we open that file, it will render the exact graphs we just looked at in any browser. While we lose some of the interactivity (no more zoom or panning) we do retain the on-off clicks. And, if you're old school and want to just save it as a static image, you can pass in a 'filetype' argument and output it as a png.

In [18]:
dknlab_tools.viz.save(amazing_discovery, '200731_mingluc_PYO', filetype='png')

And that's all I've got for today! I hope those of you that know python already will use what's been built so far and maybe even build up your own contributions to add to the dknlab_tools package. For those of you that don't already use python I hope you're not intimidated and that this will encourage you to learn at least enough to make use of this program. 

I'll send everyone instructions for getting this all working on your own machines later today, and I hope it makes everyone's growth curve experiments easier and more fun :)

In [19]:
%reload_ext watermark
%watermark -v -p jupyterlab,dknlab_tools,pandas,numpy

CPython 3.7.6
IPython 7.12.0

jupyterlab 1.2.6
dknlab_tools 1.0.1
pandas 0.24.2
numpy 1.18.1
