# Explore Mother Machine Data

We have an experiment where two carbon switches were done: details to come)

---

## Import packages

Before starting the code we need to import all the required packages.

We use a number of important Python packages:
- [Numpy](https://numpy.org): Goto package for vector/matrix based calculations (heavily inspired by Matlab)
- [Pandas](https://pandas.pydata.org): Goto package for handling data tables (heavily inspired by R) 
- [Matplotlib](https://matplotlib.org): Goto package for plotting data
- [Seaborn](https://seaborn.pydata.org): Fancy plots made easy (Similar to ggplot in R)
- [pathlib](https://docs.python.org/3/library/pathlib.html): Path handling made easy

In [None]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
%gui qt

import pandas as pd
import numpy as np
import seaborn as sns

import matplotlib.pyplot as plt

import pathlib

---

## Import Data from BACMMAN

Set the path to the exported csv file from Bacmman to load it into Python.

Note there is also a python package that allows for direct interactions between python and Bacmman, for example to find and select problematic/interesting cells which you want to manually correct, a detailed explanation can be found [here](https://github.com/jeanollion/bacmman/wiki/Selections#create-selections-from-python).

If needed you can manually edit segmentation and tracking using the Bacmman GUI, see [here](https://github.com/jeanollion/bacmman/wiki/Data-Curation) for instructions. You can also look [at this screencast](https://www.github.com/jeanollion/bacmman/wiki/resources/screencast/manual_correction_dataset2.webm).

For time reasons we will skip these steps and just use the data as is.

In [None]:
root = pathlib.Path(pathlib.Path.home(), 'I2ICourse/')
proj_dir = (root / 'Project2C')

data_set_name = "MM_test" # change to the actual name of the dataset
objectClassIdx = 1 # 1 is for the object class #1 = bacteria

file_name =  '%s_%i.csv' %(data_set_name,objectClassIdx) 
file_path = proj_dir / file_name

print(file_path)

Now we  read this in with Pandas

In [None]:
df = pd.read_csv(file_path, sep=';') 

---

## Inspect Bacmman data format

Let's have a look at how Bacmman stores cell property data.

In [None]:
df.head()

There is quite some info here, but it is a bit obscure:
- `Position` is the name of the position (image)
- `PositionIdx` is an integer keeping track of which position you are in 
- `Indices` corresponds to `frame_nr - channel_nr - cell-nr`
- `Frame` is frame nr
- `Idx` is cell nr (1 = mother cells)
- `Bacteria` lineage keeps track of cell lineage (after each division a letter is added)

Annoyingly there is no field for channel, so let's add it. 

> **Exercise** 
> 
> Think about how you could do this
> 
> Hint: you can use python package [`re`](https://docs.python.org/3/library/re.html#) to extract it from the `Indices` field

In [None]:
import re
ChIdx = [int(re.split("\-",ind)[1]) for ind in df['Indices']]
df['ChannelIdx'] = ChIdx
df.head()

---

## Inspect cell lineage information
Now let's look at the mother cell and first offspring in the first channel. Try to understand how  lineages are connected.

As you might notice lineages in different channels have the same BacteriaLineage code. Often it is very useful to have a unique lineage id, a number that is constant throughout a cell's life and that only occurs once within the data table. Can you come up with a good idea of how to implement this?

In [None]:
df.loc[(df['PositionIdx']==0) & (df['ChannelIdx']==2) & (df['Idx']<1) & (df['Frame']<6)]

To uniquely id a cell linage we need three pieces of info
- `Position-idx`
- `Channel-idx`
- `Bacteria-Lineage`

> **Exercise** 
> Think about how you could add a unique lineage id to the dataframe

Below we give an example of how to combine these fields to make a unique identifier.

In [None]:
#combine PositionIdx-ChannelIdx-BacteriaLineage into single string and add string lin_id_str property
df['lin_id_str'] = df['PositionIdx'].map(str) + '-' + df['ChannelIdx'].map(str) + '-' + df['BacteriaLineage'].map(str)

#show data-frame
df.head()

Now we can extract a cell lineage (e.g. 0-2-AH, which is position 0, channel 2, first offspring of mother cell) as:

In [None]:
df_sub = df.loc[df['lin_id_str']=='0-2-AH']
df_sub.head()

---

## Save data to disk
This would be a good time to save your data. 

In [None]:
save_name = proj_dir / 'cell_data.pkl'
df.to_pickle(save_name)

---

## Next Step: Explore data

**Before the next step the Tutors will give an introduction, if you are ready for this step please let them know!**

We continue in the next notebook `1_explore_data_bacmman`