# Loading the data
No much to say here

In [None]:
from loading_data import loading_data
from IPython.display import display
path_to_data = './Data/'
(lin_tree, fates, fates2,
 fates3, vol, inv_lin_tree,
 surf_ex, surfaces, names,
 properties, ColorMap) = loading_data(path_to_data)

# Lineage tree data structure manipulation

## Cell ids

A cell id is a unique identifier for every cell at a given time point.

These ids are not random and from them can be extracted their time origin and their unique identifiant in their image origin (this id is not unique accross the whole movie).

- `cell_id = origin_time * 10**4 + origin_label`

so if you want to get back the time and the label of a cell you can do it that way:
- `origin_label = cell_id%10**4` is the corresonding label in the image
- `origin_time = cell_id//10**4` is the corresponding time (note that // is the floor division)

Practically, if a cell id is `1230341` then it means that its time in the movie is `123` and its label in the corresponding segmented image is `341`

## Dictionary data structure
[Dictionary python doc 1](https://docs.python.org/2/tutorial/datastructures.html#dictionaries)
[Dictionary python doc 2](https://docs.python.org/2/library/stdtypes.html#dict)

[A bit more on dictionaries](http://www.tutorialspoint.com/python/python_dictionary.htm)

The main methods that we will use are the following:
- [iteritems](https://docs.python.org/2/library/stdtypes.html#dict.iteritems)
- [get](https://docs.python.org/2/library/stdtypes.html#dict.get)
- [setdefault](https://docs.python.org/2/library/stdtypes.html#dict.setdefault)

## Lineage tree data structure

The lineage tree data structure is a dictionary where the key is a cell id at time `t` and the key is the list of its successors at time `t+1`.

Lets have a cell with the id `920257`

In [None]:
cell_id = 920257

Then its successors can be found that way:

In [None]:
succ = lin_tree[cell_id]
print(succ)

In general, in our dataset (and generally in any of the datasets that we will generate during this course) the number of successors is either `1` or `2`

In the previous case `cell_id` has `1` successor meaning that the cell didn't divide.

If we now look at successors of the cell `1180194`

In [None]:
succ2 = lin_tree[1180194]
print(succ2)

It has two successors meaning that the cell divided into two sister cells.

Therefore the lenght of the list of the successor of a given cell give the information about whether or not the cell will divide at the next time point.

Knowing that if we want to gather all the cell_ids that will divide at the next time point in a list, it can be done the following way:

In [None]:
list_of_dividing_cells = []
for cell_id, succ in lin_tree.iteritems():
    if len(succ) == 2:
        list_of_dividing_cells.append(cell_id)

`list_of_dividing_cells` contains all the cell ids of the cell that will divide at the next time point.

An other way to build this list, that is more a "pythonic" way and that is supposed to be faster, is the following:

In [None]:
list_of_dividing_cells_2 = [cell_id for cell_id, succ in lin_tree.iteritems() if len(succ) == 2]

if list_of_dividing_cells == list_of_dividing_cells_2:
    print "The two lists are equal."
else:
    print "The two lists are not equal."

print "Number of dividing cell accross the movie: %d"%len(list_of_dividing_cells)

## Basic operation on lineage tree (and tree graphs in general)

We will look at
- how to extract all the cell_ids of a given cell (meaning from when it appears to when it divides)
- how to extract all the cell_ids of the progeny of a given cell

### How to extract all the cell_ids of a given cell
- from the first cell id of the cell cycle:

We first get a cell that just divided:

In [None]:
cell_id = lin_tree[list_of_dividing_cells[100]][0]

Then, the idea here is to retreive all the successors until we get a division:

In [None]:
# Initialisation of the list of the cell cycle with the first cell
cell_cycle = [cell_id]

# The current cell is the first cell
current_cell = cell_id

# While current cell does not divide at the next time point,
while len(lin_tree.get(current_cell, [])) == 1:
    # we update the current cell to its successor
    current_cell = lin_tree[current_cell][0]
    # we add the successor to the list of the ids
    cell_cycle.append(current_cell)

An other version (slightly) more efficient (less readable)

In [None]:
cell_cycle2 = [cell_id]
while len(lin_tree.get(cell_cycle2[-1], [])) == 1:
    cell_cycle2 += lin_tree[ cell_cycle2[-1] ]
    
print(cell_cycle == cell_cycle2)

- from the last cell of the cell cycle

We first get a cell that will divide

In [None]:
cell_id = cell_cycle[-1]

Then again the idea is to retreive all the predecessors until "fusion" to its sister cell to form the mother

In [None]:
cell_cycle_reverse = [cell_id]
current_cell = inv_lin_tree[cell_id]

# while the current cell does not divide at the next time point
# we add it to the list and get its predecessor
while len(lin_tree.get(current_cell, [])) == 1:
    cell_cycle_reverse.append(current_cell)
    current_cell = inv_lin_tree[current_cell]

An other version (slightly) more efficient (less readable)

In [None]:
cell_cycle_reverse2 = [cell_id]
while len(lin_tree.get(inv_lin_tree[cell_cycle_reverse2[-1]], [])) == 1:
    cell_cycle_reverse2.append(inv_lin_tree[cell_cycle_reverse2[-1]])
    
print("Check if the two lists are the same")
print(cell_cycle_reverse == cell_cycle_reverse2)

print("Check if the two created list are the same in the reverse order")
print(cell_cycle_reverse == cell_cycle[::-1])

- From any given cell during the cell cycle

We first get a cell id in the cell cycle:

In [None]:
cell_id = cell_cycle[10]

Then the idea to combine the two previous methods:

In [None]:
# I am a bit lazy but you probably ca do it yourself as an exercice
# I just start for you:
cell_cycle_mid = []

In [None]:
# your code here. Careful there is a trick with the order of the list.
# At some point you might want to insert cell ids to your list instead of appending them

In [None]:
# Then we check if you are correct:
print(cell_cycle_mid == cell_cycle)

### Getting all the clonal cells from a given cell id
To do so we will do what is called a [Depth-First Search](https://en.wikipedia.org/wiki/Depth-first_search) on our lineage tree from our cell id.

In [None]:
cell_id = cell_cycle[0]

clonal_cells = []
cells_to_treat = [cell_id]

while cells_to_treat != []:
    current_cell = cells_to_treat.pop(-1)
    clonal_cells.append(current_cell)
    cells_to_treat += lin_tree.get(current_cell, [])

# Accessing cell properties

All the cell properties are stored in dictionaries where the key is a cell id for which is mapped its metric.

For example `vol` is the dictionary for the volumes and the volume of the cell `cell_id` can be retrieve that way:

In [None]:
vol[cell_id]

The volume is in voxels. Knowing the dimension of the voxels it is possible to convert the volume in $\mu m^3$ (here it is $ 0.3\mu m \times 0.3\mu m \times 0.3\mu m $ ):

In [None]:
vol[cell_id]*.3**3

Now it is possible to retrieve all the volumes for a given cell through its cell cycle.

In [None]:
cell_cycle_volume = []
# the cell_cycle list is the list previously generated
for c in cell_cycle:
    cell_cycle_volume.append(vol[c])
    
# An other more pythonic way:
cell_cycle_volume = [vol[c] for c in cell_cycle]

Using the numpy library we can then compute some statistics on these volumes:

In [None]:
import numpy as np

print("The average:")
print(np.mean(cell_cycle_volume))

print("The median:")
print(np.median(cell_cycle_volume))

print("The standard deviation:")
print(np.std(cell_cycle_volume))

And we can plot the evolution of volume using matplotlib tools:

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline

plt.plot(cell_cycle_volume)

### Accessing the surface of contact

The dictionary that link a cell_id to its surface of contact is slightly more complicated since it is a dictionary of dictionaries.

This dictionary is called `surf_ex` (for surface exchange).

This is the value for a given cell id:

In [None]:
surf_ex[cell_id]

So it is a dictionary, this dictionary contains as keys all the ids of the neighbors of this cell id at this given time point.

To these neighbors is mapped the surface of contact they share with the cell `cell_id`.
The cell with the id `time * 10**4 + 1` is the "cell" that represent the exterior.

Therefore the cell `cell_id` shares an area of contact with the exterior of:

In [None]:
cell_id_time = cell_id//10**4
surf_ex[cell_id][cell_id_time * 10**4 + 1]

Which gives in $\mu m^2$:

In [None]:
surf_ex[cell_id][cell_id_time * 10**4 + 1] * .3**2

### Accessing the different cell fates

The dictionary to access the cell fates is `fates3`
It has cell ids has keys to which it maps a string that represent this cell fate.

In [None]:
fates3[cell_id]

Since some cells might not have their fate determined at all times during the development not all cells has a mapped fate in the `fate3` dictionary:

In [None]:
cell_with_no_fate = 410097

print(fates3.has_key(cell_with_no_fate))

To avoid execution errors it is preferable to call the fate dictionary that way:

In [None]:
print(fates3.get(cell_id, 'Undetermined'))
print(fates3.get(cell_with_no_fate, 'Undetermined'))

Eventhough the cell `cell_with_no_fate` does not have a fate specified yet, it is possible that its daughter is:

In [None]:
daughter1, daughter2 = lin_tree[cell_with_no_fate]
print('Fate of the first daughter:')
print(fates3.get(daughter1, 'Undetermined'))
print('Fate of the seconde daughter:')
print(fates3.get(daughter2, 'Undetermined'))

### Exercice:

We can now try to compute the surface of contact between the cell `cell_id` and to the 'Epidermis Tail' tissue

In [None]:
tissue_name = 'Epidermis Tail'
surface_of_contact = surf_ex[cell_id]

# Now compute how much the cell cell_id share with the Tail epidermis tissue