# A CNN for cell fate classification of hematopoietic stem cells

A short tutorial/redo of the analysis done in the following paper: 

[Prospective identification of hematopoietic lineage choice by deep learning](http://www.nature.com/nmeth/journal/v14/n4/full/nmeth.4182.html)    
Felix Buggenthin, Florian Buettner, Philipp S Hoppe, Max Endele, Manuel Kroiss, Michael Strasser, Michael Schwarzfischer, Dirk Loeffler, Konstantinos D Kokkaliaris, Oliver Hilsenbeck, Timm Schroeder, Fabian J Theis, Carsten Marr; *Nature Methods 14, 403–406* (2017)


[Code](https://github.com/QSCD/HematoFatePrediction)



In [None]:
import keras
import numpy as np
from talk_utils import tile_raster_images
%matplotlib inline
import matplotlib.pyplot as plt
import pickle
import pandas as pd

## Construct the CNN in  keras
- Conv1
- Conv2
- Conv3
- merge with speed
- fc6
- fc7
- fc8/softmax

![HematoCNN](images/hemato_cnn.png)

In [None]:
from talk_hemato_utils import create_hemato_cnn
CNN = create_hemato_cnn()
CNN.summary()

load the pretrained weights onto the model

In [None]:
import caffe2keras_conversion
CNN = caffe2keras_conversion.load_weights_caffe2keras('../pretrained_hemato_net.hdf5', CNN, bn_trainable=True, other_param_trainable=True)

# loading the retrained weights and the corresponding data
Turns out that the pretrained weights are hard to transfer to keras (Batch normalization etc). Hence, I retrained the network on a subset of the data provided in `images_round3_test_annotated.pickle`. 
<img src="images/latent_cells.png" alt="latent cells" style="width: 600px;"/>

**Note**: This is **VERY different** from the "across-movie" training/prediction done in the original paper.
I instead train and evaluate on the same experiment here (samples in train/testset are still disjunct, but come from the same experiment).
Therefore the results in this notebook are **overoptimistic**. But this notebook serves for demonstration only anyways :)
![roundRobin](images/roundrobin.png)

In [None]:
CNN.load_weights('../retrained_hemato_net.h5')

We need to get the exact same datasplit I used from training, such that our testset is different from the training.


In [None]:
FULL_ANNOTATED = False 

if FULL_ANNOTATED:  # load the full annotation data. careful, 1GB on disc
    with open('../data/retrained_datasplit.pickle', 'rb') as fh:
        X_train,X_val,X_test,\
        y_train,y_val,y_test,\
        mov_train, mov_val, mov_test,\
        cell_train, cell_val, cell_test = pickle.load(fh)
    
    print(" %d train data\n %d val. data\n %d test data" % (len(X_train), len(X_val), len(X_test)))

    # for fast computation, restrict to just 10000 test-samples
    X_test   = X_test[:10000]
    y_test   = y_test[:10000]
    mov_test = mov_test[:10000]
    cell_test= cell_test[:10000]

# load a smaller subset of the testset
else:
    with open('../data_small/small_retrained_datasplit.pickle', 'rb') as fh:
        X_test, y_test, mov_test, cell_test = pickle.load(fh)

print('loaded a testset of %d samples, containing %d cells' % (len(X_test), len(np.unique(cell_test)))) 

# A first look at the data
Let's look at a few examples of the two classes; 
- they look very similar to non-experts
- sometimes differentiated cells can be observed, which are distinct (e.g. megakaryocytes)

In [None]:
X_class0 = X_test[y_test[:,1]==0]
X_class1 = X_test[y_test[:,1]==1]

tile_raster_images(X_class0[:1000,:,:,0], img_dim=(1,2), 
                   tile_shape=(20,50), scale_rows_to_unit_interval=False, figsize=(20,20))

tile_raster_images(X_class1[:1000,:,:,0], img_dim=(1,2), 
                   tile_shape=(20,50), scale_rows_to_unit_interval=False, figsize=(20,20))

in addition we have the movement speed as a feature. Note that speed was already standardized (mean=0, std=1), hence the negative values

In [None]:
plt.hist(mov_test[y_test[:,1]==0], bins = np.linspace(-1,10,100), histtype='step', normed=True);
plt.hist(mov_test[y_test[:,1]==1], bins = np.linspace(-1,10,100), histtype='step', normed=True);
plt.xlabel('Movement speed')
plt.ylabel('Relative frequency')
plt.legend(['Class1', 'Class2']);

**Task 1**: Also compare the two classes in terms of their average intensity

In [None]:
...

**Solution 1**

In [None]:
%load solutions/hemato-01.py

# Prediction of annotated cells



## Single image prediction

**Task 2**
- Predict all samples from the testset (takes ~10sec) 
- look at the histogram of the scores.
- what is the accuracy/confusion matrix
- what is the area under the ROC curve

**Hint**: 
- the model has two inputs, the image and speed. Feed them into the model as a list.
- `sklearn.metrics` has implementations of confusion/AUC already!

In [None]:
%load solutions/hemato-02.py

## Aggregate cells over multiple timepoints
So far, we predict a class for each single cell patch. However, through tracking, we can pool over image patches that belong to the same cell. That should make the classification more robust.

<img src="images/lineage_Score_over_time.png" alt="latent cells" style="width: 600px;"/>




There's a couple of ways to aggregate:
- **hard voting**: each sample is first classified (into 0,1) and then we take the average: I.e 80% of the samples of the cell were classified as 1 -> vote for class 1 
- **soft voting**: we could also first average the class scores of all samples of the same cell, then discretize into [0,1]

- use a **neural network** to do the aggregation. [Buggenthin et al.] use a RNN to also incorporate the time dependence of the images. (We skip this for simplicity)


**Note**: We're somewhat cheating here: The train/val/test split was agnostic of the image patches being linked together. For example, cell 1 could have 10 patches in the training set, and 20 patches in the test set -> **some leakage from training to test** if the patches are strongly correlated and our results are overoptimistic. 

In [Buggenthin et al.], the training/validation/testsets are **different experiments** to avoid this and similar issues!

In [None]:
cellid = cell_test[1]
plt.plot(yhat[cell_test==cellid, 1]); 
plt.xlabel('image patch')
plt.ylabel('lineage score')
plt.title('Cell %d with true label %d' % (cellid, y_test[cell_test==cellid,1][0]))
plt.show()

### Pandas aggregation magic
Let's aggregate the different samples using pandas. 
First put our data (cellid, true labels and predictions into a dataframe)

In [None]:
df = pd.DataFrame(np.hstack([y_test, yhat, cell_test[:,np.newaxis]]), columns=['y0', 'y1', 'score0', 'score1', 'cellid'])
df['yhat'] = (df['score1'] > 0.5).values.astype('int')
df.head()

group the samples by cellid, calculate the mean of each group (very similar to SQL's `GROUP BY`)

In [None]:
aggr = df.groupby('cellid').mean()
aggr = aggr.rename(columns={'score1': 'softvote', 'yhat': 'hardvote'}) 
aggr.head()

In [None]:
# to comply with the usual two column scores
softvoted_yhat = np.vstack([1-aggr['softvote'], aggr['softvote']]).T
hardvoted_yhat = np.vstack([1-aggr['hardvote'], aggr['hardvote']]).T
voted_y = np.vstack([aggr['y0'], aggr['y1']]).T

putting that all together into a single function

In [None]:
def aggregate_cell_scores(df):
    "aggregates all scores from the same cell, doing either soft or hard voting"
    df['yhat'] = (df['score1'] > 0.5).values.astype('int')
    aggr = df.groupby('cellid').mean()
    aggr = aggr.rename(columns={'score1': 'softvote1', 'yhat': 'hardvote1'})
    aggr['softvote0'] = 1-aggr['softvote1']
    aggr['hardvote0'] = 1-aggr['hardvote1']

    return aggr

### Soft voting

In [None]:
aggr = aggregate_cell_scores(df)
aggr.head()

In [None]:
plt.figure()
plot_confusion_matrix(aggr[['softvote0', 'softvote1']].values,  
                      aggr[['y0', 'y1']].values, 
                      classes=[0,1]);

get_auc(aggr['softvote1'].values, aggr['y1'], do_plot=True)

### Hard voting

In [None]:
plt.figure()
plot_confusion_matrix(aggr[['hardvote0', 'hardvote1']].values,  
                      aggr[['y0', 'y1']].values, 
                      classes=[0,1]);

get_auc(aggr['hardvote1'].values, aggr['y1'], do_plot=True)

# Applying the CNN to "latent" cells
So far, we trained the NN on cells that expressed some cell fate markers (hence they were already differentiated).
The most important contribution of [Buggenthin et al.] is that this classifier can also be applied to cells expressing no marker yet (latent), and still correctly predict what will happen in the future (we know the future via tracking)

<img src="images/latent_cells.png" alt="latent cells" style="width: 600px;"/>


In [None]:
# small dataset, containing only 100 cells
dataset_fname = '../data_small/small_images_round3_test_latent_inverted_generations.pickle'  

# thats the full dataset, careful, its about 1GB on disc
# dataset_fname = '../data/images_round3_test_latent_inverted_generations.pickle' 

with open(dataset_fname, 'rb') as fh:
    X_l, y_l, movement_l, cellIDs_l, gens_l = pickle.load(fh)
print("%d patches, %d cells in total" % (len(X_l), len(np.unique(cellIDs_l))))

Put the "meta"-data into a pandas dataframe. This will become very handy later, esp. the **inverted generation**, i.e. the position of the cell wrt to marker onset in the tree (inv.gen=-2 means that the cell fate marker will turn on two generations downstream of that cell)

In [None]:
df = pd.DataFrame(np.hstack([y_l, cellIDs_l[:,np.newaxis], gens_l[:,np.newaxis]]), columns=['y0', 'y1', 'cellid', 'gens'])
df.head()
del cellIDs_l, gens_l  # just that we dont use these by accident later on

## Prediction of latent cells
Takes 2min on a Quadcore

In [None]:
%time yhat_l = CNN.predict([X_l, movement_l], batch_size=128, verbose=1)

df['score0'] = yhat_l[:,0]  # put the predictions into the dataframe
df['score1'] = yhat_l[:,1]

## Aggregation + AUC
again, we want to aggregate the scores of patches belonging to the same cell.

What's the distribution of patch scores for a single cell:

In [None]:
cellid = df.cellid[0]
df_cell = df.query("cellid==@cellid")
plt.hist(df_cell['score1'],100);
plt.title('Cell %d (%d patches) with true label %f' % (cellid, len(df_cell),df_cell.y1.values[0]))
plt.xlabel('lineage score');

**Task 3**: Aggregate the predictions, calculate the AUC and compare to unaggregated AUC 

In [None]:
...

**Solution 3**

In [None]:
%load solutions/hemato-03.py

Finally, let's check how our predictive performance changes as we move further away from the observed marker onset.
However, keep in mind that the further into the past we go, the less cells we have to evaluate the performance on!
<img src="images/latent_cells.png" alt="latent cells" style="width: 600px;"/>


**Task 4**: 
- stratify the samples according to their inverted generation (`df['gens']`)
- calculate the AUC. How long before marker onset is their cell fate predictable?

In [None]:
...

**Solution 4**

In [None]:
%load solutions/hemato-04.py

**Note**: Again, keep in mind that we didnt evaluate on a totally different testset (I trained the classifier on a subset of the "annotated" cells in a single experiment). So applying the classifier to "latent" cells of the same movie might be **too optimistic** (e.g. overfitting on lighting conditions etc).