# UCL BIOC0016 - Bioinformatics
Alan R. Lowe (a.lowe@ucl.ac.uk)

---

### Bioimage informatics / Machine learning

The aim of the exercises in this notebook is to familiarise you with the different steps of evaluating the performance of a simple convolutional neural network (CNN). You will use a real CNN to classify different cell states using image data provided to you. A more sophisticated version of this neural network has been used in recently published research. The architecture of the network is similar to those you will have learnt about in the lectures.

The network is able to tell whether a cell is proliferating or dead.  This could be useful to understand whether a drug is effective or not, or to understand the normal cellular behaviour.

The CNN has been pre-trained on a dataset of real images from data collected at UCL. This is a very simple CNN, and you are going to assess the performance of it by making predictions of the cell state using image data, and comparing these with your ground truth annotation.  The images that you are provided with are called a 'hold out' set, since they have come from the much larger original data set, but have not been used to train the neural network. The CNN has not 'seen' these data during the training phase, and therefore represent a real test of the performance of the network.  

The practical contains following sections:

1. Data annotation
2. Make predictions with a convolutional neural network
3. Compare these with the data that you have annotated
4. Determine the accuracy of the model

### Instructions

Code sections are coloured according to the following scheme:

* <div class="task_red"> Code that needs to be written by you. </div>
* <div class="task_blue"> Code that needs to be edited by you, perhaps changing some parameters. </div>
* <div class="task_green"> A task that needs to be completed by you. This may be recording the results. </div>


### Be part of the research project!

This is based on a real research project, and you can contribute to the project by recording your results as part of the practical. You can read more about the research project [here](http://lowe.cs.ucl.ac.uk/cellx.html).

As you complete different sections of the practical, your annotations will be recorded anonymously. By recording these results, you are contributing to the research programme.


### IMPORTANT NOTES:

1. This notebook will 'timeout' if you do not interact with it for more than 10-20 minutes. Make sure you save your progress occasionally.
2. If the colors above are not showing, you can run the first line of the notebook.

---
### Setting up the environment

In [None]:
import bioimageml

from bioimageml.api import validate_api_token
from bioimageml.dataset import CellDataset
from bioimageml.model import inspect, load_model, predict
from bioimageml.visualise import visualize_confusion_matrix, visualize_random_batch, visualise_outputs, visualize_predictions, visualize_report
from bioimageml.widgets import MitoticInstanceLabeller

Now we need to provide a token to the API. It will be something like this:
```python
validate_api_token("6FxFD67hFDSMw")
```


Please enter the token listed on the moodle page:

In [None]:
validate_api_token("...")

In [None]:
bioimageml._set_css_style()

---
## PART 1 - Data Annotation


To test how well the neural network is able to perform on unseen images, we need to manually annotate some new and unseen images with a label. We can then analyse how many labels the neural network correctly predicts. In this section, you will be provided with a random sample of unlabeled images. You will need at least 50, although the more the better. Using the guide below, please annotate each image with one of the six labels provided. If you are unsure, use the 'Unknown' label.

Here are some examples of cells and their corresponding labels:

![Cell_state_labels](./files/cell_states.png "Cell state labels")

In [None]:
d = CellDataset()

In [None]:
visualize_random_batch(d, batch_size=16)

<div class="task_blue"> <b>TASK</b>: Try changing the number of images by changing the batch size.</div>

Now we're going to start an in-browser widget to enable you to generate some testing labels. A new image will be shown and you can choose which label from those above that you think best represents the image.

Every answer that you submit will be recorded and used to evaluate the model.

In [None]:
widget = MitoticInstanceLabeller(d)
widget.start()

In [None]:
widget.widget

At any point you can see which annotations you've already made, using the following command:

In [None]:
widget.statistics()

When you think you have enough annotations, you can collect the annotations for further analysis, using the following command:

In [None]:
x = widget.annotations

---
## PART 2 - Using the CNN to make predictions

### Loading the pre-trained model

In the following lines of code `load_model()` builds the convolutional neural network (CNN) that we will test, and sets the weights and biases using the pretrained values. This model has been trained with thousands of images, but the network is very simple. The goal is to assess the performance of the network.

In [None]:
model = load_model()

Now that you have loaded the model, try to work out the architecture.

<div class='task_red'> <b>TASK:</b> Summarise the model and get the number of parameters. Answer the following questions:
    <ul>
        <li>How many convolutional layers are there? </li>
        <li>How many kernels are used in each convolutional layer? </li>
        <li>How many output classes are there? </li>
        <li>What is the size of the input image, and after the convolutional layers?</li>
    </ul>
</div>

**HINT**: You can use the command `model` to get the details of the model.

In [None]:
model

## Visualise activations within the network

When the network is given an example, we can visualize the activations within the layes of the network from the sample. To visualize the activations within the network, you can use the `visualize_outputs()` command.

To do this, the function needs three (3) arguments:
```python
inpsect(model, image, layer)
```

You can only use one image at a time, so you need to pass just one image as an argument. You could do this by selecting the first image (0) or the second (1), third (2) etc..:
```python
image = x[0]   # select the first image from the list
```

<br />

<div class='task_red'><b>TASK:</b> visualize the activations within the network for the first, second and third convolutional layers (layer=1, 2 or 3). What can you see? </div>

In [None]:
image = x[0]

In [None]:
activations = inspect(model, [image], layer=1)
visualise_outputs(activations)

Note that the bright regions signify high activations, and the dark regions signify low activations. How many different images are there for one input at this level of the CNN?

### Make the predictions


Now that we have loaded the pre-trained model. We can feed it the same images that you have labeled and generate predictions for the label. The predictions are returned as the probability of the label given the image data, $\text{P}(\text{label} | \text{data})$. In fact, the model returns the probability for *all* labels given the data.

<br />
<div class="task_blue"> <b>TASK:</b> Using the model, make predictions for the images that you have annotated.</div>

In [None]:
predictions = predict(model, x)

Try looking at the predictions by typing `predictions`, what do you see?

In [None]:
predictions

---
## PART 3 - Compare the predictions with your annotations

We can visualize the images and their predictions using one of the helper functions:

<div class='task_blue'><b>TASK:</b> Print out the predictions from the model using the helper functions below. How well do they compare to the annotations you have made? Which predictions is the model confident in, and which ones are less confident? </div>

**HINT:** You can use the following helper functions to do this:
`visualize_predictions(x, predictions)`

In [None]:
visualize_predictions(x, predictions)

### Calculate the confusion matrix

A *confusion matrix* is used to assess how well a multi-label classifier performs. The matrix is square; it has the same number of rows and columns, $n \times n$, where $n$ is the number of classes.

By definition, a confusion matrix $\mathbf{C}$ is such that $\mathbf{C}_{i,j}$ is equal to the number of observations known to be in group $i$ and predicted to be in group $j$.

If the predictions for every label were correct, i.e. all predictions equal the ground truth, then all of the entries will be on the diagonal of the matrix:

![Confusion_matrix](./files/confusion.png "Confusion matrix")

<div class='task_blue'><b>TASK: </b> Use the helper function `calculate_confusion_matrix` to create the confusion matrix. </div>

In [None]:
visualize_confusion_matrix(x, predictions)

Which predictions are good? Which are bad?

---
## PART 4 - Calculate the accuracy of the model using your ground truth data

We need to determine the overall accuracy of the model. We will make several simple calculations:


1. **Accuracy** is a single number that captures the fraction of true predictions among the total number of examples examined.
2. **Precision** is the fraction of correct label $i$ out of all instances where the model predicted a label $i$.  Precision is a good metric when we want to be very sure that our model is making correct predictions.
3. **Recall** is the fraction of correct label $i$ out of all instances where the true label is $i$. Recall is a good metric when we want our model to capture as many positives predictions as possible.

It's important to think about all of these metrics. Ideally we would want our model to correctly predict the label (precision) and to label all occurences (recall).

#### F1-score

The final metric we will calculate is called the $F_1$-score. This metric tries to balance the precision and recall into a single score, for each label of the classifier.  This is important to enable us to assess the performance of the model and determine whether it works well enough for our purposes, or to compare different models.

The $F_{1}$ score can be calculated according to the following equation:

\begin{equation}
F_1 = 2 \times \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}}
\end{equation}

<br />

<div class='task_red'><b>TASK:</b> Use the function below to calculate the F1 score for each class </div>

In [None]:
visualize_report(x, predictions)

Is this a good model? What are the challenges? How would you change things to get a better result?

---

# End of practical

### Further reading

Here are some publications that use these data:

**Learning biophysical determinants of cell fate using deep neural networks**  
Soelistyo C, Vallardi G, Charras G and Lowe AR.  
*Nature Machine Intelligence* (2022)  
[![doi:10.1038/s42256-022-00503-6](https://img.shields.io/badge/doi-10.1038%2Fs42256--022--00503--6-blue)](https://doi.org/10.1038/s42256-022-00503-6)  


**Automated deep lineage tree analysis using a Bayesian single cell tracking approach**  
Ulicna K, Vallardi G, Charras G and Lowe AR.  
*Frontiers in Computer Science* (2021)  
[![doi:10.3389/fcomp.2021.734559](https://img.shields.io/badge/doi-10.3389%2Ffcomp.2021.734559-blue)](https://doi.org/10.3389/fcomp.2021.734559)


**Local cellular neighbourhood controls proliferation in cell competition**  
Bove A, Gradeci D, Fujita Y, Banerjee S, Charras G and Lowe AR.  
*Molecular Biology of the Cell* (2017)  
[![doi:10.1091/mbc.E17-06-0368](https://img.shields.io/badge/doi-10.1091%2Fmbc.E17--06--0368-blue)](https://doi.org/10.1091/mbc.E17-06-0368)