# Featurizer Performance

In this notebook, we compare the performance for each featurizer and discuss each other their pros and cons. 

## Metric

To benchmark our featurizers, we use the following two metrics for evaluation:

- IoU

<img src="./figs/iou.png" style="float: left" width="200"/>

- Precision

<img src="./figs/precision.png" style="float: left" width="200"/>
Compare IoU of every predicted connected component with every instance segmentation

TP if best IoU for a specific prediction is greater than a threshold (IoU values from 0.5 to 1.0 steps 0.05) else FP

To generate the connected components, we used the function from scipy [(label)](https://www.pydoc.io/pypi/scipy-1.0.1/autoapi/ndimage/measurements/index.html#ndimage.measurements.label). This function takes any non-zero connected components in input as features and zero values as background. 


| Before connected component | After connected component | 
| --- | --- |
| ![](./figs/before_cc.png) |  ![](./figs/after_cc.png) |


## Post processing

The raw segmentation from either featurizers achieved acceptable IoU score yet very low precision. Further analysis reavealed that tiny islands of segmentation are contributing to large number of False Positives. We removed all small islands before further evaluation.


| Before post processing | After post processing | 
| --- | --- |
| ![](./figs/no_remove_island.png) |  ![](./figs/remove_island.png) |
| Before post processing | After post processing |
| ![](./figs/b_remove.png) |  ![](./figs/a_remove.png) |

## Minimum number of training examples

We want to know the minimum number of examples that our users need to label to train a model that is generalizable across their dataset. We have tested the performance of our featurizers when trained with different number of training examples, ranging from 1 to 10. For each trianing example, we pick randomly picked 20% of the pixels for our Random Forest Classifier. 

Based on our experiments, the performance gain plateaus after 4 examples

![](./figs/number_example.png)

## Minimum percentage of pixels per trianing image

After knowing the minimum number of examples needed, we also want to know the minimum percentrage of pixels needed. We have tested the performance of our featurizers when trained with different percentage of trianing pixels, ranging from 0.05 to 1.0. Since the performance is stable with 4 or more examples, we randomly picked 4 training examples to evaluate the performance of varying trianing pixels.

Based on our experiments, the performance gain plataus after 10% selected pixels. 

![](./figs/percent_pixel.png)

## Time to featurize images:

We also want to know the time it takes to featurize images using different featurizers. Since the all of the HPA and nuclei featurizers uses the same UNet architecture, we didn't experiement with the run time for each individual settings. We primary tested the time difference between UNet and image filter featurization, as well as 8 vs 16 feature dimentions. 

|  |  | 
| --- | --- |
| ![](./figs/1_time.png) |  ![](./figs/100_time.png) |


Based on the chart on the left, it might seem like the 8 dimentional filter featurizer requires the least amount of time. It is worth noting, however, that most of the run time for UNet featurizer is used to build the network. As we can see from the chart on the right, our UNet feturizer out performance image filter featurizers when scaled up to 100 training examples. The runtime for UNet is even better for users who have access to GPUs.

## Results

We want to know which featurizer out of all the different configurations produces the best performance.

First, we need to determine if we should take the max or mean across channels for the HPA based Unet featurizer:

![](./figs/test_max_mean.png)

Based on the graph above, the multi channel UNet tend to perform better with mean compression, while the UNet that only outputs the nuclei mask achieved better result with max compression.

We also want to know if the performance gain from using more dimentions (16) is enough to compensate with the run time required. For this experiment, we used the mean/max configuration determined from above for our HPA UNet:

![](./figs/test_8_16.png)

Surprisingly, all the featurizers besides HPA UNet with 3 channels perform better with 8 dimentions instead of 16. This is most likely due to 16 channels overfitting to the dataset used to train the model.

Using the optimal configurations (mean/max, 8/16) for each featurizer, we want to determine the best overall model for segmentify:

![](./figs/test_overall.png)

Even though the Nuclei UNet has by far the best result, it is worth noting that the Nuclei UNet is trained with the same dataset. Another dataset is needed to accruately and unbiasedly evaluate the perfoamnce of Nuclei UNet. 

The second best featurizer is based on the image filters. However, images from the Nuclei dataset has a strong visual difference between background and target, which is favorable for image filters. Again, another dataset is required to draw a better conclusion.

The HPA_4might seem like it has a better performance as compared to HPA_3 based on their average IoUs, but the HPA_3 actually has a much smaller range. In fact, HAP_3 actuall has a higher precision too and run time too.n 

In conclusion, it is still hard to tell which featurizer is the most suitable for Segmentify. Not only does it depend on the user's usecase, the performance need to be evaluated on more dataset to determine the generalizability of each featurizer. 