# Computer Vision: Object Counting

Counting objects as a task is relatively simple for people but it is changelling from a computer vision perspective and in algorithms implementations as it has many degrees of complexity including variations in color, shape, size or texture. And since it is relatively simple from a human perspective it means that deep learning models can help to solve these problems. Counting has many applications and it is one of the fundamental tasks in computer vision (CV). Some possible applications include:

- Surveillance
- Medicine
- Biology
- Microbiology
- Agriculture

Deep learning (DL) methods provide state-of-the-art performance in digital image processing but these methods require collecting a lot of labeled data which is both time consuming and prone to errors. The process to count objects in DL is first to detect them using convolutional neural networks (GCNet) and then count all instances. Although effective as method, it requires bounding box annotations which are hard to obtain. Alternative approaches try to solve this issue by leveraging point-like annotations of objects positions which are much cheaper to collect.

<img src="./fig/object_count_bounding_boxes_01.png" alt="Count with bounding boxes" width="500"/>
<center>Example of bounding boxes</center>

Now we are going to practice on counting objects in images with fully convolutional networks (FCN) trained on data with point-like annotations.

First let's start with counting objects indirectly by estimating a density map (his approach is described in [1]) but we must prepare training samples and for every image we must have a corresponding density map.

A density map can be obtained by applying a convolution with a Gaussian kernel (and normalized so that integrating it gives the number of objects).

<img src="./fig/object_density_map.png" alt="Object Density Map" width="500"/>
<center>Example of bounding boxes</center>

To achieve that we have several architectures to consider. <br>For example let's consider the following FCN architectures, **U-Net** and **Fully Convolutional Regression Network (FCRN)**.

### U-Net
U-Net is a widely used FCN for image segmentation, very often applied to biomedical data. It has autoencoder-like structure. An input image is processed by a block of convolutional layers, followed by a pooling layer (downsampling). This procedure is repeated several times on subsequent blocks outputs, which is demonstrated on the left side of Fig. 4. This way the network encodes (and compresses) the key features of an input image. The second part of U-Net is symmetric, but pooling layers are replaced with upsampling, so that an output dimensions match the size of an input image. The information from higher resolution layers in the downsampling part is passed to corresponding layers in the upsampling part, which allows to reuse learned higher level features to decode contracted layers more precisely.

### FCRN
Fully Convolutional Regression Network (FCRN) was proposed in [7]. The architecture is very similar to U-Net. The main difference is that the information from higher resolution layers from downsampling part is not passed directly to the corresponding layers in upsampling part. In the paper two networks are proposed: FCRN-A and FCRN-B, which differ in downsampling intensity. While FCRN-A perform pooling every convolutional layer, FCRN-B does that every second layer.

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%load_ext watermark
%watermark -v -m -p numpy,pandas,sklearn -g

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import watermark
import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

CPython 3.7.3
IPython 7.8.0

numpy 1.17.2
pandas 0.25.1
sklearn 0.21.3

compiler   : Clang 4.0.1 (tags/RELEASE_401/final)
system     : Darwin
release    : 19.0.0
machine    : x86_64
processor  : i386
CPU cores  : 16
interpreter: 64bit
Git hash   : 3691fc4c659a85976ac343a5349964e06ecc8a71


## 



[1] Weidi, X., Noble, J. A., & Zisserman, A. (2015). Microscopy cell counting with fully convolutional regression networks. In 1st Deep Learning Workshop, Medical Image Computing and Computer-Assisted Intervention (MICCAI).