# Large-scale Content-Based Image Retrieval
---

## 1. Introduction

#### 1.1 The image retrieval problem
<img src="images/CBIR.png" height="10" width="400" align="right">


Content-based image retrieval (CBIR) is the process of searching for images in a large database, given a query of search. Technically, there are three key issue in CBIR:  
  1. Image representation
  2. Database organisation
  3. Image distance measurement



We can further specify the definition of CBIR above.

> CIBR makes use of the ***representation*** of visual content to identify relevant images in a database.


<u>In this lesson we will focus on a **Query-based Image Retrieval** problem, which uses an example image as query.</u>
<br>

> At the end of this session, we expect to be able to have a function as below.  
As we go through the notebook we will learn all the ingredients to implement such function.

In [87]:
def retrieve(database, query):
    raise NotImplemented

#### 1.2 CBIR pipeline
We can distinguish two main stages in a CBIR framework: an **offline** stage, and an **online** stage [ref].  

<div style="img {align: left}">
<img src="images/pipeline.png">
<em>Add image caption with reference</em>
</div>

The objective of the **offline** stage is to use the image stack to build an indexed database. We tend to concentrate the computational effort in this stage of creation, to ease, instead, the online stage.
It is concerned with two main operations:
- Create a representation of the images
- Efficiently index the images 



Given a *query* image, the objective of the **online** stage is to score part (or all) of the images in the database, and return the ones with the higest scores.



This is how we would implement a CBIR model.  
We will divide the CBIR into different functionalities and conquer each functionality as we go through the notebook. In the last part of the session, we will put everything together.

In [96]:
class CBIR:
    def __init__(self, dataset):
        self.dataset = dataset
        self.database = None  # we haven't built an indexed database yet
        return
    
    #-- TASK 1
    def extract_features(self, image):
        # Implement me using SIFT, please
        raise NotImplemented
    
    #-- TASK 2
    def encode(self, image):
        # Implement me using extract_features and BOW, please
        raise NotImplemented

    #-- TASK 3
    def build_index(self, dataset):
        # Implement me using vocabulary tree, please
        raise NotImplemented
    
    #-- TASK 4
    def score(self, image):
        raise NotImplemented
    
    #-- TASK 5
    def retrieve(self, query):
        # this is the function we have described above
        raise NotImplemented


## 2. Offline stage: building the database

**2.1 Features extraction**

<div style="margin:auto; float:right; margin-left: 50px; width: 45%">
<img src="images/features_extraction_only.jpeg">
<em>Add image caption with reference Add image caption with reference Add image caption with reference Add image caption with reference Add image caption with reference Add image caption with reference Add image caption with reference</em>
</div>


<p style="color: #a00; font-weight: 700">>> TASK 1</p>

> In this section we are going to learn how to implement the function:
```python
def extract_features(self, image):
    raise features
```

We implement features extraction using the [Scale Invariant Feature Transform (SIFT)](https://link.springer.com/content/pdf/10.1023/B:VISI.0000029664.99615.94.pdf) method. SIFT is an image descriptor for image-based matching and recognition developed by David Lowe (1999, 2004). This descriptor as well as related image descriptors are used for a large number of purposes in computer vision related to point matching between different views of a 3-D scene and view-based object recognition. The SIFT descriptor is invariant to translations, rotations and scaling transformations in the image domain and robust to moderate perspective transformations and illumination variations.[ref]

In [91]:
def extract_features(self, image):
    # Implement me using SIFT, please
    raise NotImplemented

<div style="margin: auto; float: left; margin-right: 50px; width: 19%">
<img src="images/bow.png">
<em>Add image caption with reference</em>
</div>

**2.2 Image representation**
<div style="margin-left: 230px">
<br>
    
    
<p style="color: #a00; font-weight: 700">>> TASK 2</p>
    
> In this section we are going to learn how to implement the function:
```python
def encode(self, image):
    return embedding
```
</div>

In CBIR an image is transformed to some kind of feature space. The motivation is to achieve an implicit alignment so as to eliminate the impact of background and potential transformations or changes while keeping the intrinsic visual content distinguishable. [ref]


Once we have obtained the features of an image, ee will implement the "Bag of visual words" model [ref]. It borrows the cocept of "bag of words" from the natural language processing field, and implements its visual correspondent.

In [92]:
def encode(self, image):
    # Implement me using extract_features and BOW, please
    raise NotImplemented

#### 1.3. Features indexing

<div style="margin:auto; float:right; margin-left: 50px; width: 30%">
<img src="images/vocabulary_tree.png">
<em>Add image caption with reference</em>
</div>
<br>
    
<p style="color: #a00; font-weight: 700">>> TASK 3</p>

> In this section we are going to learn how to implement the function:
```python
def build_index(dataset):
    return indexed_database
```

In this section we provide a structure to the set of image representations collected in **1.2**.  
We implement the [**Vocabulary tree structure**](https://ieeexplore.ieee.org/document/1641018) illustrated by Nister [ref], which uses inverted indices and hierarchical k-means to build the graph.

In [88]:
def build_index(dataset):
    # Implement me using vocabulary tree, please
    raise NotImplemented

## 2. Online stage: get the _n_ most similar instances from a query image

**2.1 Image Scoring**  
<!-- <div style="margin-left: 230px"> -->
<br>


<p style="color: #a00; font-weight: 700">>> TASK 4</p>

> In this section we are going to learn how to implement the function:
```python
def score(database, image):
    return score
```
<!-- </div> -->


In [5]:
# We add some code here

**2.2 Reindexing**

Maybe this is not necessary

In [6]:
# We add some code here

## 3. End-to-end image retrieval
Here we bring together all the concepts we have illustrated above to build our Large-Scale CBIR system.

In [12]:
# We add some code here

## 4. [Optional] Features extraction with Deep Convolutional Neural Networks
The excercises below this point are not mandatory. They provide a wider picture on how to build an efficient image representation. We will illustrate two techniques that will require elements of Deep Learning:
- 4.1 Using pretrained deep artificial neural networks to build a representation of the image
- 4.2 Fine tune a pretrained model on our database

**4.1 Using a pre-trained network**

We can either:
- Illustrate http://www.cs.toronto.edu/~fritz/absps/esann-deep-final.pdf and use denoising autoencoders,
- Or go more basic and use a DCNN (https://arxiv.org/abs/1404.1777)

Regardless, I think that the best option is to use a pre-trained network - ResNet50?.  

We can add also another step, if we have the time and people have the will, in wich we fine tune the network (only last layer?) on our database

In [14]:
# we add some code here

**4.2 Fine-tuning a pre-trained network**

In [13]:
# We add some code here

In [101]:
from IPython.display import Markdown as md
md(open("../README.md", "r").read())

# sbearbank

### Literature

#### Datasets:
- [Hamming embedding and weak geometricconsistency for large scale image search (INRIA Holydays)](http://lear.inrialpes.fr/people/jegou/data.php#holidays) - [download](https://lear.inrialpes.fr/pubs/2008/JDS08/jegou_hewgc08.pdf)
- [Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset](https://arxiv.org/abs/1906.04087)
- [INSTRE: a New Benchmark for Instance-Level Object Retrieval and Recognition](https://dl.acm.org/doi/pdf/10.1145/2700292)

#### Features search:
- [PQk-means: Billion-scale Clustering forProduct-quantized Codes](https://arxiv.org/pdf/1709.03708.pdf)
- [Scalable Recognition with a Vocabulary Tree](https://ieeexplore.ieee.org/document/1641018)
- [Object retrieval with large vocabularies and fast spatial matching](https://ieeexplore.ieee.org/document/4270197)

#### Features extraction:
- [Neural Codes for Image Retrieval](https://arxiv.org/pdf/1404.1777.pdf)
- [Scale-Invariant Feature Transform (SIFT)](https://pdfs.semanticscholar.org/0129/3b985b17154fbb178cd1f944ce3cc4fc9266.pdf)
- [Speeded Up Ro- bust Feature (SURF)](https://www.vision.ee.ethz.ch/~surf/eccv06.pdf)
- [Using very deep autoencoders for content-based image retrieval](http://www.cs.toronto.edu/~fritz/absps/esann-deep-final.pdf)
- [Video Google: A Text Retrieval Approach to Object Matching in Videos](http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic03.pdf)

#### End-to-end
- [Large Scale Online Learning of Image Similarity Through Ranking (Triplet loss)](http://www.jmlr.org/papers/volume11/chechik10a/chechik10a.pdf)
- [In Defense of the Triplet Loss for Person Re-Identification](https://arxiv.org/abs/1703.07737)

#### Surveys
- [A survey on Image Retrieval Methods](http://cogprints.org/9815/1/Survey%20on%20Image%20Retrieval%20Methods.pdf)
- [Recent Advance in Content-based ImageRetrieval: A Literature Survey](https://arxiv.org/pdf/1706.06064.pdf)