Skip to content

Commit

Permalink
docs(tutorial): polish image2image (#3843)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexcg1 committed Nov 2, 2021
1 parent 0e6e2b2 commit 5905c85
Showing 1 changed file with 46 additions and 54 deletions.
100 changes: 46 additions & 54 deletions docs/datatype/image/image2image.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,53 @@
# Search Similar Images

Given an example image can we find similar images without the need of any labels? Leveraging Jina, the advantage that we
get is that we do not need to use any labels or textual information about the images in order to build similar image
search.
Given an example image can we find similar images without the need of any labels? Leveraging Jina, we have the advantage that
we don't need to use any labels or textual information about the images in order to build a search for similar images.

In this tutorial we are going to create an image search system that retrieves similar images. The data we are going to
use is the test split of the [Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats/data?select=test1.zip) datasets. We
will subsequently refer this dataset as pets dataset. It contains 12.5K images of cats and dogs. Now, we can define our
problem as selecting an image of cat or dog, we would like to get similar images of cats or dogs respectively.
In this tutorial we are going to create an image search system that retrieves similar images. We are going to
use the test split of the [Dogs vs. Cats](https://www.kaggle.com/c/dogs-vs-cats/data?select=test1.zip) dataset, which we
will subsequently refer to as the pets dataset. It contains 12.5K images of cats and dogs. Now, we can define our
problem as selecting an image of cat or dog, and getting back images of similar cats or dogs respectively.

We know that Jina searches semantically and this could vary with the neural network that we use for encoding. Since our
task is to search similar images we will consider visually similar as semantically-related.
Jina searches semantically, and the results will vary depending on the neural network that we use for image encoding. Our
task is to search for similar images so we will consider visually-similar images as semantically-related.

## Build the Flow

The solution to the problem entails a simple pipeline that can be subdivided into two steps: **Index** and **Query**
The solution uses a simple pipeline that can be subdivided into two steps: **Index** and **Query**

### Index

To search something out of the full data, first we need to index the data. What it means is that we store the embeddings
of all the images from the pets dataset in some form of storage. The images can be read as a numpy array which is then
fed to neural network of our choice. This neural network encodes the input images into some latent space which we call
as embeddings. We then use **Indexer** to store these embeddings in memory.
To search something out of the full dataset, we first need to index the data. This means that we store the embeddings
of all the images from the dataset in some form of storage. The images can be read as a numpy array which is then
fed to the neural network of our choice. This neural network encodes the input images into some latent space which we call
"embeddings". We then use an **Indexer** to store these embeddings in memory.

### Query

Once the data is indexed, i.e. our database is built, we simply need to feed our query which is an image or set of
images to the model to encode it into embeddings and then use the **Indexer** to retrieve matching images. The matching
can be based on any type of metrics but without going deeper into this, we will focus only on euclidean distance between
two embeddings (corresponding to two images) as metrics.
Once the data is indexed, i.e. our database is built, we simply need to feed our query (an image or set of
images) to the model to encode it into embeddings and then use the **Indexer** to retrieve matching images. The matching
can be based on any type of metric but without going deeper into this, we will focus only on Euclidean distance between
two embeddings (corresponding to two images).

Now, one might think what this *Indexer* is, or how to use neural network of our choice. Worry not, we've got you
covered. In Jina AI, we have three fundamental concepts which is all you need to know to follow this tutorial. If you
haven't read it yet, head on to [Jina's docs](https://docs.jina.ai/) page and give it a shot. Executor is the
algorithmic unit in the Jina. It performs a single task on a `Document` or `DocumentArray`.

We have many executors available at [Jina Hub](https://hub.jina.ai) - a marketplace for Executors. You can use any of
them relevant to your tasks or build one of your own. Coming back to problem, we will use **SimpleIndexer** executor as
our indexer (the one that stores and retrieves data). This executor also returns us the matching `Document` when we make
We will use the **SimpleIndexer** Executor as
our indexer (the one that stores and retrieves data). This Executor also returns the matching `Document` when we make
a query. The search part is done using the built-in `match` function of `DocumentArrayMemmap`. To encode the images into
embeddings we will use our own defined executor which uses pre-trained 'ResNet101' model.
embeddings we will use our own Executor which uses the pre-trained 'ResNet101' model.

## Flow Overview

We have one flow defined for this tutorial, however, it handles requests to `/index` and `/search` differently by
defining different endpoints using `requests` decorators. Below we see the Flow, which consists of `Encoder` to encode
the images as first step, followed by an `Indexer` to store/retrieve data.
We have one Flow defined for this tutorial. However, it handles requests to `/index` and `/search` differently by
defining different endpoints using `requests` decorators. Below we see the Flow, which consists of an `Encoder` to encode
the images as the first step, followed by an `Indexer` to store/retrieve data.

```{figure} ../../../.github/images/image_search_flow.svg
:align: center
```

## Insights

Our firs task is to wrap the image data as `Document` and form a `DocumentArray`. This can be easily done by using
following code snippet. `from_files` creates an iterator over a list of image path provided and yields `Document`.
Our first task is to wrap the image data as `Document`s and form a `DocumentArray`. This can be done easily with the
following code snippet. `from_files` creates an iterator over a list of image paths and yields `Document`s:

```python
from jina import DocumentArray
Expand All @@ -63,9 +56,9 @@ from jina.types.document.generators import from_files
docs_array = DocumentArray(from_files(f'{image_dir}/*.{image_format}'))
```

Once the image is loaded our next step is to encode these images into embeddings. As stated earlier one can use
executors from [hub.jina.ai](https://hub.jina.ai) and use them off-the-shelf or can define an executor of their own in
just few steps. For this tutorial we will write our own executor in few lines of codes as shown below.
Once the image is loaded our next step is to encode these images into embeddings. As stated earlier you can use
Executors from [Jina Hub](https://hub.jina.ai) off-the-shelf or you can define an Executor of your own in
just a few steps. For this tutorial we will write our own Executor:

```python
from jina import DocumentArray, Executor, requests
Expand All @@ -85,19 +78,19 @@ class FlashImageEncoder(Executor):
doc.embedding = embed.numpy()
```

As one can see, how simple it is to build an `Encoder` executor. We simply inherit the base `Executor` and use decorator
to define endpoints. As this `request` decorator is empty, it means that this function will be called regardless of the
endpoints invoked, i.e., on both `/index` and `/search` endpoint. We
leverage [lightning-flash](https://github.com/PyTorchLightning/lightning-flash) to use pre-trained `ResNet101` model for
getting the embeddings. Reader can replace this model with any other pre-trained models of their choice. When this
executor is instantiated, the pre-trained weights are downloaded automatically. The `predict` function takes in
the `DocumentArray` and extracts embeddings which is then stored in the `embedding` attribute of the
To build an Encoder Executor we inherit the base `Executor` and use a decorator
to define endpoints. As this `request` decorator is empty, this function will be called regardless of the
endpoints invoked, i.e., on both the `/index` and `/search` endpoints. We
leverage [lightning-flash](https://github.com/PyTorchLightning/lightning-flash) to use the pre-trained `ResNet101` model for
getting the embeddings. You can replace this model with any other pre-trained models of your choice. When this
Executor is instantiated, the pre-trained weights are downloaded automatically. The `predict` function takes in
the `DocumentArray` and extracts embeddings, each of which is then stored in the `embedding` attribute of the
respective `Document`.

Finally, comes the storage/retrieval step. This we accomplish using an **Indexer** executor. We can use any of the
available indexers on [hub.jina.ai](https://hub.jina.ai) or define our own. To create an **Indexer** we need to have two
endpoints `/index` and `/search`. For this tutorial we will define a `SimpleIndexer` which is also available on jina
hub.
Finally, comes the storage/retrieval step. We do this with the **Indexer** Executor. You can use any of the
available indexers on [Jina Hub](https://hub.jina.ai) or define your own. To create an **Indexer** you need to have two
endpoints: `/index` and `/search`. For this tutorial we will define a `SimpleIndexer` which is [also available on jina
Hub](https://hub.jina.ai/executor/zb38xlt4).

```python
from jina import DocumentArrayMemmap, DocumentArray, Executor, requests
Expand All @@ -118,14 +111,13 @@ class SimpleIndexer(Executor):
docs.match(self._dam)
```

`SimpleIndexer` stores all the Documents with a memory map when invoked with a `/index` endpoint. During the search
flow, it matches the query `Document` with the indexed `Document` using the built-in `match` function
`SimpleIndexer` stores all the Documents with a memory map when invoked with the `/index` endpoint. During the search
Flow, it matches the query `Document` with the indexed `Document` using the built-in `match` function
of `DocumentArrayMemmap`.

## Putting it all together in a Flow

So far we saw individual components of the Flow and how to define them. Next comes putting all this together in a Flow,
which can be done as shown below
So far we saw individual components of the Flow and how to define them. Next comes putting all of this together in a Flow:

```python
from jina import Flow
Expand All @@ -147,7 +139,7 @@ with f:

### Query from Python

Keeping the server running we can start a simple client to make query.
Keeping the server running we can start a simple client to make a query:

```python
from jina import Client, Document
Expand All @@ -165,9 +157,9 @@ c.post('/search', Document(uri='path/to/an/image/'), on_done=print_matches)

## Results

The returned response contains the matching `Document` which contains the `uri` of the images. Below we can see the
returned matching images to the query
The returned response contains the matching `Document` which in turn contains the `uri` of the images. Below we can see the
returned matching images of the query:

```{figure} image-search.png
:align: center
```
```

0 comments on commit 5905c85

Please sign in to comment.