Skip to content

Commit

Permalink
Docs snippets optimizer (#2308)
Browse files Browse the repository at this point in the history
* docs: added optimizer snippet

* docs: reformatting

* docs: cleanup optimization

* docs: fix links
  • Loading branch information
maximilianwerk committed Apr 17, 2021
1 parent dbe5a59 commit 05bd0f1
Showing 1 changed file with 127 additions and 4 deletions.
131 changes: 127 additions & 4 deletions .github/pages/snippets.md
Expand Up @@ -6,7 +6,7 @@ These code snippets provide a short introduction to Jina's functionality and des
| --- |---|
| 🥚 | [CRUD Functions](#crud-functions)[Document](#document)[Flow](#flow) |
| 🐣 | [Feed Data](#feed-data)[Fetch Result](#fetch-result)[Add Logic](#add-logic)[Inter & Intra Parallelism](#inter--intra-parallelism)[Decentralize](#decentralized-flow)[Asynchronous](#asynchronous-flow) |
| 🐥 | [Customize Encoder](#customize-encoder)[Test Encoder](#test-encoder-in-flow)[Parallelism & Batching](#parallelism--batching)[Add Data Indexer](#add-data-indexer)[Compose Flow from YAML](#compose-flow-from-yaml)[Search](#search)[Evaluation](#evaluation)[REST Interface](#rest-interface) |
| 🐥 | [Customize Encoder](#customize-encoder)[Test Encoder](#test-encoder-in-flow)[Parallelism & Batching](#parallelism--batching)[Add Data Indexer](#add-data-indexer)[Compose Flow from YAML](#compose-flow-from-yaml)[Search](#search)[Evaluation](#evaluation)[Flow Optimization](#flow-optimization)[REST Interface](#rest-interface) |

## 🥚 Fundamentals

Expand Down Expand Up @@ -100,7 +100,7 @@ with f:
</tr>
</table>

For further details about CRUD functionality, checkout [docs.jina.ai.](https://docs.jina.ai/chapters/crud/)
For further details about CRUD functionality, checkout [docs.jina.ai.](https://docs.jina.ai/chapters/crud/)


### Document
Expand Down Expand Up @@ -230,7 +230,7 @@ Get the vibe? Now we're talking! Let's learn more about the basic concepts and f
| --- |---|
| 🥚 | [CRUD Functions](#crud-functions)[Document](#document)[Flow](#flow) |
| 🐣 | [Feed Data](#feed-data)[Fetch Result](#fetch-result)[Add Logic](#add-logic)[Inter & Intra Parallelism](#inter--intra-parallelism)[Decentralize](#decentralized-flow)[Asynchronous](#asynchronous-flow) |
| 🐥 | [Customize Encoder](#customize-encoder)[Test Encoder](#test-encoder-in-flow)[Parallelism & Batching](#parallelism--batching)[Add Data Indexer](#add-data-indexer)[Compose Flow from YAML](#compose-flow-from-yaml)[Search](#search)[Evaluation](#evaluation)[REST Interface](#rest-interface) |
| 🐥 | [Customize Encoder](#customize-encoder)[Test Encoder](#test-encoder-in-flow)[Parallelism & Batching](#parallelism--batching)[Add Data Indexer](#add-data-indexer)[Compose Flow from YAML](#compose-flow-from-yaml)[Search](#search)[Evaluation](#evaluation)[Flow Optimization](#flow-optimization)[REST Interface](#rest-interface) |


## 🐣 Basic
Expand Down Expand Up @@ -509,7 +509,7 @@ That's all you need to know for understanding the magic behind `hello-world`. No
| --- |---|
| 🥚 | [CRUD Functions](#crud-functions)[Document](#document)[Flow](#flow) |
| 🐣 | [Feed Data](#feed-data)[Fetch Result](#fetch-result)[Add Logic](#add-logic)[Inter & Intra Parallelism](#inter--intra-parallelism)[Decentralize](#decentralized-flow)[Asynchronous](#asynchronous-flow) |
| 🐥 | [Customize Encoder](#customize-encoder)[Test Encoder](#test-encoder-in-flow)[Parallelism & Batching](#parallelism--batching)[Add Data Indexer](#add-data-indexer)[Compose Flow from YAML](#compose-flow-from-yaml)[Search](#search)[Evaluation](#evaluation)[REST Interface](#rest-interface) |
| 🐥 | [Customize Encoder](#customize-encoder)[Test Encoder](#test-encoder-in-flow)[Parallelism & Batching](#parallelism--batching)[Add Data Indexer](#add-data-indexer)[Compose Flow from YAML](#compose-flow-from-yaml)[Search](#search)[Evaluation](#evaluation)[Flow Optimization](#flow-optimization)[REST Interface](#rest-interface) |

## 🐥 Breakdown of `hello-world`

Expand Down Expand Up @@ -675,6 +675,129 @@ f.search(query_iterator, ...)
```


#### Flow Optimization
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/basic-optimizer/basic-optimizer.ipynb)


Flow Optimization gets the most out of your data.
It allows hyper parameter optimization on a complete search Flow, including indexing and querying.
For example, choosing a middle layer of a model often results in richer semantic embeddings.
Let's test through all layers of a model.

Before starting, we need the optimizer requirements installed:

```bash
pip install jina[optimizer]
```

First, let's get all needed imports and the Flow definition:

```python
import numpy as np
from jina import Document
from jina.executors.encoders import BaseEncoder
from jina.optimizers import FlowOptimizer, MeanEvaluationCallback
from jina.optimizers.flow_runner import SingleFlowRunner

flow = '''jtype: Flow
version: '1'
pods:
- uses:
jtype: SimpleEncoder
with:
layer: ${{JINA_ENCODER_LAYER}}
- uses: EuclideanEvaluator
'''
```

`ENCODER_LAYER` allows the optimizer to change the Encoder configuration with each iteration.
The `EuclideanEvaluator` scores the Documents according to a given groundtruth.
Beware, that the Pod definition is done via the inline syntax of Jina.

Now we will fake a model with three layers.
For simplicity each layer only consists of a single integer which is taken as the embedding.

```python
class SimpleEncoder(BaseEncoder):

ENCODE_LOOKUP = {
'🐲': [1, 3, 5],
'🐦': [2, 4, 7],
'🐢': [0, 2, 5],
}

def __init__(self, layer=0, *args, **kwargs):
super().__init__(*args, **kwargs)
self._layer = layer

def encode(self, data, *args, **kwargs) -> 'np.ndarray':
return np.array([[self.ENCODE_LOOKUP[data[0]][self._layer]]])
```

Futhermore, we define what should be the optimization parameters in `parameter.yml`.

```yaml
- !IntegerParameter
jaml_variable: JINA_ENCODER_LAYER
high: 2
low: 0
step_size: 1
```

For optimization, we need to run almost equal Flows again and again with the same data.
This is realized with a `SingleFlowRunner`.

```python
documents = [
(Document(content='🐲'), Document(embedding=np.array([2]))),
(Document(content='🐦'), Document(embedding=np.array([3]))),
(Document(content='🐢'), Document(embedding=np.array([3])))
]

runner = SingleFlowRunner(
flow, documents, 1, 'search', overwrite_workspace=True
)
```

The same Documents are used for each Flow Optimization step.
`documents` consists of `document, groundtruth` pairs.
The given embedding represents the perfect semantic embedding.

Now we are ready to start the optimization:

```python
optimizer = FlowOptimizer(
flow_runner=runner,
parameter_yaml='parameter.yml',
evaluation_callback=MeanEvaluationCallback(),
n_trials=3,
direction='minimize',
seed=1
)

optimizer.optimize_flow()
```

The `MeanEvaluationCallback` gathers the evaluations from all three sended Documents per run.
After each run, it returns the mean of the single evaluations.

Finally...

```text
...
JINA@15892[I] Trial 2 finished with value: 1.6666666666666667
and parameters: {'JINA_ENCODER_LAYER': 0}.
Best is trial 0 with value: 1.0.
JINA@15892[I]:Number of finished trials: 3
JINA@15892[I]:Best trial: {'JINA_ENCODER_LAYER': 1}
JINA@15892[I]:Time to finish: 0:00:02.081710
```

Tada! The layer 1 is the best one.

For a more detailed guide please read [our docs](https://docs.jina.ai/chapters/optimization/?highlight=optimization).

### REST Interface

In practice, the query Flow and the client (i.e. data sender) are often physically separated. Moreover, the client may prefer to use a REST API rather than gRPC when querying. You can set `port_expose` to a public port and turn on [REST support](https://api.jina.ai/rest/) with `restful=True`:
Expand Down

0 comments on commit 05bd0f1

Please sign in to comment.