Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs snippets optimizer #2308

Merged
merged 4 commits into from
Apr 17, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
131 changes: 127 additions & 4 deletions .github/pages/snippets.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ These code snippets provide a short introduction to Jina's functionality and des
| --- |---|
| 🥚 | [CRUD Functions](#crud-functions) • [Document](#document) • [Flow](#flow) |
| 🐣 | [Feed Data](#feed-data) • [Fetch Result](#fetch-result) • [Add Logic](#add-logic) • [Inter & Intra Parallelism](#inter--intra-parallelism) • [Decentralize](#decentralized-flow) • [Asynchronous](#asynchronous-flow) |
| 🐥 | [Customize Encoder](#customize-encoder) • [Test Encoder](#test-encoder-in-flow) • [Parallelism & Batching](#parallelism--batching) • [Add Data Indexer](#add-data-indexer) • [Compose Flow from YAML](#compose-flow-from-yaml) • [Search](#search) • [Evaluation](#evaluation) • [REST Interface](#rest-interface) |
| 🐥 | [Customize Encoder](#customize-encoder) • [Test Encoder](#test-encoder-in-flow) • [Parallelism & Batching](#parallelism--batching) • [Add Data Indexer](#add-data-indexer) • [Compose Flow from YAML](#compose-flow-from-yaml) • [Search](#search) • [Evaluation](#evaluation) • [Flow Optimization](#flow-optimization) • [REST Interface](#rest-interface) |

## 🥚 Fundamentals

Expand Down Expand Up @@ -100,7 +100,7 @@ with f:
</tr>
</table>

For further details about CRUD functionality, checkout [docs.jina.ai.](https://docs.jina.ai/chapters/crud/)
For further details about CRUD functionality, checkout [docs.jina.ai.](https://docs.jina.ai/chapters/crud/)


### Document
Expand Down Expand Up @@ -230,7 +230,7 @@ Get the vibe? Now we're talking! Let's learn more about the basic concepts and f
| --- |---|
| 🥚 | [CRUD Functions](#crud-functions) • [Document](#document) • [Flow](#flow) |
| 🐣 | [Feed Data](#feed-data) • [Fetch Result](#fetch-result) • [Add Logic](#add-logic) • [Inter & Intra Parallelism](#inter--intra-parallelism) • [Decentralize](#decentralized-flow) • [Asynchronous](#asynchronous-flow) |
| 🐥 | [Customize Encoder](#customize-encoder) • [Test Encoder](#test-encoder-in-flow) • [Parallelism & Batching](#parallelism--batching) • [Add Data Indexer](#add-data-indexer) • [Compose Flow from YAML](#compose-flow-from-yaml) • [Search](#search) • [Evaluation](#evaluation) • [REST Interface](#rest-interface) |
| 🐥 | [Customize Encoder](#customize-encoder) • [Test Encoder](#test-encoder-in-flow) • [Parallelism & Batching](#parallelism--batching) • [Add Data Indexer](#add-data-indexer) • [Compose Flow from YAML](#compose-flow-from-yaml) • [Search](#search) • [Evaluation](#evaluation) • [Flow Optimization](#flow-optimization) • [REST Interface](#rest-interface) |


## 🐣 Basic
Expand Down Expand Up @@ -509,7 +509,7 @@ That's all you need to know for understanding the magic behind `hello-world`. No
| --- |---|
| 🥚 | [CRUD Functions](#crud-functions) • [Document](#document) • [Flow](#flow) |
| 🐣 | [Feed Data](#feed-data) • [Fetch Result](#fetch-result) • [Add Logic](#add-logic) • [Inter & Intra Parallelism](#inter--intra-parallelism) • [Decentralize](#decentralized-flow) • [Asynchronous](#asynchronous-flow) |
| 🐥 | [Customize Encoder](#customize-encoder) • [Test Encoder](#test-encoder-in-flow) • [Parallelism & Batching](#parallelism--batching) • [Add Data Indexer](#add-data-indexer) • [Compose Flow from YAML](#compose-flow-from-yaml) • [Search](#search) • [Evaluation](#evaluation) • [REST Interface](#rest-interface) |
| 🐥 | [Customize Encoder](#customize-encoder) • [Test Encoder](#test-encoder-in-flow) • [Parallelism & Batching](#parallelism--batching) • [Add Data Indexer](#add-data-indexer) • [Compose Flow from YAML](#compose-flow-from-yaml) • [Search](#search) • [Evaluation](#evaluation) • [Flow Optimization](#flow-optimization) • [REST Interface](#rest-interface) |

## 🐥 Breakdown of `hello-world`

Expand Down Expand Up @@ -675,6 +675,129 @@ f.search(query_iterator, ...)
```


#### Flow Optimization
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/basic-optimizer/basic-optimizer.ipynb)


Flow Optimization gets the most out of your data.
It allows hyper parameter optimization on a complete search Flow, including indexing and querying.
For example, choosing a middle layer of a model often results in richer semantic embeddings.
Let's test through all layers of a model.

Before starting, we need the optimizer requirements installed:

```bash
pip install jina[optimizer]
```

First, let's get all needed imports and the Flow definition:

```python
import numpy as np
from jina import Document
from jina.executors.encoders import BaseEncoder
from jina.optimizers import FlowOptimizer, MeanEvaluationCallback
from jina.optimizers.flow_runner import SingleFlowRunner

flow = '''jtype: Flow
version: '1'
pods:
- uses:
jtype: SimpleEncoder
with:
layer: ${{JINA_ENCODER_LAYER}}
- uses: EuclideanEvaluator
'''
```

`ENCODER_LAYER` allows the optimizer to change the Encoder configuration with each iteration.
The `EuclideanEvaluator` scores the Documents according to a given groundtruth.
Beware, that the Pod definition is done via the inline syntax of Jina.

Now we will fake a model with three layers.
For simplicity each layer only consists of a single integer which is taken as the embedding.

```python
class SimpleEncoder(BaseEncoder):

ENCODE_LOOKUP = {
'🐲': [1, 3, 5],
'🐦': [2, 4, 7],
'🐢': [0, 2, 5],
}

def __init__(self, layer=0, *args, **kwargs):
super().__init__(*args, **kwargs)
self._layer = layer

def encode(self, data, *args, **kwargs) -> 'np.ndarray':
return np.array([[self.ENCODE_LOOKUP[data[0]][self._layer]]])
```

Futhermore, we define what should be the optimization parameters in `parameter.yml`.

```yaml
- !IntegerParameter
jaml_variable: JINA_ENCODER_LAYER
high: 2
low: 0
step_size: 1
```

For optimization, we need to run almost equal Flows again and again with the same data.
This is realized with a `SingleFlowRunner`.

```python
documents = [
(Document(content='🐲'), Document(embedding=np.array([2]))),
(Document(content='🐦'), Document(embedding=np.array([3]))),
(Document(content='🐢'), Document(embedding=np.array([3])))
]

runner = SingleFlowRunner(
flow, documents, 1, 'search', overwrite_workspace=True
)
```

The same Documents are used for each Flow Optimization step.
`documents` consists of `document, groundtruth` pairs.
The given embedding represents the perfect semantic embedding.

Now we are ready to start the optimization:

```python
optimizer = FlowOptimizer(
flow_runner=runner,
parameter_yaml='parameter.yml',
evaluation_callback=MeanEvaluationCallback(),
n_trials=3,
direction='minimize',
seed=1
)

optimizer.optimize_flow()
```

The `MeanEvaluationCallback` gathers the evaluations from all three sended Documents per run.
After each run, it returns the mean of the single evaluations.

Finally...

```text
...
JINA@15892[I] Trial 2 finished with value: 1.6666666666666667
and parameters: {'JINA_ENCODER_LAYER': 0}.
Best is trial 0 with value: 1.0.
JINA@15892[I]:Number of finished trials: 3
JINA@15892[I]:Best trial: {'JINA_ENCODER_LAYER': 1}
JINA@15892[I]:Time to finish: 0:00:02.081710

```

Tada! The layer 1 is the best one.

For a more detailed guide please read [our docs](https://docs.jina.ai/chapters/optimization/?highlight=optimization).

### REST Interface

In practice, the query Flow and the client (i.e. data sender) are often physically separated. Moreover, the client may prefer to use a REST API rather than gRPC when querying. You can set `port_expose` to a public port and turn on [REST support](https://api.jina.ai/rest/) with `restful=True`:
Expand Down