## Flow Optimization

Flow Optimization gets the most out of your data.
It allows hyper parameter optimization on a complete search Flow, including indexing and querying.
For example, choosing a middle layer of a model often results in richer semantic embeddings.
Let's test through all layers of a model.

### Setup

Before we start, we need to install the needed dependencies.

In [None]:
%%bash
pip install jina[optimizer]

### Imports

First, let's get all needed imports.

In [None]:
import numpy as np
from jina import Document
from jina.executors.encoders import BaseEncoder
from jina.optimizers import FlowOptimizer, MeanEvaluationCallback
from jina.optimizers.flow_runner import SingleFlowRunner


### Flow definition

For simplicity the Flow consists of two parts: An Encoder and an Evaluator.
The `SimpleEncoder` attaches an embedding to each given Document.
The `EuclideanEvaluator` scores the embedding agains a given groundtruth.

`ENCODER_LAYER` allows the optimizer to change the Encoder configuration with each iteration.
Beware, that the Pod definition is done via the inline syntax of Jina.

In [None]:
flow = '''jtype: Flow
version: '1'
pods:
  - uses:
      jtype: SimpleEncoder
      with:
        layer: ${{JINA_ENCODER_LAYER}}
  - uses: EuclideanEvaluator
'''

### Encoder Definition

Now we will fake a model with three layers.
For simplicity each layer only consists of a single integer which is taken as the embedding.


In [None]:
class SimpleEncoder(BaseEncoder):

    ENCODE_LOOKUP = {
        'üê≤': [1, 3, 5],
        'üê¶': [2, 4, 7],
        'üê¢': [0, 2, 5],
    }

    def __init__(self, layer=0, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._layer = layer

    def encode(self, data, *args, **kwargs) -> 'np.ndarray':
        return np.array([[self.ENCODE_LOOKUP[data[0]][self._layer]]])


### Parameter definition

We are loading the parameter from the `parameter.yml` file.
Let's create it.


In [None]:
with open('parameter.yml', 'w') as param_file:
    param_file.write('''- !IntegerParameter
  jaml_variable: JINA_ENCODER_LAYER
  high: 2
  low: 0
  step_size: 1
''')

### Defining rerunnable Flows

For optimization, we need to run almost equal Flows again and again with the same data.
This is realized with a `SingleFlowRunner`.

The same Documents are used for each Flow Optimization step.
`documents` consists of `document, groundtruth` pairs.
The given embedding represents the perfect semantic embedding.

In [None]:
documents = [
    (Document(content='üê≤'), Document(embedding=np.array([2]))),
    (Document(content='üê¶'), Document(embedding=np.array([3]))),
    (Document(content='üê¢'), Document(embedding=np.array([3])))
]

runner = SingleFlowRunner(
    flow, documents, 1, 'search', overwrite_workspace=True
)


### Run the Optimization

Now we are ready to run the Optimization.
The `MeanEvaluationCallback` gathers the evaluations from all three sended Documents per run.
After each run, it returns the mean of the single evaluations.

In [None]:
optimizer = FlowOptimizer(
    flow_runner=runner,
    parameter_yaml='parameter.yml',
    evaluation_callback=MeanEvaluationCallback(),
    n_trials=3,
    direction='minimize',
    seed=1
)

optimizer.optimize_flow()
