# Example 2: Preprocessing, Postprocessing, and Model Scripts

In this example, you will learn how to define custom preprocessing, postprocessing, and model inference scripts that will be used in Open Seismic. These custom scripts will be useful for you if you want to use a model that does not exist within Open Seismic, but you would still like to use our inference pipeline powered by OpenVINO.

### Sections
2.1 **Overview of Inference Tasks in Open Seismice** <br/>
2.2 **Defining Pre, Post, and Model Inference Scripts**  <br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.2.1 **Regular, Coarse, and Fine Cube Inference**<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2.2.2 **Section Inference**<br/>
2.3 **Defining Conversion Scripts**<br/>

**Note:** This example does not have cells to run. Instead, the goal of this example is to go over the structure of the custom scripts. In the next example, we will use the scripts defined here in Open Seismic.

## 2.1 Specifying Preprocessing Scripts

There are four main inference tasks that Open Seismic supports:
1. **Regular Inference:** This is an all purpose inference task that loops through your input data.
2. **Coarse Cube Inference:** This task is mainly used for fault segmentation. Specifically, a small mini cube of a much larger 3D seismic data is used as input, and a small mini cube of the same size is expected as output from the network. This cube is stored appropriately (a 3D coordinate is matched with its network output) in an output cube that is of the same size as the larger input. 
3. **Fine Cube Inference:** This task is mainly used for salt identification. Specifically, a small mini cube of a much larger 3D seismic data is used as input, and a scalar value is expected as output from the network. This scalar value is stored within a scaled down cube, and optionally, this can be interpolated to match the larger input size.
4. **Section Inference:** This task is mainly used for facies classification. Specifically, a small 2D section of a much larger slice of seismic data is used as input, and a small 2D mask of the same size is expected as output from the network. The small section is stored appropriately in an output mask that is of the same size as the larger slice. Section input size will be maximized to do inference on the largest ingestible area since larger section size has been shown to produce more accurate results.

Furthermore, there are synchronous and asynchronous modes for each inference task. However, we expect that most use cases will involve asynchronous inference. 

Next, we will do a break down of preprocessing, postprocessing, and model inference scripts. This level of granularity is needed since these tasks are fundamentally different.

## Section 2.2: Defining Pre, Post, and Model Inference Scripts

Below is a breakdown of the expected definitions for each inference task (regular, coarse, and fine cube inference tasks were combined since their usage of scripts is the same).

In general, this is what each script should include:
1. The preprocessing script `preprocessor.py` should at least define one function called `preprocess`.
2. The postprocessing script `postprocessor.py` should at least define one function called `postprocess`.
3. The model script `model.py` should at least define one class called `model`.

### Subsection 2.2.1: Regular, Coarse, and Fine Cube Inference

**Preprocessing:** `preprocessor.py` should define `preprocess` as follows:
```
def preprocess(data, input_layers, input_shape=(...), model=None):
    ...
    return {input_layer_1: data_1, ..., input_layer_n: data_n}
```
The inputs are defined as follows:
1. `data`: Input data passed from the inference task. 
2. `input_layers`: A list of the input layers that exist within the model.
3. `input_shape`: A tuple specifying the shape of the input or a list of tuples specifying shapes of the input layers. You can match input shape with the input layer by index of `input_layers`.
4. `model`: The model object defined in `model.py`. 

**Postprocessing:** `postprocessor.py` should define `postprocess` as follows:
```
def postprocess(output_dict, output_shape=(...)):
    ...
    return {output_layer_1: data_1, ..., output_layer_n: data_n}
```
The inputs are defined as follows:
1. `output_dict`: An output dictionary mapping output layer name to associated output data. In reality, this should only be a dictionary of one output layer key. 
2. `output_shape`: A tuple specifying the shape of the output.

**Model:** `model.py` should define the model class with the following methods:
```
class model(object):
    def __init__(self, xml_path, bin_path, requests=1, input_shape=(...)):
        # Initialize model. Must include the following attritubes:
        self.ie = IECore()
        self.requests = requests
        self.read_net = self.ie.read_network(
            model=xml_path, weights=bin_path)
        self.exec_net = self.ie.load_network(
            network=self.read_net, device_name="...", 
            num_requests=requests)
        self.name = "..."
        ...
        
        # Include a warmth session for optimal inference
        self.exec_net.requests[0].infer()
```

Here are the methods for synchronous inference:
```
    # Continuing model definition...
    def infer(self, input_dict, flexble_infer=False):
        # Use flexible infer condition for varying-shape inference
        ...
        return output_dict, latency
    
    def reshape_input(self, shapes):
        # Reshape input layer(s) to specific shape(s)
        ...
```

Here are the getter methods for general usage:
```
    # Continuing model definition...
    def get_input_shape(self):
        # May be list instead of tuple for multiple input layers
        ...
        return input_shape
        
    def get_inputs(self):
        # Return list of input layer names
        ...
        return input_layer_names
        
    def get_outputs(self):
        # Return list of output layer names
        ...
        return output_layer_names
```

Finally, here are the methods for asynchronous inference:
```
    # Continuing model definition...
    def get_requests(self):
        # Return requests from exec net
        return self.exec_net.requests # or self.requests
        
    def get_idle_request_id(self):
        # Return idle request
        return self.exec_net.get_idle_request_id()
```

### Section 2.2.2: Section Inference

This section mentions the subtle differences between the other inference tasks and the section inference task. 

**Preprocessing:** `preprocessor.py` should define `preprocess` as follows:
```
def preprocess(data, input_layers, model, input_shape=(...)):
    ...
    return {input_layer_1: data_1, ..., input_layer_n: data_n}
```
The inputs are defined as follows:
1. `data`: Input data passed from the inference task. 
2. `input_layers`: A list of the input layers that exist within the model.
3. `input_shape`: A tuple specifying the shape of the input or a list of tuples specifying shapes of the input layers. You can match input shape with the input layer by index of `input_layers`.
4. `model`: The model object defined in `model.py`. 

Notice that the only difference is that the `model` parameter must not be `None`.

**Postprocessing:** `postprocessor.py` should define `postprocess` as follows:
```
def postprocess(output_dict, output_shape):
    ...
    return {output_layer_1: data_1, ..., output_layer_n: data_n}
```
The inputs are defined as follows:
1. `output_dict`: An output dictionary mapping output layer name to associated output data. In reality, this should only be a dictionary of one output layer key. 
2. `output_shape`: A tuple specifying the shape of the output.

Notice that the only difference is that the `output_shape` parameter must not be `None`.

## Section 2.3: Defining Conversion Scripts

In this section, we will show you how to define your conversion scripts. Recall in Example 1 where we learned how to convert models to popular frameworks like Tensorflow or ONNX and convert them using OpenVINO's model optimizer. Instead of doing these steps separately, Open Seismic has provided a pipeline for conversion and optimization before running inference. However, we need to learn how to define those conversion scripts.

In our system, we ask that our conversion scripts include:
1. An .sh file for calling a Python conversion script with appropriate parameters
2. A Python conversion script that will convert the original model to a popular framework equivalent

### Conversion.sh Script
Use below's code snippet as your conversion.sh script:
```
#!/bin/bash

for ARGUMENT in "$@"
do

    KEY=$(echo $ARGUMENT | cut -f1 -d=)
    VALUE=$(echo $ARGUMENT | cut -f2 -d=)

    case "$KEY" in
            *) ARGS="$ARGS $KEY" ;;
    esac
done

python3 $PWD/path/to/convert.py $ARGS
```

The only thing you need to edit is the path to the conversion Python script. Also note that this path must be from the perspective of your mounted volume when using Open Seismic's Docker container. Finally, the arguments for the Python conversion script will be read via the .sh for loop. You can control the arguments passed to the Python conversion script via the JSON config that you will also need to specify. JSON configuration and other usage topics surrounding Open Seismic's Docker container is covered in Example 3!

### Conversion.py Script

The Python conversion script needs to read arguments from the commandline, so the argparse library is useful in this scenario. It might also be helpful to specify one of the arguments as the path to the converted graph written to disk. This can make writing the JSON config file easier, since you can specify the output path as the input path for the model optimizer to look at.

## Summary

Congratulations! You have finished Example 2. This example was mainly to prepare you for Example 3 where we will put this knowledge into action and use the defined scripts in Open Seismic. Here is what you have learned in this example:
1. The purpose of each inference task.
2. The signatures, return values, and purposes for each required function and class.

If you need an example of the scripts talked about in this example, please go to the directory `examples/assets/example3/assets/example3_assets`. Look in sub-directories `example3_optimization` for conversion scripts, and look in `example3_scripts` for preprocessing, postprocessing, and model scripts.