## Reference Implementation

### E2E Architecture

![use_case_flow](assets/e2e-flow-orig.png)

### Solution Setup

Use the following cell to change to the correct kernel. Then check that you are in the stock kernel. If not, navigate to `Kernel > Change kernel > Python [conda env:stock]`. Note that the cell will remain with * but you can continue running the following cells.

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-stock-py'})

#### Setting up the data

Use the `data/generate_data.py` script to generate synthetic data.

In [None]:
!cd data && python generate_data.py

Once we have data, we can view a few samples.

In [None]:
# inspect generated data
import pandas as pd
train_data = pd.read_csv("data/demand/train.csv")
train_data.tail()

In [None]:
# inspect generated data
import pandas as pd
test_data = pd.read_csv("data/demand/test_full.csv")
test_data.tail()

#### Model Building Process

We first transform the data to the regression format expected and feed this data into our
CNN-LSTM model.  The `run_training.py` script *reads and preprocesses the data*, *trains the model*, and *saves the model* which can be used for future inference.

The script takes the following arguments:

```shell
usage: run_training.py [-h] [-l LOGFILE] [-s SAVE_MODEL_DIR] [-i] [-b BATCH_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  -l LOGFILE, --logfile LOGFILE
                        log file to output benchmarking results to
  -s SAVE_MODEL_DIR, --save_model_dir SAVE_MODEL_DIR
                        directory to save model to
  -i, --intel           use intel configs
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        training batch size
```
As an example of using this to train a model, we can run the following commands from the `/src` directory:

```shell
conda activate demand_stock
python run_training.py --save_model_dir saved_models/stock --batch_size 512
```
which will produce a saved model in Tensorflow Keras format to the `saved_models/stock` directory which can be used in the next step for running inference.

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-stock-py'})

In [None]:
!cd src && python run_training.py --save_model_dir saved_models/stock --batch_size 512

### Running Inference

The above script will train and save models to the `save_model_dir`.  To use this model to make predictions on new data, a 2-step process is necessary to optimize performance.  

1. Convert the saved model from a Keras saved model to a Tensorflow frozen graph.  To do this, we provide a utility script `convert_keras_to_frozen_graph.py` which takes the following arguments:

```shell
usage: convert_keras_to_frozen_graph.py [-h] -s KERAS_SAVED_MODEL_DIR -o OUTPUT_SAVED_DIR

optional arguments:
  -h, --help            show this help message and exit
  -s KERAS_SAVED_MODEL_DIR, --keras_saved_model_dir KERAS_SAVED_MODEL_DIR
                        directory with saved keras model.
  -o OUTPUT_SAVED_DIR, --output_saved_dir OUTPUT_SAVED_DIR
                        directory to save frozen graph to.
```

For the above saved model, we would run the command

```shell
python convert_keras_to_frozen_graph.py -s saved_models/stock -o saved_models/stock
```
which **takes in the saved keras model** and outputs a **frozen graph** in the same directory called `saved_models/stock/saved_frozen_model.pb`.

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-stock-py'})

In [None]:
!cd src && python convert_keras_to_frozen_graph.py -s saved_models/stock -o saved_models/stock

2. Once a saved frozen graph is saved, this model can now be used to perform inference using the `run_inference.py` script which has the following arguments:

```shell
usage: run_inference.py [-h] [-l LOGFILE] [-s SAVED_FROZEN_MODEL] [-b BATCH_SIZE] --input_file INPUT_FILE [--benchmark_mode] [-n NUM_ITERS]

optional arguments:
  -h, --help            show this help message and exit
  -l LOGFILE, --logfile LOGFILE
                        log file to output benchmarking results to
  -s SAVED_FROZEN_MODEL, --saved_frozen_model SAVED_FROZEN_MODEL
                        saved frozen graph.
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        batch size to use
  --input_file INPUT_FILE
  --benchmark_mode      benchmark inference time
  -n NUM_ITERS, --num_iters NUM_ITERS
                        number of iterations to use when benchmarking
```

To run inference on a new data file, included for reference as `../data/demand/test_full.csv`:

```shell
python run_inference.py --input_file ../data/demand/test_full.csv --saved_frozen_model saved_models/stock/saved_frozen_model.pb --batch_size 512
```

which outputs a json representation of the predicted values.

In [None]:
!cd src && python run_inference.py --input_file ../data/demand/test_full.csv --saved_frozen_model saved_models/stock/saved_frozen_model.pb --batch_size 512 --benchmark_mode

## Optimizing the E2E Reference Solution with Intel® oneAPI

#### Model Building Process with Intel® Optimizations

The Intel optimizations are enabled by simply using Tensorflow >= v2.9. The `run_training.py` script can be run with no code changes otherwise. The same training process can be run, optimized with Intel® oneAPI as follows:

```shell
conda activate demand_intel
python run_training.py --save_model_dir saved_models/intel --batch_size 512
```

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-intel-py'})

In [None]:
!cd src && python run_training.py --save_model_dir saved_models/intel --batch_size 512

#### Model Inference with Intel® Optimizations

Similar to model training, the 2 steps of for inference (saving a frozen graph and running inference) is identical with by running the scripts.  Specifically, we can run

```shell
python convert_keras_to_frozen_graph.py -s saved_models/intel -o saved_models/intel
python run_inference.py --input_file ../data/demand/test_full.csv --saved_frozen_model saved_models/intel/saved_frozen_model.pb --batch_size 512
```

on the saved graph from the above line.  On larger sample data set sizes and more complex models, the gains will become more obvious and apparent.

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-intel-py'})

In [None]:
!cd src && python convert_keras_to_frozen_graph.py -s saved_models/intel -o saved_models/intel

In [None]:
!cd src && python run_inference.py --input_file ../data/demand/test_full.csv --saved_frozen_model saved_models/intel/saved_frozen_model.pb --batch_size 512 --benchmark_mode

#### Post Training Optimization with Intel® Neural Compressor

In scenarios where the model or data become very large, such as if there are a huge amount of stores and items, and the model is expanded to capture more complex phenomena, it may be desirable to further optimize the latency and throughput of a model.  For these scenarios, one method can utilize *model quantization* techniques via Intel® Neural Compressor.

Model quantization is the practice of converting the FP32 weights in Deep Neural Networks to a 
lower precision, such as INT8 in order **to accelerate computation time and reduce storage
space of trained models**.  This may be useful if latency and throughput are critical.  Intel® 
offers multiple algorithms and packages for quantizing trained models.  In this reference implementation, we 
include a script, `run_quantize_inc.py` which can be executed *after saving the frozen graph* to attempt to accuracy-aware quantization on the trained model.

The `run_quantize_inc.py` script takes the following arguments:

```shell
usage: run_quantize_inc.py [-h] --saved_frozen_graph SAVED_FROZEN_GRAPH --output_dir OUTPUT_DIR --inc_config_file INC_CONFIG_FILE

optional arguments:
  -h, --help            show this help message and exit
  --saved_frozen_graph SAVED_FROZEN_GRAPH
                        saved pretrained frozen graph to quantize
  --output_dir OUTPUT_DIR
                        directory to save quantized model.
  --inc_config_file INC_CONFIG_FILE
                        INC conf yaml
```

which can be used as follows:

```shell
python run_quantize_inc.py --saved_frozen_graph saved_models/intel/saved_frozen_model.pb --output_dir saved_models/intel --inc_config_file conf.yaml
```

and outputs a quantized model, if successful, to `saved_models/intel/saved_frozen_int8_model.pb`.  This model is typically smaller at a minor cost to accuracy.  In our case, accuracy falls from a RMSE of 7.59 to an RMSE of 7.60.

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-intel-py'})

In [None]:
!cd src && python run_quantize_inc.py --saved_frozen_graph saved_models/intel/saved_frozen_model.pb --output_dir saved_models/intel --inc_config_file conf.yaml

Inference on this newly quantized model can be performed identically as before, pointing the script to the saved quantized graph.

```shell
python run_inference.py --input_file ../data/test --saved_frozen_model saved_models/intel/saved_frozen_int8_model.pb --batch_size 512
```

In [None]:
!cd src && python run_inference.py --input_file ../data/demand/test_full.csv --saved_frozen_model saved_models/intel/saved_frozen_int8_model.pb --batch_size 512 --benchmark_mode

## Performance Experiments

**Experiment**: Model is trained using `batch_size` 64, 128 and 256, and the model is used for inference. 

First, We run training and inference using the stock version.

**Change kernel to Python[conda env:stok]**

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-stock-py'})

We make sure there are no logs.

In [None]:
!rm -rf ./logs

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.run_training_benchmark(intel=False)

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.convert_keras_to_frozen_graph_benchmark(intel = False) 

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.run_inference_benchmark(intel = False) 

Second, we run the raining using Intel® oneAPI Optimizations for Tensorflow to accelerate performance using oneDNN optimizations.

**change kernel to Python[conda env:intel]**

In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-intel-py'})

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.run_training_benchmark(intel=True)

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.convert_keras_to_frozen_graph_benchmark(intel=True) 

Finally, We run inference with Intel® oneAPI Optimizations for Tensorflow/Intel® Neural Compressor (inc).

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.run_inference_benchmark(intel = True) 

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.run_quantize_inc_benchmark()

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.run_inference_quantized_model_benchmark(intel=True)

Now, we can create tables and graphs to ilustrate the performance benefits in training and inference.


In [None]:
%%javascript
Jupyter.notebook.session.restart({kernel_name: 'conda-env-intel-py'})

**Training performance**

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.print_training_benchmark_table()

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.print_training_benchmark_bargraph()

**Inference performance**

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.print_inference_benchmark_table()

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

if not 'workbookDir' in globals():
    import os
    workbookDir = os.getcwd()
    
os.chdir(os.path.join(workbookDir,'src'))
from notebooks.utils import benchmarking_utils
benchmarking_utils.print_inference_benchmark_bargraph()