## **Reference Implementation**
### ***E2E Architecture***
### **Use Case E2E flow**

<img src="assets/e2e_flow.png" width="500" height="500">

### ***Solution setup***
Check that you are in the `ner_stock` kernel. If not, navigate to `Kernel > Change kernel > Python [conda env:ner_stock]` \
After activating the environment for stock Tensorflow framework, make sure the oneDNN flag is disabled by running the below cell to set the flag and verify it

In [2]:
%%bash
export TF_ENABLE_ONEDNN_OPTS=0
echo $TF_ENABLE_ONEDNN_OPTS

0


### ***Solution implementation***
#### **Model building process**
The cell below will execute a python script to initiate training in the kernel enabled above

The script will run the benchmarks for the passed parameters and displays the corresponding  training time in seconds. The details of the script and parameters are as follows:

```shell
python src/run_modeltraining.py --batchsize <batchsize value> --dataset_file <dataset filename> -i <intel/stock> --save_model_path <save file path>

```

  Arguments:<br>
```
  --help                   show this help message and exit
  --batchsize              Give the required batch sizes
  --dataset_file           Give the name of dataset file
  --i                      0 for stock, 1 for intel environment
  --save_model_path        Give the directory path to save the model after the training
```
<b>Note:</b> 
1) The dataset file and save_model_path parameters are mandatory to be given, remaining parameters if not given will take the default values
2) --help option will give the details of the arguments

In [None]:
%run src/run_modeltraining.py --batch_size 128 --dataset_file "./data/ner_dataset.csv" --intel 0 --save_model_path "./models/trainedmodels/" | tee stock_training_log.txt

### **Expected Output**
Example of expected output stored in the log.txt file

In [1]:
%cat stock_training_log.txt

# of sentences:  25001
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 128)]        0           []                               
                                                                                                  
 input_2 (InputLayer)           [(None, 128)]        0           []                               
                                                                                                  
 tf_bert_model (TFBertModel)    TFBaseModelOutputWi  109482240   ['input_1[0][0]',                
                                thPoolingAndCrossAt               'input_2[0][0]']                
                                tentions(last_hidde                                               
                                n_state=(None, 128,         

### Model Inference or Predictions
The cell below will execute a python script to initiative inference in the kernel enabled above

The script will run the benchmarks for the passed parameters and  displays the corresponding inference time in seconds. The details of the script and parameters are as follows:

```shell
python src/run_inference.py --batchsize <batchsize value> --dataset_file <dataset filename> -i <intel/stock> --model_path <model file path>

```

  Arguments:<br>
  
```
  --help                   show this help message and exit
  --batchsize              Give the required batch sizes
  --dataset_file           Give the name of test dataset file
  --intel                  0 for stock, 1 for intel environment
  --model_path             Give the directory path the trained model
```

<b>Note:</b>
1) All the options above are optional expect for test dataset file and model_path, if not given will take the default values
2) --help option will give the details of the arguments<br>


In [None]:
%run src/run_inference.py --batch_size 128 --dataset_file "./data/ner_test_dataset.csv" --intel 0 --model_path "./models/trainedmodels/stock/model_b128/model_checkpoint" | tee stock_inference_log.txt

### **Expected Output**
Example of expected output stored in the log.txt file

In [2]:
%cat stock_inference_log.txt

# of sentences:  2500
Testing dataset size: 2500
Inference Time:  1.678523302078247
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       104
           2       1.00      1.00      1.00         2
           3       0.00      0.00      0.00         1
          16       0.95      0.95      0.95        21

    accuracy                           0.98       128
   macro avg       0.74      0.74      0.74       128
weighted avg       0.98      0.98      0.98       128

Inference Time:  0.12709403038024902
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       104
           2       1.00      1.00      1.00         2
           3       0.00      0.00      0.00         1
          16       0.95      0.95      0.95        21

    accuracy                           0.98       128
   macro avg       0.74      0.74      0.74       128
weighted avg       0.98      0.98      0

## **Optimizing the E2E solution with Intel Optimizations for Tensorflow**
### ***E2E Architecture***
### **Use Case E2E flow**
<img src="assets/e2e_flow_optimized.png" width="500" height="500">

### ***Solution setup***
Check that you are in the `ner_intel` kernel. If not, navigate to `Kernel > Change kernel > Python [conda env:ner_intel]` \
After activating the environment for stock Tensorflow framework, make sure the oneDNN flag is disabled by running the below cell to set the flag and verify it

In [4]:
%%bash
export TF_ENABLE_ONEDNN_OPTS=1
echo $TF_ENABLE_ONEDNN_OPTS

1


### ***Solution implementation***
#### **Model building process with Intel® optimizations**
The python script run_modeltraining.py used for training on stock version is used for training the model on Intel environment also as per example below.

In [None]:
%run src/run_modeltraining.py --batch_size 128 --dataset_file "./data/ner_dataset.csv" --intel 1 --save_model_path "./models/trainedmodels/" | tee intel_training_log.txt

### **Expected Output**
Example of expected output stored in the log.txt file

In [1]:
%cat intel_training_log.txt

# of sentences:  25001
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 128)]        0           []                               
                                                                                                  
 input_2 (InputLayer)           [(None, 128)]        0           []                               
                                                                                                  
 tf_bert_model (TFBertModel)    TFBaseModelOutputWi  109482240   ['input_1[0][0]',                
                                thPoolingAndCrossAt               'input_2[0][0]']                
                                tentions(last_hidde                                               
                                n_state=(None, 128,         

#### **Model Inference process with Intel® optimizations**
The python script run_inference.py used to obtain inference benchmarks for stock version is used for getting inference benchmarks for Intel environment also as per example below.

In [None]:
%run src/run_inference.py --batch_size 128 --dataset_file "./data/ner_test_dataset.csv" --intel 1 --model_path "./models/trainedmodels/intel/model_b128/model_checkpoint" | tee intel_inference_log.txt

### **Expected Output**
Example of expected output stored in the log.txt file

In [2]:
%cat intel_inference_log.txt

# of sentences:  2500
Testing dataset size: 2500
Inference Time:  1.862060785293579
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       104
           2       1.00      1.00      1.00         2
           3       0.00      0.00      0.00         1
          16       0.95      1.00      0.98        21

    accuracy                           0.99       128
   macro avg       0.74      0.75      0.74       128
weighted avg       0.98      0.99      0.99       128

Inference Time:  0.12018132209777832
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       104
           2       1.00      1.00      1.00         2
           3       0.00      0.00      0.00         1
          16       0.95      1.00      0.98        21

    accuracy                           0.99       128
   macro avg       0.74      0.75      0.74       128
weighted avg       0.98      0.99      0

#### **Model Conversion process with Intel Neural Compressor**
Intel® Neural Compressor is used to quantize the FP32 Model to the INT8 Model. 
Optimized model is used here for evaluating and timing Analysis. 
Intel® Neural Compressor supports many optimization methods. 
In this case, we have used post training quantization with default quantization method to quantize the FP32 model.

Before performing the quantization of the trained model, the model is converted to frozen graph format using the <i>run_create_frozen_graph.py</i>
python script. The usage of this script to generate the frozen graph is as follows:

```
python src/run_create_frozen_graph.py --model_path <trained model file path> --save_model_path <path to save the model frozen graph>
```
where,<br>
<b>model_path</b> - The path of the FP32 trained model <br>
<b>save_model_path</b> - The path to save the frozen graph format of the model given in model_path<br>

The cell below provides an example execution this script


In [None]:
%run src/run_create_frozen_graph.py --model_path "./models/trainedmodels/intel/model_b128/model_checkpoint" --save_model_path "./models/frozen_models/intel/model_b32/"

Once the frozen model format of the model is created, the <i>run_neural_compressor_conversion.py</i> python script is used for 
quanitization of the FP32 trained model. The syntax for using the script
given as follows:

```
python src/run_neural_compressor_conversion.py --dataset_file <test dataset file name> --model_path <path of the frozen graph> --config_file <configuration file> --save_model_path <path to save the model>
```
where,
```
--dataset_file        The path of the test dataset file
--model_path          The path of the model file in the frozen graph format
--config_file         The path of the configuration file which contains the settings for the quanitization
--save_model_path     The path to save the quantized model
```

The cell below provides an example execution this script

>Note:
If while running the above script if the error "Unable to run due to ImportError: libGL.so.1: 
cannot open shared object file: No such file or directory" occurs then install libgl using the command,
!sudo apt-get install libgl1

In [None]:
%run src/INC/run_neural_compressor_conversion.py --dataset_file "./data/ner_test_quan_dataset.csv" --model_path "./models/frozen_models/intel/model_b32/frozen_graph.pb" --config_file "./env/deploy.yaml" --save_model_path "./models/quantized_models/intel/model_b128_d100/inc_model_b128_d100/"

If the quanitized model needs to be tuned to evaluate a specific accuracy relative to the FP32 trained model then the respective configuration parameters need to be set in the config file and the python script <i>run_neural_compressor_tune_conversion.py</i> is used in a similiar manner as <i>run_neural_compressor_conversion.py</i> except for the configuration file. The syntax of the script and usage is given as follows:

```
python src/INC/run_neural_compressor_tune_conversion.py --dataset_file <test dataset file name> --model_path <path of the frozen graph> --config_file <configuration file> --save_model_path <path to save the model>
```
where,
```
--dataset_file        The path of the test dataset file
--model_path          The path of the model file in the frozen graph format
--config_file         The path of the configuration file which contains the settings for the quanitization
--save_model_path     The path to save the quantized model
```
The usage of the script is similar to the <i>run_neural_compressor_conversion.py</i> except for the configuration file.

The cell below provides an example execution this script


In [None]:
%run src/INC/run_neural_compressor_tune_conversion.py --dataset_file "./data/ner_test_quan_dataset.csv" --model_path "./models/frozen_models/intel/model_b128/frozen_graph.pb" --config_file "./env/deploy_accuracy.yaml" --save_model_path "./models/acc_quantized_models/intel/model_b128_d100/inc_model_b128_d100/"

#### **Model Inference process with Intel® Quanitizations**
Now that the quantized model is created using INC it can be used for inferencing on the test data and perform benchmarking.
The inferencing is done on the FP32 model and INC quantized model and results for real time inference and batch inference are used
for benchmarking.

The python script <i>run_neural_compressor_inference.py</i> is used for perform predictions on the test data. The syntax to use the script 
is given below.

```
python src/INC/run_neural_compressor_inference.py --batch_size 32 --dataset_file <test dataset file> --model_path <FP32 or INC frozen graph file>
```
where,
```
--batch_size      Give the required batch size for inference
--dataset_file    The path of the test data set file name
--model_path      The path of the FP32 or quantized frozen graph model
```

The cell below provides an example execution this script

In [None]:
%run src/INC/run_neural_compressor_inference.py --batch_size 128 --dataset_file "./data/ner_test_quan_dataset.csv" --model_path "./models/quantized_models/intel/model_b128_d100/inc_model_b128_d100.pb" | tee INC_inference_log.txt

### **Expected Output**
Example of expected output stored in the log.txt file

In [3]:
%cat INC_inference_log.txt

# of sentences:  2500
Testing dataset size: 2500
Total Time Taken for model inference with                           batch size 128 in seconds                               ---> 62.66711711883545
Total Time Taken for model inference with                           batch size 128 in seconds                               ---> 62.51118612289429
Total Time Taken for model inference with                           batch size 128 in seconds                               ---> 62.3470025062561

                ----------------------------------------
                # Model Inference details:
                # Real time inference (in seconds): 0.06799769401550293
                # Average Inference Time (in seconds): 3.289917644701506
                # Accuracy: 0.9701987818667762
                ----------------------------------------
                
