# 1.サンプル学習データのダウンロード
37種類の犬と猫の画像です。

In [None]:
!wget https://hiouchiystorage.blob.core.windows.net/share/data.zip

In [None]:
!mkdir train_data

In [None]:
!mv data.zip train_data/

In [None]:
!unzip train_data/data.zip -d train_data/

# 2.モデルの学習（ResNet50を転移学習）
TensorFlow v2.xのPre trainedモデルのResNet50をベースに、37種類の犬と猫を分類するカスタムモデルを作成します。

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet import preprocess_input, decode_predictions
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard, CSVLogger
from tensorflow.keras import optimizers, models
from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras import applications
from tensorflow.keras import backend as K
import tensorflow as tf
import os
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

In [None]:
test_path = 'train_data/test/'
train_path = 'train_data/train/'
val_path = 'train_data/val/'
WIDTH=224
HEIGHT=224
BATCH_SIZE=64

#Train DataSet Generator with Augmentation
print("\nTraining Data Set")
train_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
train_flow = train_generator.flow_from_directory(
    train_path,
    target_size=(HEIGHT, WIDTH),
    batch_size = BATCH_SIZE
)

#Validation DataSet Generator with Augmentation
print("\nValidation Data Set")
val_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
val_flow = val_generator.flow_from_directory(
    val_path,
    target_size=(HEIGHT, WIDTH),
    batch_size = BATCH_SIZE
)

#Test DataSet Generator with Augmentation
print("\nTest Data Set")
test_generator = ImageDataGenerator(preprocessing_function=preprocess_input)
test_flow = test_generator.flow_from_directory(
    test_path,
    target_size=(HEIGHT, WIDTH),
    batch_size = BATCH_SIZE
)


# Initialize MobileNet with transfer learning
base_model = applications.ResNet50(weights='imagenet', 
                                include_top=False, 
                                input_shape=(WIDTH, HEIGHT,3))

# add a global spatial average pooling layer
x = base_model.output

x = GlobalAveragePooling2D()(x)
# and a dense layer
x = Dense(1024, activation='relu')(x)
predictions = Dense(len(train_flow.class_indices), activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional MobileNet layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer=optimizers.Adam(lr=0.001), metrics=['accuracy', 'top_k_categorical_accuracy'], loss='categorical_crossentropy')
model.summary()

import math
top_layers_file_path="resnet50.hdf5"

checkpoint = ModelCheckpoint(top_layers_file_path, monitor='loss', verbose=1, save_best_only=True, mode='min')
tb = TensorBoard(log_dir='./logs', batch_size=val_flow.batch_size, write_graph=True, update_freq='batch')
early = EarlyStopping(monitor="loss", mode="min", patience=5)
csv_logger = CSVLogger('./logs/mn-log.csv', append=True)

history = model.fit_generator(train_flow, 
                              epochs=1, 
                              verbose=1,
                              validation_data=val_flow,
                              validation_steps=math.ceil(val_flow.samples/val_flow.batch_size),
                              steps_per_epoch=math.ceil(train_flow.samples/train_flow.batch_size),
                              callbacks=[checkpoint, early, tb, csv_logger])


model.load_weights(top_layers_file_path)
loss, acc, top_5 = model.evaluate_generator(
    test_flow,
    verbose = True,
    steps=math.ceil(test_flow.samples/test_flow.batch_size))
print("Loss: ", loss)
print("Acc: ", acc)
print("Top 5: ", top_5)


label = [k for k,v in train_flow.class_indices.items()]
with open('labels.txt', 'w+') as file:
    file.write("\n".join(label))
 
tf.saved_model.save(model, 'resnet50_model')

トレーニングには時間がかかるので、待てないという方はこちらからトレーニング済みのモデルファイルをダウンロードください。

In [None]:
!wget https://hiouchiystorage.blob.core.windows.net/share/resnet50_model.zip; unzip resnet50_model.zip

## おまけ①　上記ダウンロードしたモデルを追加学習するためのコード

In [None]:
test_path = 'train_data/test/'
train_path = 'train_data/train/'
val_path = 'train_data/val/'
WIDTH=224
HEIGHT=224
BATCH_SIZE=64

#Train DataSet Generator with Augmentation
print("\nTraining Data Set")
train_generator = ImageDataGenerator()
train_flow = train_generator.flow_from_directory(
    train_path,
    target_size=(HEIGHT, WIDTH),
    batch_size = BATCH_SIZE
)

#Validation DataSet Generator with Augmentation
print("\nValidation Data Set")
val_generator = ImageDataGenerator()
val_flow = val_generator.flow_from_directory(
    val_path,
    target_size=(HEIGHT, WIDTH),
    batch_size = BATCH_SIZE
)

#Test DataSet Generator with Augmentation
print("\nTest Data Set")
test_generator = ImageDataGenerator()
test_flow = test_generator.flow_from_directory(
    test_path,
    target_size=(HEIGHT, WIDTH),
    batch_size = BATCH_SIZE
)

new_model = tf.keras.models.load_model('resnet50_model')    
model = new_model
model.summary()

import math
top_layers_file_path="resnet50.hdf5"

checkpoint = ModelCheckpoint(top_layers_file_path, monitor='loss', verbose=1, save_best_only=True, mode='min')
tb = TensorBoard(log_dir='./logs', batch_size=val_flow.batch_size, write_graph=True, update_freq='batch')
early = EarlyStopping(monitor="loss", mode="min", patience=5)
csv_logger = CSVLogger('./logs/mn-log.csv', append=True)

history = model.fit_generator(train_flow, 
                              epochs=1, 
                              verbose=1,
                              validation_data=val_flow,
                              validation_steps=math.ceil(val_flow.samples/val_flow.batch_size),
                              steps_per_epoch=math.ceil(train_flow.samples/train_flow.batch_size),
                              callbacks=[checkpoint, early, tb, csv_logger])


model.load_weights(top_layers_file_path)
loss, acc, top_5 = model.evaluate_generator(
    test_flow,
    verbose = True,
    steps=math.ceil(test_flow.samples/test_flow.batch_size))
print("Loss: ", loss)
print("Acc: ", acc)
print("Top 5: ", top_5)


label = [k for k,v in train_flow.class_indices.items()]
with open('labels.txt', 'w+') as file:
    file.write("\n".join(label))
 
tf.saved_model.save(model, 'resnet50_model')

## おまけ② モデルをFrozen Graphにすることもできます。（※今回はSavedModel形式で保存しています。）

In [None]:
# Convert Keras model to ConcreteFunction
full_model = tf.function(lambda x: model(x))
concrete_function = full_model.get_concrete_function(
    x=tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype))

# Get frozen ConcreteFunction
frozen_model = convert_variables_to_constants_v2(concrete_function)

# Generate frozen pb
tf.io.write_graph(graph_or_graph_def=frozen_model.graph,
                   logdir="./frozen_models",
                   name="simple_frozen_graph.pb",
                   as_text=False)

# 3.モデルの推論
そのままTensorFlow上で推論を実行し、性能を確認します。

In [None]:
import tensorflow as tf
import os
from PIL import Image
import numpy as np
import cv2
import time
import glob
import random
import pandas as pd
from PIL import Image
import PIL
import io
import argparse
import sys
from openvino.inference_engine import IECore
import IPython.display
from IPython.display import clear_output

In [None]:
def inference_on_tf(numOfImages=50):
    loaded = tf.saved_model.load('resnet50_model')
    #print(list(loaded.signatures.keys()))  # ["serving_default"]

    infer = loaded.signatures["serving_default"]
    output_node_name = next(iter(infer.structured_outputs.keys()))

    #Read in Labels
    arg_labels="train_data/labels.txt"
    label_file = open(arg_labels, "r")
    labels = label_file.read().split('\n')

    total_spent_time = 0
    total_infer_spent_time = 0
    file_list = glob.glob("train_data/test/*/*")
    for i in range(numOfImages):
        img_path = random.choice(file_list)
        img_cat = os.path.split(os.path.dirname(img_path))[1]

        start1 = time.time()
        img = tf.keras.preprocessing.image.load_img(img_path, target_size=[224, 224])
        x = tf.keras.preprocessing.image.img_to_array(img)
        x = tf.keras.applications.resnet.preprocess_input(x[tf.newaxis,...])

        start2 = time.time() #ここ追加
        labeling = infer(tf.constant(x))[output_node_name]
        infer_time = time.time() - start2
        
        label_index = np.argsort(labeling)[0,::-1][0]
        pred_label = labels[label_index]
        total_time = time.time() - start1
        
        if i > 1:
            total_infer_spent_time += infer_time
            total_spent_time += total_time
        
        #print("Filename:{}, Prediction:{}, ProcTime:{}, InferTime:{}".format(img_path, pred_label, int(total_time*1000), int(infer_time*1000)))

        clear_output(wait=True)
        frame = cv2.imread(img_path)
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        if frame.shape[:-1] != (224, 224):
            frame = cv2.resize(frame, (224, 224))
        cv2.putText(frame,'No ' + str(i+1) + ':' + str(int(total_time*1000)) + ',' + str(int(infer_time*1000)), (10,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,2550), 4)
        cv2.putText(frame,'No ' + str(i+1) + ':' + str(int(total_time*1000)) + ',' + str(int(infer_time*1000)), (10,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,0), 2)
        cv2.putText(frame,str(img_cat), (10,80), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,2550), 4)
        cv2.putText(frame,str(img_cat), (10,80), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,0), 2)
        cv2.putText(frame,str(pred_label), (10,130), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,2550), 4)
        cv2.putText(frame,str(pred_label), (10,130), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,0), 2)
        f = io.BytesIO()
        PIL.Image.fromarray(frame).save(f, 'jpeg')
        IPython.display.display(IPython.display.Image(data=f.getvalue()))
        
    print()
    print('全' + str(numOfImages) + '枚 完了！')
    print()
    print("平均処理時間: " + str(int((total_spent_time / (numOfImages-1))*1000.0)) + " ms/枚")
    print("平均推論時間: " + str(int((total_infer_spent_time / (numOfImages-1))*1000.0)) + " ms/枚")

In [None]:
inference_on_tf(50)

TensorFlowでの作業は以上となります。

---
ここからOpenVINO↓
# 4.OpenVINOでFP32モデルをCPUに最適化（IRに変換）

ここからはIntel® OpenVINO™ Toolkitを用いた量子化方法をご紹介します。

といってもまずは、元のTensorFlowのモデル（FP32）をOpenVINOのIR（Intermidiate Repretation）形式に変換するところから実施しましょう。

In [None]:
!python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_tf.py --saved_model_dir ./resnet50_model --input_shape=[1,224,224,3] --data_type FP32 --model_name resnet50_fp32

IR(xml+bin)が生成されていることを確認します。

# 5.最適化済みのモデル(IR)をOpenVINOの推論エンジン（Inference Engine）上で実行

IRをOpenVINOの推論エンジン（IE）上で実行してみます。モデルはFP32のままですが、IRに変換することでモデルの内部構造がCPUに最適化され、大きく性能が向上したことが確認できるかと思います。

In [None]:
import tensorflow as tf
import os
from PIL import Image
import numpy as np
import cv2
import time
import glob
import random
import pandas as pd
from PIL import Image
import PIL
import io
import argparse
import sys
from openvino.inference_engine import IECore
import IPython.display
from IPython.display import clear_output 
from tensorflow.keras.applications.resnet import preprocess_input

In [None]:
class Model(object):

    def __init__(self):
        #Read in Labels
        arg_labels="train_data/labels.txt"
        label_file = open(arg_labels, "r")
        self.labels = label_file.read().split('\n')

    def predict(self, imageFile):
        raise NotImplementedError

class OpenVINOModel(Model):

    def __init__(self, target_device, modelFilePath):
        super(OpenVINOModel, self).__init__()

        # These are set to the default names from exported models, update as needed.
        model_xml = modelFilePath
        model_bin = modelFilePath.replace('.xml', '.bin')

        # Plugin initialization for specified device and load extensions library if specified
        # Set the desired device name as 'device' parameter. This sample support these 3 names: CPU, GPU, MYRIAD
        ie = IECore()

        # Read IR
        self.net = ie.read_network(model=model_xml, weights=model_bin)

        self.input_blob = next(iter(self.net.inputs))
        self.out_blob = next(iter(self.net.outputs))
        self.net.batch_size = 1

        # Loading model to the plugin
        self.exec_net = ie.load_network(network=self.net, device_name='CPU', num_requests=1)

    def predict(self, imageFile):
        start1 = time.time() #ここ追加
        
        image = cv2.imread(imageFile)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        if image.shape[:-1] != (224, 224):
            image = cv2.resize(image, (224, 224))
        frame = image
        image = preprocess_input(image)
        image = image.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        image = image.reshape((1, 3, 224, 224))
        images = image
        

        start2 = time.time() #ここ追加
        predictions = self.exec_net.infer(inputs={self.input_blob: images})
        infer_time = time.time() - start2

        # Print the highest probability label
        predictions = predictions[self.out_blob]
        highest_probability_index = predictions[0].argsort()[-1:][::-1][0]

        total_time = time.time() - start1

        return total_time, infer_time, self.labels[highest_probability_index], frame  #ここ追加
        #return total_time, infer_time, "", frame  #ここ追加


def inference_on_ov(modelFile, model_type="tf", target_device='CPU', total=500):
    if model_type == 'tf':
        model = TFModel(modelFile)
    elif model_type == 'tf_int8':
        model = TFModel(modelFile)
    else:
        if target_device == 'GPU':
            model = OpenVINOModel('GPU', modelFile)
        elif target_device == 'MYRIAD':
            model = OpenVINOModel('MYRIAD', modelFile)
        else:
            model = OpenVINOModel('CPU', modelFile)

    total_infer_spent_time = 0
    total_spent_time = 0
    list_df = pd.DataFrame( columns=['正解ラベル','予測ラベル','全処理時間(msec)','推論時間(msec)'] )

    match = 0
    #file_list = glob.glob(os.path.join(dataset_dir, "*"))
    file_list = glob.glob("train_data/test/*/*")
    for i in range(total):
        img_path = random.choice(file_list)
        img_cat = os.path.split(os.path.dirname(img_path))[1]
        total_time, infer_time, pred_label, frame = model.predict(img_path)

        if i > 1:
            total_infer_spent_time += infer_time
            total_spent_time += total_time

        #print(img_path, str(int(total_time*1000.0)) + 'msec', str(int(infer_time*1000.0)) + 'msec', pred_label) #ここ追加
        clear_output(wait=True)
        frame = cv2.imread(img_path)
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        if frame.shape[:-1] != (224, 224):
            frame = cv2.resize(frame, (224, 224))
        cv2.putText(frame,'No ' + str(i+1) + ':' + str(int(total_time*1000)) + ',' + str(int(infer_time*1000)), (10,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,2550), 4)
        cv2.putText(frame,'No ' + str(i+1) + ':' + str(int(total_time*1000)) + ',' + str(int(infer_time*1000)), (10,30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,0), 2)
        cv2.putText(frame,str(img_cat), (10,80), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,2550), 4)
        cv2.putText(frame,str(img_cat), (10,80), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,0), 2)
        cv2.putText(frame,str(pred_label), (10,130), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,2550), 4)
        cv2.putText(frame,str(pred_label), (10,130), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,0), 2)
        f = io.BytesIO()
        PIL.Image.fromarray(frame).save(f, 'jpeg')
        IPython.display.display(IPython.display.Image(data=f.getvalue()))

        tmp_se = pd.Series( [img_cat, pred_label, str(int(total_time * 1000)), str(int(infer_time * 1000)) ], index=list_df.columns )
        list_df = list_df.append( tmp_se, ignore_index=True ) 

    print()
    print('全' + str(total) + '枚 完了！')
    print()
    print("平均処理時間: " + str(int((total_spent_time / (total-1))*1000.0)) + " ms/枚")
    print("平均推論時間: " + str(int((total_infer_spent_time / (total-1))*1000.0)) + " ms/枚")
    return int((total_spent_time / (total-1))*1000.0), int((total_infer_spent_time / (total-1))*1000.0)

In [None]:
inference_on_ov('resnet50_fp32.xml', model_type='openvino', total=50)

---
ここからOpenVINOで量子化↓

# 6.OpenVINOのPOTでIRを量子化
IRの量子化はOpenVINOのPOT（Post-Training Optimization Toolkit）を使用して行います。事前にPOTの[セットアップ](https://docs.openvinotoolkit.org/latest/_README.html#install_post_training_optimization_toolkit)を完了させて下さい。

その後、量子化のための各種設定を記述したConfigファイル（JSON）を準備します。今回は既にレポジトリ内に用意されている'resnet50_int8.json'を使用します。今回使用するConfigファイルの中身を見てみましょう。

In [None]:
!cat resnet50_int8.json

ここでPOTに関して2点補足説明です。

1. POTはAccuracyCheckerという既存ツールを前提としている

    AccuracyCheckerはその名の通り、モデルのAccuracyを計測するためのツールです。OpenVINOのIRに変換後のモデルはもちろん、変換前の形式（TensorFlow、PyTorch、ONNXなど）であっても実行可能です。POTはこのAccuracyCheckrを拡張した機能であるため、AccuracyCheckrへの依存関係があります。したがって、上記Configファイルの前半部分は、まさにAccuracyChecker用の設定になります。
より詳しくは[こちら](https://docs.openvinotoolkit.org/latest/_README.html)を参照ください。


2. POTには2つの量子化のアルゴリズムが用意されている

    量子化のアルゴリズムとして下記2つのいずれかを利用可能です。より詳しくは[こちら](https://docs.openvinotoolkit.org/latest/_compression_algorithms_quantization_README.html)
    - DefaultQuantization・・・このサンプルで利用。より量子化処理の実行時間を高速化を優先。より詳しくは[こちら](https://docs.openvinotoolkit.org/latest/_compression_algorithms_quantization_default_README.html)
    - AccuracyAwareQuantization・・・より量子化後のAccuracyを優先。時間がかかることがある。より詳しくは[こちら](https://docs.openvinotoolkit.org/latest/_compression_algorithms_quantization_accuracy_aware_README.html)

続いて、POTを使って量子化を実行します。

In [None]:
!pot -c resnet50_int8.json

実行が成功すると、resultsというフォルダが作成されます。そして、量子化済みのIRがその中に格納されています。

results/resnet50_int8_DefaultQuantization/日付日時のフォルダ/optimized/**.xml と **.bin

ちなみに、POTコマンドではなく、[Pythonスクリプト](https://docs.openvinotoolkit.org/latest/_sample_README.html#how_to_run_the_sample)を書いて同様のことを実現可能することも可能です。より細かなカスタマイズを行いたい時などはぜひご利用ください

# 7.量子化後のIRを実行

下記コマンドの日時の部分（2020-10-07_12-55-36）を実際のものに書き換えてから実行ください。

In [None]:
inference_on_ov('results/resnet50_int8_DefaultQuantization/2020-12-03_05-22-17/optimized/resnet50_int8.xml', model_type='openvino', total=50)

出来上がったモデルを共有フォルダに格納して終了。

In [None]:
!cp -r results/resnet50_int8_DefaultQuantization/2020-12-03_05-22-17/optimized/ /workspace

# おしまい！