Update the lib code and example for review comment

Signed-off-by: khalid-davis <huangqinkai1@huawei.com>
kubeedge · Jan 28, 2021 · 7cecee8 · 7cecee8
1 parent 595291e
commit 7cecee8
Show file tree

Hide file tree

Showing 13 changed files with 234 additions and 339 deletions.
diff --git a/build/worker/base_images/tensorflow/tensorflow-2.3.Dockerfile b/build/worker/base_images/tensorflow/tensorflow-2.3.Dockerfile
@@ -10,4 +10,4 @@ ENV PYTHONPATH "/home/lib"
 WORKDIR /home/work
 COPY ./lib /home/lib
 
-ENTRYPOINT ["python"]
+ENTRYPOINT ["python"]
diff --git a/examples/helmet_detection/training/train.py b/examples/helmet_detection/training/train.py
diff --git a/examples/helmet_detection/README.md → ...met_detection_incremental_train/README.md b/examples/helmet_detection/README.md → ...met_detection_incremental_train/README.md
@@ -1,34 +1,51 @@
 # Using Incremental Learning Job in Helmet Detection Scenario
 
-This document introduces how to use incremental learning job in helmet detectioni scenario. Using the incremental learning job, our application can automatically retrains, evaluates, and updates models based on the data generated at the edge.
+This document introduces how to use incremental learning job in helmet detectioni scenario. 
+Using the incremental learning job, our application can automatically retrains, evaluates, 
+and updates models based on the data generated at the edge.
 
 ## Helmet Detection Experiment
 
+### Prepare Worker Image
+Build the worker image by referring to the [dockerfile](/build/worker/base_images/tensorflow/tensorflow-1.15.Dockerfile)
+and put the image to the `gm-config.yaml`'s  `imageHub` in [Install Neptune](#install-neptune)
+In this demo, we need to replace the requirement.txt to
+```
+flask==1.1.2
+keras==2.4.3
+opencv-python==4.4.0.44
+websockets==8.1
+Pillow==8.0.1
+requests==2.24.0
+tqdm==4.56.0
+matplotlib==3.3.3
+```
 ### Install Neptune
 
 Follow the [Neptune installation document](/docs/setup/install.md) to install Neptune.
 
 ### Prepare Data and Model
 
-Download dataset and model to your node:
-* step 1: download [dataset](https://edgeai-neptune.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/dataset.tar.gz)
+* step 1: create dataset directory:
 ```
 mkdir -p /data/helmet_detection
-cd /data/helmet_detection
-tar -zxvf dataset.tar.gz
 ```
+
 * step 2: download [base model](https://edgeai-neptune.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/model.tar.gz)
 ```
 mkdir /model
 cd /model
+wget https://edgeai-neptune.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/dataset.tar.gz
 tar -zxvf model.tar.gz
 ```
 ### Prepare Script
-Download the [scripts](/examples/helmet_detection/training) to the path `code` of your node
+Download the [scripts](/examples/helmet_detection_incremental_train/training) to the path `code` of your node
 
 
 ### Create Incremental Job
 
+Create Namespace `kubectl create ns neptune-test`
+
 Create Dataset
 
 ```
@@ -45,7 +62,7 @@ spec:
 EOF
 ```
 
-Create Initial Model
+Create Initial Model to simulate the initial model in incremental learning scenario.
 
 ```
 kubectl create -f - <<EOF
@@ -163,10 +180,10 @@ EOF
 
 ### Mock Video Stream for Inference in Edge Side
 
-* step1: install the open source video streaming server [EasyDarwin](https://github.com/EasyDarwin/EasyDarwin/tree/dev).
-* step2: start EasyDarwin server.
-* step3: download [video](https://edgeai-neptune.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/video.tar.gz).
-* step4: push a video stream to the url (e.g., `rtsp://localhost/video`) that the inference service can connect.
+* step 1: install the open source video streaming server [EasyDarwin](https://github.com/EasyDarwin/EasyDarwin/tree/dev).
+* step 2: start EasyDarwin server.
+* step 3: download [video](https://edgeai-neptune.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/video.tar.gz).
+* step 4: push a video stream to the url (e.g., `rtsp://localhost/video`) that the inference service can connect.
 
 ```
 wget https://github.com/EasyDarwin/EasyDarwin/releases/download/v8.1.0/EasyDarwin-linux-8.1.0-1901141151.tar.gz --no-check-certificate
@@ -180,13 +197,41 @@ tar -zxvf video.tar.gz
 ffmpeg -re -i /data/video/helmet-detection.mp4 -vcodec libx264 -f rtsp rtsp://localhost/video
 ```
 
-
-### Check Incremental Job Result
-
+### Check Incremental Learning Job
 query the service status
 ```
 kubectl get incrementallearningjob helmet-detection-demo -n neptune-test
 ```
+In the `IncrementalLearningJob` resource helmet-detection-demo, the following trigger is configured:
+```
+trigger:
+  checkPeriodSeconds: 60
+  timer:
+    start: 02:00
+    end: 04:00
+  condition:
+    operator: ">"
+    threshold: 500
+    metric: num_of_samples
+```
+In a real word, we need to label the hard examples in `HE_SAVED_URL`  with annotation tools and then put the examples to `Dataset`'s url.   
+Without annotation tools, we can simulate the condition of `num_of_samples` in the following ways:  
+Download [dataset](https://edgeai-neptune.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/dataset.tar.gz) to our cloud0 node.
+```
+cd /data/helmet_detection
+wget  https://edgeai-neptune.obs.cn-north-1.myhuaweicloud.com/examples/helmet-detection/dataset.tar.gz
+tar -zxvf dataset.tar.gz
+```
+The LocalController component will check the number of the sample, realize trigger conditions are met and notice the GlobalManager Component to start train worker.
+When the train worker finish, we can view the updated model in the `/output` directory in cloud0 node.
+Then the eval worker will start to evaluate the model that train worker generated.
 
-after the job completed, we can view the updated model in the /output directory in cloud0 node
-
+If the eval result satisfy the `deploySpec`'s trigger 
+```
+trigger:
+  condition:
+    operator: ">"
+    threshold: 0.1
+    metric: precision_delta
+```
+the deploy worker will load the new model and provide service.
diff --git a/...les/helmet_detection/training/data_gen.py → ...on_incremental_train/training/data_gen.py b/...les/helmet_detection/training/data_gen.py → ...on_incremental_train/training/data_gen.py
@@ -215,13 +215,10 @@ def read_data(self, annotation_line, input_shape=416, random=True, max_boxes=50,
         return image_data, box_data
 
     def preprocess_true_boxes(self, true_boxes, in_shape=416):
-        """
-        Introduction
-        ------------
-            对训练数据的ground truth box进行预处理
-        Parameters
-        ----------
-            true_boxes: ground truth box 形状为[boxes, 5], x_min, y_min, x_max, y_max, class_id
+        """Preprocesses the ground truth box of the training data
+
+        :param true_boxes: ground truth box shape is [boxes, 5], x_min, y_min,
+            x_max, y_max, class_id
         """
 
         num_layers = self.anchors.shape[0] // 3
@@ -238,20 +235,21 @@ def preprocess_true_boxes(self, true_boxes, in_shape=416):
         grid_shapes = [input_shape // 32, input_shape // 16, input_shape // 8]
         y_true = [np.zeros((m, grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5 + self.num_classes),
                            dtype='float32') for l in range(num_layers)]
-        # 这里扩充维度是为了后面应用广播计算每个图中所有box的anchor互相之间的iou
+        # The dimension is expanded to calculate the IOU between the
+        # anchors of all boxes in each graph by broadcasting
         anchors = np.expand_dims(self.anchors, 0)
         anchors_max = anchors / 2.
         anchors_min = -anchors_max
-        # 因为之前对box做了padding, 因此需要去除全0行
+        # Because we padded the box before, we need to remove all 0 lines
         valid_mask = boxes_wh[..., 0] > 0
 
         for b in range(m):
             wh = boxes_wh[b, valid_mask[b]]
             if len(wh) == 0: continue
 
-            # 为了应用广播扩充维度
+            # Expanding dimensions for broadcasting applications
             wh = np.expand_dims(wh, -2)
-            # wh 的shape为[box_num, 1, 2]
+            # wh shape is [box_num, 1, 2]
             boxes_max = wh / 2.
             boxes_min = -boxes_max
 
@@ -263,7 +261,10 @@ def preprocess_true_boxes(self, true_boxes, in_shape=416):
             anchor_area = anchors[..., 0] * anchors[..., 1]
             iou = intersect_area / (box_area + anchor_area - intersect_area)
 
-            # 找出和ground truth box的iou最大的anchor box, 然后将对应不同比例的负责该ground turth box 的位置置为ground truth box坐标
+            # Find out the largest anchor box with the IOU of the ground truth
+            # box, and then set the corresponding positions of different
+            # proportions responsible for the ground turn box as the
+            # coordinates of the ground truth box
             best_anchor = np.argmax(iou, axis=-1)
             for t, n in enumerate(best_anchor):
                 for l in range(num_layers):

diff --git a/examples/helmet_detection/training/eval.py → ...ection_incremental_train/training/eval.py b/examples/helmet_detection/training/eval.py → ...ection_incremental_train/training/eval.py
@@ -19,13 +19,10 @@ def main():
 
     model = validate
 
-    model = neptune.incremental_learning.evaluate(model=model,
-                                                  test_data=test_data,
-                                                  class_names=class_names,
-                                                  input_shape=input_shape)
-
-    # Save the model based on the config.
-    # kubeedge_ai.incremental_learning.save_model(model)
+    neptune.incremental_learning.evaluate(model=model,
+                                          test_data=test_data,
+                                          class_names=class_names,
+                                          input_shape=input_shape)
 
 
 if __name__ == '__main__':

diff --git a/...es/helmet_detection/training/inference.py → ...n_incremental_train/training/inference.py b/...es/helmet_detection/training/inference.py → ...n_incremental_train/training/inference.py
@@ -1,9 +1,9 @@
 import logging
+import os
 import time
 
 import cv2
 import numpy as np
-import os
 
 import neptune
 from neptune.incremental_learning import InferenceResult

diff --git a/...es/helmet_detection/training/interface.py → ...n_incremental_train/training/interface.py b/...es/helmet_detection/training/interface.py → ...n_incremental_train/training/interface.py
@@ -165,7 +165,7 @@ def avg_checkpoints(self):
 
         logging.info("average checkpoints end .......")
 
-    def save_model_pb(self):
+    def save_model_pb(self, saved_model_name):
         """
         save model as a single pb file from checkpoint
         """
@@ -189,6 +189,6 @@ def save_model_pb(self):
             print('output_tensors : ', output_tensors)
             output_tensors = [t.op.name for t in output_tensors]
             graph = tf.graph_util.convert_variables_to_constants(sess, input_graph_def, output_tensors)
-            tf.train.write_graph(graph, model.model_dir, 'model.pb', False)
+            tf.train.write_graph(graph, model.model_dir, saved_model_name, False)
 
         logging.info("save model as .pb end .......")