# Object Following - Live Demo

In this notebook we'll show how you can follow an object with JetBot!  We'll use a pre-trained neural network
that was trained on the [COCO dataset](http://cocodataset.org) to detect 90 different common objects.  These include

* Person (index 0)
* Cup (index 47)

and many others (you can check [this file](https://github.com/tensorflow/models/blob/master/research/object_detection/data/mscoco_complete_label_map.pbtxt) for a full list of class indices).  The model is sourced from the [TensorFlow object detection API](https://github.com/tensorflow/models/tree/master/research/object_detection),
which provides utilities for training object detectors for custom tasks also!  Once the model is trained, we optimize it using NVIDIA TensorRT on the Jetson Nano.

This makes the network very fast, capable of real-time execution on Jetson Nano!  We won't run through all of the training and optimization steps in this notebook though.

Anyways, let's get started.  First, we'll want to import the ``ObjectDetector`` class which takes our pre-trained SSD engine.

### Compute detections on single camera image

物体追跡モデル( ``ssd_mobilenet_v2_coco.engine`` )のロード.  
``ssd_mobilenet_v2_coco.engine``は右のリンクから拝借：https://faboplatform.github.io/JetbotDocs/07.Object%20Following/02.JetPack4.3/02.run/ 

> 物体追跡モデルにはいくつかバージョンがあり、**JetPack(JetsonNanoのOS)のバージョンに合ったものがあるため注意**  .  
JetPackのバージョンは```$ cat /etc/nv_tegra_release``` で確認することが出来、  
表示されたものとの対応はhttps://nisshingeppo.com/ai/jetson-install-os/#toc6 を参照. ※このマシンは **JetPack4.3**

In [1]:
from jetbot import ObjectDetector

model = ObjectDetector('ssd_mobilenet_v2_coco.engine')  # オブジェクトを検知できるようになる

Internally, the ``ObjectDetector`` class uses the TensorRT Python API to execute the engine that we provide.  It also takes care of preprocessing the input to the neural network, as
well as parsing the detected objects.  Right now it will only work for engines created using the ``jetbot.ssd_tensorrt`` package. That package has the utilities for converting
the model from the TensorFlow object detection API to an optimized TensorRT engine.

内部的には、 ``ObjectDetector`` クラスは TensorRT Python API を使用して、私たちが提供するエンジンを実行します。 また、ニューラルネットワークへの入力の前処理や、検出されたオブジェクトの解析も行います。
検出されたオブジェクトのパースも行います。 現在は ``jetbot.ssd_tensorrt`` パッケージを使用して作成されたエンジンのみで動作します。このパッケージには
このパッケージには、TensorFlowオブジェクト検出APIから最適化されたTensorRTエンジンにモデルを変換するためのユーティリティが含まれています

Next, let's initialize our camera.  Our detector takes 300x300 pixel input, so we'll set this when creating the camera.

> Internally, the Camera class uses GStreamer to take advantage of Jetson Nano's Image Signal Processor (ISP).  This is super fast and offloads
> a lot of the resizing computation from the CPU. 

次に、カメラを初期化しましょう。この検出器は300x300ピクセルの入力を受け取るので、カメラ作成時にこれを設定します。

>内部的には、Camera クラスは GStreamer を使って Jetson Nano の Image Signal Processor (ISP) を利用しています。これは超高速で、リサイズ計算の多くをCPUからオフロードします。

In [2]:
from jetbot import Camera

camera = Camera.instance(width=300, height=300)  # カメラの起動

Now, let's execute our network using some camera input.  By default the ``ObjectDetector`` class expects ``bgr8`` format that the camera produces.  However,
you could override the default pre-processing function if your input is in a different format.

labelの値との対応は次を参照：https://github.com/tensorflow/models/blob/master/research/object_detection/data/mscoco_complete_label_map.pbtxt

In [5]:
# カメラで撮影したオブジェクトの検出
detections = model(camera.value)

print(detections)  # オブジェクトの表示 [label,confidence(信頼度)]

[[{'label': 62, 'confidence': 0.8789390921592712, 'bbox': [0.6567592024803162, 0.31460779905319214, 0.7879235148429871, 0.5521348118782043]}, {'label': 62, 'confidence': 0.8735388517379761, 'bbox': [0.16717860102653503, 0.35617706179618835, 0.7373985052108765, 0.967941164970398]}, {'label': 62, 'confidence': 0.643105149269104, 'bbox': [0.26959696412086487, 0.47950971126556396, 0.9691752195358276, 0.9773321151733398]}, {'label': 62, 'confidence': 0.6047490835189819, 'bbox': [0.05567103624343872, 0.6592293977737427, 0.9747286438941956, 0.980849027633667]}, {'label': 62, 'confidence': 0.5992506742477417, 'bbox': [0.8408224582672119, 0.3700416684150696, 0.9882800579071045, 0.6402749419212341]}, {'label': 62, 'confidence': 0.5626571774482727, 'bbox': [0.21289262175559998, 0.3200993537902832, 0.3429218828678131, 0.5343102812767029]}, {'label': 62, 'confidence': 0.5131805539131165, 'bbox': [0.34787604212760925, 0.3361283838748932, 0.7043672800064087, 0.8462303876876831]}, {'label': 1, 'confid

If there are any COCO objects in the camera's field of view, they should now be stored in the ``detections`` variable.

### Display detections in text area

We'll use the code below to print out the detected objects.

In [6]:
# 検出されたオブジェクトの出力
from IPython.display import display
import ipywidgets.widgets as widgets

detections_widget = widgets.Textarea()
detections_widget.value = str(detections)

display(detections_widget)  # テキストエリアに出力

Textarea(value="[[{'label': 62, 'confidence': 0.8789390921592712, 'bbox': [0.6567592024803162, 0.3146077990531…

You should see the label, confidence, and bounding box of each object detected in each image.  There's only one image (our camera) in this example.  
bounding box:オブジェクトを囲む枠線の事 (https://aiacademy.jp/media/?p=1173)

To print just the first object detected in the first image, we could call the following

> This may throw an error if no objects are detected

In [7]:
# 最初の画像で撮影された,最初のオブジェクトを表示
image_number = 0
object_number = 0

print(detections[image_number][object_number])  # detection[<最初から何番目に撮影された画像か>][<その画像内で何番目に検出されたオブジェクトか>]

{'label': 62, 'confidence': 0.8789390921592712, 'bbox': [0.6567592024803162, 0.31460779905319214, 0.7879235148429871, 0.5521348118782043]}


### Control robot to follow central object

ロボットが指定されたクラスのオブジェクトを追跡するための手順↓

1.  指定されたクラスに合致するオブジェクトを検出
2.  よりカメラ中央に映っているオブジェクトをターゲットとする
3.  2で決めたターゲットに向かって進む. ターゲットが無い場合は徘徊する
4.  障害物を検知した場合は,左に曲がる

We'll also create some widgets that we'll use to control the target object label, the robot speed, and
a "turn gain", that will control how fast the robot turns based off the distance between the target object
and the center of the robot's field of view.  
turn gain:ロボットのターン速度を,``ターゲット``と``カメラの中心``との距離に基づいて制御する

First, let's load our collision detection model.  The pre-trained model is stored in this directory as a convenience, but if you followed
the collision avoidance example you may want to use that model if it's better tuned for your robot's environment.  

最初に,衝突検出モデルのロードをする.

In [8]:
# 衝突検出モデルのロード
import torch
import torchvision
import torch.nn.functional as F
import cv2
import numpy as np

collision_model = torchvision.models.alexnet(pretrained=False)  # 学習済みモデル(alexnet)のロード
# 入力データに対して線形変換を行う(入力ユニット数,出力ユニット数)
collision_model.classifier[6] = torch.nn.Linear(collision_model.classifier[6].in_features, 2)
collision_model.load_state_dict(torch.load('../collision_avoidance/best_model.pth'))
device = torch.device('cuda')
collision_model = collision_model.to(device)

mean = 255.0 * np.array([0.485, 0.456, 0.406])
stdev = 255.0 * np.array([0.229, 0.224, 0.225])

normalize = torchvision.transforms.Normalize(mean, stdev)

def preprocess(camera_value):
    global device, normalize
    x = camera_value
    x = cv2.resize(x, (224, 224))
    x = cv2.cvtColor(x, cv2.COLOR_BGR2RGB)
    x = x.transpose((2, 0, 1))
    x = torch.from_numpy(x).float()
    x = normalize(x)
    x = x.to(device)
    x = x[None, ...]
    return x

FileNotFoundError: [Errno 2] No such file or directory: '../collision_avoidance/best_model.pth'

Great, now let's initialize our robot so we can control the motors.

In [7]:
# 駆動系を動かすための準備
from jetbot import Robot

robot = Robot()

Finally, let's display all the control widgets and connect the network execution function to the camera updates.

In [8]:
from jetbot import bgr8_to_jpeg

# 色々なウイジェットの生成
blocked_widget = widgets.FloatSlider(min=0.0, max=1.0, value=0.0, description='blocked')
image_widget = widgets.Image(format='jpeg', width=300, height=300)
label_widget = widgets.IntText(value=1, description='tracked label')
speed_widget = widgets.FloatSlider(value=0.2, min=0.0, max=1.0, description='speed')
turn_gain_widget = widgets.FloatSlider(value=0.4, min=0.0, max=2.0, description='turn gain')

# ウィジェットの表示
display(widgets.VBox([
    widgets.HBox([image_widget, blocked_widget]),
    label_widget,
    speed_widget,
    turn_gain_widget
]))

width = int(image_widget.width)
height = int(image_widget.height)

def detection_center(detection):
    """Computes the center x, y coordinates of the object"""
    bbox = detection['bbox']
    center_x = (bbox[0] + bbox[2]) / 2.0 - 0.5
    center_y = (bbox[1] + bbox[3]) / 2.0 - 0.5
    return (center_x, center_y)
    
def norm(vec):
    """Computes the length of the 2D vector"""
    return np.sqrt(vec[0]**2 + vec[1]**2)

def closest_detection(detections):
    """Finds the detection closest to the image center"""
    closest_detection = None
    for det in detections:
        center = detection_center(det)
        if closest_detection is None:
            closest_detection = det
        elif norm(detection_center(det)) < norm(detection_center(closest_detection)):
            closest_detection = det
    return closest_detection
        
def execute(change):
    image = change['new']
    
    # execute collision model to determine if blocked
    collision_output = collision_model(preprocess(image)).detach().cpu()
    prob_blocked = float(F.softmax(collision_output.flatten(), dim=0)[0])
    blocked_widget.value = prob_blocked
    
    # 障害物回避(左に旋回)
    if prob_blocked > 0.5:
        robot.left(0.2)
        image_widget.value = bgr8_to_jpeg(image)
        return
        
    # compute all detected objects
    detections = model(image)
    
    # draw all detections on image
    for det in detections[0]:
        bbox = det['bbox']
        cv2.rectangle(image, (int(width * bbox[0]), int(height * bbox[1])), (int(width * bbox[2]), int(height * bbox[3])), (255, 0, 0), 2)
    
    # select detections that match selected class label
    matching_detections = [d for d in detections[0] if d['label'] == int(label_widget.value)]
    
    # get detection closest to center of field of view and draw it
    det = closest_detection(matching_detections)
    if det is not None:
        bbox = det['bbox']
        cv2.rectangle(image, (int(width * bbox[0]), int(height * bbox[1])), (int(width * bbox[2]), int(height * bbox[3])), (0, 255, 0), 5)
    
    
        
    # otherwise go forward if no target detected
    if det is None:
        robot.forward(float(speed_widget.value))
        
    # otherwsie steer towards target
    else:
        # move robot forward and steer proportional target's x-distance from center
        center = detection_center(det)
        robot.set_motors(
            float(speed_widget.value + turn_gain_widget.value * center[0]),
            float(speed_widget.value - turn_gain_widget.value * center[0])
        )
    
    # update image widget
    image_widget.value = bgr8_to_jpeg(image)
    
execute({'new': camera.value})  # カメラ画像が更新される度にexecute関数を実行

VBox(children=(HBox(children=(Image(value=b'', format='jpeg', height='300', width='300'), FloatSlider(value=0.…

Call the block below to connect the execute function to each camera frame update.

In [9]:
camera.unobserve_all()
camera.observe(execute, names='value')

Awesome!  If the robot is not blocked you should see boxes drawn around the detected objects in blue.  The target object (which the robot follows) will be displayed in green.

The robot should steer towards the target when it is detected.  If it is blocked by an object it will simply turn left.

ロボットがブロックされていなければ、検出されたオブジェクトの周りに青で描かれたボックスが表示されているはずです。
目標物（ロボットが追いかける対象）は緑色で表示されます。

ロボットはターゲットが検出されると、それに向かってステアリングを行うはずです。オブジェクトにブロックされた場合は、単に左折します。

``tracked label``：追跡するlabelを選択できる. デフォルトでは1(人間)になっている.  
この値を変える事で、追跡するオブジェクトの変更が可能.  
ラベル一覧：https://github.com/tensorflow/models/blob/master/research/object_detection/data/mscoco_complete_label_map.pbtxt

You can call the code block below to manually disconnect the processing from the camera and stop the robot.

In [2]:
# カメラとロボットの停止
import time

camera.unobserve_all()
camera.stop()
time.sleep(1.0)
robot.stop()

NameError: name 'camera' is not defined