## 프로젝트 제목: BrainAI 감정 인식 AI 시스템 만들기
### OpenVINO Pre-trained 모델을 활용하여 감정 인식 AI 시스템 개발 실습

### (1) 얼굴 인식 모델(<span style="color:blue">face-detection-adas-0001</span>)에 대해서 [여기](https://docs.openvino.ai/2024/omz_models_model_emotions_recognition_retail_0003.html)에서 알아보기.

<img src="https://docs.openvino.ai/2022.1/_images/face-detection-adas-0001.png" style="width:400px; float:left;" />
<div style="clear:both;"></div>

네트워크(모델)는 모양이 [1, 1, N, 7] 인 blob을 출력합니다. <br>
여기서 N은 감지 된 경계 상자의 수입니다. 각 탐지에 대해 설명은 다음과 같은 형식을 갖습니다.<br>
<b>[image_id, label, conf, x_min, y_min, x_max, y_max]

A very basic introduction to using face detection models with OpenVINO™ 

The [Model/face-detection-adas-0001](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/face-detection-adas-0001/README.md) model from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/) is used. It detects faces in images and returns a blob of data 

with shape: `1, 1, 200, 7` in the format `1, 1, N, 7`, where `N` is the number of detected
bounding boxes. The results are sorted by confidence in decreasing order. Each detection has the format
[`image_id`, `label`, `conf`, `x_min`, `y_min`, `x_max`, `y_max`], where:

- `image_id` - ID of the image in the batch
- `label` - predicted class ID (1 - face)
- `conf` - confidence for the predicted class
- (`x_min`, `y_min`) - coordinates of the top left bounding box corner
- (`x_max`, `y_max`) - coordinates of the bottom right bounding box corner

### (2) 감정인식 모델(<span style="color:blue">emotions-recognition-retail-0003)</span> 에 대해서 [여기](https://docs.openvino.ai/2022.1/omz_models_model_face_detection_adas_0001.html) 알아보기
Use Case and High-Level Description
Fully convolutional network for recognition of <b><span style="color:red">five emotions (‘neutral’, ‘happy’, ‘sad’, ‘surprise’, ‘anger’).</span></b>

Validation Dataset
For the metrics evaluation, the validation part of the AffectNet dataset is used. A subset with only the images containing five aforementioned emotions is chosen. The total amount of the images used in validation is 2,500.

<b>Inputs</b><br>
Image, name: data, <b><span style="color:red">shape: 1, 3, 64, 64</b></span> in 1, C, H, W format, where:

C - number of channels

H - image height

W - image width

Expected color order is BGR.

<b>Outputs</b><br>
Name: prob_emotion, <b><span style="color:red">shape: 1, 5, 1, 1</span></b> - Softmax output across <b><span style="color:red">five emotions (0 - ‘neutral’, 1 - ‘happy’, 2 - ‘sad’, 3 - ‘surprise’, 4 - ‘anger’)</span></b>.

## Download Models

In [4]:
!omz_downloader --name face-detection-adas-0001 --precision FP16

################|| Downloading face-detection-adas-0001 ||################

... 100%, 304 KB, 78 KB/s, 3 seconds passed

... 49%, 1024 KB, 169 KB/s, 6 seconds passed
... 99%, 2048 KB, 277 KB/s, 7 seconds passed
... 100%, 2056 KB, 277 KB/s, 7 seconds passed



In [5]:
!omz_downloader --name emotions-recognition-retail-0003 --precision FP16

################|| Downloading emotions-recognition-retail-0003 ||################

... 100%, 54 KB, 47 KB/s, 1 seconds passed

... 21%, 1024 KB, 728 KB/s, 1 seconds passed
... 42%, 2048 KB, 1236 KB/s, 1 seconds passed
... 63%, 3072 KB, 1611 KB/s, 1 seconds passed
... 84%, 4096 KB, 1913 KB/s, 2 seconds passed
... 100%, 4848 KB, 2096 KB/s, 2 seconds passed



## 필요한 라이브러리

In [6]:
import openvino as ov
import cv2
import numpy as np

import matplotlib.pyplot as plt
from pathlib import Path

## Inference 할 장치 확인

In [8]:
core = ov.Core()
options = core.available_devices
options

['CPU', 'GPU']

## Load Models

In [9]:
model = core.read_model(model='./models/face-detection-adas-0001.xml')
face_model = core.compile_model(model=model, device_name="CPU")

face_input_layer = face_model.input(0)
face_output_layer = face_model.output(0)

print("Input layer shape: ", face_input_layer.shape)
print("Output layer shape:", face_output_layer.shape)

Input layer shape:  [1,3,384,672]
Output layer shape: [1,1,200,7]


In [10]:
model = core.read_model(model='./models/emotions-recognition-retail-0003.xml')
emotion_model = core.compile_model(model=model, device_name="CPU")

emotion_input_layer = emotion_model.input(0)
emotion_output_layer = emotion_model.output(0)

print("Input layer shape: ", emotion_input_layer.shape)
print("Output layer shape:", emotion_output_layer.shape)

Input layer shape:  [1,3,64,64]
Output layer shape: [1,5,1,1]


## Load Image

In [22]:
frame = cv2.imread("images/emotions.jpg")

resized_frame = cv2.resize(src=frame, dsize=(672, 384)) 
transposed_frame = resized_frame.transpose(2, 0, 1)
input_frame = np.expand_dims(transposed_frame, 0)

## DrawBoundingBoxes

In [17]:
def DrawBoundingBoxes(output, frame, conf=0.5):
    boxes = []
    canvas = frame.copy()
    h,w,_ = canvas.shape 

    predictions = output[0][0]            # 하위 집합 데이터 프레임
    confidence = predictions[:,2]         # conf 값 가져오기 [image_id, label, conf, x_min, y_min, x_max, y_max]

    top_predictions = predictions[(confidence>conf)]         # 임계값보다 큰 conf 값을 가진 예측만 선택

    for detection in top_predictions:
        box = (detection[3:7] * np.array([w, h, w, h])).astype("int") # 상자 위치 결정
        (xmin, ymin, xmax, ymax) = box   # xmin, ymin, xmax, ymax에 상자 위치 값 지정
        cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 0, 255), 2)       # 사각형 그리기
        boxes.append(box)     #이미지에 박스를 그린 얼굴의 위치 저장
   
    return boxes

## DrawText

In [18]:
def DrawText(output, frame, face_position):
    # emotions 딕셔너리 생성
    emotions = {
        0:"neutral",
        1:"happy",
        2:"sad",
        3:"surprise",
        4:"anger"
    }
    # 딕셔너리 출력하기
    #for key, value in emotions.items():
    #    print(key, value, end='      ')
    #print()
        
    predictions = output[0,:,0,0]              # 5개의 감정 예측값 저장
    print("predictions : " + str(predictions))
    
    topresult_index = np.argmax(predictions)   # 5개의 감정 예측값 중 가장 높은값의 위치 저장
    #print("topresult_index : " + str(topresult_index))
    
    emotion = emotions[topresult_index]        # emotions에서 topresult_index 값에 해당하는 감정 저장
    #print("emotion : " + emotion)
    
    cv2.putText(frame, emotion,                 # 예측한 감정값 이미지에 출력하기
                (face_position[0],face_position[1]),    #xmin, ymin 값을 가져와 위치 설정
                cv2.FONT_HERSHEY_SIMPLEX, 1, 
                (255, 0,0), 2)

## Emotion Recognition

In [23]:
face_output = face_model([input_frame])[face_output_layer]
boxes = DrawBoundingBoxes(face_output, frame, conf=0.5)

In [24]:
if boxes is not None:
    
    for box in boxes:          #boxes에 저장된 얼굴의 위치들을 하나씩 box에 전달
    
        xmin, ymin, xmax, ymax = box      #box에 저장된 좌표 저장
        emotion_input = frame[ymin:ymax,xmin:xmax]         #이미지에서 해당 얼굴 위치를 찾아 저장
        
        # 감정 인식 모델을 사용하기 위해 이미지 전처리
        # Input layer shape:  [1,3,64,64]
        resized_image = cv2.resize(src=emotion_input, dsize=(64, 64))      #이미지 사이즈 변경  (64,64,3)
        transposed_image = resized_image.transpose(2, 0, 1)                #shape 위치 변경    (3,64,64)
        input_image = np.expand_dims(transposed_image, 0)                  #차원 확장 (1,3,64,64)

        emotion_output = emotion_model([input_image])[emotion_output_layer]  # 감정 추론
        DrawText(emotion_output, frame, box)   # 추론의 결과값 이미지에 출력하기

predictions : [0.47297916 0.00922722 0.03208939 0.12683502 0.35886925]
predictions : [1.8001848e-04 1.5403839e-03 9.2567629e-01 2.4386642e-04 7.2359510e-02]
predictions : [5.2245539e-03 7.2852598e-04 8.5551449e-04 9.9174678e-01 1.4446168e-03]
predictions : [1.6682535e-01 8.2122606e-01 9.5314868e-03 1.8027051e-04 2.2368992e-03]
predictions : [3.9358577e-04 9.7996449e-01 1.5637865e-02 2.5980247e-03 1.4059979e-03]
predictions : [5.8812596e-04 5.6197669e-02 1.8172298e-04 9.1954255e-01 2.3489935e-02]
predictions : [6.7086010e-05 9.9930334e-01 7.4545591e-05 4.5859793e-04 9.6366552e-05]
predictions : [5.3750433e-04 1.4339949e-04 1.6935905e-02 4.8260146e-05 9.8233497e-01]
predictions : [0.0036957  0.9840232  0.00687443 0.00285215 0.00255447]
predictions : [1.05269134e-01 3.57439509e-04 7.96646699e-02 1.47449886e-04
 8.14561188e-01]
predictions : [4.2153094e-03 8.6598024e-02 1.0233668e-04 8.9234114e-01 1.6743178e-02]
predictions : [1.6983317e-03 5.4251018e-04 3.2775686e-03 1.5957810e-05 9.94465

In [25]:
cv2.imshow("emotion-recognition", frame)

cv2.waitKey(0)
cv2.destroyAllWindows()

## Add Background

In [30]:
def AddBackground(frame, bg):

    frame_h, frame_w = frame.shape[0], frame.shape[1]
    new_h = 500
    new_w = int((new_h/frame_h)*frame_w)
    frame_resize = cv2.resize(frame, (new_w, new_h))

    xmax = bg.shape[1] - 350
    ymax = bg.shape[0] - 175
    xmin = xmax - new_w
    ymin = ymax - new_h

    bg[ymin:ymax, xmin:xmax] = frame_resize

    return bg

In [31]:
background = "./images/background.jpg"  #사용할 배경화면 경로
bg = cv2.imread(background)
deployment = AddBackground(frame, bg)
cv2.imshow("Deployment", deployment)

cv2.waitKey(0)
cv2.destroyAllWindows()

## webcam

In [34]:
camera = cv2.VideoCapture(0) #create a VideoCapture object with the 'first' camera (your webcam)
background = "./images/background.jpg"  #사용할 배경화면 경로
bg = cv2.imread(background)

while(True):
    ret, frame = camera.read()             # Capture frame by frame      
    if ret == False:
        break
    
    resized_frame = cv2.resize(src=frame, dsize=(672, 384)) 
    transposed_frame = resized_frame.transpose(2, 0, 1)
    input_frame = np.expand_dims(transposed_frame, 0)    
    
    face_output = face_model([input_frame])[face_output_layer]
    
    boxes = DrawBoundingBoxes(face_output, frame, conf=0.5)
    
    if boxes is not None:
    
        for box in boxes:          #boxes에 저장된 얼굴의 위치들을 하나씩 box에 전달
    
            xmin, ymin, xmax, ymax = box      #box에 저장된 좌표 저장
            emotion_input = frame[ymin:ymax,xmin:xmax]         #이미지에서 해당 얼굴 위치를 찾아 저장
        
            # 감정 인식 모델을 사용하기 위해 이미지 전처리
            # Input layer shape:  [1,3,64,64]
            resized_image = cv2.resize(src=emotion_input, dsize=(64, 64))      #이미지 사이즈 변경  (64,64,3)
            transposed_image = resized_image.transpose(2, 0, 1)                #shape 위치 변경    (3,64,64)
            input_image = np.expand_dims(transposed_image, 0)                  #차원 확장 (1,3,64,64)

            emotion_output = emotion_model([input_image])[emotion_output_layer]  # 감정 추론
            DrawText(emotion_output, frame, box)   # 추론의 결과값 이미지에 출력하기
    
    deployment = AddBackground(frame, bg)
    
    cv2.imshow('Press Spacebar to Exit', deployment)

    if cv2.waitKey(1) & 0xFF == ord(' '):  # Stop if spacebar is detected
        break

camera.release()                           # Cleanup after spacebar is detected.
cv2.destroyAllWindows()

predictions : [0.62678134 0.08138942 0.21381725 0.01847814 0.05953386]
predictions : [0.5588023  0.06464918 0.3125371  0.00914041 0.05487103]
predictions : [0.3425343  0.036699   0.5783002  0.00812694 0.0343396 ]
predictions : [0.52509075 0.06271542 0.3453611  0.0130449  0.05378788]
predictions : [0.3234275  0.04206432 0.60158473 0.0071773  0.02574611]
predictions : [0.3234275  0.04206432 0.60158473 0.0071773  0.02574611]
predictions : [0.4247596  0.06004936 0.45343512 0.00860119 0.05315476]
predictions : [0.57248807 0.06858908 0.28727996 0.0085057  0.06313722]
predictions : [0.26459247 0.03520987 0.6704773  0.00353215 0.0261882 ]
predictions : [0.33127198 0.04806687 0.589548   0.00537501 0.02573819]
predictions : [0.33406308 0.05574457 0.5665071  0.00432561 0.0393596 ]
predictions : [0.33406308 0.05574457 0.5665071  0.00432561 0.0393596 ]
predictions : [0.35994962 0.03350279 0.55322325 0.0052195  0.0481048 ]
predictions : [0.39089224 0.07050619 0.4705818  0.00886252 0.05915722]
predic