# Inference 
<!-- 
---

The inference is created from 3 parts - face detector, spectrum translator from NIR to VIS and vice versa, and Facial expression recognition part.

### Face Detector
Now, face detection can be done with 2 following models:
* **RetinaFace** - model from [here](https://github.com/serengil/deepface). Accurate, though less quick. Also, RetinaFace can align the face.
* **CenterFace** - original Centerface from [here](https://gitlab.fit.cvut.cz/vadlemar/real-time-facial-expression-recognition-in-the-wild). It is fast, however less accurate than first one. The model in onnx is stored in `models/face_detection/centerface.onnx`. This option does not implicitly align faces in Inference class, RetinaFace does.

Additional detectors from DeepFace module can be easily added such as *MTCNN* or *Jones-Viola* algorithm.

### Spectrum Translator
Translates images between NIR, VIS specters.

* **CycleGAN on CASIA+OuluCasia db** - This Translates between both specters. 
* **FFE-CycleGAN on BUAA db** - This translates between NIR images and averaged grayscale image (averaged from color green illuminated RGB image). The translation is good, however not 

### Facial expression recognition
Classifies categorical emotion and regresses the valece arousal labels. 
* **MobileNet** - FER net from original work [here](https://github.com/serengil/deepface). Does not work on NIR. Onnx in `models/pretrained/mobilenet_simultaneous.onnx`.
* **MobileNet-NIR** - Original MobileNet pretrained on Oulu-Casia database NIR images. Several onnx versions are stored in a folder `models/mobilenet_NIR/`. Additional information below. -->

## Definitions

Imports

In [1]:
from skeleton.inference import Inference

2024-03-14 13:38:29.035588: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-14 13:38:29.089118: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-03-14 13:38:29.089776: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Inference examples

***

The inference first needs defining each of: *face detector*, *spectrum translator* and *facial expression recognition* (FER). Each one of them needs either `"net_type": None` for skipping this part, or defining the model type with its path to onnx file and additional info.
This models definition are shown below and further passed to Inference object.

Model types can be for:
* **Face detector** - `Inference.net_type.FACE_DETECTOR_CENTERFACE` or `Inference.net_type.FACE_DETECTOR_RETINAFACE`
* **FER** - `Inference.net_type.FER_MOBILENET`
* **Spectrum translation** - `Inference.net_type.SPECTRUM_TRANSLATOR_ORIG_CYCLEGAN`

Each model is stored in the folder *models/*.

***

After the `Inference` object definition, the inference can be run. Either *from_folder* loading images, *from_filenames* loading images from array of filepaths, or *from_array* - already loaded image as np.array.
Additionally, the *infer_video* provides incorporating the FER results right into the original video.

<!-- Also the inference can be run as *infer_instant* where each image is immediately passed through all parts of Inference. -->
<!-- Inference, as opposed to *infer* mode, where are first all images detected, then than those translated and then FER. -->

<!-- Thus following functions are available:
* `inf.infer_from_folder('path/to/folder')`
* `inf.infer_from_filenames(['path/to/image1', 'path/to/image2'])`
* `inf.infer_from_array(img_np_array)` -->

<!-- And for Inference of each image separately is: -->
* `inf.infer_instant_from_folder('path/to/folder', save_to_folder='pth/to/save_fld')`
* `inf.infer_instant_from_filenames(['path/to/image1', 'path/to/image2'], save_to_folder='pth/to/save_fld')`
* `inf.infer_instant_from_array(img_np_array, save_to_folder='pth/to/save_fld')`
* `inf.infer_video('path/to/input_video.mp4', 'path/where/to/save/output_video.mp4', fraes_per_second)`

### VIS->NIR Spectrum translation example and FER

This is example of translating images to NIR using the model from Experiment1.2. When Experiment1.1 wmat to be used, simply change path to `models/spectrum_translation/experiment1_1_VIS2NIR.onnx`. Subsequently, image is applied to the FER model from the *Experiment2.2*. Images can be ofcourse evlauated by FER model without using spextrum translation, as that is the discussed *Approach 2* in the thesis. However here it is used for demostrative purposes.

In [35]:
from skeleton.inference import Inference

# define here which models to use. If u which not to use certain model, set its 'net_type' = None.
# If you want to apply model, pick from Inference.net_type defined above
models = {
    "face_detector": {
        "net_type": Inference.net_type.FACE_DETECTOR_RETINAFACE,
        # If using RetinaFace detector, when true, align faces.
        # Should not be used alongside 'remove_black_stripes' == True (artifacts wil be created)
        "retina_face_align": True,
        # Returned images have square dimenstions, however, detectors return rectangles,
        # thus there are black stripes on sides. If you wish to remove those stripes, set True.
        "remove_black_stripes": False,
        # Display detected images in notebook
        "display_images": False,
        # Save images - when loded from filepath saved as <original_filestem>_<face_idx>.<original_extension>,
        # else as <detection_id>.jpg starting from 0
        "save_image_to_folder": None,
    },
    "spectrum_translator": {
        "net_type": Inference.net_type.SPECTRUM_TRANSLATOR_ORIG_CYCLEGAN,
        "pth_to_onnx": 'models/spectrum_translation/experiment1_2_VIS2NIR.onnx',
        # First translates input image to grayscale and then translates spectra
        "input_as_avg_grayscale": False,
        # Translates newly translated image to averaged grayscale
        "output_as_avg_grayscale": True,
        # simply displays images
        "display_images": True,
        # Saves as <global idx_xount>_<id of face starting from 0>.jpg
        "save_image_to_folder": "transtmp",
    },
    "fer": {
        "net_type": Inference.net_type.FER_MOBILENET,
        "pth_to_onnx": 'models/fer/experiment2.2-mobilent-affnir.onnx',
        "display_images": True,
        # Saves as <global idx_xount>_<id of face starting from 0>.jpg
        "save_image_to_folder": None,
    }
}
# debug flag is displaying images, verbose prints other secondary information such as time of inference etc.
inf = Inference(models, None, verbose=True)

Using '{'net_type': <net_type.FACE_DETECTOR_RETINAFACE: 'G'>, 'retina_face_align': True, 'remove_black_stripes': False, 'display_images': False, 'save_image_to_folder': None}' as face detector model
Using '{'net_type': <net_type.SPECTRUM_TRANSLATOR_ORIG_CYCLEGAN: 'B'>, 'pth_to_onnx': 'models/spectrum_translation/experiment1_2_VIS2NIR.onnx', 'input_as_avg_grayscale': False, 'output_as_avg_grayscale': True, 'display_images': True, 'save_image_to_folder': 'transtmp'}' as spectrum transfer model
Orig-CycleGAN model from 'models/spectrum_translation/experiment1_2_VIS2NIR.onnx' loaded.
Using '{'net_type': <net_type.FER_MOBILENET: 'E'>, 'pth_to_onnx': 'models/fer/experiment1.2-mobilent-affnir.onnx', 'display_images': True, 'save_image_to_folder': None}' as FER model
model 'models/fer/experiment1.2-mobilent-affnir.onnx' loaded


Applied to VIS images from AffectNet

In [None]:
output = inf.infer_instant_from_folder('example_data/aff_vis/')

### NIR->VIS Spectrum translation and FER (Approach 1)

This is example of translating images to VIS using the model from Experiment1.2. When Experiment1.1 want to be used, simply change path to `models/spectrum_translation/experiment1_1_NIR2VIS.onnx`.
Subsequently, image is applied to the FER model from the the original work trained on VIS.

In [4]:
from skeleton.inference import Inference

# define here which models to use. If u which not to use certain model, set its 'net_type' = None.
# If you want to apply model, pick from Inference.net_type defined above
models = {
    "face_detector": {
        "net_type": Inference.net_type.FACE_DETECTOR_RETINAFACE,
        # If using RetinaFace detector, when true, align faces.
        # Should not be used alongside 'remove_black_stripes' == True (artifacts wil be created)
        "retina_face_align": True,
        # Returned images have square dimenstions, however, detectors return rectangles,
        # thus there are black stripes on sides. If you wish to remove those stripes, set True.
        "remove_black_stripes": False,
        # Display detected images in notebook
        "display_images": False,
        # Save images - when loded from filepath saved as <original_filestem>_<face_idx>.<original_extension>,
        # else as <detection_id>.jpg starting from 0
        "save_image_to_folder": None,
    },
    "spectrum_translator": {
        "net_type": Inference.net_type.SPECTRUM_TRANSLATOR_ORIG_CYCLEGAN,
        "pth_to_onnx": 'models/spectrum_translation/experiment1_2_NIR2VIS.onnx',
        # First translates input image to grayscale and then translates spectra
        "input_as_avg_grayscale": False,
        # Translates newly translated image to averaged grayscale
        "output_as_avg_grayscale": False,
        # simply displays images
        "display_images": True,
        # Saves as <global idx_xount>_<id of face starting from 0>.jpg
        "save_image_to_folder": None,
    },
    "fer": {
        "net_type": Inference.net_type.FER_MOBILENET,
        "pth_to_onnx": 'models/fer/mobilenet_on_aff.onnx',
        "display_images": True,
        # Saves as <global idx_xount>_<id of face starting from 0>.jpg
        "save_image_to_folder": None,
    }
}
# debug flag is displaying images, verbose prints other secondary information such as time of inference etc.
inf = Inference(models, None, verbose=True)

Using '{'net_type': <net_type.FACE_DETECTOR_RETINAFACE: 'G'>, 'retina_face_align': True, 'remove_black_stripes': False, 'display_images': False, 'save_image_to_folder': None}' as face detector model
Using '{'net_type': <net_type.SPECTRUM_TRANSLATOR_ORIG_CYCLEGAN: 'B'>, 'pth_to_onnx': 'models/spectrum_translation/experiment1_2_NIR2VIS.onnx', 'input_as_avg_grayscale': False, 'output_as_avg_grayscale': False, 'display_images': True, 'save_image_to_folder': None}' as spectrum transfer model
Orig-CycleGAN model from 'models/spectrum_translation/experiment1_2_NIR2VIS.onnx' loaded.
Using '{'net_type': <net_type.FER_MOBILENET: 'E'>, 'pth_to_onnx': 'models/fer/mobilenet_on_aff.onnx', 'display_images': True, 'save_image_to_folder': None}' as FER model
model 'models/fer/mobilenet_on_aff.onnx' loaded


Infernece on images from AffectNetNIR - images from AffetNet translated to NIR.
Those images were not in the training dataset.

In [None]:
output = inf.infer_instant_from_folder('example_data/aff_nir/')

Those are mixture of training and testing data with the stripes. Experiment1.1 performs better on those data with stripes rather than data without stripes. On the other hand, Experiment1.2 performs better on data without stripes compared to this wit hstripes. 

In [None]:
output = inf.infer_instant_from_folder('example_data/train_nir/')

Images mostly without stripes

In [None]:
output = inf.infer_instant_from_folder('example_data/train_nir_wo_stripes/')

### Video example - FER

The inference loads the video and applies on it the models specified. This video feature displays FER results integrated right into the original video (models definitions must contain FER and face detections, else it does nothing).

In [1]:
from skeleton.inference import Inference

# define here which models to use. If u which not to use certain model, set its 'net_type' = None.
# If you want to apply model, pick from Inference.net_type defined above
models = {
    "face_detector": {
        "net_type": Inference.net_type.FACE_DETECTOR_CENTERFACE,
        # If using RetinaFace detector, when true, align faces.
        # Should not be used alongside 'remove_black_stripes' == True (artifacts wil be created)
        "retina_face_align": True,
        # Returned images have square dimenstions, however, detectors return rectangles,
        # thus there are black stripes on sides. If you wish to remove those stripes, set True.
        "remove_black_stripes": False,
        # Display detected images in notebook
        "display_images": False,
        # Save images - when loded from filepath saved as <original_filestem>_<face_idx>.<original_extension>,
        # else as <detection_id>.jpg starting from 0
        "save_image_to_folder": None,
    },
    "spectrum_translator": {
        "net_type": None, #Inference.net_type.SPECTRUM_TRANSLATOR_ORIG_CYCLEGAN,
        "pth_to_onnx": 'path/to/model.onnx',
        # First translates input image to grayscale and then translates spectra
        "input_as_avg_grayscale": False,
        # Translates newly translated image to averaged grayscale
        "output_as_avg_grayscale": True,
        "display_images": True,
        # Saves as <global idx_xount>_<id of face starting from 0>.jpg
        "save_image_to_folder": "transtmp",
    },
    "fer": {
        "net_type": Inference.net_type.FER_MOBILENET,
        "pth_to_onnx": 'models/fer/experiment2.2-mobilent-affnir.onnx',
        "display_images": False,
        # Saves as <global idx_xount>_<id of face starting from 0>.jpg
        "save_image_to_folder": None,
    }
}
# debug flag is displaying images, verbose prints other secondary information
inf = Inference(models, None, verbose=False)

2024-05-07 01:37:18.171109: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-07 01:37:18.225673: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-07 01:37:18.226894: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Using '{'net_type': <net_type.FACE_DETECTOR_CENTERFACE: 'F'>, 'retina_face_align': True, 'remove_black_stripes': False, 'display_images': False, 'save_image_to_folder': None}' as face detector model
Using '{'net_type': <net_type.FER_MOBILENET: 'E'>, 'pth_to_onnx': 'models/_old/mobilenet_NIR/mobilenet_on_AffectNet-NIR/mobilenet_aff_nir-aff_continue.onnx', 'display_images': False, 'save_image_to_folder': None}' as FER model


Function belwo calls the processingof the video. Method accepts input video path, output video path where it will be stored and the frequency of frames per second. Example input video is intentionally "doubled" for demonstration that inference can handle multiple people in the video.

In [2]:
inf.infer_video("example_data/multiemotions_example_doubled.mp4", "example_data/output2.mp4", 0.5)

Processing frames: 100%|████████████████████| 1630/1630 [00:41<00:00, 39.07it/s]
