## AI Inference from many video files
This notebook is an example how to use DeGirum PySDK to do AI inference of multiple video streams from video files multiplexing frames. This example demonstrates lowest possible and stable AI inference latency while maintaining decent throughput. This is achieved by using synchronous prediction mode and video decoding offloaded into separate thread.

This script works with the following inference options:

1. Run inference on DeGirum Cloud Platform;
2. Run inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN;
3. Run inference on DeGirum ORCA accelerator directly installed on your computer.

To try different options, you just need to uncomment **one** of the lines in the code below.

You also need to specify your cloud API access token, cloud zoo URLs, and AI server hostname in [env.ini](env.ini) file, located in the same directory as this notebook.

#### Specify video file names, model name, and other options here

In [None]:
# video files to process
input_filenames = [ 
    "./images/Traffic.mp4", \
    "./images/Traffic.mp4", \
    "./images/Traffic.mp4", \
    "./images/Traffic.mp4" \
]

model_name = "mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1" # model to be used for inference

# options:
offload_preprocessing = True # True to do image preprocessing outside of inference call
do_image_compression = True # True to do JPEG compression before sending image for inference

#### Specify where do you want to run your inferences

In [None]:
import degirum as dg, mytools

cloud_token = mytools.get_token() # get cloud API access token from env.ini file
cloud_zoo_url = mytools.get_cloud_zoo_url() # get cloud zoo URL from env.ini file

#
# Please UNCOMMENT only ONE of the following lines to specify where to run AI inference
#

# 1. Inference on the DeGirum Cloud Platform
zoo = dg.connect_model_zoo("dgcps://cs.degirum.com" + cloud_zoo_url, cloud_token)

# 2. Inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN
# zoo = dg.connect_model_zoo((mytools.get_ai_server_hostname(), "https://cs.degirum.com" + cloud_zoo_url), cloud_token)

# 3. Inference on DeGirum ORCA accelerator installed on your computer
# zoo = dg.connect_model_zoo("https://cs.degirum.com" + cloud_zoo_url, cloud_token)

#### The rest of the cells below should run without any modifications

In [None]:
import cv2, numpy, time, threading, queue
from contextlib import ExitStack

In [None]:
# Define stream multiplexing source:
# it reads frames from given video files in round robin fashion
# and puts them into given queue.
# If offload_preprocessing is enabled, it also performs image resizing
def mux_source(streams, frame_queue, model):
    
    phase = 0 # stream multiplexing phase counter

    while True:
        ret, frame = streams[phase].read()
        if not ret:
            break
        
        if offload_preprocessing:
            # do image resizing
            frame = model._preprocessor.forward(frame)[0]
            
        frame_queue.put((frame, phase))

        phase = (phase + 1) % len(streams) # advance mux phase

    frame_queue.put(None) # send poison pill

In [None]:
with zoo.load_model(model_name) as model, ExitStack() as stack:
    # create model object in `with` block to avoid server disconnections on each frame inference

    model.image_backend = 'opencv' # select OpenCV backend: needed to have overlay image in OpenCV format
    model.input_numpy_colorspace = 'BGR'
    model._model_parameters.InputImgFmt = ['JPEG'] if do_image_compression else ['RAW']
    model.measure_time = True
    
    # open video streams
    streams = [stack.enter_context(mytools.open_video_stream(fn)) for fn in input_filenames]
  
    frame_queue = queue.Queue(maxsize=10) # queue to enqueue frames
    start_times = [] # list of frame starting times
    end_times = [] # list of frame result receiving times
    
    # start frame retrieving thread
    mux_tread = threading.Thread(target=mux_source, args=(streams, frame_queue, model))
    mux_tread.start()

    # initialize progress indicator
    steps = min([stream.get(cv2.CAP_PROP_FRAME_COUNT) for stream in streams])
    progress = mytools.Progress(steps * len(streams))        
    
    # inference loop
    start_time=time.time()
    while True:
        # get frame from queue
        frame = frame_queue.get()
        if frame is None:
            break # got poison pill: end loop
        
        # do inference and record times
        start_times.append(time.time())
        res = model(frame[0])
        end_times.append(time.time())
        
        progress.step()
 
    mux_tread.join()
        
    # print time statistics
    for s in model.time_stats().items():
        print(s[1])

In [None]:
# process latency times
end_times = numpy.array(end_times)
start_times = numpy.array(start_times)
latency_times_ms = (end_times - start_times) * 1000

print("\nLatency Histogram")
latency_hist = numpy.histogram(latency_times_ms)
for hval, bin in zip(latency_hist[0], latency_hist[1]):
    print(f"{bin:4.0f} ms:     {hval:4}")