## Example script illustrating asynchronous parallel execution of sound classification on audio stream and object detection on video stream
This notebook is an example how to use DeGirum PySDK to perform parallel inferences on two asynchronous data streams with different frame rates. To achieve maximum performance this example uses non-blocking batch prediction mode.

This script works with the following inference options:

1. Run inference on DeGirum Cloud Platform;
2. Run inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN;
3. Run inference on DeGirum ORCA accelerator directly installed on your computer.

To try different options, you just need to uncomment **one** of the lines in the code below.

You also need to specify your cloud API access token, cloud zoo URLs, and AI server hostname in [env.ini](env.ini) file, located in the same directory as this notebook.

**pyaudio package with portaudio is required to run this sample.**

**Access to microphone is required to run this sample.**

**Access to camera is required to run this sample.**

The script needs either a web camera or local camera connected to the machine running this code. The camera index or URL needs to be specified either in the code below by assigning `camera_id` or in [env.ini](env.ini) file by defining `CAMERA_ID` variable and assigning `camera_id = None`.

#### Specify camera id

In [None]:
camera_id = None         # camera index or URL; 0 to use default local camera, None to take from env.ini file

#### Specify where do you want to run your inferences

In [None]:
import degirum as dg, mytools

cloud_token = mytools.get_token() # get cloud API access token from env.ini file
cloud_zoo_url = mytools.get_cloud_zoo_url() # get cloud zoo URL from env.ini file

#
# Please UNCOMMENT only ONE of the following lines to specify where to run AI inference
#

# 1. Inference on the DeGirum Cloud Platform
zoo = dg.connect(dg.CLOUD, cloud_zoo_url, cloud_token)

# 2. Inference on DeGirum AI Server deployed on a localhost or on some computer in your LAN or VPN
# zoo = dg.connect(mytools.get_ai_server_hostname(), cloud_zoo_url, cloud_token)

# 3. Inference on DeGirum ORCA accelerator installed on your computer
# zoo = dg.connect(dg.LOCAL, cloud_zoo_url, cloud_token)

#### The rest of the cells below should run without any modifications

In [None]:
import mytools
import sys

In [None]:
# load YAMNET sound classification model for DeGirum Orca AI accelerator
# (change model name to "...n2x_cpu_1" to run it on CPU)
sound_model = zoo.load_model("mobilenet_v1_yamnet_sound_cls--96x64_quant_n2x_orca_1")

In [None]:
# load MobileNetv2+SSD object detection model for DeGirum Orca AI accelerator
# (change model name to "...n2x_cpu_1" to run it on CPU)
detection_model = zoo.load_model("mobilenet_v2_ssd_coco--300x300_quant_n2x_orca_1")

# set model parameters
detection_model.image_backend = 'opencv' # select OpenCV backend: needed to have overlay image in OpenCV format
detection_model.input_numpy_colorspace = 'BGR'

In [None]:
# set non-blocking mode for both models
sound_model.non_blocking_batch_predict = True
detection_model.non_blocking_batch_predict = True

In [None]:
audio_sampling_rate_hz = sound_model.model_info.InputSamplingRate[0]
audio_buffer_size = sound_model.model_info.InputWaveformSize[0] // 2 # two read buffers in waveform for half-length overlapping

with mytools.Display("Async Streams") as display, \
    mytools.open_audio_stream(audio_sampling_rate_hz, audio_buffer_size) as audio_stream, \
    mytools.open_video_stream(camera_id) as video_stream:
    
    # create prediction result generators:
    sound_predictor = sound_model.predict_batch(mytools.audio_overlapped_source(audio_stream, lambda: False, True))
    detection_predictor = detection_model.predict_batch(mytools.video_source(video_stream))
    
    sound_label = ""
    while True: # press 'x' or 'q' to abort
        
        # do asynchronous ML inferences for both models (each one can be None if not ready):
        sound_result = next(sound_predictor)
        detection_result = next(detection_predictor)

        # process sound classification result (just remember the text)
        if sound_result is not None:
            sound_label = f"{sound_result.results[0]['label']}: {sound_result.results[0]['score']}"
        
        # process video detection result (just display the annotated frame)
        if detection_result is not None:
            img = detection_result.image_overlay
            mytools.Display.put_text(img, sound_label, (1, img.shape[0] - 40), (0,0,0), (255,255,255))
            display.show(img)            