You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several of the real-world applications that are desired of the DepthAI platform are actually series or parallel (or both) combinations of neural networks with regions of interest (ROI) passed from one network to one or more subsequent networks.
The Myriad X is hardware is capable of multi-stage neural inference in parallel with computer vision functions, disparity depth, video encoding, etc. but no system exists to be able to easily use this functionality to solve real-world problems. If a user can modularly piece these together (i.e. in a pipeline builder), this gives super-interesting capabilities, and example of which is below for sports filming:
Detecting action in a scene (neural inference, say detecting where a soccer ball is)
Automatically tracking the action (say tracking the ball)
Automatically digitally zooming (Digital PTZ Support (Lossless Zoom) #135) using the 12MP camera dynamically (lossless zoom up to 6x while producing 1080p encoded video). (say running motion detection and only encoding the subset of the video that has the motion… in sports, no motion probably means no action)
Running parallel neural on ball/player detection and tracking them in 3D space - to produce game statistics of total distance traveled of the ball in miles, each player, etc.
Running re-identification (neural inference-based) on players as they move (and occlude eachother) so that each player is tracked individually.
So this is just an example of how the pipeline builder can be used to string together really interesting functionalities. The core value of the builder is that it would allow many hardware/firmware capabilities to be strung together in series/parallel combinations to solve real-world problems easily:
Neural inference (e.g. Object detection, image classification of the ROI of a detected object, etc.)
3D object localization (both monocular object detection plus stereo depth and stereo neural inference supported)
Digital zoom (leveraging the full 12MP sensor resolution... which is 6x full 1080p streams)
Background subtraction
Feature tracking
Motion estimation
Arbitrary crop/rescale/reformat and ROI return
In many of these pipeline flows of multiple nodes, there is need for custom rules and logic between nodes (e.g. filtering out which ROI 'make the cut' for the next stage. And in many cases, the pipeline is not doable without these rules as the rules are often a key implementation of a-priori knowledge by the designer, without which, the solution is not tractable.
So as such, having support for custom code/functions/etc. to enable rules is a critical feature. And the support of this feature is equally necessary when DepthAI is used with or without a host.
DepthAI used with host
When using DepthAI and megaAI with a host, having the capability to implement these rules/functions/etc. on the host is very convenient. As then the engineer can leverage the full convenience of the host for running rules, functions, and even CV capabilities.
To most flexibly facilitate this, architecting the pipeline builder such that every node (including the camera node(s)) can support (optionally) sending its output to the host and (optionally) receiving it is a key capability of such a pipeline builder.
Importantly, such a capability for each node to send/receive information from the host also enables easier development work-flows:
Debugging (testing each node for accuracy/performance by itself)
QA (capability to test thousands (or millions) of images through the whole pipeline, or parts of it, from existing datasets)
Model refinement and accuracy testing (being able to test the node accuracy fully on the hardware, after conversion, in a quantitative way)
Visualization (being able to see on a computer the output of each stage to easily see how things are looking in each stage)
DepthAI used without host (i.e. embedded use-case)
When there is no host present - for example when DepthAI is running completely standalone and directly actuating IO or communicating over SPI/UART/I2C - it is still equally necessary to allow such rules/custom code/etc.
To support this, the capability for the user to run arbitrary code on DepthAI (as nodes) is critical.
It is worth noting that when using DepthAI without a host in deployment, one could still use the with host above for debugging, while still running the full embedded flow.
The how:
To support such arbitrary pipeline builds in both with-host and without-host use-cases, we architect the pipeline builder to support every node to send data to/from the host and for CPython code to be run directly as nodes.
Integrating this, we have settled on the following approach, which breaks into 3 modalities of nodes that are used in the pipeline builder to solve embedded CV/AI problems and leveraging this information to interact with the physical world.
Node modalities:
Fast, easy, limited flexibility: So the list accelerated blocks above like neural inference, 3D object localization, etc. These come pre-packaged and are trivial to make use of. But they often need application-specific logic between them, hence modality 2. And if your CV algorithm isn't on that list (or maybe you've invented your own proprietary, and you need it to run performantly on the DepthAI, see modality 3.
Slow, easy, quite flexible: CPython bindings for scripts running direct on DepthAI as a node (issue Scripting Support on DepthAI #207).
This allows you to have custom rules on metadata from neural inference results, write custom protocols that run on-chip as part of the pipeline, communicate with sensors/actuators or other systems over SPI, UART, I2C, etc. based on pipeline results, etc. For example you can make rules that make sense of neural-inference metadata, which then control performant crop/resize/reformat to connect layers of accelerated CV functions.
Fast, hard, quite flexible: OpenCL (here), G-API (more details soon) and ML Frameworks for Vectorized math are used to compile custom computer functions to run performantly on the SHAVES in DepthAI. So you can take your computer vision function, write it in OpenCL, G-API, or say in PyTorch, and drop it as a node in the pipeline builder. So this supports custom algorithms, including proprietary algorithms, to be hardware accelerated in the pipeline as a node. And the pipeline builder leverages the hardware accelerated crop/rescale/reformat to match inputs and outputs. This could even be used for non-CV functions for example be used to run custom arbitrary mathematical functions on audio data brought in via CPython over I2C. For an EXCELLENT example of how to run custom CV code on depthai using PyTorch, see this guide by Rahul Ravikumar.
The what:
If we support the following with our pipeline builder it seems it would be sufficiently flexible.
So implement a pipeline builder which can be used to implement the flows below.
Detects people, vehicle, bikes, and then runs person attributes and person re-idenfitication on the ROI of detected people.
UPDATE 16 March 2020: ArduCam actually implemented this, here and we have our WIP version here. (We started before we realized ArduCam had already produced this example!)
Does face detection, ROI of which goes to both head pose estimation and facial landmarks.
The outputs of head pose estimation and facial landmarks are passed to the gaze estimation model
UPDATE Oct 23 2020: Initially implemented in Gen2, here
Of the examples on the OpenVINO repository, the following seems like it should not be implemented, as it’s the only one that does series, parallel, and output of parallel back to a single model. So it seems much more specialized.
This will then cover the following items which were previously independently on the DepthAI roadmap:
Get two-stage face detection and following age-gender or emotion working (prototype here)
Person detection, tracking, and reidentification.
Add capability to run multiple neural networks in parallel (prototype here)
Integrate face detection and identification AP with Python API (e.g. here)
First step: without depth
Second step: with depth.
Most common compliment to object detection
Be able to run multiple models in sequence (e.g. facial detection -> facial landmark -> landmark tracking) (prototype here)
This is different than multiple-output tensor. (which is already implemented, PR here)
This smart motion (DepthAI and megaAI 'SmartMotion' Feature #132) sort of pipeline, here, which is using motion detection to determine what subset of an scene to pass into object detection, followed by object tracking on the detected object detection
Start with the
why:Several of the real-world applications that are desired of the DepthAI platform are actually series or parallel (or both) combinations of neural networks with regions of interest (ROI) passed from one network to one or more subsequent networks.
The Myriad X is hardware is capable of multi-stage neural inference in parallel with computer vision functions, disparity depth, video encoding, etc. but no system exists to be able to easily use this functionality to solve real-world problems. If a user can modularly piece these together (i.e. in a pipeline builder), this gives super-interesting capabilities, and example of which is below for sports filming:
So this is just an example of how the pipeline builder can be used to string together really interesting functionalities. The core value of the builder is that it would allow many hardware/firmware capabilities to be strung together in series/parallel combinations to solve real-world problems easily:
In many of these pipeline flows of multiple nodes, there is need for custom rules and logic between nodes (e.g. filtering out which ROI 'make the cut' for the next stage. And in many cases, the pipeline is not doable without these rules as the rules are often a key implementation of a-priori knowledge by the designer, without which, the solution is not tractable.
So as such, having support for custom code/functions/etc. to enable rules is a critical feature. And the support of this feature is equally necessary when DepthAI is used with or without a host.
DepthAI used with host
When using DepthAI and megaAI with a host, having the capability to implement these rules/functions/etc. on the host is very convenient. As then the engineer can leverage the full convenience of the host for running rules, functions, and even CV capabilities.
To most flexibly facilitate this, architecting the pipeline builder such that every node (including the camera node(s)) can support (optionally) sending its output to the host and (optionally) receiving it is a key capability of such a pipeline builder.
Importantly, such a capability for each node to send/receive information from the host also enables easier development work-flows:
UPDATE 20 Nov. 2020:: The first example of this host-integrated use-case is here: https://github.com/luxonis/depthai-experiments/blob/master/gaze-estimation
DepthAI used without host (i.e. embedded use-case)
When there is no host present - for example when DepthAI is running completely standalone and directly actuating IO or communicating over SPI/UART/I2C - it is still equally necessary to allow such rules/custom code/etc.
To support this, the capability for the user to run arbitrary code on DepthAI (as nodes) is critical.
It is worth noting that when using DepthAI without a host in deployment, one could still use the
with hostabove for debugging, while still running the full embedded flow.The
how:To support such arbitrary pipeline builds in both with-host and without-host use-cases, we architect the pipeline builder to support every node to send data to/from the host and for CPython code to be run directly as nodes.
Integrating this, we have settled on the following approach, which breaks into 3 modalities of nodes that are used in the pipeline builder to solve embedded CV/AI problems and leveraging this information to interact with the physical world.
Node modalities:
Fast, easy, limited flexibility: So the list accelerated blocks above like neural inference, 3D object localization, etc. These come pre-packaged and are trivial to make use of. But they often need application-specific logic between them, hence modality 2. And if your CV algorithm isn't on that list (or maybe you've invented your own proprietary, and you need it to run performantly on the DepthAI, see modality 3.
Slow, easy, quite flexible: CPython bindings for scripts running direct on DepthAI as a node (issue Scripting Support on DepthAI #207).
This allows you to have custom rules on metadata from neural inference results, write custom protocols that run on-chip as part of the pipeline, communicate with sensors/actuators or other systems over SPI, UART, I2C, etc. based on pipeline results, etc. For example you can make rules that make sense of neural-inference metadata, which then control performant crop/resize/reformat to connect layers of accelerated CV functions.
Fast, hard, quite flexible: OpenCL (here), G-API (more details soon) and ML Frameworks for Vectorized math are used to compile custom computer functions to run performantly on the SHAVES in DepthAI. So you can take your computer vision function, write it in OpenCL, G-API, or say in PyTorch, and drop it as a node in the pipeline builder. So this supports custom algorithms, including proprietary algorithms, to be hardware accelerated in the pipeline as a node. And the pipeline builder leverages the hardware accelerated crop/rescale/reformat to match inputs and outputs. This could even be used for non-CV functions for example be used to run custom arbitrary mathematical functions on audio data brought in via CPython over I2C. For an EXCELLENT example of how to run custom CV code on depthai using PyTorch, see this guide by Rahul Ravikumar.
The
what:If we support the following with our pipeline builder it seems it would be sufficiently flexible.
So implement a pipeline builder which can be used to implement the flows below.
UPDATE 26 December 2021: The docs for Gen2 are materializing here: https://docs.luxonis.com/projects/api/en/gen2_develop/
Example Neural Pipelines To support:
The OpenVINO security barrier demo (here).
Update 26 January 2021: Github issue for this example pipeline is Gen2 License Plate Detection and OCR oak-examples#47
Interactive Face Detection Demo (here)
Interactive Face Recognition Demo (here)
Cross Road Camera Demo (here)
Pedestrian Tracker (i.e. Person ReID here)
Text Detection and Recognition (OCR) (here)
Gaze Estimation (here and here)
Of the examples on the OpenVINO repository, the following seems like it should not be implemented, as it’s the only one that does series, parallel, and output of parallel back to a single model. So it seems much more specialized.
This will then cover the following items which were previously independently on the DepthAI roadmap:
To keep in mind, but maybe not support initially:
smart motion(DepthAI and megaAI 'SmartMotion' Feature #132) sort of pipeline, here, which is using motion detection to determine what subset of an scene to pass into object detection, followed by object tracking on the detected object detection