Platform for Situated Intelligence
Platform for Situated Intelligence (or in short, \psi, pronounced like the greek letter) is an open, extensible framework for development and research of multimodal, integrative-AI systems. Examples include multimodal interactive systems such as social robots and embodied conversational agents, systems for ambient intelligence and smart spaces, applications based on small devices that work with streaming sensor data, etc. In essence, any application that processes streaming, sensor data (such as audio, video, depth, etc.), combines multiple (AI) technologies, and operates under latency constraints can benefit from the affordances the provided by the framework.
The framework accelerates the development of these applications by providing:
- a modern, performant infrastructure for working with multimodal, temporally streaming data
- a set of tools for multimodal data visualization, annotation, and processing
- an ecosystem of components for various sensors, processing technologies, and effectors
A high-level overview of the framework is available in this blog post. An on-demand webinar is also available containing a brief introduction and tutorial on how to code with \psi – here is the registration link.
12/07/2020: We have published a new beta release, version 0.14.35.3, which includes a new ONNX model runner for ImageNet models, new components for screen and window capture, updates to annotation editing in PsiStudio, as well as a number of bug fixes and updates -- see the full release notes for more details.
09/30/2020: We have added three additional samples: a basic HelloWorld sample illustrating the simplest starting point for a \psi application, a more complex one demonstrating how to do some basic audio capture and processing to construct a simple voice activity detector, and a third sample that combines information from Azure Kinect with Cognitive Services vision and speech to detect objects that a person is pointing to.
09/02/2020: We published a blog post with a high-level overview of the framework.
08/31/2020: We released version 0.13.38.2, which brings important updates to Platform for Situated Intelligence Studio (including data annotation), updates to the runtime to support 3rd party data store sources, and components for running ONNX models. See the release notes for a more complete description of updates.
The core \psi infrastructure is built on .NET Standard and therefore runs both on Windows and Linux. Some components and tools are more specific and are available only on one or the other operating system. You can build \psi applications either by leveraging \psi NuGet packages, or by cloning and building the source code.
A Quick Introduction. To learn more about \psi and how to build applications with it, we recommend you start with the Brief Introduction tutorial, which will walk you through for some of the main concepts. It shows how to create a simple program, describes the core concept of a stream, and explains how to transform, synchronize, visualize, persist and replay streams from disk.
Samples. If you would like to directly start from sample code, a number of small sample applications are also available, and several of them have walkthroughs that explain how the sample was constructed and point to additional documentation. We recommend you start with the samples below, listed in increasing order of complexity:
||This sample provides the simplest starting point for creating a \psi application: it illustrates how to create and run a simple \psi pipeline containing a single stream.||Yes||None|
||This sample captures audio from a microphone and performs voice activity detection, i.e., it computes a boolean signal indicating whether or not the audio contains voiced speech.||Yes||Microphone|
|WebcamWithAudio for Windows or Linux
||This sample shows how to display images from a camera and the audio energy level from a microphone and illustrates the basics of stream synchronization.||Yes||Webcam and Microphone|
||This sample implements a simple application that uses an Azure Kinect sensor to detect the objects a person is pointing to.||Windows-only||Azure Kinect + Cognitive Services|
Documentation. The documentation for \psi is available in the github project wiki. It contains many additional resources, including tutorials, other specialized topics, and a full API reference that can help you learn more about the framework.
If you find a bug or if you would like to request a new feature or additional documentation, please file an issue in github. Use the
bug label when filing issues that represent code defects, and provide enough information to reproduce the bug. Use the
feature request label to request new features, and use the
documentation label to request additional documentation.
We are looking forward to engaging with the community to improve and evolve Platform for Situated Intelligence! We welcome contributions in many forms: from simply using it and filing issues and bugs, to writing and releasing your own new components, to creating pull requests for bug fixes or new features. The Contributing Guidelines page in the wiki describes many ways in which you can get involved, and some useful things to know before contributing to the code base.
To find more information about our future plans, please see the Roadmap document.
Who is Using
Platform for Situated Intelligence has been and is currently used in several industry and academic research labs, including (but not limited to):
- the Situated Interaction project, as well as other research projects at Microsoft Research.
- the MultiComp Lab at Carnegie Mellon University.
- the Speech Language and Interactive Machines research group at Boise State University.
- the Qualitative Reasoning Group, Northwestern University.
- the Intelligent Human Perception Lab, at USC Institute for Creative Technologies.
- the Teledia research group, at Carnegie Mellon University.
- the F&M Computational, Affective, Robotic, and Ethical Sciences (F&M CARES) lab, at Franklin and Marshall College.
- the Transportation, Bots, & Disability Lab at the Carnegie Mellon University.
The codebase is currently in beta and various aspects of the framework are under active development. There are probably still bugs in the code and we may make breaking API changes.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
We would like to thank our internal collaborators and external early adopters, including (but not limited to): Daniel McDuff, Kael Rowan, Lev Nachmanson and Mike Barnett at MSR, Chirag Raman and Louis-Phillipe Morency in the MultiComp Lab at CMU, as well as researchers in the SLIM research group at Boise State and the Qualitative Reasoning Group at Northwestern University.