Creating Applications

Matthew Taylor edited this page May 23, 2016 · 60 revisions
Clone this wiki locally

Contents


☝︎ Table of Contents

Introduction

So, you want to build an application with NuPIC? This page provides practical guidance and advice for the budding NuPIC application developer.

Requirements

This reference is meant to support NuPIC application development, so you will need to have NuPIC installed. For some sections below, you should have some Python experience. However, if you don't have Python experience, you can do a lot by use the HTM Engine, which allows you to create your own NuPIC client application in any programming environment you like.

For some apps, you may need to run a Swarm, which requires that you have MySQL installed and running. In general, individual tools listed below may have additional requirements, so be sure to check their README pages for installation instructions.


☝︎ Table of Contents

Data

The nature of the brain and HTM dictate the best form of data for NuPIC: streaming temporal data. HTM works best with data having a strong temporal component that changes continuously. Some examples:

Note that every example above has changing values of data over time. The patterns in this data are temporal patterns. For example, the energy consumption of a building will have obvious daily and weekly patterns, depending on the purpose of the building. These patterns can be learned by NuPIC models and acted upon given NuPIC's predictions of future data and anomaly indications.

Getting Data

It can be difficult to find high quality streaming temporal data in a format that is easy to consume. The NuPIC community has been collecting possible data sources for a long time. Below are some resources we've put together to help provide data for NuPIC applications.

River View

River View

River View collects publicly accessible temporal data streams over time and caches them in an easy-to-query interface. Many public data streams only expose real-time data, neglecting to allow the querying of historical data. River View caches data from the source, making near-real-time data available as well as up to 3 months of historical data. The data.numenta.org River View instance is currently caching nearly 7,000 data streams, including:

These are just some examples of existing "Rivers" available for anyone to consume. Anyone can add a new River to River View by following these instructions. As soon as the new river is merged into the codebase and redeployed to http://data.numenta.org, the River will start collecting data. Give it a few weeks and see what kinds of temporal patterns you can uncover!

If you think you might need to set up a River to collect data for your NuPIC application, don't delay! We are accepting pull requests at https://github.com/nupic-community/river-view. The sooner you get your River started, the more accumulated data you'll have to use.

Other Public Data Sets

There are a lot of potential public data sources that have not been exploited. We've been keeping a list of them at Data Sets for NuPIC.

Portable Device Sensors

There are a plethora of hackable sensor devices today with APIs for getting streams of live data. Here are a few examples:

Spark also offers a data service much like River View that allows Spark device users to upload their own data sets for anyone to use: https://data.sparkfun.com/streams/


☝︎ Table of Contents

NuPIC APIs

NuPIC exposes two primary interfaces: the Network API and the Online Prediction Framework (OPF). The OPF is an easier-to-use interface, but compromises the flexibility to create new network structures. The Network API is a lower-level API designed to allow users the power and flexibility to construct hierarchical structures of nodes.

The OPF takes a typical network design and wraps it in a model class so it can be easily created. Many of our examples use the OPF because it is easier to set up new experiments.

There is also a third API for Swarming, which allows users to find the best model parameters for a particular data set using a particle swarm optimization algorithm. This API is not always needed, however, so it is listed in the Advanced section.

The Online Prediction Framework (OPF)

The OPF is a Python-only convenience library that uses the Network API. The primary classes it exposes to users are the Model and the ModelFactory.

Online Prediction Framework (OPF) is a framework for working with and deriving predictions from online learning algorithms, including HTM. OPF is designed to work in conjunction with a larger architecture, as well as in a standalone mode (i.e. directly from the command line). It is also designed such that new model algorithms and functionalities can be added with minimal code changes.

More complete documentation on the OPF can be found on the Online Prediction Framework wiki page. Here are some examples of applications using the OPF interface:

The Network API

Thumbnail of Network API video

The Network API interface is defined in C++, but also exported in a Python interface. So it can be used from either C++ or Python.

An HTM Network is a collection of Regions that implement HTM algorithms and other algorithms. The Network Engine allows users to create and manipulate HTM Networks

See the Network API wiki page for a complete description. Examples of Network API usages can be found at examples/network.


☝︎ Table of Contents

Tools

These tools exist outside of the core NuPIC codebase. They are not HTM implementations, but they can help users to create HTM applications.

HTM Engine

HTM Engine is a framework used for creating and running hundreds of NuPIC anomaly detection models simultaneously. It manages memory and CPU usage by serializing models to disk when they are inactive. This means models only utilize system resources when they are learning new data and returning anomaly indications.

First, some disclaimers. HTM Engine:

  • only runs anomaly detection models
  • does not generate predictions, only anomaly scores and likelihoods
  • provides a model interface that can only monitor one field of data per model

That said, HTM Engine is a very useful tool for anomaly detection problems when hundreds of potential scalar metrics are involved because it allows you to easily stand up a server that handles scaling models automatically.

See the README for complete installation instructions, including a tutorial screencast (mentioned below).

Cortical.IO

Cortical.IO is a partner of Numenta. Their basic API provides SDRs (called fingerprints) for words, sentences, and paragraphs. They have additional services for more advanced processing. The demos on their website are comprehensive. You can sign up for a free API key for experimentation.

There are two Python clients for the Cortical.IO API:

The SDRs returned from Cortical.IO's APIs can be used directly by NuPIC if passed into the temporal memory module or temporal pooler. You can see an example of this in the Fluent library.

Fluent

Fluent is a platform for building language / NLP-based applications using NuPIC and Cortical.io's API. This project is currently in research-mode, so it is not stable enough to promote to NuPIC application developers yet. We have future plans to release a version of Fluent that does not depend on nupic.research.

Details coming soon...


☝︎ Table of Contents

Example Apps

The NuPIC community keeps several experiments and example applications on the NuPIC Community GitHub Organization. Anyone interested in sharing their example application is welcome to chat with us on HTM Forum or our Gitter chat room.

Reference Apps

These are sample applications that don't include detailed tutorials. They are more for reference. You might want to use them as examples for how to do certain things with NuPIC.

Taurus

"Taurus" is the development code word for Grok for Stocks.

Grok for Stocks is an example HTM application that continually monitors hundreds of publicly traded companies and alerts you if something unusual is happening to any of them. Grok for Stocks uses HTM machine intelligence algorithms to model stock price, stock volume, and Twitter data related to 200 of the largest publicly traded companies. Companies monitored include Apple, Google, Amazon, and Starbucks. Grok for Stocks is a mobile application that runs on Android-based phones.

This complete application is available for free download on the Google Play Store for Android devices.

The complete source code for Grok for Stocks includes several components:

taurus

A server application that implements HTM Engine for the purpose of collecting and reporting on company metrics. Custom metrics are used for Stock Price, Stock Volume, and Twitter handle tweet volume. A RESTful API is provided to support the Taurus Mobile application.

taurus.monitoring

Implements several monitors of the Taurus infrastructure and a supporting database.

taurus.metric_collectors

Implements metric collection agents for twitter and xignite data sources which forward data to a running Taurus instance.

taurus-mobile

Application-specific Android source code for the mobile app.

mobile-core

The Mobile App is composed of reusable components that are used by all Numenta mobile client applications.

HTM for IT

"Grok" is the development codeword for HTM for IT.

The complete source code for HTM for IT includes several components:

grok

Grok is an application for monitoring IT infrastructure and notifying on anomalous behavior. This is the server running HTM Engine.

grok-cli

This repository contains the Grok Command line interface (CLI). grokcli allows you to easily interact with a Grok server through the command line including creating instances, etc.

grok-mobile

Application-specific Android source code for the mobile app.

mobile-core

The Mobile App is composed of reusable components that are used by all Numenta mobile client applications.

Rogue Behavior

Metrics collection agent for the Numenta Rogue showcase application and consists of two primary components: A long-running metric collection agent, which periodically polls various metrics and records the results to a local database, and a separate process for forwarding metrics to a Grok server for analysis.

Audio Signal Analysis

A couple of projects may be of interest in this area:

Skeleton HTMEngine App


☝︎ Table of Contents

Tutorial Apps

Hot Gym

The "hot gym" sample application has been around for a long time, and was one of the first real-world applications of NuPIC that proved the value of cortically-inspired learning algorithms. The data used is real energy consumption data from a gym in Australia, which simply contains a timestamp and float value for energy consumption.

This collection of tutorials uses the "Hot Gym" premise to illustrate many ways users can set up and run a NuPIC application against real-world data.

HTM Engine Traffic Anomalies

YouTube Screenshot of HTM Engine Tutorial

This tutorial application comes complete with an instructional video, fully commented example codebase, and a runtime that pulls live data from River View.

Geospatial Tracking

Shows examples of geospatial anomaly detection using canned and manually recorded GPS paths.

NuPIC Geospatial Tracking Application Tutorial

Predicting a Sine Wave

Predicting Sine Waves with NuPIC

Not the most practical example, but an example nonetheless.


☝︎ Table of Contents

Ideas

Looking for inspiration? Check out these videos of demos from our previous hackathons:

Additionally, here are some topical suggestions for projects:

Biometric

  • Look for anomalies in human heartbeats. You can get this data from sound recordings (there are more online if you poke around), electrocardiograms (EKG), or other heart monitoring devices. Try it identify the onset of an irregular heartbeat.
  • Electroencephalogram (EEG) data from devices like OpenBCI or Muse.
  • Collect accelerometer data from human movements. Attempt to classify the movements.

Geospatial

  • Attach a location-tracking device to a pet. See if you can correlate high anomaly indications with abnormal behaviors.
  • Attach a GPS device to your car to identify anomalies in your routine.
  • Track satellites through space and identify anomalies in their movements.

Earth Science

Human Behavior


☝︎ Table of Contents

Advanced

Swarming

Swarming is a process that automatically determines the best model for a given dataset. By "best", we mean the model that most accurately produces the desired output. Swarming figures out which optional components should go into a model (encoders, spatial pooler, temporal pooler, classifier, etc.), as well as the best parameter values to use for each component.

Swarming in NuPIC


☝︎ Table of Contents


Learning NuPIC Contributing to NuPIC