Google Summer of Code 2014 Ideas List for NuPIC

Richard Crowder edited this page Dec 23, 2015 · 1 revision
Clone this wiki locally

This is a list of ideas for projects students could attempt with NuPIC for Google Summer of Code 2014.

For more information about NuPIC and what it's about see numenta.org.

For further reference, Google's response to our GSoC 2014 application and things to improve.

1. C++ Client for nupic.core

Create a C++ client that uses the nupic.core API. The client should provide some usability enhancements for specific use-cases like streaming data from files and printing predictions to stdout or plotting them. This task does not require creating language bindings.

Expected Outcome

A repository on Github called nupic.c++ that contains the basic client with instructions for usage and documented code.

Knowledge Prerequisite

2. Create a language binding for nupic.core

As described in our core extraction plan, nupic.core will support language bindings. This is an opportunity to create a language binding for NuPIC in your favorite development environment.

Some interesting target languages are:

  • java
  • javascript
  • .net
  • ruby
  • ROS
  • Matlab

Expected Outcome

A library in a programming language of your choice that gives access to the complete nupic.core API.

Knowledge Prerequisite

  • nupic.core API
  • Experience with C++ language bindings (or a willingness to learn)
  • Working knowledge of the target language

Mentor: Marek Otahal (~breznak)

3. Create a new client library for nupic.core

Once a language binding exists for a particular environment, client libraries can be created that use the binding. The client should use the language binding and provide its own API for a specific use-case or domain. One example of a current client is the Online Prediction Framework in python.

Expected Outcome

An API for NuPIC in the target language the addresses a specific use-case of NuPIC, or a specific domain in which NuPIC could be used (NLP or streaming data, for example).

Knowledge Prerequisite

  • Working knowledge of target language
  • Use-case for the client
  • Understanding of the CLA White Paper

Mentor: Matt Taylor

4. Implement the new temporal pooling and apply the modified NuPIC to action recognition in videos

Action recognition in videos is a very difficult problem in computer vision and machine learning. Some of the reasons are that the current state-of-the-art approaches are unable to make use of the sequential structures of videos as well as none have the support for temporal pooling. NuPIC with temporal pooling support does have the potential to overcome those two challenges.

Expected Outcome

If temporal pooling works, NuPIC should be able to achieve better results than the current state-of-the-art.

Knowledge Prerequisite

Mentor: Chetan Surpur

5. Create a compelling NuPIC demo

NuPIC is currently fairly new to open source and one of its aims for this year is to attract new users. However it currently lacks a really good demonstration application to show off its abilities.

A good demo could focus on NuPIC's differences from other machine learning technologies such as adapting to streaming data. Or it could just be cool. Look at the past hackathon projects for ideas.

This is a fairly open ended project which can be adapted to the expertise of the student.

Expected Outcome

One or more working demo applications to impress new users and help them understand what NuPIC can do.

Knowledge Prerequisite

  • Knowledge of the strengths/features of NuPIC/CLA
  • Familiarity with the NuPIC code base, or willingness to become familiar

Mentor: Matt Taylor

6. Create tools for visualizing and understanding the algorithm behind NuPIC

The algorithm underlying NuPIC (the Cortical Learning Algorithm) is powerful and has a lot of potential. However it is still in development and currently it is complex to understand; making it hard to both develop and use.

This project would look to implement a faster way to get an understanding of the CLA. Such a thing could include visualizations, interactive control and ease of experimenting. Something along with the ideas of Bret Victor - http://worrydream.com/LadderOfAbstraction/. There currently exists a tool called Cerebro (here's a video introduction) for some visualization, but it could be enhanced or replaced with a better tool.

Expected Outcome

A set of tools or an environment for experimentation to help understand how the CLA behaves.

Knowledge Prerequisite

  • A good knowledge and understanding of the current CLA (CLA White Paper)
  • Familiarity with the NuPIC code base

7. Update the nupic-setup.chef repo to support non-CentOS systems

  • General Background: nupic-setup.chef is a Chef cookbook that updates CentOS 6.* or Amazon Linux machines to have all the pre-requisites for NuPIC development.

  • Background Specific to the task: Basic Chef knowledge, and basic administrative experience with a RedHat based Linux and at least one other Linux flavor (Ubuntu/Debian/Mint and/or Gentoo). Preferably, OS X experience as well.

Expected Outcome

An updated nupic-setup.chef cookbook that can be used with chef solo to install Nupic's prerequisites on CentOS, Amazon Linux, and at least one other Linux flavor.

Knowledge Prerequisite

  • Basic Chef
  • Basic system administration skills

Mentor: Joe Block

8. Update the nupic-setup.chef repo to include live machine tests

  • General Background: nupic-setup.chef is a Chef cookbook that updates CentOS 6.* or Amazon Linux machines to have all the pre-requisites for NuPIC development. There are currently no tests to ensure that a machine is still compliant with NuPIC's pre-requisites. Agamotto is a Python module that makes it easier to test running systems. We would like to add a test suite that can be installed by nupic-setup.chef so that when users have problems with their NuPIC machines they can run the test suite and get a preliminary health check before emailing the list for help.

  • Background Specific to the task: Basic Python programming skills and basic Chef knowledge.

Expected Outcome

A set of Agamotto tests to run on a machine to confirm it is still providing all the prerequisites needed by Nupic.

Knowledge Prerequisite

  • Basic Chef
  • Basic Python
  • Basic system administration skills

Mentor: Joe Block

9. Implement performance benchmarks for NuPIC

  • General Background: We would like to add a set of Python scripts that characterize the speed and scalability of NuPIC and the algorithms under various situations. Ideally we would also implement C++ benchmarks.
  • Background Specific to the task: Basic Python skills Basic C++ programming skills for C++ benchmarks.

Expected Outcome

A set of platform independent Python scripts that run all the benchmarks and output a performance report. (Optional) C++ code that runs C++ specific benchmarks and outputs a performance report.

Knowledge Prerequisite

  • Python skills
  • Basic familiarity with the CLA algorithms will be helpful but not required.
  • C++ skills required for implementing C++ benchmarks.

Mentor: Subutai Ahmad

10. Benchmark NuPIC CLA with other Machine Learning techniques on standard datasets

The HTM/CLA theory is a novel approach to machine learning (ML) and even the NuPIC implementation is still developing rapidly. To gain more interest from the academia, and for potential users, it is important to evaluate real performance of NuPIC CLA (*) with other (the best, most commonly used) ML algorithms being used, and know performance on the standard datasets.

Example ML algorithms to compare could be: SVMs, (recurrent)NN, echo-state NN, LTST-mem (long-term short-term memory), HMMs, Rule-based- forests of classification trees, anomaly detection, ...

Interesting datasets could be: the "Iris dataset" (classification, 100 features), sequence mining datasets, "shopping cart" analysis data, signal predictions (medical, financial,..); possible with more work even extending NuPIC to domains it does not cover yet (vision data)-> digit recognition,...

Expected Outcome

Repository with code that allows to run NuPIC and the other ML algorithms (can use existing libraries) on chosen datasets.

Time needs to be taken to select appropriate ML algorithm for given datasets, tune both CLA and other ML algorithm for "best" results on given domain, and evaluate the results.

Summary of the experiments in a form of a wiki page or a paper (pdf).

Knowledge Prerequisite

  • good knowledge of Artificial Intelligence/Machine Learning to be able to choose appropriate datasets, and best algorithms, tune them and interpret results
  • basic programming needed to implement the experiments
  • (optional) medium C++/Python programming for extending NuPIC on new domains (vision, ..)

Mentor: from dept. Cybernetics, CTU

11. Implement, document and evaluate hierarchical CLAs in NuPIC

Hierarchies of CLAs play an integral role in the HTM/CLA theory. However the state of implementation in NuPIC is currently incomplete. The task would be to explore the existing C++ (Link) code we have, finalize it, and provide documentation and examples of creating and running hierarchies of CLA, provide some benchmarks of performance.

Expected Outcome

Submit (and have it reviewed and accepted) PRs to NuPIC that implement support for forming hierarchies of CLAs (some older code already present in NuPIC), provide documentation and examples of hierarchical CLAs. Provide performance of running the hierarchical CLA.

Knowledge Prerequisites

  • good C++ programming (to review and optionally finalize code for hierarchies)
  • medium Python (to write tests and examples)
  • knowledge of the HTM/CLA theory CLA White Paper

Mentor: Marek Otahal (~breznak)

12. Research connection capabilities between CLAs and biological (wet) neural networks (and spiking ANNs)

This is more of a research idea. As HTM/CLA is a very strongly nature-human neocortex inspired model, it would be interesting to evaluate (theoretically and on practical real data) its capability to communicate with real (human, wet, biological) neural networks. This could be very interesting for medical/"cyborg" research.

Expected Outcome

  • provide capability to operate between two different neural networks models (CLA-spiking NNS) - an encoder (?)
  • model the communication between CLA and "biologically complex, accurate NNs" (SNNs, other advanced architectures, project BlueBrain) artificial NN models
  • model ability of CLA to process biologically obtained (from a neural field brain probe) data
  • write a paper about the research

Knowledge Prerequisites

  • strong background in AI and biology
  • good knowledge of HTM/CLA theory
  • (medium) programming needed to conduct the experiments
  • proactive attitude (to obtain the data, etc)

Mentor: from dept. Cybernetics, CTU

13. Refactor nupic.core for a flow based programming model

So that it behaves more like flow based programming components or Lego pieces that can be snapped together.

  • General Background: Nupic consists of three main parts. Encoders, regions and classifiers (henceforth named components). Encoders are like our sensory organs, regions are the layers in the brain and the classifier essentially gives meaning to the streams of data.
  • Background Specific to the task: Nupic.core has a graph component manager written in C++. It manages execution of the components. One needs to explicitly specify the types of links between components (2D or 3D). Each of these links are bounded buffers and are setup at compile time.
  • Goal: These bounded buffer links should be implicit to the linking of components. At no point does one mention a link(edge). All one does is say the output of this component goes to the input of that component. Components should know what type of link they need (2D or 3D) at runtime. Secondly these components should be able to talk over TCP/IP networks via sockets (preferably use ROS for that).

Expected Outcome

The graph component manager class is no longer part of nupic.core and is no longer used. Components may talk to each other without explicitly mentioning the types of links, and will communicate via sockets/ROS. Link information is determined at runtime. The components can be used separatedly, with a mix of "other components" and for all sorts for connections.

Knowledge Prerequisite

Mentor: Marek Otahal (~breznak)