Skip to content
Dan Morales edited this page Mar 27, 2018 · 14 revisions

Background

Detecting changes in statistical properties of a time series is important in a large number of fields. The original motivation for changepoint analysis was in sequential process control, where a decision as to whether a change has occurred must be made in real time. With the increasing amount of sensor and similar online data, there is a clear need for online changepoint detection methods to be available through R packages.

Similarly more and more users of our changepoint and ecp packages are requesting online functionality.

Related work

There are many R packages available for offline changepoint detection but, to our knowledge, only one for online changepoint detection (cpm). Whilst this package implements traditional “resetting” methodology, whereby once a change has occurred previous data is forgotten. In contrast this project would bring the accuracy benefits of the offline methodology to the online setting, allowing users to implement the state of the art offline methods is a computational efficient manner for online use.

Details of your coding project

This project will create the changepoint.online R package. The package will mirror the functionality of the changepoint and ecp packages in terms of functionality but for an online setting. More specifically:

  • Setup a github repo for changepoint.online, with TravisCI for GNU/Linux testing, Appveyor for windows testing and Coveralls for code coverage.
  • Break the back end code for the PELT and ecp algorithms into initialization and update functions.
  • Write user facing functions and plotting tools.
  • Write some extensive test cases using testthat and building on tests in the changepoint and ecp packages. Goal: 100% coverage in both R and C code by the end of summer. If time allows, port applicable tests back to changepoint and ecp to increase coverage.
  • Can test on windows via win-builder.
  • Write a vignette describing how to use the package.
  • Create a shiny app demonstrating the functionality of the package for real time analysis.

Expected impact

The package will provide a new and important alternative to the “resetting” algorithms currently available. The package will also include parametric models which are not included in cpm. Additionally we have received a considerable number of requests for this functionality over the last year so we expect the package to be well used by the community.

Mentors

Students, please contact mentors below after completing at least one of the tests below.

  • Rebecca Killick <r.killick@lancs.ac.uk>
  • David Matteson <matteson@cornell.edu>

Tests

Easy: Download and install the changepoint and ecp packages. Write a for loop to analyze a data set with an increasing number of data points. Graph the ouput adding a new timepoint in each iteration of the loop and updating the best changepoint locations.

Medium: Fork the changepoint, changepoint.np, EnvCpt or ecp packages on github and write some new tests to increase the code coverage. Commit these back to the main repository.

Hard: Make your easy task into R functions, remembering to include checks on your code. Write a package which includes tests for your functions. Upload to github and link in TravisCI testing and code coverage via covr.

Solutions of tests

Students, please post a link to your test results here.


Name: Andrew Connell

Email: a.connell1@lancaster.ac.uk

University: Lancaster University

Course: BSc Mathematics and Statistics

Solution to Easy Test: Easy Test


Name: Daniel Morales

Email: dm9450@gmail.com

University: Instituto Tecnológico de Querétaro, MX

Degree: Computer Systems Engineering

Solution to Easy Test: Easy Test

Clone this wiki locally