Skip to content

cpVis: Interactive visualization for change point exploration and labeling

Benjamin Bach edited this page Apr 9, 2019 · 5 revisions

Background

A changepoint is typically defined as a point in time where the distribution of a data-stream changes in a distinct manner, for example, typically one may look for changepoints in mean, and/or variance. Usually, this is performed in an unsupervised setting where we have no labelled examples of true changepoints. However, in practice, we usually have examples of periods of time where we know no changes should be present, or conversely where changes are expected to exist. When and where such information is available, we can potentially use this to aid our judgement of how to set complexity penalties in the changepoint estimation task, and thus, decide on an appropriate number of changepoints, a task which currently requires time-consuming parameter tuning by domain experts. Currently, this process is severely hampered by a lack of streamlined tools required for the task, namely, visualisation of changepoint solutions (across tuning parameters), interactive labelling of data-streams, and finally taking this feedback into account when learning penalty functions.

Taken together these components fall into recent efforts to produce explainable AI systems within the growing community of research that involves 'the human in the loop' to monitor and control complicated algorithms. Such approaches aims to complement the system's capabilities with the contextual domain knowledge, creativity, and decision making capabilities of humans. To help humans understand and control an algorithmic system, interactive visualizations provide a range of potential while leveraging humans' capabilities of parallel and simultaneous perception, pattern detection, as well as exploratory analysis. In this particular project, we seek a simple visualization interface to support, i) human labeling of the data with the aid of several complementary measures on the data-stream such as as mean, trends, min, max, variance, etc. and ii) interactive exploration of the result space suggested by a changepoint detection algorithm. Eventually, any visualization will help communicating the data and respective decisions to peers and larger audiences in the form of reports, posters, slideshows, or open web-documentations.

Related work

Two key packages related to this work are discussed below:

penaltyLearning - https://cran.r-project.org/web/packages/penaltyLearning/index.html Provides a mechanism for learning penalty level for given univariate sequence and labelled changepoint regions. While the package provides a useful method to suggest an optimal penalty level for defining changepoint segmentation it is geared largely towards the genetics community. We aim to utilise the penalty learning method in this package, but integrate this in a more general labelling framework with an enhanced focus on visualisation and interaction. This will allow for quick user comparison between unsupervised and supervised changepoint methods.

changepoint - https://cran.r-project.org/web/packages/changepoint/index.html Provides various methods for segmenting individual time-series based on mean and variance. We plan to use the included methods (mainly PELT) to perform unsupervised segmentation, but extend the visualisation of such solutions to enable better interpretation of changepoint output. Experience with end-users suggests it is a time-consuming process to find an appropriate penalty parameter using these methods, in large part due to lack of coherent visualisation of solution paths. There is no labelled/supervised learning capability currently within the changepoint package. The final part of this proposal aims to examine changepoint detection in multivariate series, and use visualisation tools to help highlight changepoints which may be shared across data streams.s

A range of recent work that focuses on interactive visualization of AI systems [1] and can be summarized under the terms 'Explainable' or 'interactive AI'. Examples include interactive playgrounds such as TensorFlow Playground (http://playground.tensorflow.org) and Momentum (https://distill.pub/2017/momentum/), tools for interactive machine learning (https://learningfromusersworkshop.github.io/) as well as more story-like descriptions of studies and analysis cases (http://formafluens.io/client/mix10.html). A great variety of further tools and research is summarized online: http://visxai.io/program.html. More specific, tools such as SmallMultiples [2] and BayesPiles [3] use simple segmentation methods (far less sophisticated than changepoint detection in R) as a proof-of-concept to demonstrate interactive visualization approaches to detect states in temporal networks. In both cases, visualization is used to provide a user with a holistic view of the data (i.e., a time sequence of networks) including more specific information required to aid a user in making decisions about temporal states. Interaction is used complementary to automatic segmentation to allow a user to explore a found segmentation solution (explore states in the network) as well as to quickly refine an automatic solution by splitting and combining states. Finally, time curves [4] are a far more generic way to visualize changes over time, e.g., for multiple timeseries. To the best of our knowledge, no tool and visualization interface exists that allows analysts to explore the solution path of changepoint detection methods in simple and multiple time-series. Through this project, we will lay they foundations for interfaces and methods that enable changepoint detection across a variety of domains

Details of your coding project

The project outputs will take the form of an R package (tentatively named cpViz), documentation (via .Rd files), and a corresponding vignette in the form of a comprehensive R-markdown document.

Goals:

  1. Visualizing relevant information on time series, such as means, trends, etc. for a variety of number of changepoints defined via a penalty threshold.
  2. Allow for manual labeling of a time series so that researchers can label time points based on the patterns they see in the data, or prior knowledge.
  3. Use labelling (and max-min labelling) to learn “optimal” penalty levels and provide consistent changepoint/model specification.
  4. Extend 1 to work with multiple time series.

Milestone 1 (Goals 1-2):

  • Extend the visualisation methods included for outputs of the changepoint R package. Take the output object of this package as an input and produce visualisations across the solution path. This should implement visualisation of the segmented series (and mean/variance estimates) at individual points on the penalty-changepoint solution curve.
  • Enable a user to select a range of points on the solution path to visualise overlays of multiple segmentations. Can either be implemented via plotly or Rshiny.
  • Develop interactive visualisation for specification of labelled changepoints, generically these will be of the interval type, however, they may be intervals of size one, i.e. the changepoint is required to be exactly at a given time-point. This should also allow labelling of periods in which no changepoints occur. The labelling process should be interactive in that a user can drag labels to be overlaid on the raw-data stream. This can be enabled via development of a Shiny App. It will also allow for the user to save the labels once applied for future reference.
  • Create documentation set (roxygen)
  • Create an R markdown script demonstrating the above functionality
  • The above functionality should be packaged as an R package cpViz which will form the basis for the project outputs.

Milestone 2 (Goal 3):

  • Using the solution path provided by the changepoint package, we will allow a user to visualise the performance of segmentations when judged against the user labelled sequences.
  • A penalty threshold, as specified via the changepoint package will automatically be recommended to the user based on the empirical performance with respect to the labelling
  • Finally, we will use the penaltyLearning package to attempt to learn a penalty function and perform supervised segmentation on the same data. This can be contrasted with the unsupervised (but tuned) segmentation in steps 2.a, 2.b.
  • The documentation and R-markdown vignette should be appropriately extended and above steps included in the package.

Milestone 3 (Goal 4):

  • While there are packages for performing multivariate changepoint segmentation (i.e. ecp, inspect, and changepoint.mv) this project aims to focus on the applying relatively fast univariate segmentation to multivariate sequences and enabling the user to make their own judgement on grouped changepoints via visual inspection of the solutions. This will extend the work in milestone 1.a. 1.b to the multivariate setting.
  • An alternative way to visusalise high-dimensional (multivariate) data is to project it onto a 2-dimensional subspace. This enables one to intuitively follow the dynamics of the process and can aid interpretation and storytelling via state characterisation.
  • Package the multivariate visualisation tools and extend documentation and vignette.
  • Submit release version of the package and documentation to CRAN for general usage. Note: This may be initially limited to features developed in Milestones 1 and 2.

Expected impact

The envisaged package cpViz will enable the enhanced visualisation of solutions from the changepoint package as well as dynamic and interactive labelling and investigation of changepoint solutions. It will allow for quick comparison of unsupervised and supervised changepoint estimation methods. This should drastically cut the time required for end users to perform model tuning saving the valuable time of domain experts and increasing accessibility and useability of changepoint methods.

The project will give the student a chance to develop a coherent piece of packaged software with the goal of submitting the package to CRAN at the end of the project. This will enhance their overall software development capabilities, and give them experience of working within a group via the project mentors. Beyond developing the skills required to create advanced visualisations within R (RShiny, JS), the student will also gain enhanced understanding of changepoint methods, and the general optimisation tools used for implementing these methods.

In future, we plan to extend the capabilities of cpVis, especially with respect to multiple time series. Another application case we are currently planning is the analysis of dynamic networks, e.g., to detect and explore a variety of networks from social networks to brain connectivity networks. This summer project will help us building the necessary collaboration skills to apply for independent funding for further development at the interface of visualisation and exploratory statistical analysis.

Mentors

  • Alex Gibberd (a.gibberd@lancaster.ac.uk). Lecturer in Statistics at Lancaster University with expertise in changepoint and dynamic network analysis methodology. Teaches R via the MSc Data-Science masters programme.
  • Benjamin Bach (bbach@inf.ed.ac.uk). Lecturer in Design Informatics at the University of Edinburgh. Expertise in data visualization and human computer interaction, visual analytics, as well as visualization of temporal and relational data.
  • Dan Grose (dan.grose@lancaster.ac.uk). Research Software Engineer at Lancaster University. Maintainer for RcppEigenAD, IndepTest, anomaly, and changepoint.mv R packages.

References (if not already in the text)

[1] Li, Tianyi, Gregorio Convertino, Wenbo Wang, Haley Most, Tristan Zajonc, and Yi-Hsun Tsai. "HyperTuner: Visual Analytics for Hyperparameter Tuning by Professionals."

[2] Bach, Benjamin, Nathalie Henry‐Riche, Tim Dwyer, Tara Madhyastha, J‐D. Fekete, and Thomas Grabowski. "Small MultiPiles: Piling time to explore temporal patterns in dynamic networks." In Computer Graphics Forum, vol. 34, no. 3, pp. 31-40. 2015.

[3] Vogogias, Athanasios, Jessie Kennedy, Daniel Archambault, Benjamin Bach, V. Anne Smith, and Hannah Currant. "BayesPiles: Visualisation Support for Bayesian Network Structure Learning." ACM Transactions on Intelligent Systems and Technology (TIST) 10, no. 1 (2018): 5.

[4] Bach, Benjamin, Conglei Shi, Nicolas Heulot, Tara Madhyastha, Tom Grabowski, and Pierre Dragicevic. "Time curves: Folding time to visualize patterns of temporal evolution in data." IEEE transactions on visualization and computer graphics 22, no. 1 (2016): 559-568.

Clone this wiki locally