<img src="imgs/header.png" width="100%">

---------------

# Unit III: Probabilistic filtering
#### Inferring user intention in a noisy world
<b>[John Williamson](http://johnhw.com)</b> 

----

    All theorems are true. 
    All models are wrong. 
    And all data are inaccurate. 

    What are we to do? 
    We must be sure to remain uncertain.

-- *[Leonard A. Smith, Proc. International School of Physics ``Enrico Fermi", (1997)](http://www2.maths.ox.ac.uk/~lenny/fermi96_main_abs.html)* 

<img  src="imgs/Capture.PNG"/>
*A probabilistic filter-based gesture recogniser*

# Introduction 

-----------------

### What is probabilistic filtering?
One view on interaction is to see user intentions as **unknown values** which are partially observed through input sensors. The time series of inputs from the user only give a partial, noisy, incomplete view of intention inside the user's head. 

### Interaction model
<img src="imgs/brainspace.png" width="100%">

#### Probabilistic filtering in HCI
Probabilistic filtering **(PF)** tracks the evolution of some unknown variables [user intentions] given observed evidence [user input], in a way that is **robust**. Probabilistic filters infer a **distribution** over possible hidden (unobserved) variables, updating them over time. They are inherently **uncertain** (they represent degrees of belief) and **dynamic** (they explicitly model changing state over time).

Probabilistic filtering is an **inverse probability** approach, and it requires that we think of interaction from an unique perspective. We have to explicitly be able to write down:

* what we want to know (i.e. the **state space of intention**);
* how that will change over time (i.e. the **dynamics of intention**);
*  a model that *if we knew what the user intention was, what the expected behavior would be* (i.e. a **function mapping intention -> expected user inputs**).

Note that this last point is the **inverse** of the typical way of approaching this problem, where we would try and find a mapping from a sensors to intention, by design or by learning. 

### Why is this computational HCI?
Probabilistic filtering means writing down an **executable, statistical model** of user behavior, then **running an inference algorithm** that updates beliefs based on the way observations evolve. The **parameters** of the filter can be **learned from data**.

This has four key elements of computational interaction:
* an explicit mathematical model of user-system behavior;
* a way of updating that model with observed data from users;
* an algorithmic element that, using this model, can apply computational power to improving interaction;
* the ability to simulate or synthesize elements of the expected user-system behavior.

It satisfies the requirement that better interfaces can be achieved via:
* improved modeling;
* better data collection;
* more powerful algorithms;  
* or increased computational power, 

rather than the workhorses of traditional HCI:
* more design ingenuity;
* and stronger evaluation.


### What are competitive approaches?
* **Crafted mappings**, where we try to find (by hand) transforms from sensors to intentions that are  simple or obvious. **Example:** a button, which has two physical states, and maps on to two intentional states via two electrical states. Pushed down = current flows = user intended to switch on. The mapping from electrical states to intentional states is **designed.**

* **Machine learned mappings**, where we train a system to recognize a class of input patterns as being representative of an intended behavior. **Example:** Finger gesture recognizer; hundreds of examples of many users performing one of N multi-touch gestures are recorded. These are used to train a random forest to classify the intended gesture. The mapping from electrical states (capacitive sensors) to intentional states is **learned**.

### Benefits
* **Robustness to noise** PFs work well even with input sensors that are noisy.
* **Robustness to poorly specified models** PFs can cope predictably even if our models are bad.
* **Robustness to intermittence** PFs can continue to sensibly interpolate when input cuts out.
* **Uncertainty estimates** PFs *know how certain they are* and this can be used in the interaction design.
* **Decoupled from real-time** PFs can infer past (smoothing), present (filtering) and future (forecasting).
* **Easy fusion of multiple input sensors** PFs are often used to solely to fuse together multiple inputs from different sensors.
* **Better feedback** PFs  offer the opportunity to give users rich insight into the process of intention decoding.
* **Flexible modeling** PFs can incorporate both fundamental modeling (e.g. physiological or cognitive models) and data-driven machine learning.

### History
* 1960s Kalman filter (Swerling, Kalman, Bucy), Extended Kalman Filter (Schmidt)
* late 1960-1990s Particle filter / sequential Monte Carlo
* 1992 Bootstrap filter (Gordon)
* 1995 Unscented Kalman Filter (Uhlmann)
* 1998 Condensation: particle filter for vision problems (Isard and Blake) 

We will base our model on that proposed by Isard and Blake.

# Principles 
-------
### Overview diagram
<img src="imgs/control_loop.png">



Notation:
* We have a sequence of states over time, indexed by $t$
* $X_t$ the variable we want to know (at time $t$). 
* $Y_t$ the variable we can observe.
* $\hat{X_t}$ our estimate of the variable we want to know.

* We want to compute $\hat{X_t}=P(X_t|Y_t)$ (the **inverse problem**). 
* We use a **forward model** $P(Y_t|X_t)$ to infer this.
* We need to define two functions: $Y_t = f(X_t)$ (the **observation function**) and $X_{t} = g(X_{t-1})$ (the **dynamics** or **process function**).

* $f$ and $g$ are often very simple functions.

<img src="imgs/stochastic.png" width="50%">


### Use case
### Problem description
We are going to solve xxx

#### Algorithm
We will use the **particle filter** algorithm, although I will briefly explain how an unscented Kalman filter could be used for part of the estimation.

In [3]:
# import the things we need
from __future__ import print_function, division
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import pykalman, pfilter
import ipywidgets
import IPython
%matplotlib inline

### Meat
Meat goes here!

### Key algorithm summary
| Algorithm       | Dynamics       | State distribution | Efficiency | Optimizable |
|-----------------|----------------|--------------------|------------|-----------|
| Particle        | Arbitrary      | Arbitrary          | Low        | No        |
| Kalman          | Linear         | Gaussian           | Very high  | Yes       |
| Extended Kalman | Locally linear | Gaussian           | High       | Yes       |
| Unscented Kalman| Arbitrary      | Gaussian           | High       | ?         |
| HMM             | Transitions    | Discrete           | High       | Yes       |

* Dynamics: permissible state transition functions (i.e. how we go from now to the next timestep).
* State distribution: distribution type for representing current state. Gaussian distributions are very efficient, but can't represent multiple modes.
* Efficiency: computational efficiency.
* Optimizable: is there an algorithm to optimize the  parameters of the filter be *automatically* given training data?


### Gallery
Research papers here (thumbnail + link), short description of why cool

### Pitfalls
Hands-on guru knowledge goes here.

# Outlook
---------------------
### Scope and limitations
#### Scope

#### Limitations
* PFs can be computationally intensive to run. 
* Curse-of-dimensionality can make the attractive simplicity of PFs work poorly in practice as the state space expands.
* Sometimes the inverse probability model can be hard to formulate.
* Particle filters are simple and elegant, but inferentially weak.
* Kalman filters are rigid and restrictive, but very inferentially efficient.
* Hybrid approaches (Ensemble Kalman filter, Unscented Kalman Filter, hybrid particle/Kalman filters) can trade these qualities off, but they aren't off the shelf solutions (i.e. you need an expert!).


### Resources
#### Basic
* Read the [Condensation paper](http://vision.stanford.edu/teaching/cs231b_spring1415/papers/isard-blake-98.pdf).
* Read [the Kalman filter in pictures](http://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/)
* Watch [the particle filter without equations](https://www.youtube.com/watch?v=aUkBa1zMKv4)

#### Advanced
* [A technical but succinct and clear explanation of the particle filter](http://www.cns.nyu.edu/~eorhan/notes/particle-filtering.pdf)
* [A bibliography of particle filter papers](http://www.stats.ox.ac.uk/~doucet/smc_resources.html)

**some more HCI related resources**

### Future of probabilistic filtering

#### Learned models

Much use of probabilistic filters has depended on strong mathematical models of the fundamental process. For example, in rocket science, sophisticated physics models were used to specify the Kalman filters used for stable control. 

However, it is becoming increasingly possible to **infer** these models from observations. Techniques such as deep learning (for example variational autoencoders or generative adversarial networks) make it possible to learn very sophisticated *generative models* from observations of
data.  

These models can be dropped into probabilistic filters to produce robust inferential engines for user interaction.