![8th International Summer School on Computational Interaction](imgs/header.png)

# 1: Introduction to Forward and Inverse Modelling

## Who am I?

I'm **John H. Williamson**, from the University of Glasgow. 

* JohnH.Williamson@glasgow.ac.uk   @jhnhw  | [github.com/johnhw](https://github.com/johnhw)

<img src="imgs/uofg.jpg">

### Why am I here?
* I'm interested in computational approaches to HCI, particularly probabilistic and Bayesian methods and active inference, as well as control theoretic and unsupervised learning approaches.
* I've been doing this for a long time (>20 years!)
* (I founded the Summer School on Computational Interaction back in 2015, so it's nice to be back!).

## Advert
If you know a (Masters/Bachelors) student graduating soon who might be interested in a PhD in computational HCI, I have a position open in building "Optimal Mechanisms" for HCI. 


### Interactive elements

Interact via `sli.do` with the code `#compint24_jhw`.

## Notebooks

All of the notes for this course are executable Jupyter notebooks, using Python. See [readme.md](readme.md) for installation and setup instructions.

## Timeline for the morning

* 0900-1100: [Introduction to Forward and Inverse Modelling](01_Introduction.ipynb)
* 1100-1200: [Hands-on session](02_Exercise.ipynb)
* 1200-1300: [A review](03_Review.ipynb)




# An example: rich touch sensing

![A finger being sensed by a capacitive board](imgs/finger_track.png)
*A finger being sensed by a capacitive sensor. From the vector of values we get back, how can we recover the finger pose? Or, more generally, what the user was trying to do?*

What is going on here?

* We get a fixed-sized array of values $\bf{x_t} = [x_0, x_1, \dots]$ at each time step $t$ -- a **frame**. 
* This somehow captures the configuration of conductive bodies, like fingers, above the sensor.
* We want to turn this into something that a user can use to *convey intention* -- get the system to do what they want.

Usually we divide this into two steps:

* **Cursor** Recover some intermediate state from $\bf{x_t}$, like a ``cursor'' to indicate something related to the *physical configuration* of the world (but not usually an *actual* position!), outputing a simpler $\bf{y_t}$
* **State** UI components respond to ${\bf y_t}$ representing cursor locations and state changes (e.g. touch up/down, swipes, etc.)

In some cases these are conflated into a single step, such as a gesture system that takes a sequence of frames and directly actuates commands.

## Questions

* How do we do this transformation from $\bf x_t$ to $\bf y_t$?
* What makes this hard? 
* **How can we use computational methods to address this?**
* What do we do if the input device changes? For example, if the density of the sensor array changes, or we add a mm-wave radar tracker?

I'll use the ``finger tracking'' problem as an example of an input problem we can understand easily, but the ideas generalise to many input problems.

# What is the input problem?

![The input problem](imgs/brain_inference.png)

We are going consider the problem of **input**. We'll assume the "classical dyad", where a single human user interacts with a single system, such as someone browsing on their phone. The user has an intention, which they wish to communicate to the system. The user and system are connected via an interface. The user and system are both embedded in the physical world, and this environment influences the way in which information propagates between the user and the system.

**Input** is the process by which intention -- what a user wishes to do -- is transduced into a change of state in the system. It is a fundamental problem in  human-computer interaction. This is typically a closed-loop process, where the user observes the system's response and adjusts their input accordingly. Intention is rarely expressed ballistically, where an intention is packaged up as a single command and sent to the system. Instead, it is a continuous process, where the user's intention is continually updated based on feedback.

## Sensing
The system observes its environment via sensors. Some part of the signal those sensors transduce is related to the user's intended input and thus indirectly to their intention. Common, traditional sensors like mice are very *selective*. Their physical form and sensing properties means that they are effective at rejecting environment states that are unrelated to intention. But many sensors we'd like to be able to use for input are not so selective. Cameras, for example, pick up a lot of information that is not related to intention. They might capture millions of pixels in a frame and dozens of frames in a second. A user, however, is likely to be generating slow changes in a subspace of very low-dimension -- manipulating a slider, for example. All of the other information is irrelevant; the lighting conditions, the background environment, the clothing of the user, tiny physiological movements like breathing or tremor. 

## Tangling with the environment

![Tangling with the environment](imgs/brainspace.png)

What a sensor vector captures is typically a convoluted mixture of the user's intention and the environment. The environment is a source of noise, but it is also a source of contextual information. Intentions originate in a user's mind, and then propagate forth via their motor system, which influences the physical state of the environment. Those changes are detected by sensors, which then feedback updated internal states into the user's mind via display channel and thence into the user's mind via the user's perceptual system. 




## The computational interaction approach
We want to approach this *computationally*. What tools do we have available to tackle this problem? There are two distinct approaches to this problem:

![Forward and inverse models](imgs/fwd_inv_finger.png)

* **Direct inverse models** are models that take sensor data and directly infer the user's intention. 
    * These are typically machine learning models, such as deep networks, that are trained on large datasets of sensor data and corresponding intentions. 
    * They are typically trained in a supervised manner, where the intention is provided as a label for the sensor data. 
    * They are usually very computationally efficient.
    * They can adapt to very complex mappings from input to state.
    * These models are typically *black boxes* -- they are not interpretable, and they do not provide any insight into the underlying interaction phenomena. 
    * They are also typically *deterministic* -- they provide a single answer for a given input. 
    * They are commonly *static* -- they do not update their beliefs about the user's intention as new data arrives. Instead, they map a sensor vector to an intention. 
    * Such ML based models are **data-driven** and usually derive their power from the volume of training data used to train them.

* **Bayesian inversion** where we use *forward* models to invert the process of intention generation. We build a model of how a user's intention *would* be expressed in terms of sensing, and then use the sensor vectors to infer the parameters of that model. 
    * This is a *generative* approach -- we build a model that can simulate sensor data given an intention. 
    * This model is typically *interpretable* -- we can understand the parameters of the model in terms of the underlying interaction phenomena. 
    * It is also *probabilistic* -- it provides a distribution over possible intentions, given the sensor data. 
    * It is also *dynamic* -- it updates its beliefs about the user's intention as new data arrives. But it is both computationally expensive and requires expert knowledge. 
    * These models are **model-driven** and derive their power from the representation of uncertainty and the ability to update beliefs sequentially and to fuse information from multiple sources.

As we'll later see, this is not an either/or choice. We can use both approaches in concert, with the Bayesian inversion model providing a prior over the space of possible intentions, and the direct inverse model providing a fast, approximate estimate of the user's intention. We can also ML approaches to build fast *forward* models that can be used in the Bayesian inversion process (*emulation approaches*).



# Demo example

## Direct inversion

## Bayesian inversion




# Computational interaction
Let's get back to computational interaction. A tenet of the approach is that it puts models *first*. Every model in computational interaction will be a bit of code that is executed in order to gain insight into an interaction phenomena that we cannot directly access. Not all things that are called models are equivalent, however. We need to think about the characteristics that of models of interaction: are some better than others?

## On the virtues of models
Given two models that model some interaction phenomena equally well, we'd prefer the model that:

* is easily implemented computationally and fits with software engineering practices;
* is conveniently parameterised, with *interpretable* parameters;
* is generative, and expressed in terms of generating synthetic observations;
* is capable of propagating uncertainty correctly.


## Data generating processes
 We need *models* to do *computational interaction*, and they need to be *executable*. We'd further like them to be *generative*. That implies code that simulates or emulates some part of an interactive system -- a **forward model** that transforms unknown states into the observable quantities they imply. At the heart of Bayesian modelling we have the idea of a **data generating process**, a process which we believe is generating data we observe. We implement this as an algorithm
which generates synthetic observations. 

> This is just a function!


Every application of Bayesian ideas starts with the data generating process: write down code that will spit out plausible simulations, given some configurable parameters.

### What is uncertainty and where does it come from?
Uncertainty exists in all systems that make contact with the real world. The physical world is not the domain of absolute logical truth, and the human social world is even less so. This is especially true when we project into the future (prediction, or forecasting), but even when reasoning about the present or the past, we must account for and be aware of the uncertainty involved.

In interaction, we have, in the simplest case, two parties, or agents: 

* a brain, embedded in a human, embedded in a physical world
* and software, embedded in computer hardware, embedded in the same physical world. 

Each of these "agents" has uncertainty about the other. Some of this uncertainty is due to the world that separates them (e.g. noise in the motor system). Some of it is because they do not (yet) have knowledge of each others' states.

#### Epistemic, aleatoric and approximation

We can separate out some *types* of uncertainty:

* **Epistemic uncertainty** is uncertainty about what we know (hence epistemic) arising from the limitations of our knowledge (as encoded by a model).  If I've only ever met one person, my epistemic uncertainty about the height of people is likely to be large -- I don't *know* how tall people are.
* **Aleatoric uncertainty** is that which arises from (presumed) randomness in the world. If I toss a coin, my uncertainty about which side lands face up is aleatoric. This type of uncertainty cannot be resolved by better modelling, more data, etc.; it is irreducible. Even if I have an excellent model of people's heights, any given person's height won't be precisely predicted by that model.
* **Approximation uncertainty** arises from the limitations of computation to approximate inference. In general, Bayesian methods cannot be applied exactly, and so the results are subject to additional uncertainty.



# Bayesian Inversion

### A mysterious entity

We can imagine that the phenomena we are interested in (some interaction problem, say) is a mysterious entity who emits observable quantities (like the time taken to click on a menu item) but whose internal operation is inscrutable.

<img src="imgs/entity.png">

We can see the **data generating process** (our model) as a tame mysterious entity, who generates samples when simulating and can also judge the quality of observations (likelihood) when fed them.  The mysterious entity is controlled by parameters (dials) which adjust the simulation and its opinion of the quality of observations. What we want is to know *which* mysterious entity parameters are compatible with the true (but unseen) mysterious entity.

### Bayesian inversion

This is a problem of **inversion**; working out what was happening in the unobserved realm by deducing plausible behaviours compatible with the observations. Working out what age someone is given how tall they are is an inverse problem. Working out how tall they are given their age is a forward problem. In Bayesian modelling we use the **forward** model (the data generating process) as the key step to build our inversion model. 

> Other approaches solve inversion directly; for example we might build a machine learning model that predicts ages given heights by fitting a deep network to lots of paired `(age, height)` examples. We could then, at inference time, feed it a height and it would return an age. Critically, it would only return *one* age -- the best predicted age (as directed by the objective function used to train the network). This is very much **not** what we will do in the Bayesian models we will see later!

> * ML models: typically invert by optimising to find a single inverse function.
> * Bayesian models: invert by forming a distribution over inverse functions that are plausible, given observations.


## What is a Bayesian inverse model?

A Bayesian model:

* A generative model of the phenomena under consideration, that simulate plausible observations.
* Represents, preserves and manipulates uncertainty about unknown parameters. Uncertainty is **first-class**.
* Reasons about the unknown parameters that modulate the behaviour of those generative models.
* Uses *likelihood* to invert forward models.

## What is a forward model?



## Stochastic filtering

### Particle filter

### Particle filter terms

## Fusion


# Blending forward and inverse models

We can *combine* some of the benefits of a Bayesian and direct inverse model. There are several ways to do this:

* Using ML to learn a latent representation of sensor states (either unsupervised, or as a byprodcut from a supervised direct inverse model)
* Incorporating direct inverse model outputs as evidence sources in a Bayesian inverse mode..

![Combining a forward and inverse model.](imgs/fwd_inv_bottleneck.png)
*Combining a forward and inverse model using a learned latent representation.*

![Combining a forward and inverse model.](imgs/fwd_inv.png)
*Fusing a direct inverse model in the evidence update of an stochastic filter.*

