# Multi-fidelity Modelling

Deep neural networks are a powerful approach to dealing with images, speech and language. All interesting domains where we are gathering a large amount of data. 

Earlier on in my career, I learned an important lesson. I had discovered Gaussian processes, and was interested in approaches for making them practical. Particularly for 'popular' tasks of the day. One of which was classification of handwritten digits: the MNIST data. 

With Matthias Seeger and Ralf Herbrich I worked on an algorithm for fitting Gaussian process classifier models that was able to scale as well as the dominant methodology of the day, support vector machines, and perform well. The paper was published at NIPS and the work is known as the "informative vector machine". It's a moderately successful work, but it did not have the hoped for response of alerting the wider SVM community to the utility and flexibility of Gaussian process models. 

It's horses for courses, and where as GP Classifiers are interesting models, it seems fairly pointless to direct efforts at making a methodology work well in a domain where we already have good coverage. The real success of Gaussian process models came in domains where other frameworks were not appropriate. Hyper parameter optimization springs to mind [@Snoek:practical12], a paper with over a thousand citations. My own most successful work, GP-LVM exploits characteristics of GPs that are not associated with other models [@Lawrence:pnpca05].

With that in mind my own philosophy is to look explicitly for domains where the characteristics of Gaussian process models can be more usefully exploited, that is not to say that these models can't be used in the more traditional domains, but a lot of effort could be placed there to merely making them also-rans, whereas in domains where uncertainty is critical the balance is tilted in favour of GPs.

As a broad domain I'm excited by the domains of probabilistic numerics, surrogate modelling, emulation and uncertainty quantification. There are two reasons for this. Firstly, while I'm not a fan of the term artificial intelligence, or the image instigated in the public mind by the term, or the narrative futures it implies. The truth is that we are faced with a need for an increasing amount of algorithmic decision making. Decision making that is based on data. The dominant framework for achieving those ends, at the moment, is machine learning. That is through the combination of data with the model to form the prediction, followed by *decision making*. The term AI definitely includes decision making as part of its requirements. But a critical challenge is decision making in the presence of uncertainty. 

Data in our phones, and our devices is all well in good, but things get more interesting when considering the interaction between our virtual world and the physical world. Interaction is key, because we can choose to spend more effort on data acquisition, computation or inference. Each of these requires a decision.

Wikipedia defines [Uncertainty Quanitification](https://en.wikipedia.org/wiki/Uncertainty_quantification) as:

>Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known.

And this drives to the heart of what we require for practical decision making systems. Probabilistic numerics associates uncertainty (in the form of probabilities) with our computations. Emulation (or surrogate modelling) is a key component of this approach because an emulator is a model, we can think of it as a machine learning model, of a computational process. For example, we can think of the Gaussian process model in Bayesian optimization as *surrogate* for the process of optimization. A way of assessing our expectations of what that answer will be without running the optimization. Bayesian optimization then involves an *acquisition function* which encodes our decision making framework about what parameters to try next. 

Bayesian optimization, therefore, presents a microcosm of the world of uncertainty quantification, where we are taking the particular approach of emulating the system of interest, often with a Gaussian process. We know they perform well in these domains. We have

The interaction between the physical and virtual worlds was, for me, a major reason for working at Amazon. When I was exploring Amazon, a colleague explained to me that Amazon is a 'bits and atoms' company. It moves stuff and it moves information. For me, an excellent definition of intelligence is the use of *information* to achieve goals with *less resource*. Amazon is therefore an excellent ecosystem for exploring intelligence. However, to do so we need good models of the physical world, and approaches to decision making in the physical world. 

But let's move aware from personal corporate allegiances and in to a domain that is more neutral territory for most machine learning practitioners. Formula one motor racing. How can we use information and compute to improve motor racing?

For our practical example I want to 


in the public mind  reinterpretation of numerical algorithms as probabilistic. Why? Well in artificial intelligence we are interested in decision making 
As an example of a domain where deep Gaussian process models are an interest

@incollection{Snoek:practical12,
title = {Practical Bayesian Optimization of Machine Learning Algorithms},
author = {Snoek, Jasper and Larochelle, Hugo and Adams, Ryan P},
booktitle = {Advances in Neural Information Processing Systems 25},
editor = {F. Pereira and C. J. C. Burges and L. Bottou and K. Q. Weinberger},
pages = {2951--2959},
year = {2012},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf}
}
@Article{Lawrence:pnpca05,
  author =	 {Neil D. Lawrence},
  title =	 {Probabilistic Non-linear Principal Component
                  Analysis with {G}aussian Process Latent Variable
                  Models},
  journal =	 jmlr,
  year =	 2005,
  volume =	 6,
  pages =	 {1783--1816},
  month =	 11,
  pdf = {http://www.jmlr.org/papers/volume6/lawrence05a/lawrence05a.pdf},
  abstract =	 {Summarising a high dimensional data set with a low
                  dimensional embedding is a standard approach for
                  exploring its structure. In this paper we provide an
                  overview of some existing techniques for discovering
                  such embeddings. We then introduce a novel
                  probabilistic interpretation of principal component
                  analysis (PCA) that we term dual probabilistic PCA
                  (DPPCA). The DPPCA model has the additional
                  advantage that the linear mappings from the embedded
                  space can easily be non-linearised through Gaussian
                  processes. We refer to this model as a Gaussian
                  process latent variable model (GP-LVM). Through
                  analysis of the GP-LVM objective function, we relate
                  the model to popular spectral techniques such as
                  kernel PCA and multidimensional scaling. We then
                  review a practical algorithm for GP-LVMs in the
                  context of large data sets and develop it to also
                  handle discrete valued data and missing
                  attributes. We demonstrate the model on a range of
                  real-world and artificially generated data sets.}
}
