# Machine Learning and the Physical World

### [Neil D. Lawrence](http://inverseprobability.com), University of

Cambridge

### 2021-07-07

**Abstract**: Machine learning technologies have underpinned the recent
revolution in artificial intelligence. But at their heart, they are
simply data driven decision making algorithms. While the popular press
is filled with the achievements of these algorithms in important domains
such as object detection in images, machine translation and speech
recognition, there are still many open questions about how these
technologies might be implemented in domains where we have existing
solutions but we are constantly looking for improvements. Roughly
speaking, we characterise this domain as “machine learning in the
physical world.” How do we design, build and deploy machine learning
algorithms that are part of a decision making system that interacts with
the physical world around us. In particular, machine learning is a data
driven endeavour, but real world systems are physical and mechanistic.
In this talk we will introduce some of the challenges for this domain
and and propose some ways forward in terms of solutions.

$$
$$

<!--


\editme

\subsection{Laplace's Demon}

\includegooglebook{1YQPAAAAQAAJ}{PR17-IA2}


\speakernotes{This notion is known as *Laplace's demon* or *Laplace's superman*.}

\newslide{Laplace's Demon}

\figure{\includepng{https://inverseprobability.com/talks/slides/diagrams//physics/philosophicaless00lapliala_16_cropped}{60%}}{}{laplaces-demon-cropped}

> *Philosophical Essay on Probabilities* @Laplace-essai14 pg 3


\newslide{Machine Learning}

\aligncenter{
$$
\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}$$}

\newslide{Theory of Everything}

> If we do discover a theory of everything ... it would be the ultimate triumph of human reason-for then we would truly know the mind of God
>
> Stephen Hawking in *A Brief History of Time* 1988

-->
\\figure{\\columns{\\threeColumns{\\aligncenter{\\includediagramclass{https://inverseprobability.com/talks/slides/diagrams//simulation/life-rules-1-0}{100%}}}{\\aligncenter{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//util/right-arrow}{60%}}}{\\aligncenter{\\includediagramclass{https://inverseprobability.com/talks/slides/diagrams//simulation/life-rules-1-1}{100%}}}{30%}{39%}{30%}}{\\aligncenter{\\includejpg{https://inverseprobability.com/talks/slides/diagrams//maths/John-Conway}{100%}{}{right}}}{70%}{30%}}{‘Death’
through loneliness in Conway’s game of life. If a cell is surrounded by
less than three cells, it ‘dies’ through
loneliness.}{life-rules-loneliness}

\\figure{\\columns{\\threeColumns{\\aligncenter{\\includediagramclass{https://inverseprobability.com/talks/slides/diagrams//simulation/life-rules-2-0}{100%}}}{\\aligncenter{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//util/right-arrow}{60%}}}{\\aligncenter{\\includediagramclass{https://inverseprobability.com/talks/slides/diagrams//simulation/life-rules-2-1}{100%}}}{30%}{39%}{30%}}{\\aligncenter{\\includejpg{https://inverseprobability.com/talks/slides/diagrams//maths/John-Conway}{100%}{}{right}}}{70%}{30%}}{‘Death’
through overpopulation in Conway’s game of life. If a cell is surrounded
by more than three cells, it ‘dies’ through
loneliness.}{life-rules-crowding}

\\figure{\\columns{\\threeColumns{\\aligncenter{\\includediagramclass{https://inverseprobability.com/talks/slides/diagrams//simulation/life-rules-3-0}{100%}}}{\\aligncenter{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//util/right-arrow}{60%}}}{\\aligncenter{\\includediagramclass{https://inverseprobability.com/talks/slides/diagrams//simulation/life-rules-3-1}{100%}}}{30%}{39%}{30%}}{\\aligncenter{\\includejpg{https://inverseprobability.com/talks/slides/diagrams//maths/John-Conway}{100%}{}{right}}}{70%}{30%}}{Birth
in Conway’s life. Any position surounded by precisely three live cells
will give birth to a new cell at the next turn.}{life-rules-crowding}

\\figure{\\columns{\\aligncenter{\\includegif{https://inverseprobability.com/talks/slides/diagrams//simulation/Glider}{80%}{}{left}}}{\\includejpg{https://inverseprobability.com/talks/slides/diagrams//maths/John-Conway}{80%}{}{right}}{45%}{45%}}{*Left*
A Glider pattern discovered 1969 by Richard K. Guy. *Right*. John Horton
Conway, creator of *Life* (1937-2020).}{glider-loafer-conway}

\\figure{\\columns{\\aligncenter{\\includegif{https://inverseprobability.com/talks/slides/diagrams//simulation/Loafer}{80%}{}{left}}}{\\includejpg{https://inverseprobability.com/talks/slides/diagrams//maths/John-Conway}{80%}{}{right}}{45%}{45%}}{*Left*
A Loafer pattern discovered by Josh Ball in 2013. *Right*. John Horton
Conway, creator of *Life* (1937-2020).}{glider-loafer-conway}

<!--


\editme


\subsection{Laplace's Gremlin}

\figure{\includepng{https://inverseprobability.com/talks/slides/diagrams//physics/philosophicaless00lapliala_18_cropped}{60%}}{To Laplace, determinism is a strawman. Ignorance of mechanism and data leads to uncertainty which should be dealt with through probability.}{probability-relative-in-part}
>
> *Philosophical Essay on Probabilities* @Laplace-essai14 pg 5


\newslide{}

\figure{\includejpg{https://inverseprobability.com/talks/slides/diagrams//ai/gremlins-think-its-fun-to-hurt-you}{40%}}{Gremlins are seen as the cause of a number of challenges in this World War II poster.}{germlins-think-its-fun-to-hurt-you}

-->
\\figure{\\includejpg{https://inverseprobability.com/talks/slides/diagrams//science-holborn-viaduct}{50%}}{Centrifugal
governor as held by “Science” on Holborn
Viaduct}{science-holborn-viaduct} \`\`{=tex}

\\figure{\\includepng{https://inverseprobability.com/talks/slides/diagrams//SteamEngine_Boulton&Watt_1784}{50%}{negate}}{Watt’s
Steam Engine which made Steam Power Efficient and
Practical.}{steam-engine-boulton-watt}

\\figure{\\includepng{https://inverseprobability.com/talks/slides/diagrams//Centrifugal_governor}{70%}{negate}}{The
centrifugal governor, an early example of a decision making system. The
parameters of the governor include the lengths of the linkages (which
effect how far the throttle opens in response to movement in the balls),
the weight of the balls (which effects inertia) and the limits of to
which the balls can rise.}{centrifugal-governor}

<!--

\editme

\subsection{Process Automation}

\newslide{Efficiency}
\slides{
* Economies driven by 'production'.
* Greater production comes with better efficiency.
    * E.g. moving from gathering food to settled agriculture.
* In the modern era one approach to becoming more efficient is automation of processes.
    *  E.g. manufacturing production lines
}
\newslide{Physical Processes}
\slides{
* Production lines, robotic automation
* Supply chain, logistics
* Efficiency through automation.
}
\newslide{Goods and Information}
\slides{
* Manage flow of goods and information.
* Flow of information is highly automated.
* Processing of data is decomposed into stages in computer code. 
}
\newslide{Intervention}
\slides{
* For all cases:  manufacturing, logistics, data management
* Pipeline requires human intervention from an operator.
* Interventions create bottlenecks, slow the process.
* Machine learning is a key technology in automating these manual stages.
}
\newslide{Long Grass}
\slides{
* Easy to replicate interventions have already been dealt with.
* Components that still require human intervention are the knottier problems.
* Difficult decompose into stages which could then be further automated.
* These components are 'process-atoms'.
* These are the "long grass" regions of technology.
}
\newslide{Nature of Challenge}
\slides{
*  In manufacturing or logistics settings atoms are flexible manual skills.
    * Requires emulation of a human's motor skills.
* In information processing: our flexible cognitive skills.
    * Our ability to mentally process an image or some text. 
}







\editme

\subsection{Artificial Intelligence and Data Science}
\slides{
* AI aims to equip computers with human capabilities
    * Image understanding
    * Computer vision
    * Speech recognition
    * Natural language understanding
    * Machine translation
}
\notes{Artificial intelligence has the objective of endowing computers with human-like intelligent capabilities. For example, understanding an image (computer vision) or the contents of some speech (speech recognition), the meaning of a sentence (natural language processing) or the translation of a sentence (machine translation).}

\subsubsection{Supervised Learning for AI}
\slides{
* Dominant approach today:
    * Generate large labelled data set from humans.
    * Use *supervised learning* to emulate that data.
        * *E.g.* [ImageNet](www.image-net.org) @Russakovsky-imagenet15
* Significant advances due to *deep learning*
    * *E.g.* Alexa, Amazon Go
}
\notes{The machine learning approach to artificial intelligence is to collect and annotate a large data set from humans. The problem is characterized by input data (e.g. a particular image) and a label (e.g. is there a car in the image yes/no). The machine learning algorithm fits a mathematical function (I call this the *prediction function*) to map from the input image to the label. The parameters of the prediction function are set by minimizing an error between the function’s predictions and the true data. This mathematical function that encapsulates this error is known as the *objective function*.}

\notes{This approach to machine learning is known as *supervised learning*.  Various approaches to supervised learning use different prediction functions, objective functions or different optimization algorithms to fit them.}

\notes{For example, *deep learning* makes use of *neural networks* to form the predictions. A neural network is a particular type of mathematical function that allows the algorithm designer to introduce invariances into the function.}

\notes{An invariance is an important way of including prior understanding in a machine learning model. For example, in an image, a car is still a car regardless of whether it’s in the upper left or lower right corner of the image. This is known as translation invariance. A neural network encodes translation invariance in *convolutional layers*. Convolutional neural networks are widely used in image recognition tasks.}

\notes{An alternative structure is known as a recurrent neural network (RNN).  RNNs neural networks encode temporal structure. They use auto regressive connections in their hidden layers, they can be seen as time series models which have non-linear auto-regressive basis functions. They are widely used in speech recognition and machine translation.}

\notes{Machine learning has been deployed in Speech Recognition (e.g. Alexa, deep neural networks, convolutional neural networks for speech recognition), in computer vision (e.g. Amazon Go, convolutional neural networks for person recognition and pose detection).}

\newslide{Data Science}
\slides{
* Arises from *happenstance data*.
* Differs from statistics in that the question comes *after* data collection.
}
\notes{The field of data science is related to AI, but philosophically different. It arises because we are increasingly creating large amounts of data through *happenstance* rather than active collection. In the modern era data is laid down by almost all our activities. The objective of data science is to extract insights from this data.}

\notes{Classically, in the field of statistics, data analysis proceeds by assuming that the question (or scientific hypothesis) comes before the data is created. E.g., if I want to determine the effectiveness of a particular drug, I perform a *design* for my data collection. I use foundational approaches such as randomization to account for confounders. This made a lot of sense in an era where data had to be actively collected. The reduction in cost of data collection and storage now means that many data sets are available which weren’t collected with a particular question in mind. This is a challenge because bias in the way data was acquired can corrupt the insights we derive. We can perform randomized control trials (or A/B tests) to verify our conclusions, but the opportunity is to use data science techniques to better guide our question selection or even answer a question without the expense of a full randomized control trial (referred to as A/B testing in modern internet parlance).}


-->
-   There is a gap between the world of data science and AI.
-   The mapping of the virtual onto the physical world.
-   E.g. Causal understanding.

\\centerdiv{\\gurKimchiPicture{15%}\\paulViolaPicture{15%}\\davidMoroPicture{15%}}

\\figure{\\includejpg{https://inverseprobability.com/talks/slides/diagrams//ai/amazon-prime-air-remars-june-2019}{80%}}{Picture
of the drone from Amazon Re-MARS event in
2019.}{amazon-prime-air-remars}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//software/buying-schematic}{40%}}{The
components of a putative automated buying
system}{buying-system-components}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//ai/ml-system-monolith-purchasing}{60%}}{A
potential path of models in a machine learning
system.}{ml-system-monolith}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//ai/ml-system-downstream-purchasing000}{60%}}{A
potential path of models in a machine learning
system.}{ml-system-downstream-purchasing}

\\figure{\\includepng{https://inverseprobability.com/talks/slides/diagrams//ai/2020-02-12-intellectual-debt}{70%}}{Jonathan
Zittrain’s term to describe the challenges of explanation that come with
AI is Intellectual Debt.}{intellectual-debt}

-   Technical debt is the inability to *maintain* your complex software
    system.

-   Intellectual debt is the inability to *explain* your software
    system.

\\figure{\\includepng{https://inverseprobability.com/talks/slides/diagrams//simulation/unified_model_systems_13022018_1920}{60%}{negate}}{The
UK Met office runs a shared code base for its simulations of climate and
the weather. This plot shows the different spatial and temporal scales
used.}{met-office-unified-model}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//uq/statistical-emulation000}{80%}}{Real
world systems consist of simulators that capture our domain knowledge
about how our systems operate. Different simulators run at different
speeds and granularities.}{statistical-emulation-1}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//uq/statistical-emulation001}{80%}}{A
statistical emulator is a system that reconstructs the simulation with a
statistical model.}{statistical-emulation-2}

\\slides{\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//uq/statistical-emulation002}{80%}}{A
statistical emulator is a system that reconstructs the simulation with a
statistical model.}{statistical-emulation-3}}

\\slides{\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//uq/statistical-emulation003}{80%}}{As
well as reconstructing the simulation, a statistical emulator can be
used to correlate with the real world.}{statistical-emulation-4}}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//uq/statistical-emulation004}{80%}}{A
statistical emulator is a system that reconstructs the simulation with a
statistical model. As well as reconstructing the simulation, a
statistical emulator can be used to correlate with the real
world.}{statistical-emulation-5}

\\slides{\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//uq/statistical-emulation005}{80%}}{In
modern machine learning system design, the emulator may also consider
the output of ML models (for monitoring bias or accuracy) and Operations
Research models..}{statistical-emulation-6}}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//ai/ml-system-downstream-purchasing000}{75%}}{A
potential path of models in a machine learning
system.}{ml-system-downstream-purchasing0}

<!-- This structural learning allows us to associate data with the relevant -->
<!-- layer of the model, rather than merely on the leaf nodes of the output -->
<!-- model. When deploying the deep Gaussian process as an emulator, this -->
<!-- allows for the possibility of learning the structure of the different -->
<!-- component parts of the underlying system. This should aid the user in -->
<!-- determining the ideal system decomposition. -->
\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//ai/ml-system-downstream-purchasing001}{75%}}{A
potential path of models in a machine learning
system.}{ml-system-downstream-purchasing1}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//ai/ml-system-downstream-purchasing002}{75%}}{A
potential path of models in a machine learning
system.}{ml-system-downstream-purchasing2}

\\figure{\\includediagram{https://inverseprobability.com/talks/slides/diagrams//ai/ml-system-downstream-purchasing003}{75%}}{A
potential path of models in a machine learning
system.}{ml-system-downstream-purchasing3}

\\figure{\\includepng{https://inverseprobability.com/talks/slides/diagrams//accelerate/accelerate-website}{70%}}{The
Accelerate Programme for Scientific Discovery covers research, education
and training, engagement. Our aim is to bring about a step change in
scientific discovery through AI.
<http://acceleratescience.github.io>}{accelerate-website}

\\figure{\\includepng{https://inverseprobability.com/talks/slides/diagrams//ml/ml-and-the-physical-world-course}{70%}}{Machine
Learning and the Physical World is a course focussed on teaching the
principles and techniques of emulation. It’s freely available on line.
<http://mlatcl.github.io/mlphysical/>}{ml-physical-world-course}

\\thanks