# The Challenges of Data Science

### [Neil D. Lawrence](http://inverseprobability.com), University of

Cambridge

### 2023-10-30

**Abstract**: In the first lecture, we laid out the underpinning
phenomena that give us the landscape of data science. In this lecture we
unpick the challenges that landscape presents us with. The material
gives you context for why data science is very different from standard
software engineering, and how data science problems need to be
approached including the many different aspects that need to be
considered. We will look at the challenges of deploying data science
solutions in practice. We categorize them into three groups.

$$
$$

::: {.cell .markdown}

<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!---->
<!-- Do not edit this file locally. -->
<!-- Do not edit this file locally. -->
<!-- The last names to be defined. Should be defined entirely in terms of macros from above-->
<!--

-->

# Introduction

Data science is an emerging discipline. That makes it harder to make
clean decisions about what any given individual will need to know to
become a data scientist. Those of you who are studying now will be those
that define the discipline. As we deploy more data driven decision
making in the world, the role will be refined. Until we achieve that
refinement, your knowledge needs to be broad based.

In this lecture we will first continue our theme of how our limitations
as humans mean that our analysis of data can be affected, and I will
introduce an analogy that should help you understand *how* data science
differs significantly from traditional software engineering. We’ll then
contextualize some of the challenges the domain into three different
groups.

## The Gartner Hype Cycle

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/gartner-hype-cycle.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/gartner-hype-cycle.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

<img src="https://mlatcl.github.io/advds/./slides/diagrams//Gartner_Hype_Cycle.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>The Gartner Hype Cycle places technologies on a graph that
relates to the expectations we have of a technology against its actual
influence. Early hope for a new techology is often displaced by
disillusionment due to the time it takes for a technology to be usefully
deployed.</i>

The [Gartner Hype Cycle](https://en.wikipedia.org/wiki/Hype_cycle) tries
to assess where an idea is in terms of maturity and adoption. It splits
the evolution of technology into a technological trigger, a peak of
expectations followed by a trough of disillusionment and a final
ascension into a useful technology. It looks rather like a classical
control response to a final set point.

## Cycle for ML Terms

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/gartner-hype-cycle-ai-bd-dm-dl-ml-llm.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/gartner-hype-cycle-ai-bd-dm-dl-ml-llm.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

## Google Trends

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/gartner-hype-cycle-base.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/gartner-hype-cycle-base.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

In [None]:
%pip install pytrends

In [None]:
import mlai.plot as plot

In [None]:
plot.google_trends(terms=['artificial intelligence', 'big data', 'data mining', 'deep learning', 'machine learning', 'large language model'], 
                  initials='ai-bd-dm-dl-ml-llm', 
                  diagrams='./data-science')

In [None]:
import notutils as nu
from ipywidgets import IntSlider

In [None]:
nu.display_plots('ai-bd-dm-dl-ml-llm-google-trends{sample:0>3}.svg', 
                            './data-science/', sample=IntSlider(0, 0, 4, 1))

<img src="https://mlatcl.github.io/advds/./slides/diagrams//data-science/ai-bd-dm-dl-ml-llm-google-trends.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>A Google trends search for ‘artificial intelligence’, ‘big
data’, ‘data mining’, ‘deep learning’, ‘machine learning’, ‘large
language model’ as different technological terms gives us insight into
their popularity over time.</i>

Google trends gives us insight into the interest for different terms
over time.

Examining Google trends search for ‘artificial intelligence’, ‘big
data’, ‘data mining’, ‘deep learning’, ‘machine learning’ and ‘large
language model’ we can see that ‘artificial intelligence’ was entering a
plateau of productivity, ‘big data’ is entering the trough of
disillusionment, and ‘data mining’ seems to be deeply within the trough.
On the other hand, ‘deep learning’ and ‘machine learning’ appear to be
ascending to the peak of inflated expectations having experienced a
technology trigger.

For deep learning that technology trigger was the ImageNet result of
2012 (Krizhevsky et al., n.d.). This step change in performance on
object detection in images was achieved through convolutional neural
networks, popularly known as ‘deep learning’.

Now there is a second technology trigger associated with large language
models.

## Data Science as Debugging

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/data-science-as-debugging.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_data-science/includes/data-science-as-debugging.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

One challenge for existing information technology professionals is
realizing the extent to which a software ecosystem based on data differs
from a classical ecosystem. In particular, by ingesting data we bring
unknowns/uncontrollables into our decision-making system. This presents
opportunity for adversarial exploitation and unforeseen operation.

blog post on [Data Science as
Debugging](http://inverseprobability.com/2017/03/14/data-science-as-debugging).

Starting with the analysis of a data set, the nature of data science is
somewhat difference from classical software engineering.

One analogy I find helpful for understanding the depth of change we need
is the following. Imagine as a software engineer, you find a USB stick
on the ground. And for some reason you *know* that on that USB stick is
a particular API call that will enable you to make a significant
positive difference on a business problem. You don’t know which of the
many library functions on the USB stick are the ones that will help. And
it could be that some of those library functions will hinder, perhaps
because they are just inappropriate or perhaps because they have been
placed there maliciously. The most secure thing to do would be to *not*
introduce this code into your production system at all. But what if your
manager told you to do so, how would you go about incorporating this
code base?

The answer is *very* carefully. You would have to engage in a process
more akin to debugging than regular software engineering. As you
understood the code base, for your work to be reproducible, you should
be documenting it, not just what you discovered, but how you discovered
it. In the end, you typically find a single API call that is the one
that most benefits your system. But more thought has been placed into
this line of code than any line of code you have written before.

An enormous amount of debugging would be required. As the nature of the
code base is understood, software tests to verify it also need to be
constructed. At the end of all your work, the lines of software you
write to interact with the software on the USB stick are likely to be
minimal. But more thought would be put into those lines than perhaps any
other lines of code in the system.

Even then, when your API code is introduced into your production system,
it needs to be deployed in an environment that monitors it. We cannot
rely on an individual’s decision making to ensure the quality of all our
systems. We need to create an environment that includes quality
controls, checks, and bounds, tests, all designed to ensure that
assumptions made about this foreign code base are remaining valid.

This situation is akin to what we are doing when we incorporate data in
our production systems. When we are consuming data from others, we
cannot assume that it has been produced in alignment with our goals for
our own systems. Worst case, it may have been adversarially produced. A
further challenge is that data is dynamic. So, in effect, the code on
the USB stick is evolving over time.

It might see that this process is easy to formalize now, we simply need
to check what the formal software engineering process is for debugging,
because that is the current software engineering activity that data
science is closest to. But when we look for a formalization of
debugging, we find that there is none. Indeed, modern software
engineering mainly focusses on ensuring that code is written without
bugs in the first place.

### Lessons

1.  When you begin an analysis, behave as a debugger.

-   Write test code as you go. Document those tests and ensure they are
    accessible by others.
-   Understand the landscape of your data. Be prepared to try several
    different approaches to the data set.
-   Be constantly skeptical.
-   Use the best tools available, develop a deep understand how they
    work.
-   Share your experience of what challenges you’re facing. Have others
    (software engineers, fellow data analysts, your manager) review your
    work.
-   Never go straight for the goal: you’d never try and write the API
    call straight away on the discarded hard drive, so why are you
    launching your classification algorithm before visualizing the data?
-   Ensure your analysis is documented and accessible. If your code does
    go wrong in production, you’ll need to be able to retrace to where
    the error crept in.

1.  When managing the data science process, don’t treat it as standard
    code development.

-   Don’t deploy a traditional agile development pipeline and expect it
    to work the same way it does for standard code development. Think
    about how you handle bugs, think about how you would handle very
    many bugs.
-   Don’t leave the data scientist alone to wade through the mess.
-   Integrate the data analysis with your other team activities. Have
    the software engineers and domain experts work closely with the data
    scientists. This is vital for providing the data scientists with the
    technical support they need, but also managing the expectations of
    the engineers in terms of when and how the data will be able to
    deliver.

**Recommendation**: Anecdotally, resolving a machine learning challenge
requires 80% of the resource to be focused on the data and perhaps 20%
to be focused on the model. But many companies are too keen to employ
machine learning engineers who focus on the models, not the data. We
should change our hiring priorities and training. Universities cannot
provide the understanding of how to data-wrangle. Companies must fill
this gap.

## Statistics to Deep Learning

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/statistics-to-deep-learning.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/statistics-to-deep-learning.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

# What is Machine Learning?

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/what-is-ml.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/what-is-ml.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

What is machine learning? At its most basic level machine learning is a
combination of

$$\text{data} + \text{model} \stackrel{\text{compute}}{\rightarrow} \text{prediction}$$

where *data* is our observations. They can be actively or passively
acquired (meta-data). The *model* contains our assumptions, based on
previous experience. That experience can be other data, it can come from
transfer learning, or it can merely be our beliefs about the
regularities of the universe. In humans our models include our inductive
biases. The *prediction* is an action to be taken or a categorization or
a quality score. The reason that machine learning has become a mainstay
of artificial intelligence is the importance of predictions in
artificial intelligence. The data and the model are combined through
computation.

In practice we normally perform machine learning using two functions. To
combine data with a model we typically make use of:

**a prediction function** it is used to make the predictions. It
includes our beliefs about the regularities of the universe, our
assumptions about how the world works, e.g., smoothness, spatial
similarities, temporal similarities.

**an objective function** it defines the ‘cost’ of misprediction.
Typically, it includes knowledge about the world’s generating processes
(probabilistic objectives) or the costs we pay for mispredictions
(empirical risk minimization).

The combination of data and model through the prediction function and
the objective function leads to a *learning algorithm*. The class of
prediction functions and objective functions we can make use of is
restricted by the algorithms they lead to. If the prediction function or
the objective function are too complex, then it can be difficult to find
an appropriate learning algorithm. Much of the academic field of machine
learning is the quest for new learning algorithms that allow us to bring
different types of models and data together.

A useful reference for state of the art in machine learning is the UK
Royal Society Report, [Machine Learning: Power and Promise of Computers
that Learn by
Example](https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf).

You can also check my post blog post on [What is Machine
Learning?](http://inverseprobability.com/2017/07/17/what-is-machine-learning).

## Classical Statistical Analysis

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/statistics-to-deep-learning.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/statistics-to-deep-learning.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Despite the shift of emphasis, traditional statistical techniques are
more important than ever. One of the few ways we have to validate the
analyses we create is to make use of visualizations, randomized testing
and other forms of statistical analysis. You will have explored some of
these ideas in earlier courses in machine learning. In this unit we
provide some review material in a practical sheet to bring some of those
ideas together in the context of data science.

## Artificial Intelligence and Data Science

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/data-science-vs-ai.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/data-science-vs-ai.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Machine learning technologies have been the driver of two related, but
distinct disciplines. The first is *data science*. Data science is an
emerging field that arises from the fact that we now collect so much
data by happenstance, rather than by *experimental design*. Classical
statistics is the science of drawing conclusions from data, and to do so
statistical experiments are carefully designed. In the modern era we
collect so much data that there’s a desire to draw inferences directly
from the data.

As well as machine learning, the field of data science draws from
statistics, cloud computing, data storage (e.g. streaming data),
visualization and data mining.

In contrast, artificial intelligence technologies typically focus on
emulating some form of human behaviour, such as understanding an image,
or some speech, or translating text from one form to another. The recent
advances in artificial intelligence have come from machine learning
providing the automation. But in contrast to data science, in artificial
intelligence the data is normally collected with the specific task in
mind. In this sense it has strong relations to classical statistics.

Classically artificial intelligence worried more about *logic* and
*planning* and focused less on data driven decision making. Modern
machine learning owes more to the field of *Cybernetics* (Wiener, 1948)
than artificial intelligence. Related fields include *robotics*, *speech
recognition*, *language understanding* and *computer vision*.

There are strong overlaps between the fields, the wide availability of
data by happenstance makes it easier to collect data for designing AI
systems. These relations are coming through wide availability of sensing
technologies that are interconnected by cellular networks, WiFi and the
internet. This phenomenon is sometimes known as the *Internet of
Things*, but this feels like a dangerous misnomer. We must never forget
that we are interconnecting people, not things.

<center>

Convention for the Protection of *Individuals* with regard to Automatic
Processing of *Personal Data* (1981/1/28)

</center>

## What does Machine Learning do?

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/what-does-machine-learning-do.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/what-does-machine-learning-do.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Any process of automation allows us to scale what we do by codifying a
process in some way that makes it efficient and repeatable. Machine
learning automates by emulating human (or other actions) found in data.
Machine learning codifies in the form of a mathematical function that is
learnt by a computer. If we can create these mathematical functions in
ways in which they can interconnect, then we can also build systems.

Machine learning works through codifying a prediction of interest into a
mathematical function. For example, we can try and predict the
probability that a customer wants to by a jersey given knowledge of
their age, and the latitude where they live. The technique known as
logistic regression estimates the odds that someone will by a jumper as
a linear weighted sum of the features of interest.

$$ \text{odds} = \frac{p(\text{bought})}{p(\text{not bought})} $$

$$ \log \text{odds}  = \beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}.$$
Here $\beta_0$, $\beta_1$ and $\beta_2$ are the parameters of the model.
If $\beta_1$ and $\beta_2$ are both positive, then the log-odds that
someone will buy a jumper increase with increasing latitude and age, so
the further north you are and the older you are the more likely you are
to buy a jumper. The parameter $\beta_0$ is an offset parameter and
gives the log-odds of buying a jumper at zero age and on the equator. It
is likely to be negative[1] indicating that the purchase is
odds-against. This is also a classical statistical model, and models
like logistic regression are widely used to estimate probabilities from
ad-click prediction to risk of disease.

This is called a generalized linear model, we can also think of it as
estimating the *probability* of a purchase as a nonlinear function of
the features (age, latitude) and the parameters (the $\beta$ values).
The function is known as the *sigmoid* or [logistic
function](https://en.wikipedia.org/wiki/Logistic_regression), thus the
name *logistic* regression.

$$ p(\text{bought}) =  \sigma\left(\beta_0 + \beta_1 \text{age} + \beta_2 \text{latitude}\right).$$
In the case where we have *features* to help us predict, we sometimes
denote such features as a vector, $\mathbf{ x}$, and we then use an
inner product between the features and the parameters,
$\boldsymbol{\beta}^\top \mathbf{ x}= \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 ...$,
to represent the argument of the sigmoid.

$$ p(\text{bought}) =  \sigma\left(\boldsymbol{\beta}^\top \mathbf{ x}\right).$$
More generally, we aim to predict some aspect of our data, $y$, by
relating it through a mathematical function, $f(\cdot)$, to the
parameters, $\boldsymbol{\beta}$ and the data, $\mathbf{ x}$.

$$ y=  f\left(\mathbf{ x}, \boldsymbol{\beta}\right).$$ We call
$f(\cdot)$ the *prediction function*.

To obtain the fit to data, we use a separate function called the
*objective function* that gives us a mathematical representation of the
difference between our predictions and the real data.

$$E(\boldsymbol{\beta}, \mathbf{Y}, \mathbf{X})$$ A commonly used
examples (for example in a regression problem) is least squares,
$$E(\boldsymbol{\beta}, \mathbf{Y}, \mathbf{X}) = \sum_{i=1}^n\left(y_i - f(\mathbf{ x}_i, \boldsymbol{\beta})\right)^2.$$

If a linear prediction function is combined with the least squares
objective function, then that gives us a classical *linear regression*,
another classical statistical model. Statistics often focusses on linear
models because it makes interpretation of the model easier.
Interpretation is key in statistics because the aim is normally to
validate questions by analysis of data. Machine learning has typically
focused more on the prediction function itself and worried less about
the interpretation of parameters, which are normally denoted by
$\mathbf{w}$ instead of $\boldsymbol{\beta}$. As a result, *non-linear*
functions are explored more often as they tend to improve quality of
predictions but at the expense of interpretability.

-   These are interpretable models: vital for disease etc.

-   Modern machine learning methods are less interpretable

-   Example: face recognition

[1] The logarithm of a number less than one is negative, for a number
greater than one the logarithm is positive. So if odds are greater than
evens (odds-on) the log-odds are positive, if the odds are less than
evens (odds-against) the log-odds will be negative.

# Deep Learning

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/deep-learning-overview.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/deep-learning-overview.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Classical statistical models and simple machine learning models have a
great deal in common. The main difference between the fields is
philosophical. Machine learning practitioners are typically more
concerned with the quality of prediciton (e.g. measured by ROC curve)
while statisticians tend to focus more on the interpretability of the
model and the validity of any decisions drawn from that interpretation.
For example, a statistical model may be used to validate whether a large
scale intervention (such as the mass provision of mosquito nets) has had
a long term effect on disease (such as malaria). In this case one of the
covariates is likely to be the provision level of nets in a particular
region. The response variable would be the rate of malaria disease in
the region. The parmaeter, $\beta_1$ associated with that covariate will
demonstrate a positive or negative effect which would be validated in
answering the question. The focus in statistics would be less on the
accuracy of the response variable and more on the validity of the
interpretation of the effect variable, $\beta_1$.

A machine learning practitioner on the other hand would typically denote
the parameter $w_1$, instead of $\beta_1$ and would only be interested
in the output of the prediction function, $f(\cdot)$ rather than the
parameter itself. The general formalism of the prediction function
allows for *non-linear* models. In machine learning, the emphasis on
prediction over interpretability means that non-linear models are often
used. The parameters, $\mathbf{w}$, are a means to an end (good
prediction) rather than an end in themselves (interpretable).

<!-- No slide titles in this context -->

## DeepFace

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/deep-face.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/deep-face.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

<img class="" src="https://mlatcl.github.io/advds/./slides/diagrams//deepface_neg.png" style="width:100%">

Figure: <i>The DeepFace architecture (Taigman et al., 2014), visualized
through colors to represent the functional mappings at each layer. There
are 120 million parameters in the model.</i>

The DeepFace architecture (Taigman et al., 2014) consists of layers that
deal with *translation* invariances, known as convolutional layers.
These layers are followed by three locally-connected layers and two
fully-connected layers. Color illustrates feature maps produced at each
layer. The neural network includes more than 120 million parameters,
where more than 95% come from the local and fully connected layers.

### Deep Learning as Pinball

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/deep-learning-as-pinball.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/deep-learning-as-pinball.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

<img class="" src="https://mlatcl.github.io/advds/./slides/diagrams//576px-Early_Pinball.jpg" style="width:50%">

Figure: <i>Deep learning models are composition of simple functions. We
can think of a pinball machine as an analogy. Each layer of pins
corresponds to one of the layers of functions in the model. Input data
is represented by the location of the ball from left to right when it is
dropped in from the top. Output class comes from the position of the
ball as it leaves the pins at the bottom.</i>

Sometimes deep learning models are described as being like the brain, or
too complex to understand, but one analogy I find useful to help the
gist of these models is to think of them as being similar to early pin
ball machines.

In a deep neural network, we input a number (or numbers), whereas in
pinball, we input a ball.

Think of the location of the ball on the left-right axis as a single
number. Our simple pinball machine can only take one number at a time.
As the ball falls through the machine, each layer of pins can be thought
of as a different layer of ‘neurons’. Each layer acts to move the ball
from left to right.

In a pinball machine, when the ball gets to the bottom it might fall
into a hole defining a score, in a neural network, that is equivalent to
the decision: a classification of the input object.

An image has more than one number associated with it, so it is like
playing pinball in a *hyper-space*.

<img src="https://mlatcl.github.io/advds/./slides/diagrams//pinball001.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>At initialization, the pins, which represent the parameters
of the function, aren’t in the right place to bring the balls to the
correct decisions.</i>

<img src="https://mlatcl.github.io/advds/./slides/diagrams//pinball002.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>After learning the pins are now in the right place to bring
the balls to the correct decisions.</i>

Learning involves moving all the pins to be in the correct position, so
that the ball ends up in the right place when it’s fallen through the
machine. But moving all these pins in hyperspace can be difficult.

In a hyper-space you have to put a lot of data through the machine for
to explore the positions of all the pins. Even when you feed many
millions of data points through the machine, there are likely to be
regions in the hyper-space where no ball has passed. When future test
data passes through the machine in a new route unusual things can
happen.

*Adversarial examples* exploit this high dimensional space. If you have
access to the pinball machine, you can use gradient methods to find a
position for the ball in the hyper space where the image looks like one
thing, but will be classified as another.

Probabilistic methods explore more of the space by considering a range
of possible paths for the ball through the machine. This helps to make
them more data efficient and gives some robustness to adversarial
examples.

## What are Large Language Models?

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/what-are-large-language-models.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ml/includes/what-are-large-language-models.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

## Probability Conversations

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ai/includes/conversation-probability.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ai/includes/conversation-probability.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

<img src="https://mlatcl.github.io/advds/./slides/diagrams//ai/anne-probability-conversation.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>The focus so far has been on reducing uncertainty to a few
representative values and sharing numbers with human beings. We forget
that most people can be confused by basic probabilities for example the
prosecutor’s fallacy.</i>

In practice we know that probabilities can be very unintuitive, for
example in court there is a fallacy known as the “prosecutor’s fallacy”
that confuses conditional probabilities. This can cause problems in jury
trials (Thompson, 1989).

## LLM Conversations

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ai/includes/conversation-llm.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ai/includes/conversation-llm.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

<img src="https://mlatcl.github.io/advds/./slides/diagrams//ai/anne-llm-conversation.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>The focus so far has been on reducing uncertainty to a few
representative values and sharing numbers with human beings. We forget
that most people can be confused by basic probabilities for example the
prosecutor’s fallacy.</i>

In [None]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('0sJjdxn5kcI')

Figure: <i>The Inner Monologue paper suggests using LLMs for robotic
planning (Huang et al., 2023).</i>

By interacting directly with machines that have an understanding of
human cultural context, it should be possible to share the nature of
uncertainty in the same way humans do. See for example the paper [Inner
Monologue: Embodied Reasoning through
Planning](https://innermonologue.github.io/) Huang et al. (2023).

## The MONIAC

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_simulation/includes/the-moniac.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_simulation/includes/the-moniac.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

[The MONIAC](https://en.wikipedia.org/wiki/MONIAC) was an analogue
computer designed to simulate the UK economy. Analogue comptuers work
through analogy, the analogy in the MONIAC is that both money and water
flow. The MONIAC exploits this through a system of tanks, pipes, valves
and floats that represent the flow of money through the UK economy.
Water flowed from the treasury tank at the top of the model to other
tanks representing government spending, such as health and education.
The machine was initially designed for teaching support but was also
found to be a useful economic simulator. Several were built and today
you can see the original at Leeds Business School, there is also one in
the London Science Museum and one [in the Unisversity of Cambridge’s
economics
faculty](https://www.econ.cam.ac.uk/economics-alumni/drip-down-economics-phillips-machine).

<img class="" src="https://mlatcl.github.io/advds/./slides/diagrams//simulation/Phillips_and_MONIAC_LSE.jpg" style="width:40%">

Figure: <i>Bill Phillips and his MONIAC (completed in 1949). The machine
is an analogue computer designed to simulate the workings of the UK
economy.</i>

## Donald MacKay

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/_ai/includes/donald-mackay-brain.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/_ai/includes/donald-mackay-brain.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

<img class="" src="\diagramdDir/people/DonaldMacKay1952.jpg" style="width:40%">

Figure: <i>Donald MacKay (1922-1987) was a physicist who focussed his
work on understanding the eye and the brain.</i>

Donald MacKay was a physicist who worked on naval gun targetting during
the second world war. The challenge with gun targetting for ships is
that both the target and the gun platform are moving. The challenge was
tackled using analogue computers, for example in the US the [Mark I fire
control
computer](https://en.wikipedia.org/wiki/Mark_I_Fire_Control_Computer)
which was a mechanical computer. MacKay worked on radar systems for gun
laying, here the velocity and distance of the target could be assessed
through radar and an mechanical electrical analogue computer.

## Fire Control Systems

<span class="editsection-bracket"
style="">\[</span><span class="editsection"
style=""><a href="https://github.com/lawrennd/talks/edit/gh-pages/../_ai/includes/fire-control-systems.md" target="_blank" onclick="ga('send', 'event', 'Edit Page', 'Edit', 'https://github.com/lawrennd/talks/edit/gh-pages/../_ai/includes/fire-control-systems.md', 13);">edit</a></span><span class="editsection-bracket" style="">\]</span>

Naval gunnery systems deal with targeting guns while taking into account
movement of ships. The Royal Navy’s Gunnery Pocket Book (The Admiralty,
1945) gives details of one system for gun laying.

<img class="" src="https://mlatcl.github.io/advds/./slides/diagrams//ai/low-angle-fire-control-team.jpg" style="width:80%">

Figure: <i>The fire control computer set at the centre of a system of
observation and tracking (The Admiralty, 1945).</i>

<img class="" src="https://mlatcl.github.io/advds/./slides/diagrams//ai/the-measurement-of-inclination.jpg" style="width:80%">

Figure: <i>Measuring inclination between two ships (The Admiralty,
1945). Sophisticated fire control computers allowed the ship to continue
to fire while under maneuvers.</i>

<img class="" src="https://mlatcl.github.io/advds/./slides/diagrams//ai/typical-modern-fire-control-table.jpg" style="width:80%">

Figure: <i>A second world war gun computer’s control table (The
Admiralty, 1945).</i>

## *Behind the Eye*

> Later in the 1940's, when I was doing my Ph.D. work, there was much
> talk of the brain as a computer and of the early digital computers
> that were just making the headlines as "electronic brains." As an
> analogue computer man I felt strongly convinced that the brain,
> whatever it was, was not a digital computer. I didn't think it was an
> analogue computer either in the conventional sense.

## Human Analogue Machine

<img class="" src="https://mlatcl.github.io/advds/./slides/diagrams//ai/human-analogue-machine.png" style="width:60%">

Figure: <i>The human analogue machine creates a feature space which is
analagous to that we use to reason, one way of doing this is to have a
machine attempt to compress all human generated text in an
auto-regressive manner.</i>

-   A human-analogue machine is a machine that has created a feature
    space that is analagous to the “feature space” our brain uses to
    reason.

-   The latest generation of LLMs are exhibiting this charateristic,
    giving them ability to converse.

-   Perils of this include *counterfeit people*.

-   Daniel Dennett has described the challenges these bring in [an
    article in The
    Atlantic](https://www.theatlantic.com/technology/archive/2023/05/problem-counterfeit-people/674075/).

-   But if correctly done, the machine can be appropriately
    “psychologically represented”

-   This might allow us to deal with the challenge of *intellectual
    debt* where we create machines we cannot explain.

<img src="https://mlatcl.github.io/advds/./slides/diagrams//data-science/new-flow-of-information004.svg" class="" width="70%" style="vertical-align:middle;">

Figure: <i>The trinity of human, data, and computer, and highlights the
modern phenomenon. The communication channel between computer and data
now has an extremely high bandwidth. The channel between human and
computer and the channel between data and human is narrow. New direction
of information flow, information is reaching us mediated by the
computer. The focus on classical statistics reflected the importance of
the direct communication between human and data. The modern challenges
of data science emerge when that relationship is being mediated by the
machine.</i>

<img src="https://mlatcl.github.io/advds/./slides/diagrams//data-science/new-flow-of-information-ham.svg" class="" width="70%" style="vertical-align:middle;">

Figure: <i>The HAM now sits between us and the traditional digital
computer.</i>

## Networked Interactions

Our modern society intertwines the machine with human interactions. The
key question is who has control over these interfaces between humans and
machines.

<img src="https://mlatcl.github.io/advds/./slides/diagrams//ai/human-computers-interacting.svg" class="" width="80%" style="vertical-align:middle;">

Figure: <i>Humans and computers interacting should be a major focus of
our research and engineering efforts.</i>

## Conclusions

In today’s lecture we’ve drilled down further on a difficult aspect of
data science. By focusing too much on the data and the technical
challenges we face, we can forget the context. But to do data science
well, we must not forget the context of the data. We need to pay
attention to domain experts and introduce their understanding to our
analysis. Above all we must not forget that data is almost always (in
the end) about people.

## References

## Thanks!

For more information on these subjects and more you might want to check
the following resources.

-   twitter: [@lawrennd](https://twitter.com/lawrennd)
-   podcast: [The Talking Machines](http://thetalkingmachines.com)
-   newspaper: [Guardian Profile
    Page](http://www.theguardian.com/profile/neil-lawrence)
-   blog:
    [http://inverseprobability.com](http://inverseprobability.com/blog.html)

Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., Zeng,
A., Tompson, J., Mordatch, I., Chebotar, Y., Sermanet, P., Jackson, T.,
Brown, N., Luu, L., Levine, S., Hausman, K., ichter, brian, 2023. [Inner
monologue: Embodied reasoning through planning with language
models](https://proceedings.mlr.press/v205/huang23c.html), in: Liu, K.,
Kulic, D., Ichnowski, J. (Eds.), Proceedings of the 6th Conference on
Robot Learning, Proceedings of Machine Learning Research. PMLR, pp.
1769–1782.

Krizhevsky, A., Sutskever, I., Hinton, G.E., n.d. ImageNet
classification with deep convolutional neural networks. pp. 1097–1105.

Taigman, Y., Yang, M., Ranzato, M., Wolf, L., 2014. DeepFace: Closing
the gap to human-level performance in face verification, in: Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition. <https://doi.org/10.1109/CVPR.2014.220>

The Admiralty, 1945. [The gunnery pocket book, b.r.
224/45](https://www.maritime.org/doc/br224/).

Thompson, W.C., 1989. [Are juries competent to evaluate statistical
evidence?](http://www.jstor.org/stable/1191906) Law and Contemporary
Problems 52, 9–41.

Wiener, N., 1948. Cybernetics: Control and communication in the animal
and the machine. MIT Press, Cambridge, MA.