# 06.00 Machine Learning

Until now we have seen a lot of ways to describe data,
(including plotting, description statistics and dimensionality) which did allow
us to get some insight into the processes that caused the creation of that data.
We can also argue that we can get insight into the process of measuring this data.
The analysis and interpretation of data are a branch of statistics therefore we can
classify what we have been doing until now as an exercise in statistics.

Yet now, we will attempt to understand techniques that are less concerned with
understanding the data but more concerned with practical use of the information
contained within that data.  We enter the realm of Machine Learning (ML).
Some would name it Artificial Intelligence (AI), and often there is no difference
between what is machine learning and what is artificial intelligence.
We will take the practical approach and not be pedantic about our naming conventions,
for our purposes we will explore the algorithms and techniques in front of us
and not worry about distinctions between machine learning and artificial intelligence."

![Machine](skl-terminator.svg)

<div style="text-align:right;"><sup>skl-terminator.svg</sup></div>

Machine learning can be understood as a branch of statistics and
is classified as such by some.
On the other hand, several people argue that machine learning
is a different area that overlap with statistics.
As with the overlap with artificial intelligence,
we will not care about the schematics and we will simply say that
machine learning is a collection of techniques that extract and
use information contained in the data to **predict the behavior**
of similar data.

Note that this is different from the general goal of statistics.

- Statistics' goal is to interpret the resulting model of the data
  and from there understand the inherent process that creates the data.
  Whether we can construct a similar process and create new data
  in similar fashion is not a requirement.

- Machine learning's goal is to construct a model that will predict the
  behavior of new inputs just as if the inherent process would perform,
  this without necessarily performing or understanding the inherent process
  creating the data.

Despite the goal outline it is viable to, and we often do,
perform statistics on the products of machine learning.
For example, after running thousands of models we perform statistics
to try to understand relations in the *hyperparameters* of an ML model.
We will come back to what these *hyperparameters* are shortly.

## Forms of Machine Learning

Although vastly outdated the common classification of machine learning techniques
into groups follows.
The terminology that we often see in ML consist of a *problem* - which is
our data from which we want to build a model; a *model* - which is a trained
machine learning algorithm that behaves in a manner that mirrors a real
process that makes the problem into a solution; and a *solution*
or *answer* - which is the result that the mirrored real process builds from the problem.
For example, in a bakery the problem is flour, salt, eggs and water,
the model is the baking process and the solution is bread.

In **Supervised Learning** one has some answers to the problem
and plans to automate the solution to this problem.
The algorithms will attempt to find how data inputs map to the solutions.
After which the resulting models will be able to give a solution
to data never seen before.
We often subdivide supervised learning further into.

- *Classification*, where one predicts crisps classes.
  In other words one identifies, for example,
  ships from among cars.
  But there is no middle ground, an amphibious vehicle
  will be identified as a car or as a ship.
  Classification is the most common problem in the world around us,
  e.g. is the figure I'm walking towards a person or a lamp post?
  are they moving away or towards me?
  These are problems we humans face thousands of times during a day.
  Hence classification is also the most common ML implementation
  out there.

- *Regression*, where the answers are ordered numbers.
  The difference against classification is that the answer
  from a regression algorithm is a value anywhere within
  a reasonable range.
  We can have $27$ as the answer to a problem, so we can have $42$;
  and also any value in between, such as $32.64$.
  Regression is most often used for ranking lists of items.
  Yes, your web search and your recommendation lists on shopping
  websites are ordered according to regressions.
  Another common use for regression algorithms is in the area
  of physical control of matter.
  Since matter has continuous nature in the world we see,
  managing water levels, wind speeds, temperature, or tremor intensity;
  is often done against regression predictions.

In **Unsupervised Learning** we do not have solutions to the problem
we are attempting to solve.
But we are going to try to solve it anyway.
Moreover, if we identify patterns that may turn to be useful,
then we can use these patterns to find identifiers for new data.
Even if we do not know what these identifiers are
or what they may mean in the real world,
these tell us how the new data relates to the data we already have.
We subdivide this search for patterns into.

- *Dimensionality Reduction*, which isn't a collection of techniques
  for pattern search itself but one closely related to it.
  Most patterns we search using ML are not easy to see,
  otherwise we would have seen them already.
  Patterns in highly dimensional spaces are particularly difficult
  for us humans to visualize.
  Dimensionality reduction techniques attempt to reduce a high
  dimensional dataset into a manageable chunk of dimensions
  without losing the patterns within.
  This may be achieved by projecting dimensions,
  from where the group of techniques gets its name.
  But many other techniques exist that keep inter-distance between
  data points instead of general distance projections,
  or define close and far away points probabilistically and
  place points in a new projection according to how
  far or close they are from each other.

- *Clustering* is the ML group of techniques to find patterns in data
  about which we can tell little about at first sight.
  We can find groupings within the data that are similar to each other.
  Despite the fact that we may not know the real reasons why these
  groupings are close or far apart.
  We can then use the groupings we discover to determine in which of
  the groupings new data points belong.
  The classic example of clustering are social networks.
  We can collect big numbers of features (dimensions)
  about individuals (data points), and then cluster them together.
  The result will be circles of friends among these individuals,
  yet we have little understanding on why exactly these specific
  groups of friends form; as opposite to other completely different
  circles of friends between the same people.
  Interpretation of clustering results is often hard,
  do not be fooled that a clustering technique magically presents
  you with the best groupings if you run it enough times.
  For example, based on a clustering feature count alone,
  arguing that it is more likely that playing golf wins you friends,
  rather than playing squash is *not* knowledge.
  It is just a statistical artifact that some scientists
  publish as click bait and then consider it "science".

A single technique does not necessarily fit one specific bullet point,
e.g. SVMs can be used for classification or regression,
and Neural Networks can be used for any of the points above.
We will use this grouping of ML techniques as we explore some algorithms.
Later we will come back and add several new groupings,
needed due to the fact that some ML techniques do not fit
into any of the above.

Also note that the bakery analogy can be extended further to the terminology
used in machine learning.
A problem does not necessarily has only one solution, for example,
from eggs, flour and salt one can make break but also make cake.
Or one could attempt to be creative in the kitchen and attempt to invent
a completely new dish in the style of unsupervised learning.

## Validation

Machine learning is often more practical than a statistical approach.
And since less rigor is required to achieve its goals,
powerful techniques exist in ML that can find solutions to very hard problems.
On the other hand, this premise does not come without its drawbacks.
Since there is no full rigor in how many of the most powerful ML techniques operate,
it is difficult to know whether the model one builds is a good model or not.
Moreover, several ML techniques have free parameters - that are often referred to
as *hyperparameters* - for which one needs to find appropriate values for
the specific problem.

Since we do not have mathematical rigor on the selection of these hyperparameters,
we need some form of checking whether our model is (more-or-less) right.
A *validation* of the model.
But that was a lot of theory and vocabulary,
we should first try things out and then we come back to the tuning of these hyperparameters.
Once we see how to build a model in Python,
then how those so-called hyperparameters work,
and then we see if we can think of a way of evaluating our models.