 # <b>1 <span style='color:#F76241'>|</span> What is machine learning?</b>

<font size="9">M</font> achine learning (or `ML`for short) is the art of making programs that can *learn automatically from data*. 

What does `learn` mean in this context, you may be asking? 

> The "learn" in machine learning means a program can **teach itself** how to make decisions by reading vast amounts of data and looking for patterns. Machine learning systems are **never told explicitly** how to make decisions.

The process of learning from data is called `training`.

When talking about `ML` systems, people often use the term `pipeline`, which I define as follows:

> **A sequence of steps that are required for a ML system to function**. 

Wow, now that's an abstract and hard to visualize definition. The following is a visualization that shows it instead:

<img src="assets/images/pipeline.jpg"  width="600" height="200">
<font size="1"> (image <a href="https://valohai.com/machine-learning-pipeline/">credits</a>) </font>

Each step in the `pipeline` holds an **important function** that is vital for the machine learning system to operate. It's aptly named "pipeline" because you can imagine data as water, flowing through a pipeline comprised of various pipes that, in this case, have different functions:

<img src="assets/images/real_pipeline.jpg"  width="600" height="200">
<font size="1"> (image <a href="https://www.apollotechnical.com/what-is-pipeline-management-why-it-matters/">credits</a>) </font>


<div class="alert alert-block alert-info"><b>Note:</b> The pipeline visualizations above contain only the <em>beginning</em> of a pipeline because later steps involve more complex things like model deployment and maintenance. These notebooks <b>only</b> cover <b>steps 1-4</b>. Additionally, sometimes the beginning of a pipeline may involve either <b>extra</b> or <b>less</b> steps. It's dependent on what you're working on. But the majority of them will follow the above structure. </div>


Before diving into step 1 of the pipeline, there's some more machine learning background I need to cover. I *promise* we'll get to the fun predictive modeling soon. Trust me!

# <b>2 <span style='color:#F76241'>|</span> Types of machine learning systems</b>

Not all machine learning models are the same. The following two subsections define what **`supervised`** and **`unsupervised`** learning are and how they differ from each other.
 

<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px;">
    <p style="padding: 8px;color:white;text-align: center;">2.1.<em> Supervised</em></p>
</div>

**`Supervised learning`** is a learning technique where you provide a ML model two things:

- `Data points` (**X**) - the information you want the model to learn from
- `Labels` (**y**) - the correct, labeled output attached to each instance of **X**.

The data points and labels are represented as an (**X**, **y**) pair.

The **`absolute goal`** of a supervised ML system is to **train** it to predict the **correct labels**. We do this by showing it many instances of data in hopes that it will learn what we want it to learn. Each time the model makes a prediction, we **compare** it to the **correct label** to see if the model did well. This is what the `supervised` part refers to.

<div class="alert alert-block alert-info"><b>Note:</b> There are numerous ways to refer to <b>y</b> such as: <b>labels, targets, gold labels, classes, truths,</b> and possibly more. They all refer to the same idea: the correct labels that the model learns how to predict. I will use the terms interchangeably.</div>

<img style="float: left; padding: 0px 10px 0px 0px;" src="assets/images/shapes.jpg"  width="200" height="50">

Let's look at a very simple example. Say we wanted to train a model to predict a shape. The training data would consist of (**X**, **y**) pairs. Our **X** would consist of the shapes we want the model to recognize, and our **y** would be the correct label for each shape instance.

In this example, our **true labels** are _circle_, _square_, and _triangle_. These are the labels we need our model to learn how to predict correctly. To accomplish this, imagine we have 10,000 instances of these (**shape**, **label**) pairs. You could imagine the training process being as follows:

```
(🟥, square) -> Model predicts "circle" -> Check if "circle" == square -> Incorrect
(🟥, square) -> Model predicts "triangle" -> Check if "triangle" == square -> Incorrect
(🟥, square) -> Model predicts "square" -> Check if "square" == square -> Correct
(🔴, circle) -> Model predicts "circle" -> Check if "circle" == circle -> Correct

```

This process is followed for **every instance in our training data** and over time, as the model sees more examples, it will **gradually get better at predicting**. Thankfully, we don't need to do this manually, as machine learning APIs do this for us. But the intuition is important.


<div class="alert alert-block alert-success">You can think of this as being <em>similar</em> to how humans learn; the more we do things, the better we get at it. It's (roughly) the same concept, although humans require <em>significantly</em> less "training data" to learn things.</div>

If this all seems too abstract at the moment, don't fret! When we start constructing machine learning models and actually applying these concepts, it will be easier to visualize.





<div style="color:white;display:fill;border-radius:8px;
            background-color:#323232;font-size:150%;
            font-family:Nexa;letter-spacing:0.5px;">
    <p style="padding: 8px;color:white;text-align: center;">2.2<em> Unsupervised</em></p>
</div>


<div class="alert alert-block alert-info"><b>Note:</b> There are more types of learning, but they are more advanced and require understanding of the two described above. Here are some of them: <a href="https://ai.stackexchange.com/questions/10623/what-is-self-supervised-learning-in-machine-learning" target="_blank">self-supervision</a>, <a href="https://www.altexsoft.com/blog/semi-supervised-learning/" target="_blank">semi-supervision</a>, <a href="https://www.snorkel.org/blog/weak-supervision" target="_blank">weak-supervision</a>, and <a href="https://www.synopsys.com/ai/what-is-reinforcement-learning.html" target="_blank">reinforcement learning</a>. They're not important to know for these notebooks, but feel free to read these links.</div>

 # <b>3 <span style='color:#F76241'>|</span> Train, dev, test</b>