# Machine Learning / AI Foundations



## What is ML/AI and what is not!

### Typical misconception

<br>
<figure align="center">
  <img src = "./images/ai01.jpg" width = 40%>
      <figcaption style = "text-align: center; font-style: italic">Fig 1. Scooby-Doo.</figcaption>
</figure>

<br>
<figure align="center">
  <img src = "./images/ai01_01.png" width = 40%>
      <figcaption style = "text-align: center; font-style: italic">Fig 2. Scooby-Doo.</figcaption>
</figure>

### Most probable

<br>
<figure align="center">
  <img src = "./images/ai02.png" width = 40%>
      <figcaption style = "text-align: center; font-style: italic">Fig 3. ML Enthusiast.</figcaption>
</figure>

<br>
<figure align="center">
  <img src = "./images/ai03.png" width = 40%>
      <figcaption style = "text-align: center; font-style: italic">Fig 4. Challenges.</figcaption>
</figure>

### The real thing

- Understanding data
  - Do I have the right data?
  - What is missing?
  - Is this data enough?
- Knowing what **building blocks** can help to solve my ML/AI problems (i.e. Models)
  - How much of Statistics, Mathematics, Probability do I need?
  - How many models can help? Only one? 
  - Have anyone dealt with this problem in the past?
- Other words for **models**
  - Equations
  - Formulas
  - Blackbox



## Machine Learning Techniques

Regardless of the technique you always have to have data.

<br>
<figure align="center">
  <img src = "./images/ml_techniques.png" width = 90%>
      <figcaption style = "text-align: center; font-style: italic">Fig 5. ML Techniques.</figcaption>
</figure>

### Supervised Learning

- You have data where you know the output
- You want to predict what should be the output based on unseen data
- The output could be one of the following:
  - **Number**: Problems of this kind are **Regression** problems.
  - **Label** or **class**: It is a category given to the data. Problems of this kind are **Classification** problems
  

#### Regression

- Input data is typical numeric data
- Output data must be a prediction based on the input data
- Output is also a number
- The model is an curve (i.e. an equation) that intends to represent that data.

**Algorithms**

- Logistic Regression
- Support Vector Regression
- Autoregressive Integrated Moving Average, ARIMA
- ...


Examples: 

- Time series
- Housing prices


<br>
<figure align="center">
  <img src = "./images/regression01.png" width = 70%>
      <figcaption style = "text-align: center; font-style: italic">Fig 6. Regression.</figcaption>
</figure>

#### Classification

- Input data can be numerical and categorical
- Output data must be a label
- The model is also a curve that separates a label/class from other label/classes

**Algorithms**

- Neural Networks
- Decisions Trees
- K-Nearest Neighbors, KNN
- Support Vector Machines, SVM
- ...

Examples:

- Optical Character Recognition, OCR
- Diseases identification: Cancer
- Image recognition
- Default and risk analysis (e.g. Fraud detection)

<br>
<figure align="center">
  <img src = "./images/iris01.png" width = 70%>
      <figcaption style = "text-align: center; font-style: italic">Fig 7. Classification.</figcaption>
</figure>

A comparison could be seen in the following figure...

<br>
<figure align="center">
  <img src = "./images/supervised01.png" width = 70%>
      <figcaption style = "text-align: center; font-style: italic">Fig 8. Comparison.</figcaption>
</figure>






### Unsupervised Learning

- It is about understanding data by trying to identity the underlying structure
  - Correlations
  - Clustering
  - Visualization
  - Desriptive statistics
- There are no labels and nothing to predict

**Algorithms**

- Clustering algorithms such as K-Means
- Principal Component Analysis, PCA
- ...


<figure align="center">
  <img src = "images/clustering01.png" width = 70%>
      <figcaption style = "text-align: center; font-style: italic">Fig 9. Clustering.</figcaption>
</figure>

<figure align="center">
  <img src = "images/clustering02.png" width = 70%>
      <figcaption style = "text-align: center; font-style: italic">Fig 10. Clustering.</figcaption>
</figure>


### Reinforcement Learning

- It is about learning from the **environment**
  - States in the environment can be discrete or continuous
- The models make decisions based on what they perceive. What is the next **action**!
  - Actions are predetermined (i.e. Finite)
  - Actions are taken based on a **Policy** (What is the current state and what action is the best option)
- Let's call the models **agents**
- Agents get a **reward** but this reward sometimes come late
  - You are doing it good
  - You are doing it bad
  - Good and bad are numbers

**Algorithms**

- Bellman Equation
- Deep Q-Learning
- Searches: 
  - Breadth-First Search
  - Depth-First Search
  - A*
- Game Theory 
- ...

<br>
<figure align="center">
  <img src = "./images/reinforcement01.png" width = 70%>
      <figcaption style = "text-align: center; font-style: italic">Fig 11. Reinforcement.</figcaption>
</figure>


#### Examples

- Autonomous autos
- Games such as Chess, Go, Checkers, Tic-tac-toe

<br>
<figure align="center">
  <img src = "./images/noLeftTurn.jpg" width = 50%>
      <figcaption style = "text-align: center; font-style: italic">Fig 12. No Left Turn.</figcaption>
</figure>


## Required Background

"If Your Only Tool Is a Hammer Then Every Problem Looks Like a Nail"

(Almost) **ALL** of the Machine Learning problems are based on **OPTIMIZATION**. with this in mind, an enthusiast should embrace the following concepts:

- Programming
- Linear Algebra
- Calculus
- Statistics and Probability
- Heuristics
- ...

Optimization means:

- The best combination of resources (e.g. Minimum cost, a cost-benefit trade-off)
- Shortest path
- Better performance
- The less the error is, the best!
  

## Software

### Typical Programming Languages

- [Python](https://www.python.org/)
- [Julia](https://juliaacademy.com/courses)
- [R](https://www.r-project.org/)
- [Octave](https://octave.org/) / [Matlab](https://nl.mathworks.com/products/matlab.html)

### IDEs

- [Jupyter](https://jupyter.org/)
- [R Studio](https://posit.co/products/open-source/rstudio/)

### Cloud Tools

- [Jovian](https://jovian.com/)
- [JetBrains Datalore](https://datalore.jetbrains.com/)
- [AWS SageMaker](https://aws.amazon.com/sagemaker)
- [Azure ML Studio](https://ml.azure.com/)
- [Google GCP Dataproc](https://cloud.google.com/dataproc/)
- [Google Vertex AI](https://cloud.google.com/vertex-ai)

<figure align="center">
  <img src = "./images/IA-Intro.png" width = 50%>
      <figcaption style = "text-align: center; font-style: italic">IA-Intro</figcaption>
</figure>