# Machine Learning (ML)

**"Can machines think [in the way that we do]?"** [1]


- The ML term was <font color='dodgerblue'>first used in 1959</font> by Arthur Samuel (an IBM researcher)


- Mathematical Foundation
    - <font color='dodgerblue'>Statistics</font> (the "work-horse" of ML)
    - Calculus (derivatives; optimizations)
    - Algerbra (vectors, matrix, tensors)


- Different components were developed by researchers for many years. Only recently they were collected into libraries that make the ideas more accessible.

## Machine Learning Catagories

1. <font color='dodgerblue'>Shallow learning (e.g. **s**ci**k**it-**learn** - a.k.a. **sklearn**)
    - <font color='dodgerblue'>predefined features</font>

1. Deep learning (e.g. TensorFlow, PyTorch)
    - <font color='dodgerblue'>feature learning</font>
    - mostly <font color='dodgerblue'>combines shallow learning<font color='dodgerblue'> instances together</font> into <font color='dodgerblue'>"layers"</font>


**Sources**:
1. Turing, Alan M. "Computing machinery and intelligence." Parsing the Turing test. Springer, Dordrecht, 2009. 23-65.

**Additional Resources**:
1. https://en.wikipedia.org/wiki/Machine_learning

<hr style="border:2px solid gray"></hr>

# Shallow Learning

## Catagories

| Regression | Classification | Clustering | Dimension Reduction|
| :-: | :-: | :-: | :-: |
| <font color='dodgerblue'>Linear</font> | Logistic Regression | <font color='dodgerblue'>K-means</font> | <font color='dodgerblue'>Principle Component Analysis</font> |
| <font color='dodgerblue'>Polynomial</font> | <font color='dodgerblue'>Support Vector Machine</font> | Mean-Shift | Linear Discriminant Analysis |
| StepWise | Naive Bayes | DBScan | Gernalized Discriminant Analysis |
| Ridge | Nearest Neighbor | Agglomerative Hierachcial | Autoencoder |
| Lasso | Decision Tree | Spectral Clustering | Non-Negative Matrix Factorization |
| ElasticNet | <font color='dodgerblue'>Random Forest</font> | Gaussian Mixture | UMAP |

## Supervised vs. Unsupervised Learning

1. **Supervised** - the **target information is known** in the data set, and we **train to reproduce** that information
    - <font color='dodgerblue'>regression</font>
    - <font color='dodgerblue'>classification</font>

1. **Unsupervised** - the **target information is unknown**, with the goal to 
    - cluster the data's similarity (<font color='dodgerblue'>clustering</font>)
    - determine the distribution of data (<font color='dodgerblue'>density estimation</font>)
    - <font color='dodgerblue'>dimensionality reduction</font> for exploring and visualization

<p><img alt="Accuracy vs Precision" width="800" src="00_images/31_machine_learning/scikit_learn_ml_map.png" align="center" hspace="10px" vspace="0px"></p>

Image Source (interactive): https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

<hr style="border:2px solid gray"></hr>

# Deep Learning - Unsupervised Learning


#### Neural network
- **Input Layer**: <font color='dodgerblue'>features (observables)</font> should have some degree of correlation (i.e., structure; nonlinear relationships)
- Encoder: input $\rightarrow$ hidden layers (<font color='dodgerblue'>data reduction</font>)
- **Hidden Layer**: a <font color='dodgerblue'>compressed knowledge representation</font> of the original input
- Decoder: hidden layers $\rightarrow$ <font color='dodgerblue'>output</font>
- **Output Layer**

<p><img alt="neural network" width="800" src="00_images/31_machine_learning/deep_neural_network.png" align="center" hspace="10px" vspace="0px"></p>

Image Source: https://www.studytonight.com/post/understanding-deep-learning

#### Autoencoders - generative models (i.e., <font color='dodgerblue'>creates new things</font>)

Autoencoders are involved in deep learning algorithms. They **encode an input** (i.e., something that is human-relatable) and **transform it into a different representation** within the latent space, and then **decode** back to something **human-relatable**. This allows for new things to be generated.


- https://www.jeremyjordan.me/autoencoders/
- <font color='dodgerblue'>Sparse</font> Autoencoder
    - **hidden** layers have the **same number of nodes** as the **input** and **output** layers
    - loss function includes a penalty for "activating" a node within the hidden layer

<br>

- <font color='dodgerblue'>Denoising</font> Autoencoder
    - slightly **corrupt** the **input data** (i.e., add noise) to help make the encoding/decoding more generalizable
    - **target data** remains **uncorrupted**
    - make the decoding (reconstruction function) insensitive to small changes in the input

<br>

- <font color='dodgerblue'>Contractive</font> Autoencoder
    - make the **encoding** (feature extraction function) **less sensitive** to **small changes** within the **input data**
    - learn similar encoding (hidden layer) for different inputs that vary slightly

<br>

- <font color='dodgerblue'>Variational</font> Autoencoder (VAE)
    - https://arxiv.org/abs/1606.05908
    - training using **backpropagation** (aka **backward propagation of error**)
        - backpropagation - https://www.ibm.com/think/topics/backpropagation
        - starting from an **output**, compute the **importance** (measured as a gradient) that each neural network **parameter** has on the final model's **error** (predicted values) (i.e., loss function)
    - encoding is **regularized** (adding a penalty term to the model's loss function during the learning process) to ensure that the latent space has good properties (and thus, allowing us to have generative models to be created)
        - regularization - https://en.wikipedia.org/wiki/Regularization_(mathematics)



<!-- - Generative Adversarial Networks (GANs)
    - two networks oppose each other (a generator and a discriminator), for which both iteratively improve -->