# Machine Learning (ML; a.k.a. self-teaching computers)

**"Can machines think [in the way that we do]?"** [1]


- The ML term was first used in 1959 by Arthur Samuel (an IMB researcher)


- Mathematical Foundation
    - Statistics (the "work-horse" of ML)
    - Calculus (derivatives; optimizations)
    - Algerbra (vectors, matrix, tensors)


- Different components have been developed by researchers for a long time, but recently collected together into libraries that make the ideas more accessible.

## Machine Learning Catagories

1. Shallow learning (e.g. scikit-learn - a.k.a. sklearn)
    - predefined features

1. Deep learning (e.g. tensorflow, pytorch)
    - feature learning
    - mostly combines shallow learning together into "layers"


**Sources**:
1. Turing, Alan M. "Computing machinery and intelligence." Parsing the turing test. Springer, Dordrecht, 2009. 23-65.

**Additional Resources**:
1. https://en.wikipedia.org/wiki/Machine_learning

---

# Shallow Learning

## Catagories

| Regression | Classification | Clustering | Dimension Reduction|
| :-: | :-: | :-: | :-: |
| Linear | Logistic Regression | K-means | Principle Component Analysis |
| Polynomial | Support Vector Machine | Mean-Shift | Linear Discriminant Analysis |
| StepWise | Naive Bayes | DBScan | Gernalized Discriminant Analysis |
| Ridge | Nearest Neighbor | Agglomerative Hierachcial | Autoencoder |
| Lasso | Decision Tree | Spectral Clustering | Non-Negative Matrix Factorization |
| ElasticNet | Random Forest | Gaussian Mixture | UMAP |

## Supervised vs. Unsupervised Learning

1. **Supervised** - the **target information is known** in the data set, and we **train to reproduce** that information
    - regression
    - classification

1. **Unsupervised** - the **target information is unknown**, with the goal to 
    - cluster the data's similarity (clustering)
    - determine the distribution of data (density estimation)
    - reduce the dimensions for purpose of visualization

<p><img alt="Accuracy vs Precision" width="800" src="00_images/31_machine_learning/scikit_learn_ml_map.png" align="center" hspace="10px" vspace="0px"></p>

Source (interactive): https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

---
# Deep Learning - Unspervised Learning


#### Neural network
- Input Layer - feature should have some degree of correlation (i.e. structure; nonlinear relationships)
- Hidden Layer ("bottleneck") - a compressed knowledge representation of the original input
- Output Layer
- Encoder - input-> hidden layers (data reduction)
- Decoder - hidden layers -> output


<!-- 2. Generative models
    - High dimensional data
    - Usually interested in generating data that is like the input data (but not the same) -->

#### Autoencoders - generative models (i.e. creates new things)
- https://www.jeremyjordan.me/autoencoders/
- Sparse Autoencoder
    - hidden layers have the same number of nodes as the input and output layers
    - loss function include a penalty for "activating" a node within the hidden layer

- Denosing Autoencoder
    - slighly corrupt the input data (i.e. add noise) to help make the encoding/decoding more generalizable
    - target data remins uncorrupted
    - make the decoding (reconstruction function) insensitive to small changes in the input
- Contractive Autoencoder
    - make the encoding (feature extraction function) less sensitive to small changes within the input data
    - learn similar encoding (hidden layer) for different inputs that vary slightly

- Variational Autoencoder (VAE)
    - https://arxiv.org/abs/1606.05908
    - training using backpropogation
    - encoding is regularized (during the learning process) to ensure that the latent space has good properties (and thus, allowing us to have generative models to be created)

- Generative Adversarial Networks (GANs)
    - two networks oppose each other (a generator and a discriminator), for which both iteratively improve