### What is Machine Learning?

ML is a subfield of AI that is concerned with using specialized algorithms to uncover meaningful information and find hidden patterns from perceived data to corroborate the rational decision-making process.

![AI-ML-DS.png](attachment:e84b7f0c-61fb-40eb-a24d-eb6eb51813a6.png)

### Supervised, Unsupervised and Reinforced Learning

Most of the ML algorithms can be separated into 3 categories:

1. Supervised Learning. This is the most common type of ML. In this case, our model is trained on labeled dataset and our model is training to predict those labels which can be a category (classification problem) or real values (regression problem).

    Each data point is called a feature vector and its entries are called features. The input of a supervised model are feature vectors and the output of this model is called a "label" or a "target". After training we want our model to predict labels based on features of previously unseen data. Examples of such models include Linear Regression, Logistic Regression, Desicion Trees, Support Vector Machines for classifications, some Neural Networks, etc.

2.  Unsupervised Learning. In this case, our model is trained on unlabeled dataset. So, the model is trying to find patterns in actual data. Often this includes clustering and dimensionality reduction. Examples of this includes PCA, K-mean clustering, t-SNE, etc.

3.  Reinforced Learning. In this case, model is trained by interacting with its environment. It "sees" the state of this environment as a feature vector. The model performs actions on it and is being rewarded or penalized based on its actions, and trying to maximize the reward. Such models are good when data is sequential and model needs to learn a sequence of actions. This can includes game playing AI, robotics, tuning language models like Chat GPT, etc.



### The main workflow of ML is as follows:

1. Data collection, which includes organizing data in a suitable format.
2. Data pre-processing, which includes cleaning of data, normalizing data, dimensionality reduction. Finally, splitting our data into training dataset and testing data (training data set may be split further into training and validation datasets).
3. Choosing and training the model. This depends on type of data given, type of questions we want to ask, amount of data we have, etc. Training also includes hyperparameter tuning usually using cross validation. Training is done using "Loss Function" that determines how badly our model is performing and trying to minimize it.
4. Assessing model performance based on testing dataset.


### More detailed training part.

In the actual training part, we have to make a few decisions: we have to choose

1. the model,
2. its hyperparameters (parameters that do not change during training),
3. loss function,
4. optimizers (tool to find minimums of loss function)

Then we perform forward pass: evaluate our model and loss function.
Then we perform backward pass: we use optimizer to find gradients ("derivatives") of loss function and figure out how to change our parameters so that loss is smaller.
Finally, we update our parameters and repeat the process.

This gets repeated until we reach some stopping critiria.

### Overfitting and Underfitting

The overall goal of learning is to obtain a model that generalizes well to new and unseen data. This unseen data is usually a testing set for us. We usually assume that our training and testing sets are identically distributed and are good representatives of populations. Overfitting and underfitting are two terms we use to diagnose model based on training and testing performance.

Model underfits when it doesn't perform well on either testing nor training set. Model overfits when it performs extremely well on the training set, but poorly on the testing set.

Other terms that are used are variance and bias. To simplify a bit, we can say that “high variance” (we are not being very precise) is proportional to overfitting, and “high bias” (we are not being very accurate) is proportional to underfitting.

We want to avoid both of these problems, but in reality, if we try to minimize one of them, we necessarily increase the other. This is called "Bias-Variance Tradeoff". We generally want to find a good balance. Both of these problems are related to parameter space. The more parameters we use the more likely we are to overfit, the less parameters we use the more likely we are to underfit. Consider the following regression example. Here, the green line is underfitting (it is a linear function, so it has 2 parameters) and blue line is overfitting (it is a polynomial of degree 10 curve; it has 11 parameters). And the red line is an ok fit (this is a quadratic function, that has 3 parameters).

![over-underfit.png](attachment:df27c5ee-0a1b-402d-a43c-0624b6da722c.png)

### Shallow vs Deep

A shallow learning algorithm learns the parameters of the model directly from the features
of the training examples. Most supervised learning algorithms are shallow. The common exceptions are neural network learning algorithms, specifically those that build neural
networks with more than one layer between input and output. Such neural networks are
called deep neural networks. In deep neural network learning (or, simply, deep learning),
contrary to shallow learning, most model parameters are learned not directly from the features
of the training examples, but from the outputs of the preceding layers.