# Machine learning

# Learning
* Enables problem solving:
    * the transition of a state with respect to a quality factor
    * state x is changed with some function
    * f(speech) = emotion
    * f(x) = x'
    * e.g. f(x) = a x + b
    * find a and b so that x' is optimal

# Terminology

## Loss function
 is the function that artificial neural nets use to track progress, i.e. the function that evaluates the predicted outcome with the desired one. Finding a good loss function is crucial for your task.

## Backpropagation
Fundamental way to train neural networks by evaluating the error with the loss function and than propagating it backwards towards the input layer, by taking the derivative.

## Batch size
number of samples in one batch in the training which are used together to compute the error (-> loss function) and do the backpropagation step

## Embeddings
 are learned representations of data, usually the pen-ultimate layer of a pretrained artificial neural net. 

## Latent space
 means the property of deep artificial neural nets to represent specific features of the data within the higher layers, for example speaker characteristics or expressed emotion in a net trained for speech synthesis. This is often used to influence the output in a desired way, for example simulating a specific speaking style.

## Freezing
 layers in an ANN means to not update the weights, as they might contain knowledge that should not be forgotten (from a pretrained net) or to make the training faster.


## Drop out
is the technique to delete a number of randomly selected neurons in a hidden layer during training to prevent [overfitting](http://blog.syntheticspeech.de/2022/02/16/kinds-of-machine-learning/#Overfitting).

## Patience 
* Number of epochs with no improvement after which training will be stopped.

## Overfitting
* Means that the machine learner performs well on the training but not on any other data. 
* This is usually the case when the model has enough complexity to distinguish all training data and is trained for enough periods (one period is one run through the training). 
* Measures against this are subsumed under the label *regularization*. 

## Vanishing / exploding gradient 
Means that the weights of the neurons become too small or too large for the net to be stable. 

This happens especially with very deep (many layers) networks.

## Bias vs. variance 
 means the trade-off between generalization (high bias, underfitting) and specification (high variance, overfitting). You can either 
 
* have simple models, like e.g. linear regression classifiers, that will treat every input with a similar strong bias (wrong decisions), irrespective of the training set, or 
* very complex models (e.g. a neural net with many layers) that will be more exact but very specific to your training data.

[Here](https://mlu-explain.github.io/bias-variance/)'s a nice visualization of bias vs. variance. 

# How to split your data
In supervised machine learning, you usually need three kinds of data sets:
* train data: to teach the model the relation between data and labels
* dev data: (short for *development*) to tune meta parameters of your model, e.g. 
    * *number of neurons*, 
    * *batch size* or 
    * *learning rate*.
* test data: to evaluate your model ONCE at the end to check on generalization

Of course all this is to prevent [*overfitting*](http://blog.syntheticspeech.de/2022/02/16/kinds-of-machine-learning/#Overfitting) on your train and/or dev data.

If you've used your test data for a while, 
you might need to find a new set, 
as chances are high that you overfitted 
on your test during experiments.

So what's a good split?

Some rules apply:
* train and dev can be from the same set, but the test set is ideally from a different database.
* if you don't have so much data, a 80/20/20 % split is normal
* if you have masses an data, use only so much dev and test that your population seems covered.
* If you have really little data: use [x cross validation](http://blog.syntheticspeech.de/2022/11/28/how-to-evaluate-your-model/#X_fold_cross_validation) for train and dev, still the test set should be extra

## Nkululeko exercise 1


Edit the [demo configuration](https://github.com/felixbur/nkululeko/blob/main/demos/exp_emodb.ini)

Set/keep as target *emotion* as FEAT type *os* and as MODEL type *xgb*

Use the emodb as test and train set but try [out all split methods](https://github.com/felixbur/nkululeko/blob/main/ini_file.md#data)
* specified
* speaker split
* random
* loso
* logo 
* 5_fold_cross_validation

Which works best and why?

## Nkululeko exercise 2
Set the 
```
[EXP]
epochs = 200
[MODEL] 
type = mlp
layers = {'l1':1024, 'l2':64} 
save = True
[PLOT]
epoch_progression = True
best_model = True
```
run the experiment.
Find the epoch progression plot and see at which epoch overfitting starts.

# Evaluation
This post is about evaluation of machine learning models, obviously the answer to the question if a model is any good depends a lot on how you test that.

## Criteria
Depending whether you got a classification or regression problem you can choose from a multitude of measures.

# Classification
Most of these measures are derived from the confusion matrix:
* **Confusion Matrix** : Matrix with results: rows represent the real values and columns the predictions. 
* In the binary case, the cells are called *True Positive* (TP), *False Negative* (FN: Type 2 error), *False Positive* (FN: Type 1 error) and *True Negative* (TN)

<img src=images/Prec-recall.png width=20%>

