## Fundamentals of Machine Learning

### 4 branches of machine learning

* supervised learning
    - binary and multiclass classification
    - scalar regression
    - sequence generation: given a picture, predict a caption describing it.
    - syntax tree prediction: given a sentence, predict its decomposition into a syntax tree.
    - object detection: given a picture, draw a bouding box around certain objects inside the picture.
    - image segmentation: given a picture, draw a pixel-level mask on a specific object.
    
* unsupervised learning
    - finding interesting transformations of the input dat without the help of any targets, for data visualization, data compression or data denoising or better understand the correlations present in the data.
    - dimensionality reduction
    - clustering
    
* self-supervised learning
    - supervised learning without human-annotated labels.
    - autoencoders: a well-known instance where the generated targets are the input.
    - temporally supervised learning: predict next frame, given past frames, or next work in a text, given previous words.
    
* reinforcement learning
    - agent receives information about its environment and learns to choose action that will maximize some reward.
    

### Evaluating ML models

- split the data into: training, validation and test.
- hyper-parameter tuning on the validation
- information leaks: hyperparameter tune leaks validation data into the model.
- simple hold out validation
    - train, validation, test
    - not good with small data

```python
num_validation_samples = 10000
np.random.shuffle(data)
validation_data = data[:num_validation_samples]
data = data[num_validation_samples:]
training_data = data[:]
model = get_model()
model.train(training_data)
validation_score = model.evaluate(validation_data)
# model tuning
model = get_model()
model.train(np.concatenate([training_data, validation_data]))
test_score = model.evaluate(test_data)
```

- k-fold validation
```python
k=4
num_validation_samples = len(data)//4
np.random.shuffle(data)
validation_scores = []
for fold in range(k):
    validation_data = data[num_validation_sample*fold:num_validation_sample*(fold+1)]
    training_data = data[:num_validation_sample*fold]+data[num_validation_sample*(fold+1):]
    model = get_model()
    model.train(training_data)
    validation_score = model.evalluate(validation_data)
    validation_scores.append(validation_score)
validation_score = np.average(validation_scores)
model = get_model()
model.train(data)
test_score = model.evaluate(test_data)
```
    

- iterated k-fold validation with shuffling
    - for situations in which you have relatively little data available and you nee to evaluate the model as precisely as possible
    - applying k-fold validation multiple times, shuffling the data every time before splitting it k ways. 
    - final score is the average of the scores obtained at each run of k-fold validation. 
    
Things to keep in mind:
- data representation: random shuffle
- arrow of time: do not random shuffle if tring to predict the future given the past
- redundancy in data: ex. data points appear twice,, make sure training and validation are disjoint

### Data preprocessng, feature engineering and feature learning

Data preprocessing:
- vectorization: input and targets need to be tensors of flaoting-point data
- normalization: 
    - take small values: typically 0-1
    - homogenous: all features take values in roughly the same range
    - normalize each feature independently to have a mean 0 and std 1

```python
x -= x.mean(axis=0)
x /= x.std(acis=0)
```

- handling missing values
    - with neural network, its safe to input missing values as 0, with the condition that 0 is not already a meaningful value. The network will learn from exposure that 0 means missing.
    - if test has missing, but training does not, model will not leaned to ignore missing values. One should artificially generate training samples with missing entries.
- feature extraction

### Overfitting and Underfitting

regularization:
- reducing the network size