# Mini-project



## General guidelines

We want to make a typical study of a ML problem.

We're going to use Fashion-MNIST ("*fashion-mnist-reshaped.npz*")  as data set, attempting to classify the pictures correctly.

There are 2 parts in the project:
- use `DecisionTreeClassifier` and PCA from sklearn to classify the data
- make your own multi-class classifier, deriving its updates from scratch
The first part weights more in the total grade than the second one.

In the first part, the goal is to showcase a typical hyper-parameter tuning. We will simulate the fact of having different tasks by restricting ourselves to different dataset size, and comment on how hyper-parameters choice can depend a lot on how much data we have at hand.

General advice: **write clean code**, well factored in functions/classes, for each question, as much as possible.
This will make your code **easier to read and also easier to run!**. You may re-use code in several questions. If it's  well factored, it will be easier to code the next questions.

Tips: you may want to use 
- `sklearn.tree.DecisionTreeClassifier`
- `sklearn.model_selection.train_test_split`
- `sklearn.decomposition.PCA`
- `sklearn.model_selection.cross_validate` 

to lighten your code.

## Part 1: using `sklearn.tree.DecisionTreeClassifier`

## (about 15 points over 20 total)

Decision Trees are powerful methods, however they can easily overfit. The number of parameters in the model essenitially grows like $\sim O(2^{maxDepth})$, i.e. exponentially with the depth of the tree.

### Part 1.1: `Ntrain+Nval=1000, Nvalid=1000, Ntest=10000`

In this part we use this amount of data.
- import the data, split the "train+validation" sets. Keep the test set for the **very** end.
- attempt direct classification using a `sklearn.tree.DecisionTreeClassifier`. Optimize the hyper-parameter `max_depth`. Measure and store the validation accuracy for the best choice of `max_depth`.
Do you fear you may be overfitting ? Explain your answer.
- Now, let's add some PCA as pre-processing. 
    - Using `max_depth=5`, what is the best number of PCA components (nComp_PCA) to keep ? Hint: you may use something like `nComp_range = np.array(list(np.arange(1,50))+[50,100,200,400,783,784])` as the range of nComp_PCA values to be explored.
    - Using `max_depth=12`, what is the best number of PCA components (nComp_PCA) to keep ?
    - Can you explain why this optimal number changes with depth ? 
- Find the best (max_depth, nComp_PCA) pair. 
- Can you explain the behavior of the optimal `max_depth`, let's call it $m*$, with `nComp_PCA`, at **small** `nComp_PCA` ?
- Can you explain the behavior of the optimal `max_depth`, let's call it $m*$, with `nComp_PCA`, at **large** `nComp_PCA` ?
- Measure the cross-validation error for this best pair. Are you surprised with the result?

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
LoadObject = np.load("../../data/fashion-mnist-reshaped.npz") # please put your data over there so it's easy for me to run your code
linearPictureLength = 28
X = LoadObject['train_images']
y = LoadObject['train_labels']
## we do not use the TEST SET for now:
# Xtest = LoadObject['test_images']
# ytest = LoadObject['test_labels']

### Part 1.2: `Ntrain+Nval=2000, Nvalid=2000`

If you factored your code decently in the last questions, this should be very easy/fast to do. Ideally, it should be a couple of lines and a single function call. (For the core computation, excluding plots and presentation)
- split the "train+validation" sets. 
- Find the best (max_depth, nComp_PCA) pair. 
- Measure the cross-validation error for this best pair. Are you surprised with the result?


### Part 1.3: `Ntrain+Nval=20000, Nvalid=10000`

If you factored your code decently in the last questions, this should be very easy/fast to do. Ideally, it should be a couple of lines and a single function call. (For the core computation, excluding plots and presentation)
- split the "train+validation" sets.
- Find the best (max_depth, nComp_PCA) pair. 
- Measure the cross-validation error for this best pair. Are you surprised with the result?

**Hint: to save compute time, you can use a smaller hyper-parameter search space, i.e. you can reduce the umber of values explored in your hyper-optimization.**

### Part 1.4: The test (with `Ntest=10000`)
Use your best model to make a prediction:
- Which model do you prefer, among the 3 "best models" you have found? Why? How confident are you with your choice?
- Using your `Ntest=10000` samples that you saved preciously (and NEVER used), compute the test error. How surprised are you with the result? 
- If you were asked by a client, "what is the level of accuracy you can achieve", what would be your answer ?


In [None]:
# Xtest = LoadObject['test_images']
# ytest = LoadObject['test_labels']


### Part 1.4 - Bonus question : 
- Compute also the cross validation error for the best hyper parameters choice with `N_train=200`
- Plot the cross validation error as a function of ntrain= 200,2000,20000  
- People often say "let's just get more data". How efficient does that does seem to be ? 

## Part 2: make your own classifier !

## (about 5 points over 20 total)

### Part 2.1
- write down (on paper, and then a little bit here, just the final result) the Loss function for the multi-class perceptron, using the softmax as output activation function.
- derive the update steps for the gradient. (you can get inspiration from TD4.1)
- think up of all the functions you need to write, and put them in a class (you can get inspiration from the correction of TP3.2) - first write a class skeleton, and only then, write the methods inside
- test your algorithm on Fashion-MNIST: make a train / validation / test split , fit the model, compute the cross-val error, and the test error.