# Mini-project



## General guidelines

We want to make a typical study of a ML problem.

We're going to use Fashion-MNIST ("*fashion-mnist-reshaped.npz*")  as data set, attempting to classify the pictures correctly.

There are 2 parts in the project:
- use `DecisionTreeClassifier` and PCA from sklearn to classify the data
- make your own multi-class classifier, deriving its updates from scratch
The first part weights more in the total grade than the second one.

In the first part, the goal is to showcase a typical hyper-parameter tuning. We will simulate the fact of having different tasks by restricting ourselves to different dataset size, and comment on how hyper-parameters choice can depend a lot on how much data we have at hand.

General advice: **write clean code**, well factored in functions/classes, for each question, as much as possible.
This will make your code **easier to read and also easier to run!**. You may re-use code in several questions. If it's  well factored, it will be easier to code the next questions.

Tips: you may want to use 
- `sklearn.tree.DecisionTreeClassifier`
- `sklearn.model_selection.train_test_split`
- `sklearn.decomposition.PCA`
- `sklearn.model_selection.cross_validate` 

to lighten your code.

## Part 1: using `sklearn.tree.DecisionTreeClassifier`

## (about 15 points over 20 total)

Decision Trees are powerful methods, however they can easily overfit. The number of parameters in the model essenitially grows like $\sim O(2^{maxDepth})$, i.e. exponentially with the depth of the tree.

### Part 1.1: `Ntrain+Nval=1000, Nvalid=1000, Ntest=10000`

In this part we use this amount of data.
- import the data, split the "train+validation" sets. Keep the test set for the **very** end.
- attempt direct classification using a `sklearn.tree.DecisionTreeClassifier`. Optimize the hyper-parameter `max_depth`. Measure and store the validation accuracy for the best choice of `max_depth`.
Do you fear you may be overfitting ? Explain your answer.
- Now, let's add some PCA as pre-processing. 
    - Using `max_depth=5`, what is the best number of PCA components (nComp_PCA) to keep ? Hint: you may use something like `nComp_range = np.array(list(np.arange(1,50))+[50,100,200,400,783,784])` as the range of nComp_PCA values to be explored.
    - Using `max_depth=12`, what is the best number of PCA components (nComp_PCA) to keep ?
    - Can you explain why this optimal number changes with depth ? 
- Find the best (max_depth, nComp_PCA) pair. 
- Can you explain the behavior of the optimal `max_depth`, let's call it $m*$, with `nComp_PCA`, at **small** `nComp_PCA` ?
- Can you explain the behavior of the optimal `max_depth`, let's call it $m*$, with `nComp_PCA`, at **large** `nComp_PCA` ?
- Measure the cross-validation error for this best pair. Are you surprised with the result?

In [86]:
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import sklearn
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import balanced_accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score

In [87]:
LoadObject = np.load("C:/Users/dadinho/Desktop/projet/fashion-mnist-reshaped.npz") # please put your data over there so it's easy for me to run your code
linearPictureLength = 28
X = LoadObject['train_images']
y = LoadObject['train_labels']
## we do not use the TEST SET for now:
# Xtest = LoadObject['test_images']
# ytest = LoadObject['test_labels']
ratio_train = 0.016667
ratio_valid = 0.016667
ratio_test = 0.16667

In [175]:
def split1(X, ratio_train, ratio_valid,ratio_test):
   

    Ntot   = X.shape[0]
    Ntrain = int(ratio_train*Ntot)
    Nvalid = int(ratio_valid*Ntot)
    Ntest  = int(ratio_test*Ntot)
    X_train = X[0: Ntrain].copy()
    y_train = y[0: Ntrain].copy()
    X_valid = X[Ntrain:Ntrain+Nvalid].copy() 
    y_valid = y[Ntrain:Ntrain+Nvalid].copy() 
    return X_train, y_train, X_valid, y_valid,Ntest

In [198]:
X_train, y_train, X_valid, y_valid,n = split1(X, ratio_train, ratio_valid,ratio_test)
print(X_train.shape)
print(X_valid.shape)
print(n)


(2000, 784)
(2000, 784)
10000


In [208]:


max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
for nC in max_depth:
    clf=DecisionTreeClassifier(random_state=123,max_depth=nC)
    clf.fit(X_train, y_train)
    trainscore = clf.score(X_train,y_train)
    validscore = clf.score(X_valid,y_valid)
    print("nombre Composants", nC , "   training score:",trainscore, ". valid score:", validscore)

nombre Composants 10    training score: 0.9525 . valid score: 0.7045
nombre Composants 20    training score: 1.0 . valid score: 0.6985
nombre Composants 30    training score: 1.0 . valid score: 0.6985
nombre Composants 40    training score: 1.0 . valid score: 0.6985
nombre Composants 50    training score: 1.0 . valid score: 0.6985
nombre Composants 60    training score: 1.0 . valid score: 0.6985
nombre Composants 70    training score: 1.0 . valid score: 0.6985
nombre Composants 80    training score: 1.0 . valid score: 0.6985
nombre Composants 90    training score: 1.0 . valid score: 0.6985
nombre Composants 100    training score: 1.0 . valid score: 0.6985
nombre Composants 110    training score: 1.0 . valid score: 0.6985
nombre Composants None    training score: 1.0 . valid score: 0.6985


In [204]:

    clf = DecisionTreeClassifier(random_state=123,max_depth=10)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_valid)
    score = balanced_accuracy_score(y_valid, y_pred)  # scoring
    print("Balanced accuracy score: {}".format(score))

Balanced accuracy score: 0.7046193385247219


In [206]:
nComp_range = np.array(list(np.arange(1,50))+[50,100,200,400,783,784])
t=[]
v=[]
for nC in nComp_range:
    ## pre-processing 
    preProc = sklearn.decomposition.PCA(n_components=nC, copy=True)
    preProc.fit(X_train)
    X_train_Transformed = preProc.transform(X_train)
    X_valid_Transformed = preProc.transform(X_valid) ## applt the SAME transform to the validation data
    clf = DecisionTreeClassifier(random_state=123,max_depth=5)
    clf.fit(X_train_Transformed, y_train)
    trainscore = clf.score(X_train_Transformed,y_train)
    validscore = clf.score(X_valid_Transformed,y_valid)
    t.append(trainscore)
    v.append(validscore)
    print("nombre Composants", nC , "   training score:",trainscore, ". valid score:", validscore)
print(max(t))
print(max(v))

nombre Composants 1    training score: 0.354 . valid score: 0.3125
nombre Composants 2    training score: 0.551 . valid score: 0.5115
nombre Composants 3    training score: 0.618 . valid score: 0.5705
nombre Composants 4    training score: 0.649 . valid score: 0.611
nombre Composants 5    training score: 0.654 . valid score: 0.61
nombre Composants 6    training score: 0.6715 . valid score: 0.625
nombre Composants 7    training score: 0.6715 . valid score: 0.6275
nombre Composants 8    training score: 0.634 . valid score: 0.5905
nombre Composants 9    training score: 0.6335 . valid score: 0.5945
nombre Composants 10    training score: 0.639 . valid score: 0.5905
nombre Composants 11    training score: 0.639 . valid score: 0.591
nombre Composants 12    training score: 0.6395 . valid score: 0.592
nombre Composants 13    training score: 0.6385 . valid score: 0.59
nombre Composants 14    training score: 0.6385 . valid score: 0.5895
nombre Composants 15    training score: 0.639 . valid score

In [211]:
nComp_range = np.array(list(np.arange(1,50))+[50,100,200,400,783,784])
t=[]
v=[]
for nC in nComp_range:
    ## pre-processing 
    preProc = sklearn.decomposition.PCA(n_components=nC, copy=True)
    preProc.fit(X_train)
    X_train_Transformed = preProc.transform(X_train)
    X_valid_Transformed = preProc.transform(X_valid) 
    clf = DecisionTreeClassifier(random_state=123,max_depth=12)
    clf.fit(X_train_Transformed, y_train)
    trainscore = clf.score(X_train_Transformed,y_train)
    validscore = clf.score(X_valid_Transformed,y_valid)
    t.append(trainscore)
    v.append(validscore)
    print("nombre Composants", nC , "   training score:",trainscore, ". valid score:", validscore)
print(max(t))
print(max(v))

nombre Composants 1    training score: 0.5385 . valid score: 0.274
nombre Composants 2    training score: 0.7905 . valid score: 0.4965
nombre Composants 3    training score: 0.8595 . valid score: 0.554
nombre Composants 4    training score: 0.91 . valid score: 0.6215
nombre Composants 5    training score: 0.907 . valid score: 0.6415
nombre Composants 6    training score: 0.9285 . valid score: 0.6605
nombre Composants 7    training score: 0.936 . valid score: 0.6575
nombre Composants 8    training score: 0.9015 . valid score: 0.6645
nombre Composants 9    training score: 0.9 . valid score: 0.685
nombre Composants 10    training score: 0.912 . valid score: 0.683
nombre Composants 11    training score: 0.9135 . valid score: 0.681
nombre Composants 12    training score: 0.915 . valid score: 0.6835
nombre Composants 13    training score: 0.917 . valid score: 0.678
nombre Composants 14    training score: 0.92 . valid score: 0.6895
nombre Composants 15    training score: 0.925 . valid score: 

In [212]:
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
nComp_range = np.array(list(np.arange(1,50))+[50,100,200,400,783,784])
L=[]
for nC in nComp_range:

       for mp in max_depth:
                ## pre-processing 
                    preProc = sklearn.decomposition.PCA(n_components=nC, copy=True)
                    preProc.fit(X_train)
                    X_train_Transformed = preProc.transform(X_train)
                    X_valid_Transformed = preProc.transform(X_valid) ## applt the SAME transform to the validation data

                ## classification
                    clf = DecisionTreeClassifier(random_state=123,max_depth=mp)
                    clf.fit(X_train_Transformed, y_train)

                ## measure of performance
                    trainscore = clf.score(X_train_Transformed,y_train)
                    validscore = clf.score(X_valid_Transformed,y_valid)
                    L.append( validscore)
                    print("max_depth", mp ,"nombre Composants", nC , "   training score:",trainscore, ". valid score:", validscore)

max_depth 10 nombre Composants 1    training score: 0.452 . valid score: 0.304
max_depth 20 nombre Composants 1    training score: 0.871 . valid score: 0.243
max_depth 30 nombre Composants 1    training score: 0.9845 . valid score: 0.2275
max_depth 40 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 50 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 60 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 70 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 80 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 90 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 100 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 110 nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth None nombre Composants 1    training score: 1.0 . valid score: 0.225
max_depth 10 nombre Composants 2    training score: 0.721 . vali

max_depth 100 nombre Composants 9    training score: 1.0 . valid score: 0.6865
max_depth 110 nombre Composants 9    training score: 1.0 . valid score: 0.6865
max_depth None nombre Composants 9    training score: 1.0 . valid score: 0.6865
max_depth 10 nombre Composants 10    training score: 0.85 . valid score: 0.671
max_depth 20 nombre Composants 10    training score: 0.998 . valid score: 0.6715
max_depth 30 nombre Composants 10    training score: 1.0 . valid score: 0.6725
max_depth 40 nombre Composants 10    training score: 1.0 . valid score: 0.6725
max_depth 50 nombre Composants 10    training score: 1.0 . valid score: 0.6725
max_depth 60 nombre Composants 10    training score: 1.0 . valid score: 0.6725
max_depth 70 nombre Composants 10    training score: 1.0 . valid score: 0.6725
max_depth 80 nombre Composants 10    training score: 1.0 . valid score: 0.6725
max_depth 90 nombre Composants 10    training score: 1.0 . valid score: 0.6725
max_depth 100 nombre Composants 10    training sc

max_depth 60 nombre Composants 18    training score: 1.0 . valid score: 0.696
max_depth 70 nombre Composants 18    training score: 1.0 . valid score: 0.687
max_depth 80 nombre Composants 18    training score: 1.0 . valid score: 0.6845
max_depth 90 nombre Composants 18    training score: 1.0 . valid score: 0.682
max_depth 100 nombre Composants 18    training score: 1.0 . valid score: 0.6835
max_depth 110 nombre Composants 18    training score: 1.0 . valid score: 0.6875
max_depth None nombre Composants 18    training score: 1.0 . valid score: 0.689
max_depth 10 nombre Composants 19    training score: 0.8665 . valid score: 0.6805
max_depth 20 nombre Composants 19    training score: 0.9995 . valid score: 0.6845
max_depth 30 nombre Composants 19    training score: 1.0 . valid score: 0.6805
max_depth 40 nombre Composants 19    training score: 1.0 . valid score: 0.676
max_depth 50 nombre Composants 19    training score: 1.0 . valid score: 0.6795
max_depth 60 nombre Composants 19    training s

max_depth 20 nombre Composants 27    training score: 1.0 . valid score: 0.6765
max_depth 30 nombre Composants 27    training score: 1.0 . valid score: 0.677
max_depth 40 nombre Composants 27    training score: 1.0 . valid score: 0.677
max_depth 50 nombre Composants 27    training score: 1.0 . valid score: 0.6775
max_depth 60 nombre Composants 27    training score: 1.0 . valid score: 0.68
max_depth 70 nombre Composants 27    training score: 1.0 . valid score: 0.679
max_depth 80 nombre Composants 27    training score: 1.0 . valid score: 0.677
max_depth 90 nombre Composants 27    training score: 1.0 . valid score: 0.6815
max_depth 100 nombre Composants 27    training score: 1.0 . valid score: 0.677
max_depth 110 nombre Composants 27    training score: 1.0 . valid score: 0.678
max_depth None nombre Composants 27    training score: 1.0 . valid score: 0.68
max_depth 10 nombre Composants 28    training score: 0.8685 . valid score: 0.684
max_depth 20 nombre Composants 28    training score: 0.9

max_depth 100 nombre Composants 35    training score: 1.0 . valid score: 0.6785
max_depth 110 nombre Composants 35    training score: 1.0 . valid score: 0.6825
max_depth None nombre Composants 35    training score: 1.0 . valid score: 0.6875
max_depth 10 nombre Composants 36    training score: 0.8725 . valid score: 0.685
max_depth 20 nombre Composants 36    training score: 1.0 . valid score: 0.68
max_depth 30 nombre Composants 36    training score: 1.0 . valid score: 0.675
max_depth 40 nombre Composants 36    training score: 1.0 . valid score: 0.6765
max_depth 50 nombre Composants 36    training score: 1.0 . valid score: 0.6815
max_depth 60 nombre Composants 36    training score: 1.0 . valid score: 0.6745
max_depth 70 nombre Composants 36    training score: 1.0 . valid score: 0.684
max_depth 80 nombre Composants 36    training score: 1.0 . valid score: 0.6835
max_depth 90 nombre Composants 36    training score: 1.0 . valid score: 0.6765
max_depth 100 nombre Composants 36    training sco

max_depth 60 nombre Composants 44    training score: 1.0 . valid score: 0.6655
max_depth 70 nombre Composants 44    training score: 1.0 . valid score: 0.6765
max_depth 80 nombre Composants 44    training score: 1.0 . valid score: 0.6625
max_depth 90 nombre Composants 44    training score: 1.0 . valid score: 0.6745
max_depth 100 nombre Composants 44    training score: 1.0 . valid score: 0.666
max_depth 110 nombre Composants 44    training score: 1.0 . valid score: 0.6735
max_depth None nombre Composants 44    training score: 1.0 . valid score: 0.67
max_depth 10 nombre Composants 45    training score: 0.8745 . valid score: 0.676
max_depth 20 nombre Composants 45    training score: 0.9995 . valid score: 0.668
max_depth 30 nombre Composants 45    training score: 1.0 . valid score: 0.6615
max_depth 40 nombre Composants 45    training score: 1.0 . valid score: 0.671
max_depth 50 nombre Composants 45    training score: 1.0 . valid score: 0.6795
max_depth 60 nombre Composants 45    training sc

max_depth 20 nombre Composants 400    training score: 1.0 . valid score: 0.642
max_depth 30 nombre Composants 400    training score: 1.0 . valid score: 0.645
max_depth 40 nombre Composants 400    training score: 1.0 . valid score: 0.6315
max_depth 50 nombre Composants 400    training score: 1.0 . valid score: 0.6345
max_depth 60 nombre Composants 400    training score: 1.0 . valid score: 0.6375
max_depth 70 nombre Composants 400    training score: 1.0 . valid score: 0.638
max_depth 80 nombre Composants 400    training score: 1.0 . valid score: 0.646
max_depth 90 nombre Composants 400    training score: 1.0 . valid score: 0.6395
max_depth 100 nombre Composants 400    training score: 1.0 . valid score: 0.632
max_depth 110 nombre Composants 400    training score: 1.0 . valid score: 0.6315
max_depth None nombre Composants 400    training score: 1.0 . valid score: 0.6415
max_depth 10 nombre Composants 783    training score: 0.881 . valid score: 0.6135
max_depth 20 nombre Composants 783    t

In [213]:
res1 = max(L) 

  
# printing result  
print ("The indices wise maximum number : " +  str(res1)) 

The indices wise maximum number : 0.696


In [216]:
        preProc = sklearn.decomposition.PCA(n_components=18, copy=True)
        preProc.fit(X_train)
        X_train_Transformed = preProc.transform(X_train)
        X_valid_Transformed = preProc.transform(X_valid) ## applt the SAME transform to the validation data

        ## classification
        clf = DecisionTreeClassifier(random_state=123,max_depth=60)
        clf.fit(X_train_Transformed, y_train)

        ## measure of performance
        scores = cross_val_score(clf, X_valid_Transformed, y_valid)
        print(scores)
        print("mean: {:.3f} (std: {:.3f})".format(scores.mean(),
                                                  scores.std()),
              end="\n\n")

[0.6625 0.695  0.6825 0.675  0.7125]
mean: 0.685 (std: 0.017)



### Part 1.2: `Ntrain+Nval=2000, Nvalid=2000`

If you factored your code decently in the last questions, this should be very easy/fast to do. Ideally, it should be a couple of lines and a single function call. (For the core computation, excluding plots and presentation)
- split the "train+validation" sets. 
- Find the best (max_depth, nComp_PCA) pair. 
- Measure the cross-validation error for this best pair. Are you surprised with the result?


In [219]:
ratio_train= 0.0333339
ratio_valid= 0.0333339
X_train, y_train, X_valid, y_valid,n = split1(X, ratio_train, ratio_valid,ratio_test)
print(X_train.shape)
print(X_valid.shape)

(2000, 784)
(2000,)
(2000, 784)


In [268]:
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
nComp_range = np.array(list(np.arange(1,50))+[50,100,200,400,783,784])
L=[]
for nC in nComp_range:

       for mp in max_depth:
                ## pre-processing 
                    preProc = sklearn.decomposition.PCA(n_components=nC, copy=True)
                    preProc.fit(X_train)
                    X_train_Transformed = preProc.transform(X_train)
                    X_valid_Transformed = preProc.transform(X_valid) ## applt the SAME transform to the validation data

                ## classification
                    clf = DecisionTreeClassifier(random_state=123,max_depth=mp)
                    clf.fit(X_train_Transformed, y_train)

                ## measure of performance
                    trainscore = clf.score(X_train_Transformed,y_train)
                    validscore = clf.score(X_valid_Transformed,y_valid)
                    L.append( validscore)
                    print("max_depth", mp ,"nombre Composants", nC , "   training score:",trainscore, ". valid score:", validscore)

max_depth 10 nombre Composants 1    training score: 0.35005 . valid score: 0.2928
max_depth 20 nombre Composants 1    training score: 0.5897 . valid score: 0.2589
max_depth 30 nombre Composants 1    training score: 0.84595 . valid score: 0.2331
max_depth 40 nombre Composants 1    training score: 0.97395 . valid score: 0.2201
max_depth 50 nombre Composants 1    training score: 0.9986 . valid score: 0.2192
max_depth 60 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 70 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 80 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 90 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 100 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 110 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth None nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 10 n

max_depth 90 nombre Composants 9    training score: 1.0 . valid score: 0.7242
max_depth 100 nombre Composants 9    training score: 1.0 . valid score: 0.7217
max_depth 110 nombre Composants 9    training score: 1.0 . valid score: 0.7231
max_depth None nombre Composants 9    training score: 1.0 . valid score: 0.7239
max_depth 10 nombre Composants 10    training score: 0.773 . valid score: 0.7205
max_depth 20 nombre Composants 10    training score: 0.9643 . valid score: 0.7386
max_depth 30 nombre Composants 10    training score: 0.9974 . valid score: 0.7346
max_depth 40 nombre Composants 10    training score: 1.0 . valid score: 0.7348
max_depth 50 nombre Composants 10    training score: 1.0 . valid score: 0.7348
max_depth 60 nombre Composants 10    training score: 1.0 . valid score: 0.7348
max_depth 70 nombre Composants 10    training score: 1.0 . valid score: 0.7348
max_depth 80 nombre Composants 10    training score: 1.0 . valid score: 0.7335
max_depth 90 nombre Composants 10    trainin

max_depth 40 nombre Composants 18    training score: 1.0 . valid score: 0.7509
max_depth 50 nombre Composants 18    training score: 1.0 . valid score: 0.7487
max_depth 60 nombre Composants 18    training score: 1.0 . valid score: 0.7523
max_depth 70 nombre Composants 18    training score: 1.0 . valid score: 0.7484
max_depth 80 nombre Composants 18    training score: 1.0 . valid score: 0.7513
max_depth 90 nombre Composants 18    training score: 1.0 . valid score: 0.7517
max_depth 100 nombre Composants 18    training score: 1.0 . valid score: 0.7484
max_depth 110 nombre Composants 18    training score: 1.0 . valid score: 0.7517
max_depth None nombre Composants 18    training score: 1.0 . valid score: 0.7519
max_depth 10 nombre Composants 19    training score: 0.78085 . valid score: 0.721
max_depth 20 nombre Composants 19    training score: 0.9829 . valid score: 0.7566
max_depth 30 nombre Composants 19    training score: 1.0 . valid score: 0.7481
max_depth 40 nombre Composants 19    train

max_depth 110 nombre Composants 26    training score: 1.0 . valid score: 0.7503
max_depth None nombre Composants 26    training score: 1.0 . valid score: 0.7565
max_depth 10 nombre Composants 27    training score: 0.79015 . valid score: 0.7334
max_depth 20 nombre Composants 27    training score: 0.9807 . valid score: 0.7613
max_depth 30 nombre Composants 27    training score: 0.99975 . valid score: 0.7562
max_depth 40 nombre Composants 27    training score: 1.0 . valid score: 0.7533
max_depth 50 nombre Composants 27    training score: 1.0 . valid score: 0.7593
max_depth 60 nombre Composants 27    training score: 1.0 . valid score: 0.7587
max_depth 70 nombre Composants 27    training score: 1.0 . valid score: 0.7518
max_depth 80 nombre Composants 27    training score: 1.0 . valid score: 0.7584
max_depth 90 nombre Composants 27    training score: 1.0 . valid score: 0.7527
max_depth 100 nombre Composants 27    training score: 1.0 . valid score: 0.7551
max_depth 110 nombre Composants 27   

max_depth 60 nombre Composants 35    training score: 1.0 . valid score: 0.7538
max_depth 70 nombre Composants 35    training score: 1.0 . valid score: 0.7515
max_depth 80 nombre Composants 35    training score: 1.0 . valid score: 0.7555
max_depth 90 nombre Composants 35    training score: 1.0 . valid score: 0.7553
max_depth 100 nombre Composants 35    training score: 1.0 . valid score: 0.7499
max_depth 110 nombre Composants 35    training score: 1.0 . valid score: 0.7527
max_depth None nombre Composants 35    training score: 1.0 . valid score: 0.75
max_depth 10 nombre Composants 36    training score: 0.7951 . valid score: 0.7329
max_depth 20 nombre Composants 36    training score: 0.98525 . valid score: 0.76
max_depth 30 nombre Composants 36    training score: 0.99885 . valid score: 0.7508
max_depth 40 nombre Composants 36    training score: 1.0 . valid score: 0.7465
max_depth 50 nombre Composants 36    training score: 1.0 . valid score: 0.7515
max_depth 60 nombre Composants 36    trai

max_depth 10 nombre Composants 44    training score: 0.796 . valid score: 0.7312
max_depth 20 nombre Composants 44    training score: 0.98445 . valid score: 0.7545
max_depth 30 nombre Composants 44    training score: 0.9996 . valid score: 0.7441
max_depth 40 nombre Composants 44    training score: 0.99995 . valid score: 0.7455
max_depth 50 nombre Composants 44    training score: 1.0 . valid score: 0.7451
max_depth 60 nombre Composants 44    training score: 1.0 . valid score: 0.7486
max_depth 70 nombre Composants 44    training score: 1.0 . valid score: 0.7481
max_depth 80 nombre Composants 44    training score: 1.0 . valid score: 0.7475
max_depth 90 nombre Composants 44    training score: 1.0 . valid score: 0.748
max_depth 100 nombre Composants 44    training score: 1.0 . valid score: 0.7467
max_depth 110 nombre Composants 44    training score: 1.0 . valid score: 0.7439
max_depth None nombre Composants 44    training score: 1.0 . valid score: 0.7476
max_depth 10 nombre Composants 45   

max_depth 70 nombre Composants 200    training score: 1.0 . valid score: 0.7348
max_depth 80 nombre Composants 200    training score: 1.0 . valid score: 0.7352
max_depth 90 nombre Composants 200    training score: 1.0 . valid score: 0.7312
max_depth 100 nombre Composants 200    training score: 1.0 . valid score: 0.7315
max_depth 110 nombre Composants 200    training score: 1.0 . valid score: 0.7397
max_depth None nombre Composants 200    training score: 1.0 . valid score: 0.7314
max_depth 10 nombre Composants 400    training score: 0.79855 . valid score: 0.7275
max_depth 20 nombre Composants 400    training score: 0.98675 . valid score: 0.7324
max_depth 30 nombre Composants 400    training score: 0.99755 . valid score: 0.7286
max_depth 40 nombre Composants 400    training score: 0.9995 . valid score: 0.7256
max_depth 50 nombre Composants 400    training score: 1.0 . valid score: 0.728
max_depth 60 nombre Composants 400    training score: 1.0 . valid score: 0.726
max_depth 70 nombre Com

In [221]:
res1 = max(L) 
# printing result  
print ("The indices wise maximum number : " +  str(res1)) 

The indices wise maximum number : 0.6925


In [222]:
        preProc = sklearn.decomposition.PCA(n_components=20, copy=True)
        preProc.fit(X_train)
        X_train_Transformed = preProc.transform(X_train)
        X_valid_Transformed = preProc.transform(X_valid) ## applt the SAME transform to the validation data

        ## classification
        clf = DecisionTreeClassifier(random_state=123,max_depth=60)
        clf.fit(X_train_Transformed, y_train)

        ## measure of performance
        scores = cross_val_score(clf, X_valid_Transformed, y_valid)
        print(scores)
        print("mean: {:.3f} (std: {:.3f})".format(scores.mean(),
                                                  scores.std()),
              end="\n\n")

[0.6625 0.7    0.7025 0.6775 0.715 ]
mean: 0.691 (std: 0.019)



### Part 1.3: `Ntrain+Nval=20000, Nvalid=10000`

If you factored your code decently in the last questions, this should be very easy/fast to do. Ideally, it should be a couple of lines and a single function call. (For the core computation, excluding plots and presentation)
- split the "train+validation" sets.
- Find the best (max_depth, nComp_PCA) pair. 
- Measure the cross-validation error for this best pair. Are you surprised with the result?

**Hint: to save compute time, you can use a smaller hyper-parameter search space, i.e. you can reduce the umber of values explored in your hyper-optimization.**

In [224]:
ratio_train= 0.333339
ratio_valid= 0.16667
X_train, y_train, X_valid, y_valid,n = split1(X, ratio_train, ratio_valid,ratio_test)
print(X_train.shape)
print(X_valid.shape)

(20000, 784)
(10000, 784)


In [225]:
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
nComp_range = np.array(list(np.arange(1,50))+[50,100,200,400,783,784])
L=[]
for nC in nComp_range:

       for mp in max_depth:
                ## pre-processing 
                    preProc = sklearn.decomposition.PCA(n_components=nC, copy=True)
                    preProc.fit(X_train)
                    X_train_Transformed = preProc.transform(X_train)
                    X_valid_Transformed = preProc.transform(X_valid) ## applt the SAME transform to the validation data

                ## classification
                    clf = DecisionTreeClassifier(random_state=123,max_depth=mp)
                    clf.fit(X_train_Transformed, y_train)

                ## measure of performance
                    trainscore = clf.score(X_train_Transformed,y_train)
                    validscore = clf.score(X_valid_Transformed,y_valid)
                    L.append( validscore)
                    print("max_depth", mp ,"nombre Composants", nC , "   training score:",trainscore, ". valid score:", validscore)

max_depth 10 nombre Composants 1    training score: 0.35005 . valid score: 0.2928
max_depth 20 nombre Composants 1    training score: 0.5897 . valid score: 0.2589
max_depth 30 nombre Composants 1    training score: 0.84595 . valid score: 0.2331
max_depth 40 nombre Composants 1    training score: 0.97395 . valid score: 0.2201
max_depth 50 nombre Composants 1    training score: 0.9986 . valid score: 0.2192
max_depth 60 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 70 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 80 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 90 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 100 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 110 nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth None nombre Composants 1    training score: 0.99995 . valid score: 0.2195
max_depth 10 n

max_depth 90 nombre Composants 9    training score: 1.0 . valid score: 0.725
max_depth 100 nombre Composants 9    training score: 1.0 . valid score: 0.7248
max_depth 110 nombre Composants 9    training score: 1.0 . valid score: 0.7201
max_depth None nombre Composants 9    training score: 1.0 . valid score: 0.7238
max_depth 10 nombre Composants 10    training score: 0.773 . valid score: 0.7197
max_depth 20 nombre Composants 10    training score: 0.96425 . valid score: 0.7409
max_depth 30 nombre Composants 10    training score: 0.99905 . valid score: 0.7352
max_depth 40 nombre Composants 10    training score: 1.0 . valid score: 0.7342
max_depth 50 nombre Composants 10    training score: 1.0 . valid score: 0.7334
max_depth 60 nombre Composants 10    training score: 1.0 . valid score: 0.7341
max_depth 70 nombre Composants 10    training score: 1.0 . valid score: 0.7345
max_depth 80 nombre Composants 10    training score: 1.0 . valid score: 0.7347
max_depth 90 nombre Composants 10    traini

max_depth 40 nombre Composants 18    training score: 1.0 . valid score: 0.7542
max_depth 50 nombre Composants 18    training score: 1.0 . valid score: 0.7498
max_depth 60 nombre Composants 18    training score: 1.0 . valid score: 0.7537
max_depth 70 nombre Composants 18    training score: 1.0 . valid score: 0.7476
max_depth 80 nombre Composants 18    training score: 1.0 . valid score: 0.7525
max_depth 90 nombre Composants 18    training score: 1.0 . valid score: 0.7486
max_depth 100 nombre Composants 18    training score: 1.0 . valid score: 0.7519
max_depth 110 nombre Composants 18    training score: 1.0 . valid score: 0.7509
max_depth None nombre Composants 18    training score: 1.0 . valid score: 0.7508
max_depth 10 nombre Composants 19    training score: 0.77905 . valid score: 0.7174
max_depth 20 nombre Composants 19    training score: 0.9834 . valid score: 0.7542
max_depth 30 nombre Composants 19    training score: 1.0 . valid score: 0.7484
max_depth 40 nombre Composants 19    trai

max_depth 110 nombre Composants 26    training score: 1.0 . valid score: 0.7509
max_depth None nombre Composants 26    training score: 1.0 . valid score: 0.7514
max_depth 10 nombre Composants 27    training score: 0.7895 . valid score: 0.7301
max_depth 20 nombre Composants 27    training score: 0.98055 . valid score: 0.7646
max_depth 30 nombre Composants 27    training score: 0.99965 . valid score: 0.752
max_depth 40 nombre Composants 27    training score: 1.0 . valid score: 0.7546
max_depth 50 nombre Composants 27    training score: 1.0 . valid score: 0.7557
max_depth 60 nombre Composants 27    training score: 1.0 . valid score: 0.7568
max_depth 70 nombre Composants 27    training score: 1.0 . valid score: 0.7558
max_depth 80 nombre Composants 27    training score: 1.0 . valid score: 0.7563
max_depth 90 nombre Composants 27    training score: 1.0 . valid score: 0.7564
max_depth 100 nombre Composants 27    training score: 1.0 . valid score: 0.7539
max_depth 110 nombre Composants 27    

max_depth 60 nombre Composants 35    training score: 1.0 . valid score: 0.7522
max_depth 70 nombre Composants 35    training score: 1.0 . valid score: 0.7508
max_depth 80 nombre Composants 35    training score: 1.0 . valid score: 0.7524
max_depth 90 nombre Composants 35    training score: 1.0 . valid score: 0.7571
max_depth 100 nombre Composants 35    training score: 1.0 . valid score: 0.753
max_depth 110 nombre Composants 35    training score: 1.0 . valid score: 0.7574
max_depth None nombre Composants 35    training score: 1.0 . valid score: 0.7518
max_depth 10 nombre Composants 36    training score: 0.79525 . valid score: 0.7329
max_depth 20 nombre Composants 36    training score: 0.98585 . valid score: 0.7559
max_depth 30 nombre Composants 36    training score: 0.99925 . valid score: 0.7482
max_depth 40 nombre Composants 36    training score: 1.0 . valid score: 0.7531
max_depth 50 nombre Composants 36    training score: 1.0 . valid score: 0.7501
max_depth 60 nombre Composants 36    

max_depth 10 nombre Composants 44    training score: 0.7959 . valid score: 0.7332
max_depth 20 nombre Composants 44    training score: 0.98555 . valid score: 0.7543
max_depth 30 nombre Composants 44    training score: 0.9996 . valid score: 0.7504
max_depth 40 nombre Composants 44    training score: 1.0 . valid score: 0.7477
max_depth 50 nombre Composants 44    training score: 1.0 . valid score: 0.7438
max_depth 60 nombre Composants 44    training score: 1.0 . valid score: 0.7531
max_depth 70 nombre Composants 44    training score: 1.0 . valid score: 0.7466
max_depth 80 nombre Composants 44    training score: 1.0 . valid score: 0.7502
max_depth 90 nombre Composants 44    training score: 1.0 . valid score: 0.7435
max_depth 100 nombre Composants 44    training score: 1.0 . valid score: 0.7486
max_depth 110 nombre Composants 44    training score: 1.0 . valid score: 0.7439
max_depth None nombre Composants 44    training score: 1.0 . valid score: 0.7456
max_depth 10 nombre Composants 45    t

max_depth 70 nombre Composants 200    training score: 1.0 . valid score: 0.7313
max_depth 80 nombre Composants 200    training score: 1.0 . valid score: 0.7355


KeyboardInterrupt: 

In [226]:
res1 = max(L) 
# printing result  
print ("The indices wise maximum number : " +  str(res1)) 

The indices wise maximum number : 0.7646


In [232]:
        preProc = sklearn.decomposition.PCA(n_components=27, copy=True)
        preProc.fit(X_train)
        X_train_Transformed = preProc.transform(X_train)
        X_valid_Transformed = preProc.transform(X_valid) ## applt the SAME transform to the validation data

        ## classification
        clf = DecisionTreeClassifier(random_state=123,max_depth=20)
        clf.fit(X_train_Transformed, y_train)

        ## measure of performance
        scores = cross_val_score(clf, X_valid_Transformed, y_valid)
        print(scores)
        print("mean: {:.3f} (std: {:.3f})".format(scores.mean(),
                                                  scores.std()),
              end="\n\n")

[0.731  0.738  0.7215 0.7355 0.717 ]
mean: 0.729 (std: 0.008)



### Part 1.4: The test (with `Ntest=10000`)
Use your best model to make a prediction:
- Which model do you prefer, among the 3 "best models" you have found? Why? How confident are you with your choice?
- Using your `Ntest=10000` samples that you saved preciously (and NEVER used), compute the test error. How surprised are you with the result? 
- If you were asked by a client, "what is the level of accuracy you can achieve", what would be your answer ?


In [256]:
from sklearn.linear_model import LogisticRegression

X_train, y_train, X_valid, y_valid,n = split1(X, ratio_train, ratio_valid,ratio_test)
Xtest = LoadObject['test_images']
ytest = LoadObject['test_labels']
X_test  = Xtest[-n:].copy()
y_test  = ytest[-n:].copy()

preProc = sklearn.decomposition.PCA(n_components=27, copy=True)
preProc.fit(X_train)
X_train_Transformed = preProc.transform(X_train)
X_valid_Transformed = preProc.transform(X_valid)
X_test_Transformed  = preProc.transform(X_test)

clf = DecisionTreeClassifier(max_depth=20)
clf.fit(X_train_Transformed, y_train)
trainscore = clf.score(X_train_Transformed,y_train)
validscore = clf.score(X_valid_Transformed,y_valid)
testscore  = clf.score(X_test_Transformed,y_test)
print("   training score:",trainscore, ". valid score:", validscore)
print("test score: ", testscore)


   training score: 0.98015 . valid score: 0.76
test score:  0.7527


In [254]:
from sklearn.linear_model import LogisticRegression

X_train, y_train, X_valid, y_valid,n = split1(X, ratio_train, ratio_valid,ratio_test)
Xtest = LoadObject['test_images']
ytest = LoadObject['test_labels']
X_test  = Xtest[-n:].copy()
y_test  = ytest[-n:].copy()

preProc = sklearn.decomposition.PCA(n_components=27, copy=True)
preProc.fit(X_train)
X_train_Transformed = preProc.transform(X_train)
X_valid_Transformed = preProc.transform(X_valid)
X_test_Transformed  = preProc.transform(X_test)
clf = LogisticRegression()
clf.fit(X_train_Transformed, y_train)
trainscore = clf.score(X_train_Transformed,y_train)
validscore = clf.score(X_valid_Transformed,y_valid)
testscore  = clf.score(X_test_Transformed,y_test)
print("   training score:",trainscore, ". valid score:", validscore)
print("test score: ", testscore)

   training score: 0.8166 . valid score: 0.813
test score:  0.804


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [258]:
preProc = sklearn.decomposition.PCA(n_components=27, copy=True)
preProc.fit(X_train)
X_train_Transformed = preProc.transform(X_train)
X_valid_Transformed = preProc.transform(X_valid)
X_test_Transformed  = preProc.transform(X_test)
clf = DecisionTreeClassifier(max_depth=20)
clf.fit(X_train_Transformed, y_train)
y_pred = clf.predict(X_valid_Transformed)
score = balanced_accuracy_score(y_valid, y_pred)  # scoring
print("Balanced accuracy score: {}".format(score))

Balanced accuracy score: 0.760298425462633


In [259]:
preProc = sklearn.decomposition.PCA(n_components=27, copy=True)
preProc.fit(X_train)
X_train_Transformed = preProc.transform(X_train)
X_valid_Transformed = preProc.transform(X_valid)
X_test_Transformed  = preProc.transform(X_test)
clf = LogisticRegression()
clf.fit(X_train_Transformed, y_train)
y_pred = clf.predict(X_valid_Transformed)
score = balanced_accuracy_score(y_valid, y_pred)  # scoring
print("Balanced accuracy score: {}".format(score))

Balanced accuracy score: 0.8141267241632418


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Part 1.4 - Bonus question : 
- Compute also the cross validation error for the best hyper parameters choice with `N_train=200`
- Plot the cross validation error as a function of ntrain= 200,2000,20000  
- People often say "let's just get more data". How efficient does that does seem to be ? 

## Part 2: make your own classifier !

## (about 5 points over 20 total)

The multi-class percpetron can be implemented this way. 
We denote $K$ the number of classes, $N$ the number of (training) examples, $D$ the dimension of the data (after feature augmentation, at least with a "1" as first component).

The **output** of the network *(not equal to the predicted label)*, can be taken as the **softmax** among the $K$ separating hyperplanes (each hyperplane $\vec{w}_k$ separates class $k$ from the others).
$$ y_k^{(n)} = \text{softmax}\big( (\vec{w}_{k} \cdot \vec{x}^{(n)})_{k=1...K} \big) = \frac{ \exp(  \vec{w}_k\cdot\vec{x}^{(n)}   )}{\sum_\ell \exp(  \vec{w}_\ell\cdot\vec{x}^{(n)})}$$
This output can be **interpreted as the probability** that example $x^{(n)}$ belongs to the class $k$, according the classifier's current parameters
Indeed, one can easily check that for any $\vec{x}$, the sum of probabilities is indeed one : $\sum_k y_k = 1$.
The **total output of the network** is a vector $\vec{y}^{(n)} = \begin{pmatrix}y_1^{(n)} \\ y_2^{(n)} \\ .. \\ y_K^{(n)} \end{pmatrix}$ (for the sample number $n$).

The **true labels (ground truth)** of example $\vec{x}^{(n)}$ is then encoded as a one-hot vector, so that if the example is of the second class, it may be written: $\vec{t}^{n} = \begin{pmatrix} 0 \\ 1 \\ 0 \\ .. \\ 0 \end{pmatrix}$. (where $\vec{t}^{(n)}$ or $\vec{t}^{n}$ is for **T**ruth and is shorter to write than $\vec{y}^{GT,(n)}$). More generally, the components $t_{n,k}$ of vector $\vec{t}_n$ may be written using the Kronecker's delta: $t_{n,k} = \delta(k, k_{true}^n)$, where $k_{true}^n$ is the true class of example number $n$.

From now on, **we drop the superscrip $a^{(n)}$ and instead write $a_n$ or just $a$**, when it's clear enough that the quantity $a$ relates to a single example, of generic index $n$. This helps to lighten the notations.

The Loss function that we should use is called the **cross-entropy loss function**, and is:

$$J = \frac1N \sum_n^N H(\vec{t}_{n}, \vec{y}_{n})$$

where the cross-entropy is a non-symmetric function: $$H(\vec{t}_{n}, \vec{y}_{n}) = -\sum_k^K t_{n,k} \log (y_{n,k})$$ 

Make sure you undersand all of the above. Write down the Loss function for the multi-class perceptron. 
### Part 2.1
- What are the parameters of the model ? **How many real numbers is that ?** Count them in terms of $N,K,D, etc$. 
- (3-4 points) **Derive the update steps for the gradient**. (you can get inspiration from TD4.1)
- Some Hints:
    - It is recommended to compute the quantity $\nabla_{w_\ell} y_k$ ($\ell\neq k$) and the quantity $\nabla_{w_k} y_k$. Try to express these simply, by recognizing $y$ when it appears. First treat the two cases separately, then try to unite the two cases in a single mathematical form, using Kronecker's delta : $\delta(i,j)= \{1$ if $i=j$, else $0\}$.
    - When there is a sum $\sum_\ell f(w_\ell)$ and you derive with respect to $w_k$, the output only depends on the term $f(w_k)$ 
    - In the sum above, $\sum_\ell f(w_\ell)$ the index $\ell$ is a "mute" index: you can use any letter for it. Be careful not to use a letter that already exists outside the sum ($\ell$ is like a local variable, don't use the same name for a "global variable" from outside the function !)
    - For any functions $u,v$ that admit derivatives, $\partial_x \frac{u(x)}{v(x)} = \frac{u'(x)v(x)-u(x)v'(x)}{(v(x))^2}$. It extends to $\nabla_x$ without problem.
    - $\nabla_x \exp(u(x)) =  \exp(u(x)) \nabla_x u(x)$.
    - $\frac{a}{1+a} = 1- \frac{1}{1+a}$
    - $\partial_x \log(u(x)) = \frac{u'(x)}{u(x)}$ 
    - If you are too much blocked, you can ask me (via discord, in Private Message) for the solution of $\nabla_{w_k} y_k$ and/or the solution for $\nabla_{w_\ell} y_k$ ($\ell\neq k$).
    - In the end, the update step for the parameters that you should find is : $$ \vec{w}_\ell \mapsto \vec{w}_\ell - \eta \frac1N \sum_n^N \vec{x}_n (\delta_{\ell, k_{true}^n}- y_{\ell,n})$$
    - If you cannot find the equation above, you can just skip this question and use it to make your program.
    
    

### Part 2.2
- (3 points) **Think up of all the functions you need to write**, and **put them in a class** (you can get inspiration from the correction of TP3.2) - first write a class skeleton, and **only then, write the methods** inside
- Hints:
    - there may be numerical errors (NaNs) because $\exp(..)$ is too large. You can ease this by noticing the following: for any positive constant $C$, we have $$\frac{ \exp( a_k  )}{\sum_\ell \exp (a_\ell) }  = \frac{C \exp( a_k  )}{C \sum_\ell \exp (a_\ell) }= \frac{\exp( a_k +\log C )}{\sum_\ell \exp (a_\ell +\log C) }$$
    - with this trick, when your arguments in the softmax are too large, you can simply subtract a big constant $\log C$ from its argument, and this will reduce the chances of numerical error, without changing the result. It's a good idea to change the $w $'s with this kind of trick.
    - it's a good idea to define the target labels (ground truth) data in one-hot vectors (as said above), compute them once and for all, and then you never have to compute them again. In practice, you may notice that for an example with label $k_{true}$, then the genreic component number of $k$ of the vector $\vec{t}$ reads: $t_{k} = \delta_{k, k_{true}}$
    - the initial $w$ should be random (not all zeros), preferably, but not too big. A good idea is to have their dispersion be of order $1/D$ at most.
    
For this question, the main goal is to make a theoretically-working, rather clean code, using numpy array-operations (`np.dot`) and not loops, as much as possible. If you manage to do that, you will most likely have a working code (and fast code!)
- (1 point) Test your algorithm on Fashion-MNIST: make a train / validation / test split , fit the model, compute the cross-val error, and the test error. Don't waste time on optimizing hyper-parameters (just take an $\eta$ small enough that you kind of converge. The goal is really to prove that your algorithm does not always crash :)

In [260]:
## remark:
import numpy as np
print(np.exp(100))

2.6881171418161356e+43


In [261]:
print(np.exp(800)- np.exp(800))

nan


  print(np.exp(800)- np.exp(800))
  print(np.exp(800)- np.exp(800))
