# Cifar10 Neural Nets vs classic algorithms

On this notebook we are going to compare neural networks performanc against the classic algorithms on the same task.

## The dataset:
As we mentoioned, we are going to use the CIFAR10 data. Let's take a look:

The dataset is composed by 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. These are the image classes:
- airplane 
- automobile 
- bird
- cat
- deer
- dog
- frog
- horse
- ship
- truck

In [3]:
import tensorflow as tf

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

tf.keras.backend.clear_session()  # For easy reset of notebook state.

from tensorflow import keras
from tensorflow.keras.datasets import cifar10

num_classes = 10
# The data, split between train and test sets:
(trainX, y_train), (testX, y_test) = cifar10.load_data()
x_train = trainX.reshape(trainX.shape[0],-1)
x_test = testX.reshape(testX.shape[0], -1)

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (50000, 3072)
50000 train samples
10000 test samples


<font color=red><b>Plot some examples of the dataset.
<br>Hint: use the imshow function of the pyplot package</b>
</font>

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline
...

## Classic approach
One of the main takeouts from DL is that it extract features from the data. We are facing images. So, let's reduce the disadvantage of the classic algorithms and first, let's get some interesting features. We will do a feature extraction using pca, and then will test against some algorithms like random forest, svm, knn or logistic regression

### PCA

In [5]:
import numpy as np
from sklearn.metrics import accuracy_score

<font color=red><b>Perform the PCA fitting to the data and extract the optimal number of components to explain at least 95% of the variance explained
<br>Hint: find the appropiated attribute on the pca</b>
</font>

In [6]:
from sklearn.decomposition import PCA
...

<font color=red><b> Now create the pca with the given number of components. Fit the training and transform the testing data.
<br>Hint: Use whiten=True</b>
</font>

In [None]:
## Applying PCA with k calcuated above
...

<font color=red><b>  Train and evaluate the random forest
<br>Hint: What data are you using?</b>
</font>

In [None]:
from sklearn.ensemble import RandomForestClassifier
...
random_forest_score

<font color=red><b>  Train and evaluate the knn
<br>Hint: What data are you using?</b>
</font>

In [None]:
from sklearn.neighbors import KNeighborsClassifier
...
knn_score

<font color=red><b>  Train and evaluate the logistic regression
<br>Hint: What data are you using?</b>
</font>

In [None]:
from sklearn.linear_model import LogisticRegression
...
logistic_regression_score

<font color=red><b>  Train and evaluate the svm
<br>Hint: What data are you using?</b>
    <br>Hint: Don't do it. It takes forever.</b>
    <br>Hint: Really, please don't. It doesn't improve that much.</b>
</font>

### Results:

In [19]:
print("RandomForestClassifier : ", random_forest_score)
print("K Nearest Neighbors : ", knn_score)
print("Logistic Regression : ", logistic_regression_score)
# print("Support Vector Classifier : ", svc_score) # 0.4838

RandomForestClassifier :  0.0048
K Nearest Neighbors :  0.106
Logistic Regression :  0.402
Support Vector Classifier :  0.4838


## The neural nets
Will a neural net be able to improve those results? Let's see.

<font color=red><b> Build the architecture you like for the neural network. Is it better or worse?
</font>