# Introduction
## 6.1  Define the Task

#### Dataset Information
Dataset will be the cifar100 dataset from tensorflow. This is a multi-class classification problem with 100 different classes of images to be classified. There are 500 training images and 100 testing images for each of the 100 classes so this is a balanced dataset.

#### Accuracy Metric
Balanced classification problems such as this a good measure of success will be accuracy and area under the reciever operating characteristic curve. Guessing all one type of class would lead to an accuracy of 1% since there are 100 different classes. 

#### Evaluation
The model will be evaluated using a holdout test set of 10,000 images consisting of 100 of each class. The training set will consist of 500 images of each class for a total of 50,000 images.




In [1]:
### Show data class information metrics
## show that the counts of the labels are balanced

In [2]:
from tensorflow.keras.datasets import cifar100

This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).

## 6.2 Develop a Model
### Data Preparation

The data consists of images that are 32x32.  There are 50,000 training images and 10,000 test images. 

In [3]:
(train_images, train_labels), (test_images, test_labels) = cifar100.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz


In [4]:
print('tensor shape')
print('\ttraining images:', train_images.shape)
print('\ttraining labels:', train_labels.shape)
print('\ttraining images:', train_images.dtype)
print('\ttraining labels:', train_labels.dtype)
print('\ttest images:', test_images.shape)
print('\ttest labels:', test_labels.shape)

tensor shape
	training images: (50000, 32, 32, 3)
	training labels: (50000, 1)
	training images: uint8
	training labels: int32
	test images: (10000, 32, 32, 3)
	test labels: (10000, 1)


## Preprocess
1. Reshape to flatten 32x32x3 to vector
2. Cast vector as floats
3. Rescale from [0,1]

In [6]:
train_images_flat = train_images.reshape((50000, 32*32*3)).astype('float32') / 255.
test_images_flat = test_images.reshape((10000, 32*32*3)).astype('float32') / 255.


In [7]:
train_images_flat[1]

array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)

In [8]:
train_labels

array([[19],
       [29],
       [ 0],
       ...,
       [ 3],
       [ 7],
       [73]])

In [9]:
from tensorflow.keras.utils import to_categorical

orig_label = train_labels[0]
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)


In [10]:

print('\'', orig_label, '\'', 'as one-hot vector:\n', train_labels[0], sep='')

'[19]'as one-hot vector:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0.]


### Baseline Model
Start smal, smallest amout of layers can do and small # of units then increase size

Only a baseline model if it has "statistical power" meaning ??

**Selecting loss function**
Chose categorical crossentropy but why?
General loss function for classification tasks where the evaluation metric is RCO AUC - need to figure out why that is. Used as a proxy, hope is that the lower the crossentropy the higher the ROC AUC will be 

**Final Layer**  
Multi-class single label classification => 
* last layer activation: **softmax**  
* Loss function: **categorical_crossentropy**


In [42]:
# build
from tensorflow.keras import models, layers

# create empty network
network = models.Sequential()

# add 2 layers
network.add(layers.Dense(128, activation='relu', input_shape=(32 * 32 * 3, )))
network.add(layers.Dense(100, activation='softmax'))


In [43]:
network.compile(optimizer='rmsprop',
               loss='categorical_crossentropy', 
               metrics=['accuracy'])


In [44]:
network.fit(train_images_flat, train_labels, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x216857907c0>

### 6.2.4 Develop a model that overfits
Once have model with statistical power => question is now is model sufficiently powerful?  
Enough layers? Enough Parameters? to properly model problem at hand  
Universal tension in ML is optimization vs generalization  

To figure out how big a model must develop model that overfits  
1. Add layers
2. make layers bigger
3. train for more epochs

Monitor the training loss and validation loss as well as training and validation values for any metrics that you care about  

**Overfitting:** Once see that model's performance on validation data begins to degrade




### 6.2.5 Hyperparamater tuning

goal now is to maximize generalization performance  
repeatedly modify model and train it and evaluate it (**on evaluation data NOT test data**)  



**Search Space**
1. learning rate
2. units
3. layers
    ii. add or remove layers
4. add dropout