# Resnet Test create_dataset()
The purpose of this file is to test the **create_dataset()** function within the resnet_lib.py file.

The **create_dataset()** function loads tfrecord files and construct a dataset based on the input.

The experiment is conducted by loading the CIFAR-10 dataset from both the create_dataset() and keras builtin function. If the create_dataset() function works properly, the output file type, size and training results on the same model should be similair within two files.

The results of the test is in this [**Link**](https://docs.google.com/document/d/1EK9qNcKYkrYBxs-wUujxkuLCt5x21l8CccGkfoz2Njg/edit?usp=sharing)

In [17]:
from keras.utils import np_utils
from keras.datasets import cifar10
import numpy as np
from resnet_lib import create_dataset, create_model, test_callback, tune_model, visualize_model

In [8]:
# Global constants
NUM_CLASSES = 10
IMAGE_SIZE = 32
BATCH_SIZE = 64
EPOCHS = 10

# Directories
INPUT_TF_TRAIN = '../Mixmatch/ML_DATA/cifar10-train.tfrecord'
INPUT_TF_TEST = '../Mixmatch/ML_DATA/cifar10-test.tfrecord'

In [9]:
# Create dataset using both methods

# The control group: create dataset using Keras built in function
(X_keras_train, y_keras_train), (X_keras_test, y_keras_test) = cifar10.load_data()
y_keras_train = np_utils.to_categorical(y_keras_train, NUM_CLASSES)
y_keras_test = np_utils.to_categorical(y_keras_test, NUM_CLASSES)

# The experiment group: create dataset using create_dataset()
X_train, y_train = create_dataset(INPUT_TF_TRAIN)
X_test, y_test = create_dataset(INPUT_TF_TEST)
y_train = np_utils.to_categorical(y_train, NUM_CLASSES)
y_test = np_utils.to_categorical(y_test, NUM_CLASSES)

## Experiment 1
Check the shape and range of the datasets

In [23]:
print('X_train shape: ', X_train.shape, X_keras_train.shape)
print('y_train shape: ', y_train.shape, y_keras_train.shape)
print('image range: ', (np.amin(X_train), np.amax(X_train)), (np.amin(X_keras_train), np.amax(X_keras_train)))

X_train shape:  (50000, 32, 32, 3) (50000, 32, 32, 3)
y_train shape:  (50000, 10) (50000, 10)
image range:  (0, 255) (0, 255)


## Experiment 2
Train both datasets on the same model, the results should be similar

In [None]:
# Train on the create_dataset dataset
model = create_model(IMAGE_SIZE, NUM_CLASSES, lr=0.0001)
model.fit(x=X_train, y=y_train, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=0, validation_split=0.1, validation_freq=1, callbacks=[test_callback((X_test, y_test))])

# Train on the keras dataset
model = create_model(IMAGE_SIZE, NUM_CLASSES, lr=0.0001)
model.fit(x=X_keras_train, y=y_keras_train, epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=0, validation_split=0.1, validation_freq=1, callbacks=[test_callback((X_keras_test, y_keras_test))])