# Introduction
For our final project we are using the CIFAR10 dataset. The dataset consists of 60000 RGB images that are divided into ten different classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. Each image is 32x32 pixels and 50000 of them are dedicated as training images and 10,000 of them are for testing. Each class has 5,000 images. The labels are numerical numbers 0-9 that represent the different classes.

# Plan
Our goal is to create a Convolutional Neural Network that will achieve the highest accuracy on the test set. Some of our concerns are if there are too many images for the computing power we have currently. There are several paid options that we can utilize, but computing time is a concern with us.

First, we plan to try and put our data into the basic MNIST CNN that we have been using in class. From there, we will make necessary changes. We would like to look into data augmentation and see if creating new image objects from the images we already have would help training.

## Project Status
We were able to get an initial model running, but the accuracy was low and variable, ranging from 20-30%. However, this is not too bad because if we were to make predictions at random, we would expect to get 10% accuracy. We were using the generic CNN model we were using for the MNIST data, which could be why it is not doing as well, because the MNIST data did not take color and filters into account. We used some of the starter code from the in class lab to try and create a model that would make more accurate predictions.

When first training, there was a point in which the training accuracy dropped from 1.0 to .10. Using small batch sizes, such as 64, resulted in strange behavior.  Increasing the batch size to around 1000 significantly increased the training and testing accuracy. 

It may be possible that the huge decrease in accuracy occurs because the net was fed sets of images that happened to be from the same class, and then when the next batch was comprised of different classes, the net performed poorly. Therefore, in combining dropout with larger batch sizes, we think that we can remove this issue. Preliminary testing suggests that dropout does not help, though we have not experimented with it enough to definitively say.

## Further Work
For the rest of our project we plan to try and dive deep into different issues with this data set and figure out what exactly the model needs to be successful. Moving different layers and pooling around did not seem to help, so we are thinking of adding more layers and pooling layers.

In [6]:
import tensorflow as tf
import numpy as np
import seaborn as sns
from __future__ import division, print_function, unicode_literals # is this needed?

In [7]:
def unpickle(file):
    import pickle
    with open(file, 'rb') as fo:
        dict = pickle.load(fo, encoding='bytes')
    return dict

In [8]:
data_path = "cifar-10-batches-py/data_batch_"
batches = []

for i in range(1, 6):
    batches.append(unpickle(data_path + str(i)))
    
test_path = "cifar-10-batches-py/test_batch"
test = unpickle(test_path)

In [9]:
X_train = batches[0][b'data']
y_train = batches[0][b'labels']

for i in range(1,5):
    X_train = np.concatenate((X_train, batches[i][b'data']))
    y_train = np.concatenate((y_train, batches[i][b'labels']))
    
test_data = test[b'data']
test_labels = test[b'labels']

X_test_full = test_data
y_test_full = np.array(test_labels)

In [10]:
numbers = np.array(list(range(X_test_full.shape[0])))
indices = np.random.choice(numbers, X_test_full.shape[0]//2, replace=False)

other_indices = [x not in indices for x in numbers]

X_test = X_test_full[indices]
y_test = y_test_full.reshape(-1,)[indices] 

X_val = X_test_full[other_indices]
y_val = y_test_full.reshape(-1,)[other_indices]

In [11]:
height = 32
width = 32
channels = 3
n_inputs = height * width * channels

conv1_fmaps = 32
conv1_ksize = 3
conv1_stride = 1
conv1_pad = "SAME"

conv2_fmaps = 64
conv2_ksize = 3
conv2_stride = 2
conv2_pad = "SAME"

pool3_fmaps = conv2_fmaps

n_fc1 = 64
n_outputs = 10

In [12]:
tf.reset_default_graph()

he_init = tf.contrib.layers.variance_scaling_initializer()

with tf.name_scope("inputs"):
    X = tf.placeholder(tf.float32, shape=[None, n_inputs], name="X")
    X_reshaped = tf.reshape(X, shape=[-1, height, width, channels])
    y = tf.placeholder(tf.int32, shape=[None], name="y")

conv1 = tf.layers.conv2d(X_reshaped, filters=conv1_fmaps, kernel_size=conv1_ksize,
                         strides=conv1_stride, padding=conv1_pad,
                         activation=tf.nn.relu, name="conv1")
conv2 = tf.layers.conv2d(conv1, filters=conv2_fmaps, kernel_size=conv2_ksize,
                         strides=conv2_stride, padding=conv2_pad,
                         activation=tf.nn.relu, name="conv2")

with tf.name_scope("pool3"):
    pool3 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
    pool3_flat = tf.reshape(pool3, shape=[-1, pool3_fmaps * 8 * 8])

with tf.name_scope("fc1"):
    fc1 = tf.layers.dense(pool3_flat, n_fc1, kernel_initializer=he_init, activation=tf.nn.relu, name="fc1")

#drop = tf.layers.dropout(fc1, .2, name="drop")
    
#fc2 = tf.layers.dense(drop, n_fc1*2, kernel_initializer=he_init, activation=tf.nn.relu, name="fc2")

with tf.name_scope("output"):
    logits = tf.layers.dense(fc1, n_outputs, name="output")
    Y_proba = tf.nn.softmax(logits, name="Y_proba")

with tf.name_scope("train"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)
    loss = tf.reduce_mean(xentropy)
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001,
                                        beta1=0.9,
                                        beta2=0.999,
                                        epsilon=1e-06,)
    
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

with tf.name_scope("init_and_save"):
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()

In [13]:
def fetch_batch(epoch, batch_size):
    np.random.seed(epoch * batch_size) 
    indices = np.random.randint(X_train.shape[0], size=batch_size) 
    X_batch = X_train[indices]
    y_batch = y_train[indices] 
    return X_batch, y_batch

In [14]:
n_epochs = 50
batch_size = 4000

train_acc = []
test_acc = []

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(X_train.shape[0] // batch_size):
            
            X_batch, y_batch = fetch_batch(epoch, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        print(epoch, "Train accuracy:", acc_train)
        train_acc.append(acc_train)
        
        acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
        print(epoch, "Test accuracy:", acc_test)
        test_acc.append(test_acc)
        
    acc_other_test = accuracy.eval(feed_dict={X:X_val, y:y_val})
    print("Other Test accuracy:", acc_other_test)

0 Train accuracy: 0.1085
0 Test accuracy: 0.1068
1 Train accuracy: 0.114
1 Test accuracy: 0.1142
2 Train accuracy: 0.14725
2 Test accuracy: 0.1344
3 Train accuracy: 0.19475
3 Test accuracy: 0.1816
4 Train accuracy: 0.26125
4 Test accuracy: 0.242
5 Train accuracy: 0.3055
5 Test accuracy: 0.2878
6 Train accuracy: 0.365
6 Test accuracy: 0.3302
7 Train accuracy: 0.4015
7 Test accuracy: 0.3522
8 Train accuracy: 0.418
8 Test accuracy: 0.373
9 Train accuracy: 0.44725
9 Test accuracy: 0.378
10 Train accuracy: 0.474
10 Test accuracy: 0.3908
11 Train accuracy: 0.485
11 Test accuracy: 0.3984
12 Train accuracy: 0.5
12 Test accuracy: 0.41
13 Train accuracy: 0.4635
13 Test accuracy: 0.4056
14 Train accuracy: 0.51175
14 Test accuracy: 0.4288
15 Train accuracy: 0.52325
15 Test accuracy: 0.4274
16 Train accuracy: 0.5485
16 Test accuracy: 0.435
17 Train accuracy: 0.53425
17 Test accuracy: 0.4372
18 Train accuracy: 0.569
18 Test accuracy: 0.4444
19 Train accuracy: 0.5625
19 Test accuracy: 0.4418
20 Train