This notebook's aim is to save you some mundane work of data preprocessing and extraction.

In [1]:
import pickle
import numpy as np

# first let's provide some helper funtions
def unpickle(file_path):
    '''This function unpickles the training/testing file.'''
    fo = open(file_path, 'rb')
    raw_dict = pickle.load(fo, encoding='bytes')
    fo.close()
    return raw_dict

def onehot_labels(labels):
    '''This function turns numeric labels into one-hot encoding.'''
    return np.eye(100)[labels]

def get_proper_images(raw):
    '''This function turns values of dict to numpy arrays of pixels'''
    raw_float = np.array(raw, dtype=float) 
    images = raw_float.reshape([-1, 3, 32, 32])
    images = images.transpose([0, 2, 3, 1])
    return images

Next steps assume you downloaded test_cleaned and train_cleaned and placed in in the same directory as your notebook/script.

Both test_cleaned and train_cleaned are pickled Python dictionaries. Let's see wha

In [2]:
# now let's unpickle and preprocess the data to traing the model.
# note you don't have the labels of testing data.
x_train = get_proper_images(unpickle('train_cleaned')['data'])
y_train = onehot_labels(unpickle('train_cleaned')['fine_labels'])
x_test = get_proper_images(unpickle('test_cleaned')['data'])

The data is ready. I will show you how to generate your submission using simple Keras model as an example

In [5]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D

Using TensorFlow backend.


In [6]:
# here are some constants for the model
batch_size = 32
num_classes = 100
epochs = 1
num_predictions = 20

In [7]:
# here we define the actual model
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

In [8]:
# initiate RMSprop optimizer
opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
# let's train the model using RMSprop
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics=['accuracy'])

In [9]:
# data preprocessing
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

In [10]:
# actually train the model
model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              shuffle=False)

Epoch 1/1


<keras.callbacks.History at 0x7fd060f1ea58>

In [11]:
# here we generate predictions of our model on testing set (unlabeled)
predictions = model.predict(x_test, verbose=1)



In [15]:
import pandas as pd

# now we need some Pandas magic to create the .csv submission file
model_predictions = [np.argmax(predictions, axis=1)
                     for _ in range(len(predictions))]
filenames = unpickle('test_cleaned')['filenames']
submission_df = pd.DataFrame({'filename': filenames,
                             'predicted_class': model_predictions[0]})

In [17]:
# make sure columns are in right order and we are good to go!
submission_df = submission_df.reindex(columns=['filename', 'predicted_class'])
submission_df.to_csv('my_submission.csv', index=False)
submission_df.head(5)

Unnamed: 0,filename,predicted_class
0,b'volcano_s_000012.png',49
1,b'woods_s_000412.png',80
2,b'seal_s_001803.png',90
3,b'mushroom_s_001755.png',54
4,b'adriatic_sea_s_000653.png',36


Proper format of submmission file is as presented above. First column must be 'filename' with name of the file (with this 'b' prefix for string) and second column must be 'predicted_class' with model's prediction for particular image with specified filename.

Running the above cell wil produce my_submission.csv file which is to be sent to Kaggle for evaluation. The score you will get is the actual accuracy of your model on test set.