#CIFAR-10: Object Recognition in Images Tutorial

##Synopsis
The CIFAR-10 dataset consists of 60,000 labeled images. Therea are a total of 10 classes and each image is 32x32 pixels. The original data can be found [here](https://www.cs.toronto.edu/~kriz/cifar.html). Alternatively, one can download a more friendly version from the Kaggle website located [here](https://www.kaggle.com/c/cifar-10). It was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton [[1]](http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf). The objective of this tutorial is to accurately classify the 50,000 training png images via convoloutional neural nets and check the misclassification error against the test set of 10,000. If you are downloading the version from Kaggle please read the fine print. They have added 290,000 "junk" images to prevent hand-labeling. These are ignored during the scoring process. Therefore, we will be submitting ther results to the Kaggle leaderboard instead of manually calculating the error on the test set.

##Downloading and Loading Data
1\. Let's first download and unzip the training and testing datasets from Kaggle.

*Note: In order to install the **PIL** module you might simply run "**pip install PIL**" from command line or the extended version of the command "**pip install PIL --allow-external PIL --allow-unverified PIL**". Documentation for the second version of the command can be found [here](http://stackoverflow.com/questions/21242107/pip-install-pil-dont-install-into-virtualenv)*

In [1]:
from PIL import Image
import sys
import os
import urllib
import zipfile
import webbrowser

In [2]:
###FIX add script to download http://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/

def download_file(url):
    # Split on the rightmost / and take everything on the right side of that
    filename = url.rsplit('/', 1)[-1]
    
    #download file    
    if not os.path.isfile(filename):
        urllib.urlretrieve(url,filename)
        
        #unzip if it is a zip file
        if url.find(".7z") > 0:
            with zipfile.Zipfile(filename,"r") as z:
                z.extractall()

*Please be patient while the code below runs as you are downloading a large amount of data (~700MB).*

In [7]:
urls = ["https://www.kaggle.com/c/cifar-10/download/test.7z", "https://www.kaggle.com/c/cifar-10/download/train.7z",
    "https://www.kaggle.com/c/cifar-10/download/trainLabels.csv"]
for url in urls:
    download_file(url)

2\. We will convert the downloaded png files to csv by reading in the red-blue-green values with a helpful [script](https://github.com/chrispy645/kaggle-cifar10-extract/blob/master/getpx.py) created by Christopher Beckham.

  * Read in training labels by splitting each row based on comma and selecting the second item in the returned list (i.e. the actual label).

In [3]:
labels = []
f = open("trainLabels.csv")
for line in f:
    labels.append( '"' + line.rstrip().split(',')[1] + '"' )
f.close()

  * Each image will contain 32x32 pixels. Create a list of column names to store the rgb value of each of those 32x32 pixels. 

In [4]:
arr = []
for i in range(0,32*32):
    for j in ['r','g','b']:
        arr.append('"px' + j + str(i) + '"')
arr.append('"class"')
#print ",".join(arr)

  * Read in the training data

In [5]:
csv = []
for x in range(1, 50000+1): #50000+1
    im = Image.open('train/' + str(x) + '.png')

    #Open an image (only try after adjusting range to range(1,2+1) )
    #webbrowser.open('train\\' + str(x) + '.png')
 
    arr = []
    for i in range(0, 32):
        for j in range(0, 32):
            tp = im.getpixel((i,j))
            arr.append( str(tp[0]) )
            arr.append( str(tp[1]) )
            arr.append( str(tp[2]) )

    #print ",".join(arr) + "," + labels[x]
    #len(arr)==32*32*3

3072

1024