# **Getting Images into the TuriCreate framework**
In the example we provided in `image_classifier.ipynb`, we used the Kaggle Cats and Dogs dataset, provided as a `.tar.gz` file. However, your image data may be in a different format or file structure. This notebook is intended as an example for how to take a dataset in a different format and get it into the Turi Create framework. 

Turi Create likes to load images from a directory where you pass the directory to the function `tc.load_images('directory_name')`. 

In this example, we are using the popular CIFAR-10 dataset which can be downloaded [here](https://www.cs.toronto.edu/~kriz/cifar.html). You will need to use the CIFAR-10 Python version. 

Importing this data into the Turi Create framework requires the following steps:

1. Batch load all of the CIFAR-10 images from the link above into a dictionary
2. Save this object into a local folder. This is because loading in-memory images into Turi Create is currently difficult, but loading from a file is easy. For more context, see:  https://github.com/apple/turicreate/issues/119
3. Load the images from this folder into Turi Create using `tc.load_images('directory_name')`
4. Identify the labels for each image based on the file path. 

In [1]:
# First, we will need to import the necessary libraries and create some helper functions. 
import os
import pickle
from PIL import Image

import pandas as pd
import numpy as np; 
import turicreate as tc


In [2]:
# Helper functions

# cifar images come in 'batches' that are pickle files, we use this helper to load the batches
def unpickle(file):
    with open(file, 'rb') as fo:
        _batch = pickle.load(fo, encoding='bytes')
    return _batch

# this was written to write the image to local file
# Loading in-memory images to a Turi Create SFrame is not easy, loading from file easy 
def save_cifar_image(img, img_name, folder):
    try:
        img_reshaped = np.transpose(np.reshape(img, (3, 32,32)), (1,2,0))
        image = Image.fromarray(img_reshaped.astype(np.int8), 'RGB')
        image.save(f"{folder}/{img_name.decode('utf-8')}")
        return {'success' : True}
    except Exception as e:
        return {'success' : False}

### **1. Batch Load the data**
Prior to this, make sure you've downloaded and extracted the **cifar-10-batches-py** from the CIFAR-10 website linked [here](https://www.cs.toronto.edu/~kriz/cifar.html)

In [3]:
# iterate over the batches, storing the image data and file names
master_batch = {'data': [], 'filenames' : []}
for i in range(5):
    batch = unpickle(f"./cifar-10-batches-py/data_batch_{i+1}")
    # append batch data and filenames to our master batch
    master_batch['data'] += list(batch[b'data'])
    master_batch['filenames'] += list(batch[b'filenames']) 

### **2. Save the data to a folder**
We take the object we created from loading the data and save it to a folder titled **cifar-10**.

In [4]:
# iterate over all images in master batch, saving to a folder titled cifar-10
write_results = []
os.mkdir("cifar-10")
for i in range(len(master_batch['data'])):
    # we save the images to file, keeping track of successes and failures
    write_results.append(save_cifar_image(master_batch['data'][i], master_batch['filenames'][i], './cifar-10')['success'])

print(f"Write success for {100 * (np.sum(write_results)/len(write_results))} % of results")

Write success for 100.0 % of results


### **3. Load into Turi Create**
We load the images using Turi Create's built in `tc.load_images` function.

In [5]:
cifar = tc.load_images('cifar-10') # turi create recursively loads all the images in 'cifar-10' and loads them into an SFrame

### **4. Create a label column**
In order to build this model, you will need to create a target for each image. The target is a label for each image and identifies it as an automobile, bird, cat, etc. For this data, the file path indicates the label for the image. 

In [6]:
# create a label column based on the path
cifar['label'] = cifar['path'].apply(lambda x: '_'.join(x.split("/")[-1].split("_")[0:-2]))

### **Fit a model**
To build a model with this data, you would call the `tc.image_classifier.create` function as shown below. 

In [None]:
# build an image sim
# image_classifier = tc.image_classifier.create(cifar, target='label')