# **Getting Images into TuriCreate framework**
Getting images into Turi Create to build a model off of, whether that be an image classifer, image similarity or style transfer model, can be very easy, if you work within the Turi Create framework a certain way. To show you how to do this with your data, we used the popular CIFAR-10 dataset which can be downloaded [here](https://www.cs.toronto.edu/~kriz/cifar.html). Make sure to download the python batches. 

**TL;DR** Turi Create likes to load images from a directory where you pass the directory to the function `tc.load_images('directory_name')`. 

In [1]:
from PIL import Image
import pandas as pd; import numpy as np; 
import turicreate as tc
import pickle; import os

Why all this code? See https://github.com/apple/turicreate/issues/119


In [2]:
# cifar images come in 'batches' that are pickle files, we use this helper to load the batches
def unpickle(file):
    with open(file, 'rb') as fo:
        _batch = pickle.load(fo, encoding='bytes')
    return _batch

# this was written to write the image to local file
# Loading in-memory images to a Turi Create SFrame is not easy, loading from file easy 
def save_cifar_image(img, img_name, folder):
    try:
        img_reshaped = np.transpose(np.reshape(img, (3, 32,32)), (1,2,0))
        image = Image.fromarray(img_reshaped.astype(np.int8), 'RGB')
        image.save(f"{folder}/{img_name.decode('utf-8')}")
        return {'success' : True}
    except Exception as e:
        return {'success' : False}

### **Load the data**
Prior to this, make sure you've downloaded and extracted the **cifar-10-batches-py** from the CIFAR-10 website linked [here](https://www.cs.toronto.edu/~kriz/cifar.html)
- We create our own master batch
- Load each batch pickle file
- Store them in our own object

In [3]:
# iterate over the batches, storing the image data and file names
master_batch = {'data': [], 'filenames' : []}
for i in range(5):
    batch = unpickle(f"./cifar-10-batches-py/data_batch_{i+1}")
    # append batch data and filenames to our master batch
    master_batch['data'] += list(batch[b'data'])
    master_batch['filenames'] += list(batch[b'filenames'])

    

### **Save the data to a folder**
We take the object we created from loading the data and save it to a folder titled **cifar-10**.

In [4]:
# iterate over all images in master batch, saving to a folder titled cifar-10
write_results = []
os.mkdir("cifar-10")
for i in range(len(master_batch['data'])):
    # we save the images to file, keeping track of successes and failures
    write_results.append(save_cifar_image(master_batch['data'][i], master_batch['filenames'][i], './cifar-10')['success'])

print(f"Write success for {100 * (np.sum(write_results)/len(write_results))} % of results")

Write success for 100.0 % of results


### **Load into Turi Create**
We load the images using Turi Create's built in `tc.load_images` function.

In [5]:
cifar = tc.load_images('cifar-10') # turi create recursively loads all the images in 'cifar-10' and loads them into an SFrame

In [6]:
# create a label column based on the path
cifar['label'] = cifar['path'].apply(lambda x: '_'.join(x.split("/")[-1].split("_")[0:-2]))

### **Fit a model**
Building the model is extremely simple once you've loaded the data. Both the concepts and the implementation of image similarity and image classification models are very similar. The only thing that differs is the image classification model requires a target column name

In [None]:
# build an image sim
# image_classifier = tc.image_classifier.create(cifar, target='label')
# image_similarity = tc.image_similarity.create(cifar)