<a href="https://colab.research.google.com/github/mcgovey/compvision-playing-card-detection/blob/master/TuriCreate_PlayingCard_sFrameCreate.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Set Creation for Object Detection Model using Turi Create
Author: Kevin McGovern

Last Updated: May 28, 2020

This notebook is used to create an sframe that will be used by Turi Create to build an object detection model.



## Dependencies Install

In [0]:
%%capture
!pip install turicreate

## Imports and Create Reference Files

In [0]:
import turicreate as tc
import os

In [0]:
# create output files
file_name = "img_class_train"
image_path = 'drive/My Drive/Data/card_train/'
csv_path = image_path + file_name + ".csv"

# set ref variables used below to save SFrame file
sFrameName = 'ig03.sframe'
model_path = 'drive/My Drive/Data/model_data/'
sFramePath = model_path + sFrameName

## Load the data and manipulate csv
Read the csv data containing the annotations into a SFrame then create the bounding boxes in the required format.

In [6]:
# import csv with annotations
csv_sf = tc.SFrame.read_csv(csv_path)

------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,int,str,int,int,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------


In [7]:
def row_to_bbox_coordinates(row):
    """
    Takes a row and returns a dictionary representing bounding
    box coordinates:  (center_x, center_y, width, height)  e.g. {'x': 100, 'y': 120, 'width': 80, 'height': 120}
    """
    return {'x': row['xMin'] + (row['xMax'] - row['xMin'])/2, 
            'width': (row['xMax'] - row['xMin']),
            'y': row['yMin'] + (row['yMax'] - row['yMin'])/2, 
            'height': (row['yMax'] - row['yMin'])}

csv_sf['coordinates'] = csv_sf.apply(row_to_bbox_coordinates)
# delete no longer needed columns
del csv_sf['id'], csv_sf['xMin'], csv_sf['xMax'], csv_sf['yMin'], csv_sf['yMax']
# rename columns
csv_sf = csv_sf.rename({'name': 'label', 'image': 'name'})
csv_sf

name,label,coordinates
000077131.jpg,3d,"{'x': 434.5, 'width': 59, 'y': 412.5, 'height': ..."
000077131.jpg,3d,"{'x': 436.5, 'width': 59, 'y': 84.5, 'height': 67} ..."
000077131.jpg,3d,"{'x': 467.0, 'width': 52, 'y': 176.0, 'height': ..."
000077131.jpg,3d,"{'x': 281.5, 'width': 51, 'y': 270.0, 'height': ..."
000119307.jpg,10c,"{'x': 472.0, 'width': 74, 'y': 328.5, 'height': ..."
000119307.jpg,2h,"{'x': 473.0, 'width': 80, 'y': 387.0, 'height': ..."
000119307.jpg,Ks,"{'x': 463.5, 'width': 83, 'y': 456.0, 'height': ..."
000119307.jpg,Ks,"{'x': 189.0, 'width': 78, 'y': 614.0, 'height': ..."
000254235.jpg,8c,"{'x': 419.0, 'width': 56, 'y': 59.5, 'height': 71} ..."
000254235.jpg,8d,"{'x': 481.5, 'width': 61, 'y': 114.5, 'height': ..."


## Load Images from Drive Folder

In [8]:
# Load all images in random order
sf_images = tc.image_analysis.load_images(image_path, recursive=True,
                                       random_order=True)

## SFrame Manipulation
Transform SFrame to join image SFrame to CSV

In [9]:
# Split path to get filename
info = sf_images['path'].apply(lambda path: os.path.basename(path).split('/')[:1])

# Rename columns to 'name'
info = info.unpack().rename({'X.0': 'name'})

# Add to our main SFrame
sf_images = sf_images.add_columns(info)

# Original path no longer needed
del sf_images['path']

sf_images

image,name
Height: 720 Width: 720,232416588.jpg
Height: 720 Width: 720,673916767.jpg
Height: 720 Width: 720,576948951.jpg
Height: 720 Width: 720,266277850.jpg
Height: 720 Width: 720,020525758.jpg
Height: 720 Width: 720,462193768.jpg
Height: 720 Width: 720,594121308.jpg
Height: 720 Width: 720,208175088.jpg
Height: 720 Width: 720,227039224.jpg
Height: 720 Width: 720,833251029.jpg


In [10]:
# Combine label and coordinates into a bounding box dictionary
csv_sf = csv_sf.pack_columns(['label', 'coordinates'], new_column_name='bbox', dtype=dict)

# Combine bounding boxes of the same 'name' into lists
sf_annotations = csv_sf.groupby('name', 
    {'annotations': tc.aggregate.CONCAT('bbox')})
sf_annotations

name,annotations
764998605.jpg,"[{'label': '6c', 'coordinates': {'x': ..."
924090913.jpg,"[{'label': '7d', 'coordinates': {'x': ..."
542643312.jpg,"[{'label': '10c', 'coordinates': {'x': ..."
014614477.jpg,"[{'label': '5s', 'coordinates': {'x': ..."
765215291.jpg,"[{'label': 'Kd', 'coordinates': {'x': ..."
611073900.jpg,"[{'label': 'Jc', 'coordinates': {'x': ..."
031798526.jpg,"[{'label': '7d', 'coordinates': {'x': ..."
132050878.jpg,"[{'label': '10c', 'coordinates': {'x': ..."
500000102.jpg,"[{'label': '9s', 'coordinates': {'x': ..."
215858432.jpg,"[{'label': '8d', 'coordinates': {'x': ..."


In [11]:
# Join annotations with the images. Note, if some images do not have annotations,
# we will still want to keep them in the dataset. This is why it is important to
# a LEFT join.
sf = sf_images.join(sf_annotations, on='name', how='left')
sf

image,name,annotations
Height: 720 Width: 720,764998605.jpg,"[{'label': '6c', 'coordinates': {'x': ..."
Height: 720 Width: 720,924090913.jpg,"[{'label': '7d', 'coordinates': {'x': ..."
Height: 720 Width: 720,542643312.jpg,"[{'label': '10c', 'coordinates': {'x': ..."
Height: 720 Width: 720,014614477.jpg,"[{'label': '5s', 'coordinates': {'x': ..."
Height: 720 Width: 720,765215291.jpg,"[{'label': 'Kd', 'coordinates': {'x': ..."
Height: 720 Width: 720,611073900.jpg,"[{'label': 'Jc', 'coordinates': {'x': ..."
Height: 720 Width: 720,031798526.jpg,"[{'label': '7d', 'coordinates': {'x': ..."
Height: 720 Width: 720,132050878.jpg,"[{'label': '10c', 'coordinates': {'x': ..."
Height: 720 Width: 720,500000102.jpg,"[{'label': '9s', 'coordinates': {'x': ..."
Height: 720 Width: 720,215858432.jpg,"[{'label': '8d', 'coordinates': {'x': ..."


In [0]:
# The LEFT join fills missing matches with None, so we replace these with empty
# lists instead using fillna.
sf['annotations'] = sf['annotations'].fillna([])

## Save SFrame to folder specified above

In [0]:
# Save SFrame to drive folder
sf.save(sFramePath)

## References

- [Turi Create API SFrame Docs](https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.html)
- [Turi Create Sample Code for Object Detection](https://github.com/apple/turicreate/tree/master/userguide/object_detection)
- [Turi Create SFrame Example Using CSV instead of image masks](https://apple.github.io/turicreate/docs/userguide/object_detection/data-preparation.html)