# Formatting Labelbox Data Into Text Files

### Overview
For each `.jpg` image file that will be used for training there should also be another file with the same name but with `.txt`-extension. In the text file, we should have a separate line for each label associated with the image file where the object number and object coordinates are converted to yolo format. This means if the image has 1 label, the text file will have 1 line. Similarly, if the image has 3 labels, the text file will have 3 lines.

A single line in the text file should look like this: `<object-class> <x> <y> <width> <height>`

Where:
* `<object-class>` - integer number of object from 0 to (classes-1)
* `<x>` - float value of the x position of the center of the label relative to the width of the image (between 0.0 and 1.0)
* `<y>` - float value of the y position of the center of the label relative to the height of the image (between 0.0 and 1.0)
* `<width>` - float value of the width of the label relative to the width of the image (between 0.0 and 1.0)
* `<height>` - float value of the height of the label relative to the height of the image (between 0.0 and 1.0)

### Step 1: Replace Constants
First, we will replace the constants in this notebook so it works for our project! Change the following constants in the code cells below:
* `CSV_FILE`: This is the path to your Labelbox CSV
* `IMAGE_DIRECTORY`: This is the path to the directory where you want to store your images and text files
* `IMAGE_PREFIX`: Prefix to be used for image and text filenames
* `LABEL_TO_ID`: Dictionary that maps your objects to an integer ID -- ex: `{'dog': 0, 'cat': 1, 'bird': 2}`

In [20]:
# Import necessary packages
import pandas as pd
import numpy as np
import json
import urllib.request
from PIL import Image

In [21]:
# Load the CSV data into a pandas dataframe
CSV_FILE = ''
labelbox_data = pd.read_csv(CSV_FILE)

In [22]:
IMAGE_DIRECTORY = '' # Directory to store images
IMAGE_PREFIX = ''

In [23]:
# Create a dictionary that maps labels to IDs
LABEL_TO_ID = {}

### Step 2: Complete Functions
Once you have prepared your constants, you are ready for Step 2! As a team, complete the seven functions below. Pay special attention to the instructions in the function comment, the parameters being passed into the function, and the requested return value (if any). You can work together or divide and conquer.

In [24]:
def save_image(labelbox_data, ind):
    '''
    Use API to save image at a particular index to the images directory.
    Return the path where the image is stored.

    Parameters
    ----------
    labelbox_data : pandas dataframe
        Data from the labelbox CSV stored in a pandas dataframe
    ind : int
        The row index of the image we want to save

    Returns
    -------
    image_path : string
        The path where the image is stored
    '''
    pass

In [33]:
def get_object_class(label, LABEL_TO_ID):
    '''
    Gets integer value of object class from 0 to (classes-1)

    Parameters
    ----------
    label : dictionary
        Dictionary containing the label data

    Returns
    -------
    object_class : int
        ID of object class (should be between 0 and classes-1)

    >>> label = {"title": "A"}
    >>> LABEL_TO_ID = {"A": 1000, "B": 1, "C": 2}
    >>> get_object_class(label, LABEL_TO_ID)
    1000
    '''
    pass

In [16]:
def get_relative_x(label, image_width):
    '''
    Gets x position of the center of the label relative to the width of the image

    Parameters
    ----------
    label : dictionary
        Dictionary containing the label data
    image_width: int
        Width of the image

    Returns
    -------
    x_pos : float
        x position of the center of the label relative to the width of the image

    >>> label = {"bbox": {"top": 50, "left": 100, "height": 200 , "width": 400}}
    >>> image_width = 1000
    >>> get_relative_x(label, image_width)
    0.3
    '''
    pass

In [25]:
def get_relative_y(label, image_height):
    '''
    Gets y position of the center of the label relative to the height of the image

    Parameters
    ----------
    label : dictionary
        Dictionary containing the label data
    image_height: int
        Height of the image

    Returns
    -------
    y_pos : float
        y position of the center of the label relative to the height of the image

    >>> label = {"bbox": {"top": 50, "left": 100, "height": 200 , "width": 400}}
    >>> image_height = 1000
    >>> get_relative_y(label, image_height)
    0.15
    '''
    pass

In [18]:
def get_relative_width(label, image_width):
    '''
    Gets width of the label relative to the width of the image

    Parameters
    ----------
    label : dictionary
        Dictionary containing the label data
    image_width: int
        Width of the image

    Returns
    -------
    width : float
        Width of the label relative to the width of the image

    >>> label = {"bbox": {"top": 50, "left": 100, "height": 200 , "width": 400}}
    >>> image_width = 1000
    >>> get_relative_width(label, image_width)
    0.4
    '''
    pass

In [23]:
def get_relative_height(label, image_height):
    '''
    Gets height of the label relative to the height of the image

    Parameters
    ----------
    label : dictionary
        Dictionary containing the label data
    image_height: int
        Height of the image

    Returns
    -------
    height : float
        Height of the label relative to the height of the image

    >>> label = {"bbox": {"top": 50, "left": 100, "height": 200 , "width": 400}}
    >>> image_height = 1000
    >>> get_relative_height(label, image_height)
    0.2
    '''
    pass

In [20]:
def write_yolo_label(object_class, x, y, width, height):
    '''
    Write Labelbox data in YOLO format to a txt file.
    YOLO format reminder: <object_class> <x> <y> <width> <height>

    Parameters
    ----------
    object_class : int
        Integer number of object from 0 to (classes-1)
    x: float
        x position of the center of the label relative to the width of the image
    y: float
        y position of the center of the label relative to the height of the image
    width: float
        Width of the label relative to the width of the image
    height: float
        Height of the label relative to the height of the image

    Returns
    -------
    None
    '''
    pass

### Step 3: Test Functions
We want to test our functions to make sure our functions are producing the correct values. We will be using doctests to ensure that our functions are producing the correct values. If you have written a function correctly, it will pass its doctests. If a function returns a value that doesn't match the expected value, it will fail the doctest. You should debug any failed doctests before moving on to the final step.

**Note**: The functions `save_image` and `write_yolo_label` don't have doctests, so have your instructor check them out before moving on to the final step.

In [31]:
import doctest
doctest.testmod()

TestResults(failed=0, attempted=15)

### Step 4: Putting It All Together
Congrats on reaching the final step in the notebook. If everything else has been done correctly, all you need to do for this step is run the cell below. After you run the cell, go check your `IMAGE_DIRECTORY` directory. You should see your images and text files being saved there!

**Important: We have provided the code for you here, but look it over to make sure you understand it. If you don't, make sure you ask your instructor.**

In [33]:
# Loop through each image in the Labelbox CSV data
for ind in labelbox_data.index:

    # Extract the label info from a single image (at index ind)
    try:
        label_json = json.loads(labelbox_data['Label'][ind])
        if label_json == {}: continue # Skipped labels
    except:
        continue

    # Save the image to your images directory
    image_path = save_image(labelbox_data, ind)
    image = Image.open(image_path)
    image_width, image_height = image.size

    # Open txt file
    text = open(IMAGE_DIRECTORY + IMAGE_PREFIX + str(ind) + '.txt', 'a+')

    # Loop through each label in the image
    for label in label_json['objects']:

        # Get necessary components for the text file
        object_class = get_object_class(label, LABEL_TO_ID)
        x = get_relative_x(label, image_width)
        y = get_relative_y(label, image_height)
        width = get_relative_width(label, image_width)
        height = get_relative_height(label, image_height)

        # Write the Labelbox data to the text file
        write_yolo_label(object_class, x, y, width, height)

    # Close txt file
    text.close()