# Machine Learning Engineer Nanodegree Program

### Capstone Project : Object Detection, Classification and Recognition of Corvette Generations

#### By: Joel Haas
#### October 2019

## Data Labeling

This script was used to label the dataset so that I could measure the accuracy of my predictions.

### Import Libraries

In [1]:
import numpy as np
import pandas as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import csv

In [2]:
path = r"D:\D Drive\DataScience\MachineLearningEngineerNanodegree\capstoneProject\\"

### Create CSV Ouput File

In [3]:
# path to save the labels
save_path = r"D:\D Drive\DataScience\MachineLearningEngineerNanodegree\capstoneProject\\"

# header for the labels
header = ['id','car','corvette','c1','c2','c3','c4','c5','c6','c7','c8','classification']

# create a new csv for the subset of training data
#labels = open(save_path + "labels\labels_training_data.csv", 'a', newline="")

# create a new csv for the test dataset
#labels = open(save_path + "labels\labels_test_data.csv", 'a', newline="")

# create a new csv for the test dataset
labels = open(save_path + "labels\labels_training_10000.csv", 'a', newline="")

with labels:
    writer = csv.writer(labels)
    writer.writerow(header)

### Image Labeling

Load each image, parse the filename, and then label accordingly.  Prior to this step, and after I had collected all the images, I manually named the images so that I could parse the names and create labels.  

In [4]:
def load_images(folder):
    #images = []
    
    # iterate through the folder and each image filename
    for filename in os.listdir(folder):
        try:
            # try to read each image
            img = mpimg.imread(os.path.join(folder, filename))
            
            # if able to read the image, then create the label
            if img is not None:
                #images.append(img)
                labels = filename.split('_')
                split_labels(filename, labels)
        except:
            continue
    
    #return images

### File Name Parsing and Labeling

Based on the file naming convention I used, I was able to label the data.  I did the following for my file naming convention: I organized all the images into their respective folder, e.g. 1st generation corvettes when into a 'C1' folder, pictures of beaches and mountains when into a 'no-cars' folder, etc.

I manually reviewed the images in each folder to ensure that the images were categorized correctly.  For example, when I scraped for 1957 corvettes (1st generation, C1), some of the returned images were 3rd generation corvettes, C3.  I had to manually QA each image in each folder to ensure the images were categorized correctly.  

Once I completed QA of the images, then I manually renamed them.  I highlighted all the images in the folder, and renamed one of them.  For example, for the C1 folder, I renamed them all as follows: 'corvette_1_1_1_ '

My PC then renamed all the images in the C1 folder as follows:

* corvette_1_1_1 (1)
* corvette_1_1_1 (2)
* corvette_1_1_1 (3) 
* etc.

I did the same with the images in the other folders.

So now, when I parse the filename as you see in the function below, I know this image is a car (the first '1'), the car is a corvette (the second '1'), and the corvette is a C1 (the third '1'). 

In [5]:
def split_labels(filename,labels):
    """ use the filename to determine the appropriate labels and classification"""
    
    label_set = []
    
    # if filename begins with 'cars', then assign the following label
    # image is a car, but not a corvette
    if labels[0] == 'cars':
        label_set.append((filename, '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', ["car", "not_corvette"]))
    
    # if filename begins with 'corvette', then assign the following labels
    elif labels[0] == 'corvette':
        if labels[3] == '1':  #C1
            label_set.append((filename,'1', '1',  '1', '0', '0', '0', '0', '0', '0', '0', ["car", "corvette", "c1"]))
        
        if labels[3] == '2': #C2
            label_set.append((filename,'1', '1', '0', '1', '0', '0', '0', '0', '0', '0', ["car", "corvette", "c2"]))
        
        if labels[3] == '3': #C3
            label_set.append((filename,'1', '1', '0', '0', '1', '0', '0', '0', '0', '0', ["car", "corvette", "c3"]))
        
        if labels[3] == '4': #C4
            label_set.append((filename,'1', '1', '0', '0', '0', '1', '0', '0', '0', '0', ["car", "corvette", "c4"]))
        
        if labels[3] == '5': #C5
            label_set.append((filename,'1', '1','0', '0', '0', '0', '1', '0', '0', '0', ["car", "corvette", "c5"]))
        
        if labels[3] == '6': #C6
            label_set.append((filename,'1', '1', '0', '0', '0', '0', '0', '1', '0', '0', ["car", "corvette", "c6"]))
        
        if labels[3] == '7': #C7
            label_set.append((filename,'1', '1', '0', '0', '0', '0', '0', '0', '1', '0', ["car", "corvette", "c7"]))
       
        if labels[3] == '8': #C8
            label_set.append((filename,'1', '1', '0', '0', '0', '0', '0', '0', '0', '1', ["car", "corvette", "c8"]))
    
    # not a car
    else:
        label_set.append((filename,'0', '0', '0', '0', '0', '0', '0', '0', '0', '0', ["not_car", "not_corvette"]))
    
    save_labels(label_set)

### Append the CSV File with the Labels

In [6]:
def save_labels(label_set):
    save_path = r"D:\D Drive\DataScience\MachineLearningEngineerNanodegree\capstoneProject\labels\\"
    
    # subset of training data labels
    #labels = open(save_path + "labels_training_data.csv", 'a', newline="")
    
    # test data labels
    #labels = open(save_path + "labels_test_data.csv", 'a', newline="")
    
    # full set of training data labels
    labels = open(save_path + "labels_training_10000.csv", 'a', newline="")
        
    with labels:
        writer = csv.writer(labels)
        writer.writerows(label_set)

### Run the Functions Above

I needed to create labeled data for the following: 

1) a small subset of training data that I used for testing

2) my hold out test set that I used to measure my model's performance, and 

3) my full training set

In [7]:
# subset of training data labels
#load_images(path+'images\dataSet_training')

# test data labels
#load_images(path+'images\\testSet')

# full set of training data labels
load_images(path+'images\\dataSet_10000')

  " Skipping tag %s" % (size, len(data), tag))
  " Skipping tag %s" % (size, len(data), tag))
  " Skipping tag %s" % (size, len(data), tag))
