# Prepare Images

This notebook will help you to prepare training and test sets from provided images.

## 1. Copy images

To be able to access images from notebook, copy them to any directory inside your notebook root directory.
Set `images_dir` value to path of directory with images.


In [1]:
images_dir = './images'

## 2. Label images

Run below code to create CSV file called `labels.csv` with list of all images from `images_dir` directory.
Next, open CSV file and put appropriate value in `is_guitar` column. `1` if image contains guitar. `0` otherwise.

In [9]:
import os
import pandas as pd

if not os.path.exists('labels.csv'):
    df = pd.DataFrame(columns=['file_name', 'is_guitar'])

    for filename in os.listdir(images_dir):
        df = df.append({'file_name': filename, 'is_guitar': 0}, ignore_index=True)

    df.to_csv('labels.csv')
    print('Labels saved')
else:
    print("labels.csv exists. Skipping")

labels.csv exists. Skipping


## 3. Load and resize images
Open labels.csv file. Read all mentioned images into memory and resize them into 100x100px.

In [25]:
import numpy as np
import matplotlib.pyplot as plt
import skimage.transform

data = pd.read_csv('./labels.csv')
images = np.empty((0, 100, 100, 3))
labels = np.empty((0, 1))

for filename in data.file_name:
    path = '%s/%s' % (images_dir, filename)
    image = np.array(plt.imread(path))
    image = skimage.transform.resize(image, (100, 100), mode='reflect')
    images = np.append([image], images, axis=0)
    
for is_guitar in data.is_guitar:
    labels = np.append([is_guitar], labels)

## 4. Create training and test sets
Now, split images and labels into training and test set. Training set will contain 70% of images and test set - remaining 30%.


In [26]:
images_count = images.shape[0]
training_examples_count = int(0.7 * images_count)
test_examples_count = images_count - training_examples_count

(training_set_x, test_set_x) = np.split(images, [training_examples_count])
(training_set_y, test_set_y) = np.split(labels, [training_examples_count])

(110, 100, 100, 3)
(48, 100, 100, 3)
(110,)
(48,)


## 5. Save dataset
Save training and test set into `dataset.h5` file.

In [27]:
import h5py

f = h5py.File('dataset.h5', 'w')

f.create_dataset('training_set_x', data=training_set_x)
f.create_dataset('training_set_y', data=training_set_y)
f.create_dataset('test_set_x', data=test_set_x)
f.create_dataset('test_set_y', data=test_set_y)
f.close()