# Convert image read from url in csv file to ImageNet Version 2
** This file intended to process and transform the filtered input csv file into a formal input for Machine Learning. **
- Apply different method to process image: **Scipy, Scikit-image, PIL**
    1. Get image from reading url from csv file
    2. Resize image
    3. Convert image to numpy array with shape: (1, IMSIZE, IMSIZE, 3)

### 1 - Packages
- [numpy](www.numpy.org) is the main package for scientific computing with Python.
- [matplotlib](http://matplotlib.org) is a library to plot graphs in Python.
- [h5py](http://www.h5py.org) is a common package to interact with a dataset that is stored on an H5 file.
- [cv2](http://opencv.org/) OpenCV is a library of programming functions mainly aimed at real-time computer vision. 

In [2]:
import csv
import numpy as np
import h5py
#import urllib
#import cv2

In [3]:
# METHOD #2: scikit-image
import scipy
from PIL import Image
from skimage import io
from scipy import ndimage
import matplotlib.pyplot as plt

In [None]:
urls = [
   'https://tookan.s3.amazonaws.com/task_images/G0gC1486365254968-TOOKAN06022017061412.jpg',
   'https://tookan.s3.amazonaws.com/task_images/rQZD1486365263088-TOOKAN06022017061420.jpg',
   'https://tookan.s3.amazonaws.com/task_images/B7JB1486365339438-TOOKAN06022017061537.jpg',
]


In [None]:
# loop over the image URLs
for url in urls:
    # download the image using scikit-image
    print "downloading %s" % (url)
    image = io.imread(url)
    
    plt.imshow(image)
    plt.show()
    
    #image = np.array(ndimage.imread(fname, flatten=False))
    #my_image = scipy.misc.imresize(image, size=(64,64)).reshape((1, 64*64*3)).T
    my_image = scipy.misc.imresize(image, size=(128,128))
    plt.imshow(my_image)
    plt.show()

## Convert Function

In [4]:
import timeit

img_width = 64
img_height = 64

def file_to_imageArray(filename, end_idx):
    start_time = timeit.default_timer()
    
    with open('../images/' + str(filename), 'rU') as f:
        readCSV = csv.reader(f, delimiter=',')
        interestingrows = [row for idx, row in enumerate(readCSV) if idx in list(range(0,end_idx))]
    
        images_array = [] # store final result
        y = []            # store label result

        i = 0
        print("Start Processing")
        
        for row in interestingrows: #for row in readCSV:
            label = row[2]		# class label 0 or 1
            imageURL = row[1]	# image url
            i = i+1
            image = io.imread(imageURL)  # read image from url
            # resize and reshape to: (1, image_height, image_width, image_depth)
            img_array = scipy.misc.imresize(image, size=(img_width,img_height)).reshape((1, img_width, img_height, 3 ))

            if i % 50 == 0:
                print("Processed the "+str(i)+"th image.")

            # Add label to list
            y.append(label)

            # Add img_array to result by Concatenating image_array to images_array
            if len(images_array) == 0:
                images_array = img_array
            else:
                images_array = np.concatenate([images_array, img_array])
            
    elapsed = timeit.default_timer() - start_time
    print("Complete processing:")
    print(str(elapsed/(60*60)) + "hr")
    return [images_array, np.array(y)]

#### Convert 50 Outside Font  image to imageNets
**imageNet_array: Concatenated result of 50 images **
- 50 data with label 0 (normal data)
- 50 data with label 1 (outlier data)


In [5]:
outside_front_50 = file_to_imageArray('outside_front_50.csv', 50)

Start Processing
Processed the 50th image.
Complete processing:
0.0443932836586hr


In [6]:
outside_front_X = outside_front_50[0]
outside_front_Y = outside_front_50[1]
print("X shape:" + str(outside_front_X.shape))
print("Y shape:" + str(outside_front_Y.shape))

X shape:(50, 64, 64, 3)
Y shape(50,)


#### Save to h5py file under group "outside_front_50"

In [23]:
f = h5py.File('data.h5','w')
#f = h5py.File('data.h5','r+')
group=f.create_group('outside_front_50')
group.create_dataset('X', data = outside_front_X)    # could add ‘compression="gzip", compression_opts=9’ to compress
group.create_dataset('Y', data = outside_front_Y)
f.close()

In [24]:
f = h5py.File('data.h5','r')
group = f['outside_front_50']
X = group['X'][:]
Y = group['Y'][:]
f.close()

In [10]:
print(X.shape, Y.shape)

((50, 64, 64, 3), (50,))


In [12]:
outside_front_12330 = file_to_imageArray('outside_front.csv', 12330)

Start Processing
Processed the 50th image.
Processed the 100th image.
Processed the 150th image.
Processed the 200th image.
Processed the 250th image.
Processed the 300th image.
Processed the 350th image.
Processed the 400th image.
Processed the 450th image.
Processed the 500th image.
Processed the 550th image.
Processed the 600th image.
Processed the 650th image.
Processed the 700th image.
Processed the 750th image.
Processed the 800th image.
Processed the 850th image.
Processed the 900th image.
Processed the 950th image.
Processed the 1000th image.
Processed the 1050th image.
Processed the 1100th image.
Processed the 1150th image.
Processed the 1200th image.
Processed the 1250th image.
Processed the 1300th image.
Processed the 1350th image.
Processed the 1400th image.
Processed the 1450th image.
Processed the 1500th image.
Processed the 1550th image.
Processed the 1600th image.
Processed the 1650th image.
Processed the 1700th image.
Processed the 1750th image.
Processed the 1800th im

In [13]:
outside_front_12330_X = outside_front_12330[0]
outside_front_12330_Y = outside_front_12330[1]
print("X shape:" + str(outside_front_12330_X.shape))
print("Y shape:" + str(outside_front_12330_Y.shape))

X shape:(12330, 64, 64, 3)
Y shape(12330,)


In [25]:
#f = h5py.File('data.h5','w')
f = h5py.File('data.h5','r+')
group=f.create_group('outside_front_12330')
group.create_dataset('X', data = outside_front_12330_X)    # could add ‘compression="gzip", compression_opts=9’ to compress
group.create_dataset('Y', data = outside_front_12330_Y)
f.close()

In [28]:
f = h5py.File('data.h5','r')
group = f['outside_front_12330']
X = group['X'][:]
Y = group['Y'][:]
f.close()

In [29]:
print(X.shape, Y.shape)

((12330, 64, 64, 3), (12330,))
