<a href="https://colab.research.google.com/github/emilyrlong/oddy-test/blob/main/Dissertation_0_Image_Loading.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dissertation Step 0: Image Loading

This Colab will load jpg images and export them as numpy arrays to see if we can speed up the training process for future models that identify corrosion from Oddy Tests. 

## **Step 1**: Install Tensorflow and Connect to Google Drive

In [None]:
# !pip install tensorflow
import tensorflow as tf
print(tf.__version__)

In [None]:
#!pip install tensorflow-gpu
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

In [None]:
# Connect colab to Google Drive
from google.colab import drive
drive.mount('/content/drive')

## **Step 2**: Import Packages

Let's now import the packages you will use in this assignment.

In [None]:
import matplotlib
import matplotlib.pyplot as plt

import os
import random
import zipfile
import io
import scipy.misc
import numpy as np
import pandas as pd
import time

import glob
import imageio
from six import BytesIO
from PIL import Image, ImageDraw, ImageFont
from IPython.display import display, Javascript
from IPython.display import Image as IPyImage

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass

# import tensorflow as tf
tf.get_logger().setLevel('ERROR')

## **Step 3**: Function Converting Image to Numpy Array

In [None]:
def load_image_into_numpy_array(path):
    """Load an image from file into a numpy array.
    Shape: (height, width, channels), where channels=3 for RGB.

    Args: path - a file path.
    Returns: uint8 numpy array with shape (img_height, img_width, 3)
    """
    
    img_data = tf.io.gfile.GFile(path, 'rb').read()
    image = Image.open(BytesIO(img_data))
    (im_width, im_height) = image.size
    
    return np.array(image.getdata()).reshape(
        (im_height, im_width, 3)).astype(np.uint8)

## **Step 4**: Function to Load Folder of Images

Get a set of images from the Google Drive folder and their file names. The images are quite large, so the step which converts them into numpy arrays will take a while.

In [None]:
# A FUNCTION FOR LOADING IMAGES
def load_image_set(image_dir, new_dir):
    """Load a folder of images, convert to numpy arrays, and save in new folder.
    Args: 
      image_dir - a path to folder of training, validation, or test images. 
      new_dir - a path to the folder for numpy arrays
    """
    # Get a list of the files in the image folder
    files = os.listdir(image_dir)
    # Iterate and load each image in the file
    for file in files:
        # define the path (string) for each image
        image_path = os.path.join(image_dir,file)
        print(image_path)
        # load images into numpy arrays
        train_img_np = load_image_into_numpy_array(image_path)
        # Assign a new path to save the image's numpy array file
        new_path = os.path.join(new_dir,file.replace('jpg','npy'))
        # Save the new file
        np.save(new_path, train_img_np)
    print('Done Loading and Saving!')

## **Step 5**: Loading training, validation, and test images

In [None]:
# print(time.perf_counter())
# path of the directory containing the images
image_dir = '/content/drive/MyDrive/Dissertation/non_met_images/non-met-unanimous'
# path of the new directory to hold the npy files
npy_dir = '/content/drive/MyDrive/Dissertation/non_met_images/unanimous_npy'
# Use the function load_image_set to load in the test set as a list of numpy arrays
load_image_set(image_dir, npy_dir)
# print(time.perf_counter()) 

# Old: ~3 mins for 5 images (6000 x 4000 images)
# New: 12 seconds for 5 images (1536 x 1024 images)
# Test Set: 191 images in 8m 34s
# Val Set: 191 images in 7m 52s
# Training Set (1/3): 510 images in 23m 38s
# Training Set (2/3): 510 images in 21m 11s
# Training Set (3/3): 508 images in 21m 23s
# Training Set (1/4) - 960 x 640: 510 images in 13m 46s
# Training Set (2/4) - 960 x 640: 510 images in 14m 34s
# Training Set (3/4) - 960 x 640: 508 images in 13m 53s
# Training Set (4/4) - 960 x 640: 180 images in 
# Test Set - Batch 3 - 960 x 640: 88 images in 2m 3s (191 images - 3m 47s)
# Val Set  - Batch 3 - 960 x 640: 30 images in 41s

## **Step 6:** Double checking the .npy files

In [None]:
npy_dir = '/content/drive/MyDrive/Dissertation/new_test_npy'
files = os.listdir(npy_dir)

In [None]:
len(files)

In [None]:
a = pd.DataFrame(files)

In [None]:
a.to_csv('/content/drive/MyDrive/Dissertation/labels/test_set.csv')

In [None]:
print(time.perf_counter())

npy_dir = '/content/drive/MyDrive/Dissertation/val_npy'
files = os.listdir(npy_dir)

images_np = []
for file in files:
  npy_path = os.path.join(npy_dir,file)
  print(npy_path)
  test_img = np.load(npy_path)
  images_np.append(test_img)

print(time.perf_counter()) 

# Less than a second!!
# Test Set: 1 second!

Double checking that the images look good:

In [None]:
# configure plot settings via rcParams
plt.rcParams['axes.grid'] = False
plt.rcParams['xtick.labelsize'] = False
plt.rcParams['ytick.labelsize'] = False
plt.rcParams['xtick.top'] = False
plt.rcParams['xtick.bottom'] = False
plt.rcParams['ytick.left'] = False
plt.rcParams['ytick.right'] = False
plt.rcParams['figure.figsize'] = [14, 7]
plt.rcParams['figure.figsize'] = [14, 7]

# plot images
for idx, image_np in enumerate(images_np[20:26]):
    plt.subplot(2, 3, idx+1)
    plt.imshow(image_np)

plt.show()
