# MIS780 - Advanced Artificial Intelligence for Business

## Week 5 - Part 2: Dealing with real digital photos

In this session, you will practice classify real digital photos with CNN deep learning packages (Tensorflow and Keras).

To do:

*  [Task 1.1. Data Loading and Exploration](#cell_1.1)
*  [Task 1.2. Practise: CNN Training and Evaluation with the real digital photos](#cell_1.2)  

<a id = "cell_1.1"></a>
## Task 1.1. Data Loading and Exploration

In this task, we will download the images from CloudDeakin and upload them to your own `Google Drive` and then import them on your Google Colab Jupyter Notebook.

Those given color images are divided into two folders, `city` and `country`. Each folder has 200 color images.

First, we load some basic Python libraries.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import random
from tensorflow import keras
import tensorflow as tf

tf.config.list_physical_devices('GPU')

For this lab, we upload images in the two folders to Google Drive. We create a folder named as `dataset` in your Google Drive. Its target path should be "/My Drive/Colab Notebooks/dataset/". After that, we upload two folders,`city` and `country` under the `dataset`.

Now, we can import those images on Google Colab by below codes:

In [None]:
from google.colab import drive
#it will open a webpage for verifying your google account. if it is successful, the Google colab can link the Google drive
drive.mount('/content/drive')

# to show the folders under the dataset
!ls "/content/drive/My Drive/Colab Notebooks/dataset/"

We check the total number of files under the `City` folder and `Country`, respectively.

In [None]:
import os

# Set the paths to the folders containing the image files
city_path = '/content/drive/MyDrive/Colab Notebooks/dataset/City'
country_path = '/content/drive/MyDrive/Colab Notebooks/dataset/Country'

# get a list of all files in the folder
city_file_list = os.listdir(city_path)
country_file_list = os.listdir(country_path)

# print the total number of files
print(f'Total number of files under city folder are: {len(city_file_list)}')
print(f'Total number of files under country folder are: {len(country_file_list)}')

We read raw data under two folders and decode it into a tensor by `TensorFlow`. Each file is assigned a label based on which folder it is in. The images are also resized to a 100x100 resolution using the resize function. Image data and labels are stored in a list called `data`.

In [None]:
import os
import tensorflow as tf

# Create a list to store the image data and labels
data = []

# Iterate through the files in the first folder
for file in os.listdir(city_path):
  # Check if the file is a jpeg or jpg file
  if file.endswith('.jpeg') or file.endswith('.jpg'):
    # Load the image data from the file using TensorFlow
    img = tf.io.read_file(os.path.join(city_path, file))
    img = tf.image.decode_jpeg(img,channels=3)
    img = tf.image.resize(img, (50, 50))
    # Assign a label to the file
    label = 'City'
    # Add the image data and label to the data list
    data.append((img, label))

# Iterate through the files in the second folder
for file in os.listdir(country_path):
  # Check if the file is a jpeg or jpg file
  if file.endswith('.jpeg') or file.endswith('.jpg'):
    # Load the image data from the file using TensorFlow
    img = tf.io.read_file(os.path.join(country_path, file))
    img = tf.image.decode_jpeg(img,channels=3)
    img = tf.image.resize(img, (50, 50))
    # Assign a label to the file
    label = 'Country'
    # Add the image data and label to the data list
    data.append((img, label))

The data is shuffled and split into a training set and a test set using list slicing. The training set consists of the first 80% of the data, and the test set consists of the remaining 20%.

In [None]:
# Shuffle the data and split into train/test sets
random.shuffle(data)
train_data, test_data = data[:int(len(data) * 0.8)], data[int(len(data) * 0.8):]

We allocate X_train, X_test, Y_train, and Y_test and convert them into NumPy arrays for later CNN module training.

In [None]:

# Extract the image data and labels from the training data
X_train, Y_train = zip(*train_data)

# Extract the image data and labels from the testing data
X_test, Y_test = zip(*test_data)

# Convert the image data and labels into NumPy arrays
X_train = np.array(X_train)
Y_train = np.array(Y_train)
X_test = np.array(X_test)
Y_test = np.array(Y_test)

We normalize the input data, the X_train and X_test, respectively.

In [None]:
# change integers to 32-bit floating point numbers
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# normalize each value for each pixel for the entire vector for each input
X_train /= 255
X_test /= 255

# print the shape of the reshaped data
print("Training matrix shape", X_train.shape)
print("Testing matrix shape", X_test.shape)

We apply the unique integer mapping encoding for the two classes, `City` and `Country`.

In [None]:
print('The original format of class of the first element in the training dataset is: ',Y_train[0], '\n')

import numpy as np
# Create a NumPy array with category strings
categories = np.array(['City', 'Country'])

# Create a mapping from category strings to integers
category_map = {'City': 0, 'Country': 1}

# Encode the categories
Y_train = np.array([category_map[category] for category in Y_train])
Y_test = np.array([category_map[category] for category in Y_test])

print('The unique integer mapping encoding format of the calss of the first element in the training dataset is: ',Y_train[0])

We plot some color images from the training data.

In [None]:
# change the default figure size for all plots created in the program
plt.rcParams['figure.figsize'] = (9,9)

labels =  ['City', 'Country']

for i in range(25):
    # plt.subplot() function takes three integer arguments: the number of rows, the number of columns, and the index of the subplot.
    plt.subplot(5,5,i+1)
    # plt.imshow() function displays the image at index i in the X_train array as a grayscale image, with no interpolation applied.
    plt.imshow(X_train[i], interpolation='none')
    plt.title("{}".format(labels[int(Y_train[i])]))

plt.tight_layout()

<a id = "cell_1.2"></a>
## Task 1.2. Practise: CNN Training and Evaluation with the real digital photos

Apply the CNN model to classify the photos and evaluate the model.  You can refer workshop materials in the Part 1.

In [None]:
#Write your code from here