<a href="https://colab.research.google.com/github/shiernee/AI_Tutorial/blob/main/AI_Workshop_Day1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Part 1: Google Colab Introductory Workshop**

The first part of the workshop will introduce important features in available in Colab. <br>

Colab is an interactive coding enviroment to get start to code easily. The code can be executed easily by click the *play* button on the cell. <br>

Let's get started.


 **Let's try to print *Hello World***


In [None]:
print('Hello World')

In [None]:
# It's your turn to try. 
# Print your name. 
# Type your code below and click the play button to execute the cell. 
# You should see you name appear.


In [None]:
#@title Solution
print('Richard Hunter')

**Perform Calculation**

In [None]:
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

In [None]:
# It's your turn to try. 
# Perform calculation of 45 divided by 20 and then multiplies by 8
# Type your code below and click the play button to execute the cell. 
# You should get an answer of 18.

In [None]:
#@title Solution
45/20*8

**Create a Loop**

In [None]:
# Creating a loop 
# Python always start from zero
# range(start, end, step)

for i in range(10):
  print(i)

In [None]:
# It's your turn to try. 
# Create a loop to print from 2 to 10 
# Type your code below and click the play button to execute the cell. 
# You should get an answer of 2,3,4,5,6,7,8,9,10.


In [None]:
#@title Solution
for i in range(2, 11):
  print(i)

In [None]:
# It's your turn to try. 
# Create a loop to print from 2 to 10, by skipping 2
# Type your code below and click the play button to execute the cell. 
# You should get an answer of 2,4,6,8,10


In [None]:
#@title Solution
for i in range(2, 11, 2):
  print(i)

**Create a list of values**

In [None]:
# Create a list of value from 0 to 9
x = []
for i in range(10):
  x.append(i)

print(x)

In [None]:
# Another compact way to create a list of value
x = [x for x in range(10)]
print(x) 

In [None]:
# It's your turn to try. 
# Create a list of value ranging from 2 to 10, by skipping 2
# Type your code below and click the play button to execute the cell. 
# Print out the list. You should get [2,4,6,8,10]


In [None]:
#@title Solution

x = [x for x in range(2, 11, 2)]
print(x) 

**Graph Visualization**

In [None]:
# We need to import packages
# numpy for creating array and matplotlib for plotting 

import numpy as np
from matplotlib import pyplot as plt

ys = 200 + np.random.randn(100)  # create random numbers with a length of 100
x = [x for x in range(len(ys))]
x = np.array(x)

plt.plot(x, ys, '-')
plt.title("Sample Visualization")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

In [None]:
# It's your turn to try. 
# Create a list of x-value ranging from 0 to 500, by skipping 2
# Create a list of y-value using the formula y = 5x^2 
# square --> **; multiple --> *, divide --> /, plus --> +, minus --> -
# Type your code below and click the play button to execute the cell. 


In [None]:
#@title Solution
import numpy as np
from matplotlib import pyplot as plt

ys = 200 + np.random.randn(100)  # create random numbers with a length of 100
x = [x for x in range(len(ys))]
x = np.array(x)

plt.plot(x, ys, '-')
plt.title("Sample Visualization")
plt.xlabel('x')
plt.ylabel('y')
plt.show()

**Uploading files from your local file system** <br>
files.upload returns a dictionary of the files which were uploaded. The dictionary is keyed by the file name and values are the data which were uploaded. <br>

Refresh the folder in the left panel and you will the dataset you have uploaded. 

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

#**Part 2: Classification of MNIST Dreams with Convolutional Neural Networks**

*   List item
*   List item



Let's go back to powerpoint to understand the concepts. 



Now, we are ready. Let's build a convolutional neural network (CNN) classifier to classify images of handwritten digits in the MNIST dataset with a twist where we test our classifier on high-resolution hand-written digits from outside the dataset.

## 1. Import Data

In [None]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', cache=False)
X = mnist.data.astype('float32') # image
y = mnist.target.astype('int64')  # label

In [None]:
# randomly view datasets
# re-execute the cell to view other datasets
import matplotlib.pyplot as plt
import numpy as np

index = np.random.randint(0,len(X))

plt.imshow(X[index].reshape([28, 28]))
plt.title(y[index])

# 2. Data Splitting


In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense

(train_images_load, train_labels_load), (test_images_load, test_labels_load) = keras.datasets.mnist.load_data()

# reshape images to specify that it's a single channel
train_images_load = train_images_load.reshape(train_images_load.shape[0], 28, 28, 1)
test_images_load = test_images_load.reshape(test_images_load.shape[0], 28, 28, 1)

# Takes only the first 6000 training data and 6000 testing data due to limited computational resources

no_train = 10000
no_test = 6000

train_images = train_images_load[:no_train]
train_labels = train_labels_load[:no_train]
test_images = test_images_load[:no_test]
test_labels = test_labels_load[:no_test]


### **Exercise**

Check if the number of samples in train and test image are 10,000 and 6,000 respectively

In [None]:
# type your code here


In [None]:
#@title Solution
print('train_image: ', len(train_images))
print('test_images: ', len(test_images))


We scale these values to a range of 0 to 1 before feeding to the neural network model. For this, we divide the values by 255. It's important that the *training set* and the *testing set* are preprocessed in the same way:

In [None]:
def preprocess_images(imgs): # should work for both a single image and multiple images
    sample_img = imgs if len(imgs.shape) == 2 else imgs[0]
    assert sample_img.shape in [(28, 28, 1), (28, 28)], sample_img.shape # make sure images are 28x28 and single-channel (grayscale)
    return imgs / 255.0

train_images = preprocess_images(train_images)
test_images = preprocess_images(test_images)

Display the first 5 images from the *training set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network.

In [None]:
plt.figure(figsize=(10,2))
for i in range(5):
    plt.subplot(1,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i].reshape(28, 28), cmap=plt.cm.binary)
    plt.xlabel(train_labels[i])

## 3. Build the model

Building the neural network requires configuring the layers of the model, then compiling the model. In many cases, this can be reduced to simply stacking together layers:

In [None]:
model = keras.Sequential()
# 32 convolution filters used each of size 3x3
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
# 64 convolution filters used each of size 3x3
model.add(Conv2D(64, (3, 3), activation='relu'))
# choose the best features via pooling
model.add(MaxPooling2D(pool_size=(2, 2)))
# randomly turn neurons on and off to improve convergence
model.add(Dropout(0.25))
# flatten since too many dimensions, we only want a classification output
model.add(Flatten())
# fully connected to get all relevant data
model.add(Dense(128, activation='relu'))
# one more dropout
model.add(Dropout(0.5))
# output a softmax to squash the matrix into output probabilities
model.add(Dense(10, activation='softmax'))

Before the model is ready for training, it needs a few more settings. These are added during the model's *compile* step:

* *Loss function* - measures how accurate the model is during training, we want to minimize this with the optimizer.
* *Optimizer* - how the model is updated based on the data it sees and its loss function.
* *Metrics* - used to monitor the training and testing steps. "accuracy" is the fraction of images that are correctly classified.

In [None]:
model.compile(optimizer= tf.optimizers.Adam(), 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

## 4. Train the model

Training the neural network model requires the following steps:

1. Feed the training data to the model—in this example, the `train_images` and `train_labels` arrays.
2. The model learns to associate images and labels.
3. We ask the model to make predictions about a test set—in this example, the `test_images` array. We verify that the predictions match the labels from the `test_labels` array. 

To start training,  call the `model.fit` method—the model is "fit" to the training data:

In [None]:
history = model.fit(train_images, train_labels, epochs=5)

As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 97.88% on the training data.

## 5. Evaluate accuracy

Next, compare how the model performs on the test dataset:

In [None]:
print(test_images.shape)
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

Often times, the accuracy on the test dataset is a little less than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*. 

## 6. Visualize Prediction Results

Your task is to take one of these images as input and predict the most likely digit contained in the image (along with a relative confidence in this prediction):

In [None]:
y_predict = model.predict(test_images)
y_predict = np.argmax(y_predict, axis=1)

Display the first 5 images from the *testing set* and display the predicted class name below each image. 

In [None]:
import random 

plt.figure(figsize=(10,2))
plt.title('Prediction')

randomlist = random.sample(range(0, len(test_images)), 5)

for i in range(5):
    plt.subplot(1,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(test_images[randomlist[i]].reshape(28, 28), cmap=plt.cm.binary)
    plt.xlabel(y_predict[randomlist[i]])

## Acknowledgements

The contents of the Part 2: Classification tutorial is inspired and based on Lex Friedman's [tutorial_deep_learning_basic.ipynb](https://colab.research.google.com/github/lexfridman/mit-deep-learning/blob/master/tutorial_deep_learning_basics/deep_learning_basics.ipynb#scrollTo=IysPmcOBHBE9) 