# <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> APCOMP 295 Advanced Practical Data Science
## Exercise 4: Deep Learning workflow using Transfer Learning and Docker Containers



**Harvard University**<br/>
**Fall 2020**<br/>
**Instructors**: Pavlos Protopapas


<hr style="height:2pt">

**Each assignment is graded out of 5 points.  The topic for this assignment is Transfer Learning and using Docker containers for your Deep Learning workflow.**

**Due:** 10/13/2020 10:15 AM EDT

**Submit:** We won't be re running your notebooks, please ensure output is visible in the notebook.

#### Exercise Flow:

Module 1 introduced you to various tools outside of a jupyter notebooks. As you move into Module 2 you will explore using Colab notebooks alongside docker containers. As a Data Scientist you should be comfortable working with multiple tools and integrating them into your workflow.

<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-wf-3.png" alt="Excercise Workflow" width="900"/>

- **1:** Use a pre built docker container that can scrape and download from Google image search (optional).
- **2:** The images are zipped and uploaded to a Google Storage Bucket in GCP. 
- **3:** Use a Colab notebook to access this data and build image classification models using transfer. learning. Once training is complete, we pick the best model and save it to the Google Storage Bucket.  
- **4:** Finally we want to share our awesome model with our colleagues. For this we will use flask to expose our model as an API.  


#### Learning Objectives

In this exercise you will cover the following topics:  
- Docker containers
- Building data input pipelines using `tf.data`
- Build image classification model without Transfer Learning
- Build image classification models with Transfer Learning from `keras.application` and `tf.hub`
- Serve a trained model using a Flask App


---

## Question 1 : Setup your Project (1.0 Point)

Unzip exercise4.zip and ensure your directory structure looks like this:

In this step we will be creating a Google Storage Bucket so that our two docker containers and Colab notebook can all have a common place to store images & models.

**Steps:**  
- In your [GCP Console](https://console.cloud.google.com/home/dashboard), select "Storage" from the top-right menu and create a bucket called **ac295-exercise4-< your initials >**. Bucket names need to be globally unique, so pick one that is avaiable.
- Select "IAM & Admins" > "Service accounts" from the top-right menu and create a new service account called "storage-service-account". For "Service account permissions" select "Cloud Storage" > "Storage Admin". Then click done.
- This will create a service account similar to storage-service-account@ac295-data-science-xxxxxx.iam.gserviceaccount.com
- On the right "Actions" column click the vertical ... and select "Create key". A prompt for Create private key for "storage-service-account" will appear select "JSON" and click create. This will download a Private key json file to your computer. Copy this json file into a **secrets** folder under exercise4. Your folder structure after these steps should look like this: 

- Open the [Colab Notebook](https://colab.research.google.com/drive/1k7UWPHACfSfZYk2FUdHspGx-TM0qK4KW?usp=sharing)
- In the notebook select File > Save a copy in Drive
- Now you have your own version of the colab notebook
- Enable GPU for your notebook: In your Colab notebook select Runtime > Change runtime type
- Select GPU for Hardware accelerator and click Save 

#### a) Submit a screenshot of the results from "Verify Setup" section

This should show the TensorFlow, Keras version and if you have GPU enabled. Sample submission: 

<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q1-a.png" width="900"/>

##### Submission: 

*your screen shot here*

#### b) Submit a screenshot of the output from the cell "!nvidia-smi" to see what GPU was allocated

Sample submission: 

<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q1-b.png" width="600"/>

##### Submission: 

![img1](img/1.png)



#### c) Copy your output from "Check Access to Bucket"

- Run the cell to "Authenticate" to Google. Make sure to use the same email id that you signed up for GCP
- The cell will prompt your to login to gmail, then copy the generated key into the verification box and hit enter
- Run the cell "Check Bucket Access" and submit a screenshot

Sample submission: 

<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q1-c.png" width="900"/>

##### Submission: 

![img2](img/2.png)

## Question 2 : Bulk Image Downloader (Optional)

**This is an optional step that you can do if you want to prepare your own dataset or learn how to do it using a container. If you choose to skip this step we have prepared some data for you and you can use that for your transfer learning. **  

In this task you will be downloading some images from Google Images. As Data Scientists we are "smart" so we do not want to manually download images. So we will use a Docker container `image-downloader` which comes with all the pre-install image scrapping software.

**Steps:**  
- Open a terminal prompt at your **exercise4** folder
- Set the environment variable to your secrets path: ```export SECRETS_PATH=$(pwd)/secrets/```
- cd into folder `image-downloader` and run the following commands to build & run the image downloader container:
- ```docker build -t image-downloader -f Dockerfile .```
- ```docker run --rm --name image-downloader -ti -v "$(pwd)/:/app/" -v "$SECRETS_PATH:/secrets/" -e GOOGLE_APPLICATION_CREDENTIALS=/secrets/service-account-key.json image-downloader```

#### a) Check if the Container has access to write to your GCP Bucket

- If you followed along you should be inside a bash shell inside the `image-downloader` container
- Run the cli(Command line interface) to test access to the bucket with ```python -m cli --opp test_bucket_access --bucket ac295-exercise4-xx --projectid ac295-data-science-xxxxxx```
- Submit a screenshot of your terminal (just from the previous command)

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q2-a.png" width="900"/>

##### Submission: 

*Optional: your screen shot here*

#### b) Download some images using the  image-downloader container

- Downloading thousands of images will take time, so we will just test to see if your container is working
- Run the cli to download some images from Google image search ```python -m cli --opp download_images --bucket ac295-exercise4-xx --projectid ac295-data-science-xxxxx --labels "tomato,bell pepper" --num 10```. You can try different labels or number of images to download
- There will be a **datasets** folder created inside `image-downloader` folder, take a screenshot of the folder displayaing a few images

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q2-b.png" width="400"/>

##### Submission: 

*Optional: your screen shot here*

#### c) Upload your dataset to your GCP Bucket

- In this step you will zip the dataset folder created in the previous step and upload it to your GCP Bucket
- Run the cli to upload the data to your GCP bucket ```python -m cli --opp upload_to_bucket --bucket ac295-exercise4-xx --projectid ac295-data-science-xxxxx```
- Go to your bucket in GCP and verify that you see your dataset.zip file there. Submit a screenshot of all the files in your bucket currently

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q2-c.png" width="900"/>

##### Submission: 

*Optional: your screen shot here*

## Question 3 : Model Serving App (1.0)

In this step you will setup a simple model serving app using Flask.

**Steps:**  

- Open a terminal prompt at your **exercise4** folder
- Set the environment variable to your secerets path: ```export SECRETS_PATH=$(pwd)/secrets/```
- cd into folder `model-server` and run the following commands to build & run the model server container:
- ```docker build -t model-server -f Dockerfile .```
- ```docker run --rm --name model-server -ti -v "$(pwd)/:/app/" -v "$SECRETS_PATH:/secrets/" -e GOOGLE_APPLICATION_CREDENTIALS=/secrets/service-account-key.json -p 8081:8081 model-server```

#### a) Check if the Container has access to read from your GCP Bucket

- If you followed along you should be inside a bash shell inside the `model-server` container
- In this container we will set the GCP project id and bucket as environment variables. It is common practice to pass values to different components in an application using environment variables. Run the following commands and remember to replace with your GCP project id and bucket name
```
export GCP_PROJECT_ID=ac295-data-science-289118
export GCP_BUCKET=ac295-exercise4-sw
export FLASK_APP=service.py
```
- Run the cli(Command line interface) to test access to the bucket with ```python -m cli --opp test_bucket_access```
- Submit a screenshot of your terminal (just from the previous command)

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q3-a.png" width="900"/>

##### Submission: 

![img3](img/3.png)

#### b) Run Flask Server

- Now you will run Flask as a model Server
- Run ```flask run --host=0.0.0.0 --port=8081```
- To ensure flask is running go to http://localhost:8081/ and make sure you a valid response. 
- Next go to http://localhost:8081/model_status and check the response.  
- Submit a screenshot from the previous step

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q3-c.png" width="500"/>

##### Submission: 

![img4](img/4.png)

## Question 4 : Transfer Learning (3.0 Points)

**Steps:**  
- If you had followed along Question 1, you should have a copy of your Colab notebook for Transfer Learning
- In this section it is optional to use your own **dataset.zip** file you uploaded to your GCP Bucket or use the dataset we prepared for you.


### a) Download & Prepare Data

- In the **Download & Prepare Data** section of the Colab notebook, execute the sections for Download Data, Explore Data, and Build Data Pipelines
- How much RAM would the dataset require if we loaded it all into numpy arrays?
- If we were to use numpy arrays to load all this data and feed to your CNN model would this work?
- How does loading data with `tf.data` help us in this problem?

##### Submission: 

- The Zipped dataset is 620MB. However, when expanded, it takes 657 MB, so it would require this much RAM to load it into memory
- If we were to use numpy arrays to load this into our our CNN, we would just create a n*224*224*3 vector where N is the number of images. Generally if the pixel values are on a range between 0 and 255, we will divide all values by 255 to gives numbers in the 0-1 range that CNNs expect
- Tf.data allows us to stream the data to the model so that it doesn't all needed to be loaded in memory. IT also allows for easy batching.

### b) Build model without Transfer Learning

- In the **Build model without Transfer Learning** section, we have some instructions on how to stack up some basic Convolution layers to build a CNN from scratch. Build the model and ensure you give it a unique `model_name` since we need to identify your best model later on.
- Feel free to change the model architecture or any model parameters to get better results.
- Run this model for ONLY 5 epochs
- Submit a screenshot of your training history (The output from the method `evaluate_save_model(...)`)

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-a.png" width="700"/>

##### Submission: 

![img5](img/5.png)

### c) Transfer Learning using keras.application

- In the **Transfer Learning using keras.application** section, we have some instructions on how to build a model using transfer learning from keras.application. Build the model and ensure you give it a unique `model_name` since we need to identify your best model later on.
- Feel free to pick any model base or any model parameters to get better results.
- Run this model for ONLY 5 epochs
- Submit a screenshot of your training history (The output from the method `evaluate_save_model(...)`)

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-b.png" width="700"/>

##### Submission: 
![6](img/6.png)

### d) Transfer Learning using TensorFlow Hub

- In the **Transfer Learning using TensorFlow Hub** section, we already have a model using transfer learning from TensorFlow Hub using the MobileNet architecture and logic to train it.
- Pick any other image classification model as your transfer learning model base.
- Run this model for ONLY 5 epochs
- Submit a screenshot of your training history (The output from the method `evaluate_save_model(...)`)

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-c.png" width="700"/>

##### Submission: 

![7](img/7.png)

### e) Compare all Models

- In the **Select best Model** section, execute the cell for **Compare all Models**
- Submit a screenshot of your results

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-d.png" width="900"/>

##### Submission: 

![8](img/8.png)

### f) Save your best model

- In the **Select best Model** section, execute the cell for **Save Best Model**
- Go to your bucket in GCP and verify that you see a **best_model** folder in there. 
- Submit a screenshot of all the files in your bucket currently

Sample submission: 
<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-f.png" width="900"/>

##### Submission: 

![9](img/9.png)

### g) View predictions using Model Server

- You trained your model using Colab and saved the best model into your GCP Bucket, but your `model-server` container is running locally. To make sure the best model is available from your GCP bucket do the following steps:   
- **If you used your own dataset for training you will need to place some test images inside the `test_images` folder in the `model-server` container.**  
- Go to http://localhost:8081/model_status and check the response now, you should see a your best model being currenlty served.

<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-g1.png" width="500"/>

- Next make some prediction using test images
- Go to http://localhost:8081/predict?file=test1.jpg and submit a screenshot of your results
- Go to http://localhost:8081/predict?file=test2.jpg and submit a screenshot of your results
- Go to http://localhost:8081/predict?file=test5.jpg and submit a screenshot of your results

Sample submissions: 
<table><tr>
<td><img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-g2.png" width="300"/></td>
<td><img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-g3.png" width="300"/></td>
<td><img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-g4.png" width="300"/></td>
</tr></table>

##### Submission: 

![10](img/10.png)
![11](img/11.png)
![12](img/12.png)
![13](img/13.png)

#### h) Re-run your training with more Epochs  

Pick an epoch of 10 or higher and re-run your model training for:  
- Build model without Transfer Learning
- Transfer Learning using keras.application
- Transfer Learning using TensorFlow Hub
- Compare all Models
- Save your best model
- Go to http://localhost:8081/model_status and check the response now, you should see a a different best model being served.

<img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-h1.png" width="500"/>

- Next make some prediction using test images
- Go to http://localhost:8081/predict?file=test1.jpg and submit a screenshot of your results
- Go to http://localhost:8081/predict?file=test2.jpg and submit a screenshot of your results
- Go to http://localhost:8081/predict?file=test5.jpg and submit a screenshot of your results

Sample submissions: 
<table><tr>
<td><img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-h2.png" width="300"/></td>
<td><img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-h3.png" width="300"/></td>
<td><img src="https://storage.googleapis.com/public_colab_images/ac295/excercise4-q4-h4.png" width="300"/></td>
</tr></table>

##### Submission: 

![14](img/14.png)
![15](img/15.png)
![16](img/16.png)

### <font color=red> ** Delete your GCP Storage Bucket </font>