# Image Classification from scratch with TPUs on Cloud ML Engine using ResNet

This notebook demonstrates how to do image classification from scratch on a flowers dataset using TPUs and the resnet trainer.

A detailed Tutorial can be find under the following Link:

https://medium.com/p/c71b6eed78e0/edit

-----------------------------------------------------

### 0) Project Sttings

In the next Step we have to set some important Settings:

In [None]:
import os

GC_PROJECT =''             # REPLACE WITH YOUR GOOGLE PROJECT ID
BUCKET = 'product-classification'         # REPLACE WITH YOUR BUCKET NAME
REGION = 'us-central1'                    # REPLACE WITH YOUR BUCKET REGION e.g. us-central1
PROJECT = 'project_001_freisteller'       # REPLACE WITH YOUR <YOUR_PROJECT_NAME>
MODEL_VERSION = 'ResNet_v04'              # REPLACE WITH A MODEL NAME eg. ResNet_v01


# do not change these
os.environ['GC_PROJECT'] = GC_PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['PROJECT'] = PROJECT
os.environ['MODEL_VERSION'] = MODEL_VERSION
os.environ['TFVERSION'] = '1.14'


In [None]:
%%bash
gcloud config set project $GC_PROJECT
gcloud config set compute/region $REGION

---------------------------------------------------------

### 1) Mounting Bucket

The Notebook will need Access to the created Bucket in the GCS

this will be achived by mounting the bucket to the the Folder /home/jupyter/<YOUR_PROJECT_NAME> (the top Folder of the notebook) 

In [None]:
!fusermount -u /home/jupyter/$BUCKET

In [None]:
!/usr/bin/gcsfuse $BUCKET /home/jupyter/$BUCKET

----------------------------------------

### 2) Enable TPU service account

Allow Cloud ML Engine to access the TPU and bill to your project

In [None]:
%%bash
SVC_ACCOUNT=$(curl -H "Authorization: Bearer $(gcloud auth print-access-token)"  \
    https://ml.googleapis.com/v1/projects/${GC_PROJECT}:getConfig \
              | grep tpuServiceAccount | tr '"' ' ' | awk '{print $3}' )
              
echo "Enabling TPU service account $SVC_ACCOUNT to act as Cloud ML Service Agent"
gcloud projects add-iam-policy-binding $GC_PROJECT --member serviceAccount:$SVC_ACCOUNT --role roles/ml.serviceAgent



----------------------------------------------------------------------

### 3) Preprocessing

#### 3.1) Local Preprocessing

We will need an empty Folder for our local Test:

In [None]:
!rm -r ${PWD}/local_test/*

For our local test we copy the first 5 Lines to a file in our local Test Folder

Furthermore we will need the labels.txt file from the GC Storage

In [None]:
%%bash
gsutil cat gs://${BUCKET}/${PROJECT}/train_set.csv | head -5 > ${PWD}/local_test/input.csv
gsutil cat gs://${BUCKET}/${PROJECT}/labels.txt > ${PWD}/local_test/labels.txt

No we will start the local preprocessing, the code for this task is placed in the mymodel/trainer/preprocess.py

In [None]:
%%bash
export PYTHONPATH=${PYTHONPATH}:${PWD}/mymodel

python -m trainer.preprocess \
       --train_csv ${PWD}/local_test/input.csv \
       --validation_csv ${PWD}/local_test/input.csv \
       --labels_file ${PWD}/local_test/labels.txt \
       --project_id $PROJECT \
       --output_dir ${PWD}/local_test/out \
       --runner=DirectRunner

In [None]:
!ls -l ${PWD}/local_test/out

#### 3.2) Online Preprocessing

After we successfully created the small local Datasets, we will preprocess all our Data.

To speed this step up, we will run the Preprocessing with Googles dataflow

In [None]:
%%bash
export PYTHONPATH=${PYTHONPATH}:${PWD}/mymodel
    
gsutil -m rm -rf gs://${BUCKET}/resnet/data
python -m trainer.preprocess \
       --train_csv gs://${BUCKET}/${PROJECT}/train_set.csv \
       --validation_csv gs://${BUCKET}/${PROJECT}/eval_set.csv \
       --labels_file gs://${BUCKET}/${PROJECT}/labels.txt \
       --project_id $GC_PROJECT \
       --output_dir gs://${BUCKET}/${PROJECT}/resnet/data

You will see the progress under the follwoing link:

https://console.cloud.google.com/dataflow

After the Preprocessing is done, you will see the created Train and Validataion Datasets with the following command:

In [None]:
%%bash
gsutil ls gs://${BUCKET}/${PROJECT}/resnet/data

--------------------------------------------------------------------------------

### 4) Train on the Cloud

In [None]:
%%bash
echo -n "--num_train_images=$(gsutil cat gs://${BUCKET}/${PROJECT}/train_set.csv | wc -l)  "
echo -n "--num_eval_images=$(gsutil cat gs://${BUCKET}/${PROJECT}/eval_set.csv | wc -l)  "
echo "--num_label_classes=$(gsutil cat gs://${BUCKET}/${PROJECT}/labels.txt | wc -l)"

Insert the previous Line into the follwoing Block (watch the Comment!)

In [None]:
%%bash
TOPDIR=gs://${BUCKET}/${PROJECT}/resnet
OUTDIR=${TOPDIR}/trained
JOBNAME=imgclass_$(date -u +%y%m%d_%H%M%S)


echo $OUTDIR $REGION $JOBNAME
gsutil -m rm -rf $OUTDIR  # Comment out this line to continue training from the last time

gcloud ai-platform jobs submit training $JOBNAME \
 --region=$REGION \
 --module-name=trainer.resnet_main \
 --package-path=$(pwd)/mymodel/trainer \
 --job-dir=$OUTDIR \
 --staging-bucket=gs://$BUCKET \
 --scale-tier=BASIC_TPU \
 --runtime-version=$TFVERSION --python-version=3.5 \
 -- \
 --data_dir=${TOPDIR}/data \
 --model_dir=${OUTDIR}/ \
 --resnet_depth=18 \
 --train_batch_size=128 --eval_batch_size=64 --skip_host_call=True \
 --steps_per_eval=150 --train_steps=500 \
 --num_train_images=865 --num_eval_images=94 --num_label_classes=6 \      # Modify here !
 --export_dir=${OUTDIR}/export

Run this on your Cloud Console:

tensorboard --logdir=gs://product-classification/project_001_freisteller/resnet/trained/ --port=8000

------------------------------------------------------------------------------------

### 5) Deploying and predicting with model

To allow Classification in the production Mode, we have to deploy out trained model to the AI-Platform 



Choose your Option:

In [None]:
%%bash
# If you want to create a complete new Model

# gcloud ml-engine models create ${PROJECT} --regions $REGION
# gcloud ml-engine versions create ${MODEL_VERSION} --model ${PROJECT} --origin $(gsutil ls gs://${BUCKET}/${PROJECT}/resnet/trained/export/ | tail -1) --runtime-version=${TFVERSION} 

In [None]:
%%bash
# If you want to deploy an new Modell Version

gcloud ai-platform versions create ${MODEL_VERSION} --model=${PROJECT} --origin=$(gsutil ls gs://${BUCKET}/${PROJECT}/resnet/trained/export/ | tail -1) --runtime-version=${TFVERSION} 

In [None]:
%%bash
# If you want to create delet an Verion or an Modell

# %%bash
# gcloud ml-engine versions delete --quiet ${MODEL_VERSION} --model ${PROJECT}
# gcloud ml-engine models delete ${MODEL_NAME}

-------------------------------------------------------------------------

### 6) Test Modell

The following Functions are designed to do a quick test if your Model works:

(You can choose between the pictures you already downloaded into the Google Cloud Storage
and picture which are available in the Web on a given URL)

In [None]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import base64, sys, json
import tensorflow as tf
import ast
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from skimage import io
import requests


def plot_and_predict_from_GS(picture):
    with open('/home/jupyter/' + PROJECT + '/local_test/labels.txt', 'r') as f:
        lines = f.read().splitlines()
        
    
    with tf.gfile.GFile('gs://' + BUCKET + '/' + PROJECT + '/images/'+ picture, 'rb') as ifp:
        credentials = GoogleCredentials.get_application_default()
        api = discovery.build('ml', 'v1', credentials=credentials, discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')
        
        request_data = {'instances':
                        [
                            {"input": {"b64": base64.b64encode(ifp.read()).decode('utf-8')}}
                        ]}
        parent = 'projects/%s/models/%s/versions/%s' % (GC_PROJECT, PROJECT, MODEL_VERSION)
        response = api.projects().predict(body=request_data, name=parent).execute()


        img=mpimg.imread('/home/jupyter/' + BUCKET + '/' + PROJECT + '/images/' + picture)
        imgplot = plt.imshow(img)
        plt.show()
    
    
        print(lines[response['predictions'][0]['probabilities'].index(max(response['predictions'][0]['probabilities']))])

def plot_and_predict_from_Local(path):
    with open('/home/jupyter/' + PROJECT + '/local_test/labels.txt', 'r') as f:
        lines = f.read().splitlines()
    
    with open(path, 'rb') as ifp:
        credentials = GoogleCredentials.get_application_default()
        api = discovery.build('ml', 'v1', credentials=credentials, discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')

        request_data = {'instances':
                        [
                            {"input": {"b64": base64.b64encode(ifp.read()).decode('utf-8')}}
                        ]}
        parent = 'projects/%s/models/%s/versions/%s' % (GC_PROJECT, PROJECT, MODEL_VERSION)
        response = api.projects().predict(body=request_data, name=parent).execute()

        
        img=mpimg.imread(path)
        imgplot = plt.imshow(img)
        plt.show()
    
        print(lines[response['predictions'][0]['probabilities'].index(max(response['predictions'][0]['probabilities']))])
        print(response['predictions'][0]['probabilities'].index(max(response['predictions'][0]['probabilities'])))
        
def plot_and_predict_from_URL(url):
    with open('/home/jupyter/' + PROJECT + '/local_test/labels.txt', 'r') as f:
        lines = f.read().splitlines()
        
    response = requests.get(url)

    credentials = GoogleCredentials.get_application_default()
    api = discovery.build('ml', 'v1', credentials=credentials, discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json')
    
    request_data = {'instances':
                    [
                        {"input": {"b64": base64.b64encode(response.content).decode('utf-8')}}
                    ]}
    parent = 'projects/%s/models/%s/versions/%s' % (GC_PROJECT, PROJECT, MODEL_VERSION)
    response = api.projects().predict(body=request_data, name=parent).execute()


        
    image = io.imread(url)
    plt.imshow(image)
    plt.show()
    #print(response)
    print(lines[response['predictions'][0]['probabilities'].index(max(response['predictions'][0]['probabilities']))])

In [None]:
pictures = ['12812786.jpg', '13369544.jpg', '15452981.jpg', '19103367.jpg', '20087011.jpg']

for picture in pictures:
    plot_and_predict_from_GS(picture)

In [None]:
url_tchibo_0 = 'https://www.tchibo.de/newmedia/art_img/MAIN-CENSHARE/82e052bdd86c785b/metall-schubladenturm.jpg'
url_tchibo_1 = 'https://www.tchibo.de/newmedia/art_img/MAIN-CENSHARE/aaec39febb0dc1c7/max-winzer-federkern-eckschlafsofa-mit-stauraumbank.jpg'
url_cnouch = 'https://i.cnouch.de/i/otto/28333726/Jockenhoefer-Gruppe-Schlafsofa-inkl-Bettkasten-28333726.jpg?maxW=998&maxH=562'


plot_and_predict_from_URL(url_tchibo_0)