# Part 1: Train, evaluate, and test

In this part of the lab you will use Custom Vision Python SDK to train, evaluate, fine tune, and test the custom image classification model described in the introduction.

## Lab setup

Before proceeding with the lab you need to install Custom Vision Service SDK into your notebook's kernel.

### Install Custom Vision Service SDK

In [None]:
# Install Custom Vision Service SDK  in the current Jupyter kernel
import sys
!{sys.executable} -m pip install azure-cognitiveservices-vision-customvision

### Get the training and prediction keys

To invoke Custom Vision API you will need access keys.

To get the keys, navigate to the resource group you created during the lab setup and retrieve the keys for both training and prediction services. The keys can be grabbed from the overview page of each service.

In [None]:
from azure.cognitiveservices.vision.customvision.training import CustomVisionTrainingClient
from azure.cognitiveservices.vision.customvision.training.models import ImageUrlCreateEntry

ENDPOINT = "https://southcentralus.api.cognitive.microsoft.com"

# update keys
training_key = ''
prediction_key = ''

trainer = CustomVisionTrainingClient(training_key, endpoint=ENDPOINT)

### Create a Custom Vision Service project

A Custom Vision Service project is a container for the artifacts used during model development, including training data and training runs. You have to create a seperate project for each model you want to develop.

In [None]:
# Choose the name for your project
project_name = 'AerialClassifier'

# Check if the project with that name exists
project_id = None
for project in trainer.get_projects():
    if project.name == project_name:
        project_id = project.id
        print("Found existing project: {0}".format(project_id))
        break
# Create a new project        
if project_id == None:
    print("Creating a new project")
    project = trainer.create_project(project_name)
    project_id = project.id


## Prepare data
As described in the intro to the lab, you will train the model on ~500 images representing 3 types of land: `Barren`, `Cultivated`, and `Developed`.

The training images can be uploaded from the public Azure blob container.

### Get example images


In [None]:
%%sh
wget -nv https://azureailabs.blob.core.windows.net/aerialsamples/aerial.zip
unzip aerial.zip

### Add tags to your project
You need to add tags to your project before you can label and upload your training images.


In [None]:
# Create tags. Check for existing tags before creating new ones
tags = trainer.get_tags(project_id)
if len(tags) == 0:
    tags = [trainer.create_tag(project_id, tag) for tag in ['Barren', 'Developed', 'Cultivated']]


### Tag and upload images

The API used to upload images `create_images_from_files` uploads a batch of images at a time. The maximum supported batch size is 64. To simplify the upload process we created a utility function `upload_images` that manages batch creation and upload.

The input to the function is the list of Python tuples, where each tuple represents a single image and consists of the Tag ID (that refers to the tag describing the image) and the path to the image on your local file system. 

In [None]:
import os
from azure.cognitiveservices.vision.customvision.training.models import ImageFileCreateEntry, Region

# Define a utility function to upload a list of images
def upload_images(training_key, project_id, image_list, batch_size=64):    
    ENDPOINT = "https://southcentralus.api.cognitive.microsoft.com"
    trainer = CustomVisionTrainingClient(training_key, endpoint=ENDPOINT)
    print("Starting upload ...")
    image_batches = [image_list[start: start+batch_size] for start in range(0, len(image_list), batch_size)]
    for batch in image_batches:
        image_entry_batch = []
        for tag, pathname, file_name in batch:
            with open(pathname, mode='rb') as image_contents:
                image_entry_batch.append(ImageFileCreateEntry(name=file_name, contents=image_contents.read(), tag_ids=[tag]))
        summary = trainer.create_images_from_files(project_id, images=image_entry_batch)
    print("Done.")
    return summary


# Upload images
base_folder = 'aerial/train'
# Create a dictionary mapping tag names to tag ids
tag_map = {tag.name: tag.id for tag in tags}
# Create an input list to upload_images function
image_list = [(tag_map[folder], os.path.join(base_folder, folder, filename), filename)  for folder in ['Barren','Cultivated', 'Developed'] for filename in os.listdir(os.path.join(base_folder, folder))]
# Start the upload
summary = upload_images(training_key, project_id, image_list, batch_size = 64)

### Train the first iteration of the project

You will repeat the training a couple of times during the lab. To simplify the process we created a helper function that encapsulates training steps.

In [None]:
import time

def train(training_key, project_id):
    trainer = CustomVisionTrainingClient(training_key, endpoint=ENDPOINT)
    print("Starting training...")
    try:
        iteration = trainer.train_project(project_id)
        while (iteration.status != "Completed"):
            time.sleep(5)
            iteration = trainer.get_iteration(project_id, iteration.id)
            print ("Training status: " + iteration.status)      
        # The iteration is now trained. Make it the default project endpoint
        print("Training completed")
        trainer.update_iteration(project_id, iteration.id, is_default=True)
    except:
        print("No need to retrain. Retrieving default iteration")
        for iteration in trainer.get_iterations(project_id):
            if iteration.is_default:
                break

    return iteration.id

Every time you invoke training a new iteration is created. An iteration is a Custom Vision Service object that encapsulates training data, trained model, and performance measures for a  training run.

In [None]:
# Start training
iteration_id = train(training_key, project_id)

### Get iteration performance 

After the training run has completed, you can retrieve perfomance measures for the iteration. We defined a helper function `display_iteration_performance` that encapsulates the call to the service and formatting of the ouput.

In [None]:
def display_iteration_performance(training_key, project_id, iteration_id):
    trainer = CustomVisionTrainingClient(training_key, endpoint=ENDPOINT)
    performance = trainer.get_iteration_performance(project_id, iteration_id)
    print("Overall Precision: {0:<10}".format(performance.precision))
    print("Overall Recall:    {0:<10}".format(performance.recall))
    for tag_perf in performance.per_tag_performance:
        print("Tag: {0:<15} Precision: {1:<10}   Recall: {2:<10}".format(tag_perf.name, tag_perf.precision, tag_perf.recall))

In [None]:
display_iteration_performance(training_key, project_id, iteration_id)

### Improve your classifier

The quality of your classifier is dependent on the amount, quality, and variety of the labeled data you provide to it and how balanced the dataset is. A good classifier normally has a balanced training dataset that is representative of what will be submitted to the classifier. The process of building such a classifier is 
iterative. It's common to take a few rounds of training to reach expected results. As you track the performance of your model you may add more images of the underperforming class or augment your existing images by varying lighting, cropping, color saturation, etc.

In the next step you will add more images of  `Developed`  land plots and retrain the model to create the new iteration.

In [None]:
# Upload images
base_folder = 'aerial/train'
folder = 'Developed-SecondBatch'
image_list = [(tag_map['Developed'], os.path.join(base_folder, folder, filename), filename)  for filename in os.listdir(os.path.join(base_folder, folder))]

summary = upload_images(training_key, project_id, image_list, batch_size = 64)

Re-train the project.


In [None]:
# Start training
iteration_id = train(training_key, project_id)

In [None]:
display_iteration_performance(training_key, project_id, iteration_id)

## Test

Your model is ready. Each time you run training, Custom Vision Service automatically creates a REST API wrapper - prediction endpoint - around the model created by a training run. You can use it immediately after the run has completed.

### Download test images

In [None]:
%%sh
mkdir test_images
cd test_images
wget -nv https://github.com/jakazmie/AIDays/raw/master/DeveloperTrack/01-CustomVisionService/samples/barren-1.png
wget -nv https://github.com/jakazmie/AIDays/raw/master/DeveloperTrack/01-CustomVisionService/samples/cultivated-1.png
wget -nv https://github.com/jakazmie/AIDays/raw/master/DeveloperTrack/01-CustomVisionService/samples/developed-1.png

### Display test images

The images we will use for testing have been downloaded to the `test_imags` folder.

In [None]:
import os
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
%matplotlib inline

images_dir = 'test_images'
images = [os.path.join(images_dir, file) for file in os.listdir(images_dir)]

figsize=(10, 8)
fig, axis = plt.subplots(len(images)//3, 3, figsize=figsize)
fig.tight_layout()
for ax, image_path in zip(axis.flat[0:], images):
    image = Image.open(image_path)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    ax.imshow(image)

 ### Test with `curl`
 
 As noted, the prediction endpoint is a REST API that can be accessed using any tool capable of formatting REST requests, including a command line tool `curl`.

In [None]:
%env PROJECT_ID=$project_id
%env PREDICTION_KEY=$prediction_key

In [None]:
%%sh

curl -X POST https://southcentralus.api.cognitive.microsoft.com/customvision/v2.0/Prediction/$PROJECT_ID/image -H "Prediction-Key: $PREDICTION_KEY"  -H "Content-Type: application/octet-stream" --data-binary @test_images/developed-1.png

### Call the prediction endpoint using Python SDK

From Python, you can invoke the prediction endpoint using `urllib` or other library for working with HTTP. However, it is even easier to use Custom Vision Service Python SDK.

Python SDK wraps the prediction endpoint in the `prediction_endpoint` class. The class exposes the `predict_image` method that takes a Python File Object as parameter. The following code snippet defines a utility function `classify_image` that invokes the prediction endpoint and parses the results returned from the service.

In [None]:
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient

def classify_image(project_id, prediction_key, image_path):
    ENDPOINT = "https://southcentralus.api.cognitive.microsoft.com"
    predictor = CustomVisionPredictionClient(prediction_key, endpoint=ENDPOINT)
    with open(image_path, mode='rb') as image:
      result = predictor.predict_image(project_id, image)    
    probs = [prediction.probability for prediction in result.predictions]
    max_prob = max(probs)
    max_index = probs.index(max_prob)
    tag = result.predictions[max_index].tag_name

    return tag, max_prob

You will now invoke the prediction endpoint and display the results returned by the service.

In [None]:
figsize=(10, 8)
fig, axis = plt.subplots(len(images)//3, 3, figsize=figsize)
fig.tight_layout()
for ax, image_path in zip(axis.flat[0:], images):
    tag, prob = classify_image(project_id, prediction_key, image_path)
    ax.set_title(tag + ': ' + str(prob))
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    image = Image.open(image_path)
    ax.imshow(image)

## Summary

In this part of the lab you learned how to train, evaluate and improve your custom image classification model. In the second part of the lab, you will learn how to export and operationalize your trained model.

To proceed to Part II, open `export.ipynb` notebook.