# Introduction

This sample notebook takes you through an end-to-end workflow to demonstrate the functionality of SageMaker Ground Truth. We'll leverage SageMaker core functionality along with Ground Truth to train and deploy a basic facial recognition model. In our Ground Truth labelling job, we'll present our labelers with a pair of photos and ask them to identify if the faces in photos are of the same or different person. We'll then train a siamese network model that given two face photos as inputs, will be able to tell us if these are of the same person.

### Datasets Used
Rather than splitting a single set of facial images, we'll use two completely distinct facial image datasets for training and evaluating the model
- Model Training: [AT&T Database of Faces](https://www.kaggle.com/kasikrit/att-database-of-faces)
- Model Evaluation: [Yale Face Database](https://www.kaggle.com/olgabelitskaya/yale-face-database)



In [None]:
import boto3
from sagemaker import get_execution_role
import sagemaker
import os
from glob import glob
import random
import numpy as np
from PIL import Image
import json
import matplotlib.pyplot as plt
from sagemaker.tensorflow import TensorFlow

%matplotlib inline

In [None]:
# We'll use the default role for data access and job execution
role = get_execution_role()

# Manage interactions with the Amazon SageMaker APIs and any other AWS services needed.
sess = sagemaker.Session()
sm_client = sess.boto_session.client('sagemaker')

# uses a default bucket created by sagemaker
bucket = sess.default_bucket()

# Region of our account
region = sess.boto_region_name

# name of our labeling job
labeling_job_name = "face-labeling"

# path to where we'll copy all of the data
s3_root_path = os.path.join("s3://", bucket, "ground_truth_lab")

# path to data that we'll use in our Ground Truth Labeling Job
job_data_path = os.path.join(s3_root_path, "face_labeling_job_images")

# path to configuration files we'd need to setup our labeling job via SDK
labeling_job_config_path = os.path.join(s3_root_path, "gt_config")

# path to data that's already been labeled
labeled_data_path = os.path.join(s3_root_path, "labeled")

# Ground Truth lambda ARNs - needed to setup job through SDK
pre_annotation_lambdas = {"us-east-1": "arn:aws:lambda:us-east-1:432418664414:function:PRE-ImageMultiClass", 
                  "us-east-2": "arn:aws:lambda:us-east-2:266458841044:function:PRE-ImageMultiClass"}

consolidation_lambdas = {"us-east-1":"arn:aws:lambda:us-east-1:432418664414:function:ACS-ImageMultiClass", 
                 "us-east-2": "arn:aws:lambda:us-east-2:266458841044:function:ACS-ImageMultiClass"}


In [None]:
# First we create a manifest files which contains the s3 paths to the images we wish to annotate
with open("gt_config/input.manifest", "w") as f:
    images = glob("face_labeling_job_images/*.png")
    for image in images:
        s3_ref = {"source-ref":os.path.join(s3_root_path, image)}
        f.write(f"{json.dumps(s3_ref)}\n")

In [None]:
# next we upload the image data to S3
!aws s3 cp labeled {labeled_data_path} --recursive --quiet
!aws s3 cp face_labeling_job_images {job_data_path} --recursive --quiet
!aws s3 cp gt_config {labeling_job_config_path} --recursive --quiet

# Setup a Ground Truth Labeling Job
#### We'll use a private team to avoid any charges and get a better feel for the labeling user experience. First we need to setup a private team. Please follow the steps below:
1. Find the SageMaker Service in the AWS Management Console 
<img src="notebook_images/LJ1.JPG">

2. In the SageMaker console under Ground Truth, click **Labeling workforces**
<img src="notebook_images/LT1.JPG">

3. Click **Private**
4. Click **Create private team**
5. Name your team **test-team** and provide your email address for both worker and contact. Fill out the rest as per below
<img src="notebook_images/LT4.PNG">

In [None]:
workteams = sm_client.list_workteams()["Workteams"]
team_arn = [wt["WorkteamArn"] for wt in workteams if wt["WorkteamName"] == "test-team"][0]

In [None]:
resp = sm_client.create_labeling_job(
    LabelingJobName=f"{labeling_job_name}",
    LabelAttributeName=f"{labeling_job_name}",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": f"{labeling_job_config_path}/input.manifest"
            }
        },
    },
    OutputConfig={
        "S3OutputPath": f"{labeled_data_path}",
    },
    RoleArn=f"{role}",
    LabelCategoryConfigS3Uri=f"{labeling_job_config_path}/label_config.json",
    StoppingConditions={
        "MaxHumanLabeledObjectCount": 200,
        "MaxPercentageOfInputDatasetLabeled": 100
    },

    HumanTaskConfig={
        "WorkteamArn": f"{team_arn}",
        "UiConfig": {
            "UiTemplateS3Uri": f"{labeling_job_config_path}/template.liquid"
        },
        "PreHumanTaskLambdaArn": f"{pre_annotation_lambdas[region]}",
        "TaskTitle": "ground truth lab",
        "TaskDescription": "facial recognition",
        "NumberOfHumanWorkersPerDataObject": 1,
        "TaskTimeLimitInSeconds": 240,
        "TaskAvailabilityLifetimeInSeconds": 240,
        "MaxConcurrentTaskCount": 200,
        "AnnotationConsolidationConfig": {
            "AnnotationConsolidationLambdaArn": f"{consolidation_lambdas[region]}"
        },

    },
)

# Model Training
Now that we have a nice labeled dataset to work with. We can begin training our model

We'll train a model emloying a Siamese network architecture where we pass in two images as inputs and the model attempts to minimize the distance based loss function to bring similar images together and push dissimilar images appart

<img src="notebook_images/siamese_network.jpg" width=600>

[Dimensionality Reduction by Learning an Invariant Mapping](http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf)

In [None]:
# We'll build our model with Tensorflow. SageMaker has a managed container that can be used to run managed training jobs.
# To run a training job with Sagemaker, you just need to provide a script that trains the model and saves a the model artifact to a specified directory
# Let's take a look at the script
!pygmentize -l python src/training.py

In [None]:
# set up the training job
model = TensorFlow(entry_point='src/training.py',
                             role=role,
                             train_instance_count=1,
                             train_instance_type='local_gpu',
                             framework_version='2.1.0',
                             py_version='py3',
                             hyperparameters = {"epochs": 20, "steps_per_epoch":32}
                      )

In [None]:
# fit model on the labeled data - this will copy all of the data in the labeled data path onto the training instance
model.fit(inputs=labeled_data_path)

In [None]:
# this will deploy the model as a REST endpoint using TF Serving
predictor = model.deploy(initial_instance_count=1, instance_type='local')

In [None]:
def prep_image(path):
    "prep image for inference"
    im = Image.open(path)
    im = np.array(im)
    im = im[:, (im != 255).sum(axis=0) > 50]
    im = np.array(Image.fromarray(im).resize((92,112)))
    im = im / 255
    return im[None,:,:,None]

In [None]:
def show_output(f1_sub, f1_expr, f2_sub, f2_expr, thresh=0.5):
    
    "Show images along with the dissimilarity score"
    
    path1 = f"test_images/{f1_sub}.{f1_expr}"
    path2 = f"test_images/{f2_sub}.{f2_expr}"
    
    im1 = prep_image(path1)
    im2 = prep_image(path2)
    
    # prepare input for TF Serving Endpoint
    inputs = {
      'instances': [
        {"input_top":im1.tolist(),
        "input_bottom":im2.tolist()},
      ]
    }
    
    prediction = predictor.predict(inputs)['predictions'][0][0]
    
    img1 = Image.open(path1)
    img2 = Image.open(path2)
    
    fig, axes = plt.subplots(1, 2, figsize=(10,7))
    axes[0].imshow(img1, cmap='gray')
    axes[1].imshow(img2, cmap='gray')
    
    for ax in axes:
        ax.grid(False)
        ax.set_xticks([])
        ax.set_yticks([])
    
    same_dif = "same" if prediction < thresh else "different"
    
    fig.suptitle(f"Dissimilarity Score = {prediction:.3f}\n Likely {same_dif} person", y=0.85, size=20)
    plt.tight_layout()

In [None]:
expressions = ['surprised', 'sleepy', 'glasses', 'normal', 
               'sad', 'wink', 'centerlight', 'happy', 'noglasses']

subjects = [f"subject{x:02d}" for x in range(1,16)]

In [None]:
show_output(f1_sub = random.choice(subjects), f1_expr = random.choice(expressions),
           f2_sub = random.choice(subjects), f2_expr = random.choice(expressions))

In [None]:
show_output(f1_sub = subjects[14], f1_expr = expressions[2],
           f2_sub = subjects[8], f2_expr = expressions[4])

In [None]:
#delete the endpoint 
predictor.delete_endpoint()