# Introduction

This sample notebook takes you through an end-to-end workflow to train model on Amazon Rekognition Custom Labels using data set generated by Amazon GroundTruth

In [94]:
import datetime
import tarfile
import boto3
import os
from sagemaker import get_execution_role
import sagemaker
from IPython.display import HTML, display, Image as IImage
from PIL import Image, ImageDraw, ImageFont, ExifTags, ImageColor
import io

## 1. Upload Images to S3

In [95]:
bucket_name = 'aws-workshops-labels-12345678' ## Update the value with the bucket name created earlier in the lab
region = boto3.Session().region_name    
s3_client = boto3.client('s3', region_name=region)

#### Skip the below step if you already have images uploaded to your S3 bucket

In [None]:
## Uploading Licensed Images for raw data
source_dir = '../images/raw-data/LicensedImages-CreativeCommons'
dest_dir = 'raw-data/images'
file_list = os.listdir(source_dir)
for file in file_list :   
    if file != '.ipynb_checkpoints':
        response = s3_client.upload_file(source_dir+'/'+file, bucket_name, dest_dir+"/"+file)
        print (file + ' uploaded')
print('Raw Data Upload Complete to '+bucket_name+'/'+dest_dir)

## Uploading Non-Licensed Images for raw data
source_dir = '../images/raw-data/LicenseNotNeeded_Images'
dest_dir = 'raw-data/images'
file_list = os.listdir(source_dir)
s3_client = boto3.client('s3', region_name=region)
for file in file_list : 
    if file != '.ipynb_checkpoints':
        response = s3_client.upload_file(source_dir+'/'+file, bucket_name, dest_dir+"/"+file)
        print (file + ' uploaded')
print('Raw Data Upload Complete to '+bucket_name+'/'+dest_dir)

## Uploading Test Data
source_dir = '../images/test-data'
dest_dir = 'test-data/images'
file_list = os.listdir(source_dir)
s3_client = boto3.client('s3', region_name=region)
for file in file_list : 
    response = s3_client.upload_file(source_dir+'/'+file, bucket_name, dest_dir+"/"+file)
    print (file + ' uploaded')
print('Test Data Upload Complete to '+bucket_name+'/'+dest_dir)


Note: If you get the following error

<img src="../lab-images/s3error.png">

Please make sure you have updated the correct **bucket_name** with your bucket

### Let's look at one of the images

In [None]:
imageName = "raw-data/images/800px-Woodpeckers-Telephone-Cable.jpg"
display(IImage(url=s3_client.generate_presigned_url('get_object', Params={'Bucket': bucket_name, 'Key': imageName})))

### 2. Copy Existing Annotation data to S3 Bucket. 

### <span style="color:red">Note: If you have completed labeling all the images on GroundTruth and want to use your own labeled data set for training Rekognition model, skip Step 2</span>

#### Update S3 bucket name in existing data

In [None]:
old_bucket_name = "aws-workshops-labels-1234567" ## DO NOT MODIFY. This value comes from existing manifest file. 
new_bucket_name = bucket_name
!echo "Occurences of old_bucket_name i.e. $old_bucket_name in original manifest file"
!grep -ir $old_bucket_name ../images/annotated-data/manifests/output/output.manifest | wc -l 
!sed -i.bak -e "s/$old_bucket_name/$new_bucket_name/g" ../images/annotated-data/manifests/output/output.manifest
!echo "Occurences of old_bucket_name i.e. $old_bucket_name in original manifest file"
!grep -ir '$old_bucket_name' ../images/annotated-data/manifests/output/output.manifest | wc -l 
!echo "Occurences of new_bucket_name i.e. $new_bucket_name in updated manifest file"
!grep -ir $new_bucket_name ../images/annotated-data/manifests/output/output.manifest | wc -l 

#### Upload annotation metadata to S3 bucket

In [None]:
OUTPUT = 's3://{}/{}'.format(bucket_name, 'annotated-data')

## Replace bucket name in manifest file with new bucket name

## Uploading annotation data to S3 bucket
!aws s3 cp ../images/annotated-data {OUTPUT} --recursive --quiet


## 2.  Create Project in Amazon Rekognition Custom Labels

### Create Project
- On the home page, click on **Use Custom Labels** 

**Note**: *Make sure you are in same **AWS region** as S3 bucket when creating Rekognition Custom Labels project*
- On the next page, click on **Get Started** button
- If you are creating using Custom Labels for the first time in this AWS Account, you need to allow service to create S3 bucket. 
- Specify the name of the project as **aws-workshops-rekognition-custom-labels**
- Click on **Create Project** button

<img src="../lab-images/19.png" width="800">
<img src="../lab-images/20.png" width="800">
<img src="../lab-images/36.png" width="800">
<img src="../lab-images/21.png" width="800">

## 3. Create Data Set
### (Follow Screenshots below)


- Click on **Create dataset** button
- Specify name for the dataset as **aws-workshops-gt-data**
- Select Option - **Import images labeled by Amazon SageMaker Ground Truth**
- Specify the location of the output manifest file generated by SageMaker Ground Truth Labeling Job - **s3://{bucket-name}/annotated-data/manifests/output/output.manifest**
- Copy and paste the bucket policy by **copying the generated bucket policy** and **then clicking on hyperlink** in the screenshot - **Paste the policy into the "Bucket Policy" section of ...**
- Return to **Rekognition Data set creation page** and Click on **submit** button

In [None]:
print ('S3 location of manifest file - \n' + 's3://{}/annotated-data/manifests/output/output.manifest'.format(bucket_name))


<img src="../lab-images/22.png" width="800">
<img src="../lab-images/23.png" width="800">
<img src="../lab-images/24.png" width="800">
<img src="../lab-images/25.png" width="800">
<img src="../lab-images/38.png" width="800">

#### [OPTIONAL STEP - Only needed if you want to do additional labeling or if you did not label the images previously]
#### Add Labels
- Click on **Start Labeling** button
- Click on **Add** button on the next page
- Type Label - **hole** and **no_hole** and save it by clicking on **Save** button
- Once done, click on **Exit** button to complete the labeling job

<img src="../lab-images/39.png" width="800">
<img src="../lab-images/startlabeling.png" width="800">
<img src="../lab-images/43.png" width="800">

### Train Model
- Once the dataset is created, click on **Train model** button to start training the automatically identified model

<img src="../lab-images/26.png" width="800">

### Specify dataset for training
- Select the **previously created** training dataset from the drop down in **Choose training dataset**. Example - **aws-workshops-gt-data**
- Select **Spit training dataset** to spit the data set into training and test data for model training and evaluation
- Click on **Train** button to start training model.

<img src="../lab-images/29.png" width="800">

### Training Status
**Note this will take ~1-2 hours depending upon a number of factors**

<img src="../lab-images/30.png" width="800">

### Evaluate Model and Testing Results
- Once the training is completed, click on training model link
- On the next page, you will see the model evaluation details, Look at the various values for **F1 score, Precision, Recall**. These values depend on the training data.
- Click on **View test results** button to check the results on test data

<img src="../lab-images/31.png" width="800">
<img src="../lab-images/32.png" width="800">
<img src="../lab-images/33.png" width="800">

## How to evaluate trained model
Reference Docs - https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/tr-metrics-use.html

**Precision** - Precision is the fraction of correct predictions (true positives) over all model predictions (true and false positives) at the assumed threshold for an individual label. As the threshold is increased, the model might make fewer predictions. In general, however, it will have a higher ratio of true positives over false positives compared to a lower threshold. Possible values for precision range from 0–1, and higher values indicate higher precision.

**Recall** - Recall is the fraction of your test set labels that were predicted correctly above the assumed threshold. It is a measure of how often the model can predict a custom label correctly when it's actually present in the images of your test set. The range for recall is 0–1. Higher values indicate a higher recall.

For the given business problem, you may want higher precision and lower recall. Depending upon training data set, the precision and recall values will differ. You can further improve the model accuracy by following the steps mentioned on https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/tr-improve-model.html

## Detect Holes using the trained model

To analyze an image with a trained Amazon Rekognition Custom Labels model, you call the DetectCustomLabels API. The result from DetectCustomLabels is a prediction that the image contains specific objects, scenes, or concepts.




Get values of **project_arn** , **model_arn** and **version_name** as shown in the below screenshot

- Click on Project Version once the model training is completed. It shows **TRAINING_COMPLETED**
- Click on **Use Model** tab 
- Expand **API Code** 
- Select **Python** and get the values of variables from **Start Model** code

<img src="../lab-images/44.png" width="800">
<img src="../lab-images/45.png" width="800">
<img src="../lab-images/46.png" width="800">
<img src="../lab-images/47.png" width="800">


In [96]:
# Copy the name of your image
photo="test-data/images/14.jpg" 

# Review

We covered a lot of ground in this notebook! Let's recap what we accomplished. First we uploaded the labeled data set generated by SageMaker Ground Truth labeling job to S3 bucket. We then trained model in Amazon Rekognition Custom Labels based on training data and looked at the model accuracy, precision and F1 score for the resulting model.