# Create and run SageMaker Ground Truth Labeling job

This notebook creates a Ground Truth labeling job in SageMaker and lets you track the status of the job. Once this has completed, you can move onto the Prepare Data and Labels notebook. 

## Common Variables
*These will remain consistent through all notebooks in this repo*

In [None]:
BUCKET = '<S3 Bucket Name>' # Valid name for S3 bucket.
EXP_NAME = '<Job S3 Prefix>' # Any valid S3 prefix.
CLASS_NAME = '<Target object label name>' # The single label that will be annotated in the Ground Truth job.

## Notebook Variables

## Import Dependencies

In [None]:
import numpy as np
import random
import os, shutil
import json
import boto3
import botocore
import sagemaker

## Create asset bucket

In [None]:
# Make sure the bucket is in the same region as this notebook.
role = sagemaker.get_execution_role()
region = boto3.session.Session().region_name
s3 = boto3.client('s3')
bucket_region = s3.head_bucket(Bucket=BUCKET)['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']
assert bucket_region == region, "Your S3 bucket {} and this notebook need to be in the same region.".format(BUCKET)

## Upload images to be annotated

<span style="color:red">**IMPORTANT - you must now upload your images to the bucket you specified in the previous cell, under a folder called /images.**</span>

In [None]:
# need to enumerate the bucket/images folder and get a list of all objects to create the manifest file from



# Create and upload the input manifest.
manifest_name = 'input.manifest'
with open(manifest_name, 'w') as f:
    for img_id_id, img_id in enumerate(fids2bbs.keys()):
        img_path = 's3://{}/{}/images/{}.jpg'.format(BUCKET, EXP_NAME, img_id)
        f.write('{"source-ref": "' + img_path +'"}\n')
s3.upload_file(manifest_name, BUCKET, EXP_NAME + '/' + manifest_name)

## Specify the categories

To run an object detection labeling job, you must decide on a set of classes the annotators can choose from. At the moment, Ground Truth only supports annotating one object detection class at a time. To work with Ground Truth, this list needs to be converted to a .json file and uploaded to the S3 BUCKET.

In [None]:
CLASS_LIST = [CLASS_NAME]
print("Label space is {}".format(CLASS_LIST))

json_body = {
    'labels': [{'label': label} for label in CLASS_LIST]
}
with open('class_labels.json', 'w') as f:
    json.dump(json_body, f)
    
s3.upload_file('class_labels.json', BUCKET, EXP_NAME + '/class_labels.json')

You should now see class_labels.json in s3://BUCKET/EXP_NAME/.

## Create the instruction template
Part or all of your images will be annotated by human annotators. It is essential to provide good instructions. Good instructions are:

 1. Concise. We recommend limiting verbal/textual instruction to two sentences and focusing on clear visuals.
 2. Visual. In the case of object detection, we recommend providing several labeled examples with different numbers of boxes.
 
When used through the AWS Console, Ground Truth helps you create the instructions using a visual wizard. When using the API, you need to create an HTML template for your instructions. Below, we prepare a very simple but effective template and upload it to your S3 bucket.

NOTE: If you use any images in your template (as we do), they need to be publicly accessible. You can enable public access to files in your S3 bucket through the S3 Console, as described in S3 Documentation.

**Testing your instructions**

**It is very easy to create broken instructions.** This might cause your labeling job to fail. However, it might also cause your job to complete with meaningless results if, for example, the annotators have no idea what to do or the instructions are misleading. At the moment the only way to test the instructions is to run your job in a private workforce. This is a way to run a mock labeling job for free.

It is helpful to show examples of correctly labeled images in the instructions. The following code block produces several such examples for our dataset and saves them in s3://BUCKET/EXP_NAME/.

In [None]:
from IPython.core.display import HTML, display

def make_template(test_template=False, save_fname='instructions.template'):
    template = r"""<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>
    <crowd-form>
      <crowd-bounding-box
        name="boundingBox"
        src="{{{{ task.input.taskObject | grant_read_access }}}}"
        header="Dear Annotator, please draw a tight box around each {class_name} you see. Thank you!"
        labels="{labels_str}"
      >
        <full-instructions header="Please annotate each {class_name}.">

    <ol>
        <li><strong>Inspect</strong> the image</li>
        <li><strong>Determine</strong> if the specified label is/are visible in the picture.</li>
        <li><strong>Outline</strong> each instance of the specified label in the image using the provided “Box” tool.</li>
    </ol>
    <ul>
        <li>Boxes should fit tight around each object</li>
        <li>Do not include parts of the object are overlapping or that cannot be seen, even though you think you can interpolate the whole shape.</li>
        <li>Avoid including shadows.</li>
        <li>If the target is off screen, draw the box up to the edge of the image.</li>
    </ul>

        </full-instructions>
        <short-instructions>
            <p>Short Instructions</p>
        </short-instructions>
      </crowd-bounding-box>
    </crowd-form>
    """.format(class_name=CLASS_NAME,
               instructions_uri=instructions_uri,
               labels_str=str(CLASS_LIST) if test_template else '{{ task.input.labels | to_json | escape }}')
    with open(save_fname, 'w') as f:
        f.write(template)

        
make_template(test_template=True, save_fname='instructions.html')
make_template(test_template=False, save_fname='instructions.template')
s3.upload_file('instructions.template', BUCKET, EXP_NAME + '/instructions.template')

You should now be able to find your template in s3://BUCKET/EXP_NAME/instructions.template.

## Create a Private Workforce for the labeling job

This step will create the required Amazon Cognito User Pool, SageMaker Private Team, and Workers (users), that will be assigned the task of annotating the images.

In [None]:
# TODO create pool, team, and workers



# need to output team arn
private_workteam_arn = ''

## Submit the Ground Truth job request
The API starts a Ground Truth job by submitting a request. The request contains the 
full configuration of the annotation task, and allows you to modify the fine details of
the job that are fixed to default values when you use the AWS Console. The parameters that make up the request are described in more detail in the [SageMaker Ground Truth documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateLabelingJob.html).

After you submit the request, you should be able to see the job in your AWS Console, at `Amazon SageMaker > Labeling Jobs`.
You can track the progress of the job there. This job will take several hours to complete. If your job
is larger (say 100,000 images), the speed and cost benefit of auto-labeling should be larger.

In [None]:
USE_AUTO_LABELING = True

task_description = 'Dear Annotator, please draw a box around each {}. Thank you!'.format(CLASS_NAME)
task_keywords = ['image', 'object', 'detection']
task_title = 'Please draw a box around each {}.'.format(CLASS_NAME)
job_name = EXP_NAME + str(int(time.time()))

human_task_config = {
      "AnnotationConsolidationConfig": {
        "AnnotationConsolidationLambdaArn": acs_arn,
      },
      "PreHumanTaskLambdaArn": prehuman_arn,
      "MaxConcurrentTaskCount": 200, # 200 images will be sent at a time to the workteam.
      "NumberOfHumanWorkersPerDataObject": 5, # We will obtain and consolidate 5 human annotations for each image.
      "TaskAvailabilityLifetimeInSeconds": 21600, # Your workteam has 6 hours to complete all pending tasks.
      "TaskDescription": task_description,
      "TaskKeywords": task_keywords,
      "TaskTimeLimitInSeconds": 300, # Each image must be labeled within 5 minutes.
      "TaskTitle": task_title,
      "UiConfig": {
        "UiTemplateS3Uri": 's3://{}/{}/instructions.template'.format(BUCKET, EXP_NAME),
      }
    }

human_task_config["WorkteamArn"] = private_workteam_arn

ground_truth_request = {
        "InputConfig" : {
          "DataSource": {
            "S3DataSource": {
              "ManifestS3Uri": 's3://{}/{}/{}'.format(BUCKET, EXP_NAME, manifest_name),
            }
          },
          "DataAttributes": {
            "ContentClassifiers": [
              "FreeOfPersonallyIdentifiableInformation",
              "FreeOfAdultContent"
            ]
          },  
        },
        "OutputConfig" : {
          "S3OutputPath": 's3://{}/{}/output/'.format(BUCKET, EXP_NAME),
        },
        "HumanTaskConfig" : human_task_config,
        "LabelingJobName": job_name,
        "RoleArn": role, 
        "LabelAttributeName": "category",
        "LabelCategoryConfigS3Uri": 's3://{}/{}/class_labels.json'.format(BUCKET, EXP_NAME),
    }

if USE_AUTO_LABELING and RUN_FULL_AL_DEMO:
    ground_truth_request[ "LabelingJobAlgorithmsConfig"] = {
            "LabelingJobAlgorithmSpecificationArn": labeling_algorithm_specification_arn
                                       }
    
sagemaker_client = boto3.client('sagemaker')
sagemaker_client.create_labeling_job(**ground_truth_request)