## Verify & Clean Image Labels with AWS Rekognition:

The code below was written to employ AWS Rekognition to verify training set imagery labels and remove files with multiple instances of the target label, extraneous objects or other noise likely to confuse the model. This alone increased my model accuracy by 15.7%. 

In [1]:
import boto3

In [None]:
# set up boto3 S3 and Rekognition clients:

s3_client = boto3.client('s3')

bucket_name = 'bucket-name'
prefix = '/images-directory-path/'

rek_client = boto3.client(
    "rekognition",
    aws_access_key_id = "access_key_id",
    aws_secret_access_key = "your_secret_key",
    region_name = "us-east-1"
)

In the script below, the primary labels for the species of interest are captured by first processing 3 test images with Rekognition. These `test_images` are manually selected as model representations of the target label. These images are then saved to the set `test_labels` against which other photos in the directory are compared.

In [1]:
# Generate a test_labels list by loading photos consecutively and capturing Rekognition's response. 

test_images = ['image-1-path, image-2-path, image-3-path']
animal_list = []
test_labels = {}
keyString_list = []
bad_pics = 0

for img in test_images:

    response = rek_client.detect_labels(
        Image={
            'S3Object': {
                'Bucket': bucket_name,
                'Name': img,
            }
        },
        MaxLabels = 10,
    )

    for label in response['Labels']:
        if label['Confidence'] > 85:
            animal_list.append(label['Name'])

# creat a set of unique image labels from our test images

test_labels = set(animal_list)


# create an object type botocore.paginate.PageIterator from images in s3 bucket:

paginator = s3_client.get_paginator('list_objects_v2')
result = paginator.paginate(Bucket = bucket_name, Prefix = prefix)

# unpack the image file keystrings from the paginator results:

for page in result:
    if "Contents" in page:
        for key in page[ "Contents" ]:
            keyString = key[ "Key" ]
            keyString_list.append(keyString)
            
# call Rekognition with the file's keyString:
            
            try:
                rek_response = rek_client.detect_labels(
                    Image={
                        'S3Object': {
                            'Bucket': bucket_name,
                            'Name': keyString,
                        }
                    },
                    MaxLabels = 10,
                )
                
# append response labels above confidence > 85% to labels_list:

                labels_list = []
                for label in rek_response['Labels']:
                    if label['Confidence'] > 85:
                        labels_list.append(label['Name'])
                
# compare labels_list to test_labels and remove images lacking evidence of our desired subject:
                
                labels_list = set(labels_list)
                if (not labels_list.intersection(test_labels)) or ('Person' in test_labels): 
                    s3_client.delete_object(Bucket = bucket_name, Key = keyString)
                    bad_pics += 1
                    
            except:
                print('Bad image:', keyString)

print('{} images processed'.format(len(keyString_list)))
print('Deleted {} images.'.format(bad_pics))
