* changed by nov05 on 2024-12-09    
* local conda env `awsmle_py310`    
* Udacity AWS MLE Course 5   

---   
 
* End of support notice: On October 31, 2025, AWS will discontinue support for Amazon Lookout for Vision. After October 31, 2025, you will no longer be able to access the Lookout for Vision console or Lookout for Vision resources. For more information, visit this [blog post](https://aws.amazon.com/blogs/machine-learning/exploring-alternatives-and-seamlessly-migrating-data-from-amazon-lookout-for-vision).   
* You can copy the images from the https://github.com/aws-samples/amazon-lookout-for-vision .   
* [AWS example datasets](https://docs.aws.amazon.com/lookout-for-vision/latest/developer-guide/example-datasets.html) ([backup](https://www.evernote.com/shard/s139/u/0/sh/20f4cb27-b331-4359-80e3-d59fafb398c3/F2ed7arE4Ee3gTridohRMA5brpn1Ofk82amqbQmgv7knQBv5idPDFQWX0A))     
  * copy the dataset images from your computer to your Amazon S3 bucket   
    `!aws s3 cp --recursive your-repository-folder/circuitboard s3://your-bucket/circuitboard`   
* For this practice, we'll only generate manifest files without uploading them to S3, so all operations will be local.  

In [3]:
%pwd

'd:\\github\\udacity-aws-mle-nano-course5\\excercise_3.13'

In [None]:
# !notepad C:\Users\guido\.aws\credentials

In [None]:
# ## reset the session after updating credentials
# import boto3 # type: ignore
# boto3.DEFAULT_SESSION = None
# ## Define IAM role
# import sagemaker # type: ignore
# from sagemaker import get_execution_role # type: ignore
# role_arn = get_execution_role()  ## get role ARN
# if 'AmazonSageMaker-ExecutionRole' not in role_arn:
#     print(f"Role ARN (voclabs): {role_arn}")  ## arn:aws:iam::026211625715:role/voclabs
#     ## your own role here
#     role_arn = "arn:aws:iam::026211625715:role/service-role/AmazonSageMaker-ExecutionRole-20241209T041445"
# session = sagemaker.Session()
# region = session.boto_region_name
# bucket = session.default_bucket()
# print("AWS Region: {}".format(region))
# print("Default SageMaker Bucket: {}".format(bucket))
# print("Role Arn (SageMaker): {}".format(role_arn))

## Amazon Lookout for Vision Lab

To help you learn about creating a model, Amazon Lookout for Vision provides example images of circuit boards (circuit_board) that you can use. These images are taken from https://docs.aws.amazon.com/lookout-for-vision/latest/developer-guide/su-prepare-example-images.html.

*P.S.     
[**Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision**](https://aws.amazon.com/blogs/machine-learning/exploring-alternatives-and-seamlessly-migrating-data-from-amazon-lookout-for-vision/)    
by Tim Westman on 10 OCT 2024, AWS Machine Learning Blog*     

### Environmental variables

In a very first step we want to define the two global variables needed for this notebook:

- bucket: the S3 bucket that you will create and then use as your source for Amazon Lookout for Vision
    - Note: Please read the comments carefully. Depending on your region you need to uncomment the correct command
- project: the project name you want to use in Amazon Lookout for Vision

In [None]:
import os
# import boto3
bucket = "vision20210905"  ## where you upload your train/test images  
project = "circuitproject"
os.environ["BUCKET"] = bucket
# os.environ["REGION"] = boto3.session.Session().region_name
## client = boto3.client('lookoutvision')  ## deprecated
# client=boto3.Session().client('sagemaker')
# print(client)

Depending on your region follow the instructions of the next cell:

## Image Preparation and EDA

In Amazon Lookout for Vision - see also
- https://aws.amazon.com/lookout-for-vision/ and
- 👉 https://aws.amazon.com/blogs/aws/amazon-lookout-for-vision-new-machine-learning-service-that-simplifies-defect-detection-for-manufacturing/
if you already have pre-labeled images available, as it is the case in this example, you can already establish a folder structure that lets you define training and validation. Further, images are labeled for Amazon Lookout via the corresponding folder (normal=good, anomaly=bad).

We will import the sample images provided by **AWS Lookout of Vision**. If you're importing your own images, you will prepare them at this stage.

### **Generate the *manifest* files**

You might be familiar with the manifest files if you ever used Amazon SageMaker Ground Truth. If you are not don't worry about that section too much.

If you are still interested in what's happening, you can continue reading:

Each dataset training/ as well as validation/ needs a manifest file. This file is used by Amazon Lookout for Vision to determine where to look for the images. The manifest follows a fixed structure. Most importantly are the keys (it's JSON formatted) *source-ref* this is the location for each file, *auto-label* the value for each label (0=bad, 1=good), *folder* which indicates whether Amazon Lookout is using training or validation and *creation-date* as this let's you know when an image was put in place. All other fields are pre-set for you.

Each manifest file itself contains N JSON objects, where N is the number of images that are used in this dataset.

In [16]:
# Datetime for datetime generation and json to dump the JSON object
# to the corresponding files:
from datetime import datetime
import json
from tqdm import tqdm

## Current date and time in manifest file format:
date_time = datetime.now().strftime("%Y-%m-%dT%H:%M:%S.%f")
## The two datasets used: train and test
datasets = ["train", "test"]
## local dataset dir
local_base_dir = f"../data/circuitboard/" ## ⚠️ set the local dir

## For each dataset...
for dataset in datasets:
    # ...list the folder available (normal or anomaly).
    print("------------------------------------------------")
    print(f"👉 Processing the \"{dataset}\" dataset...")
    folders = os.listdir(f"{local_base_dir}{dataset}")  
    # Then open the manifest file for this dataset...
    with open("{}.manifest".format(dataset), "w") as f:
        for folder in tqdm(folders):
            print(f"    👉 Processing the \"{folder}\" folder...")
            # ...and iterate through both folders by first listing
            # the corresponding files and setting the appropriate label
            # (as noted above: 1 = good, 0 = bad):
            files = os.listdir(f"{local_base_dir}{dataset}/{folder}") 
            label = 1 if folder=="anomaly" else 0
            # For each file in the folder...
            for file in tqdm(files):
                # ...generate a manifest JSON object and save it to the manifest
                # file. Don't forget to add '/n' to generate a new line:
                manifest = {
                  "source-ref": "s3://{}/{}/{}/{}/{}".format(bucket, project, dataset, folder, file),  ## ⚠️ set the s3 dir
                  "auto-label": label,
                  "auto-label-metadata": {
                    "confidence": 1,
                    "job-name": "labeling-job/auto-label",
                    "class-name": folder,
                    "human-annotated": "yes",
                    "creation-date": date_time,
                    "type": "groundtruth/image-classification"
                  }
                }
                f.write(json.dumps(manifest)+"\n")

------------------------------------------------
👉 Processing the "train" dataset...


  0%|          | 0/2 [00:00<?, ?it/s]

    👉 Processing the "anomaly" folder...


100%|██████████| 20/20 [00:00<?, ?it/s]


    👉 Processing the "normal" folder...


100%|██████████| 20/20 [00:00<?, ?it/s]
100%|██████████| 2/2 [00:00<?, ?it/s]


------------------------------------------------
👉 Processing the "test" dataset...


  0%|          | 0/2 [00:00<?, ?it/s]

    👉 Processing the "anomaly" folder...


100%|██████████| 20/20 [00:00<?, ?it/s]


    👉 Processing the "normal" folder...


100%|██████████| 20/20 [00:00<00:00, 1332.60it/s]
100%|██████████| 2/2 [00:00<00:00, 125.49it/s]


In [15]:
## read the first line of a manifest file
from pprint import pprint
with open('train.manifest', 'r') as file:
    first_line = next(file)
pprint(json.loads(first_line))

{'auto-label': 1,
 'auto-label-metadata': {'class-name': 'anomaly',
                         'confidence': 1,
                         'creation-date': '2024-12-10T17:06:39.779867',
                         'human-annotated': 'yes',
                         'job-name': 'labeling-job/auto-label',
                         'type': 'groundtruth/image-classification'},
 'source-ref': 's3://vision20210905/circuitproject/train/anomaly/train-anomaly_1.jpg'}


---  

⚠️ Skip the following code; we just practice generating manifest files.

---  


### Upload manifest files and images to S3

Now it's time to upload all the images and the manifest files:

In [None]:
# ## Upload manifest files to S3 bucket:
# !aws s3 cp train.manifest s3://{bucket}/{project}/train.manifest
# !aws s3 cp test.manifest s3://{bucket}/{project}/test.manifest

upload: ./train.manifest to s3://vision20210905/circuitproject/train.manifest
upload: ./test.manifest to s3://vision20210905/circuitproject/test.manifest


In [None]:
# ## Upload images to S3 bucket:
# !aws s3 cp circuitboard/train/normal s3://{bucket}/{project}/train/normal --recursive
# !aws s3 cp circuitboard/train/anomaly s3://{bucket}/{project}/train/anomaly --recursive
# !aws s3 cp circuitboard/test/normal s3://{bucket}/{project}/test/normal --recursive
# !aws s3 cp circuitboard/test/anomaly s3://{bucket}/{project}/test/anomaly --recursive

upload: circuitboard/train/normal/train-normal_20.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_20.jpg
upload: circuitboard/train/normal/train-normal_13.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_13.jpg
upload: circuitboard/train/normal/train-normal_14.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_14.jpg
upload: circuitboard/train/normal/train-normal_3.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_3.jpg
upload: circuitboard/train/normal/train-normal_12.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_12.jpg
upload: circuitboard/train/normal/train-normal_10.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_10.jpg
upload: circuitboard/train/normal/train-normal_19.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_19.jpg
upload: circuitboard/train/normal/train-normal_1.jpg to s3://vision20210905/circuitproject/train/normal/train-normal_1.jpg
uplo