# Chapter 4: Advanced AWS AI/ML Services

This chapter covers more specialized and powerful AI/ML services offered by AWS. These services build upon the core services we've already explored and provide advanced capabilities for specific use cases.

## Prerequisites

- An AWS account with appropriate permissions
- AWS CLI configured with your credentials
- Python 3.7 or later
- Required Python packages: boto3, pandas, numpy, matplotlib, scikit-learn

Install required packages:

In [None]:
%%bash
pip install boto3 pandas numpy matplotlib scikit-learn

## 1. Amazon SageMaker Advanced Features

### 1.1 SageMaker Autopilot

SageMaker Autopilot automatically trains and tunes the best machine learning models for classification or regression based on your data.

In [None]:
import boto3
import sagemaker
import pandas as pd
from sagemaker.session import Session
from sagemaker.autopilot.automl import AutoML
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

# Load sample data
diabetes = load_diabetes()
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = pd.DataFrame(diabetes.target, columns=['target'])
data = pd.concat([X, y], axis=1)

# Split the data
train, test = train_test_split(data, test_size=0.2)

# Save data to CSV files
train.to_csv('train.csv', index=False)
test.to_csv('test.csv', index=False)

# Set up SageMaker session
sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker-autopilot-diabetes'

# Upload data to S3
train_s3 = sagemaker_session.upload_data('train.csv', bucket=bucket, key_prefix=prefix)
test_s3 = sagemaker_session.upload_data('test.csv', bucket=bucket, key_prefix=prefix)

# Set up Autopilot job
auto_ml = AutoML(
    role='your-sagemaker-role-arn',  # Replace with your SageMaker role ARN
    target_attribute_name='target',
    sagemaker_session=sagemaker_session,
    output_path=f's3://{bucket}/{prefix}/output'
)

# Run Autopilot job
auto_ml.fit(
    inputs=train_s3,
    job_name='diabetes-autopilot-job',
    wait=True
)

# Get best model
best_model = auto_ml.best_candidate()
print(f"Best model: {best_model}")

# Deploy model
predictor = auto_ml.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large',
    candidate=best_model
)

# Make predictions
test_data = test.drop('target', axis=1)
predictions = predictor.predict(test_data.values)
print("Sample predictions:")
print(predictions[:5])

# Clean up
predictor.delete_endpoint()

### 1.2 SageMaker Model Monitor

SageMaker Model Monitor continuously monitors the quality of machine learning models in production.

In [None]:
import boto3
import sagemaker
from sagemaker.model_monitor import DataCaptureConfig
from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat

# Assume you have a deployed model endpoint named 'your-model-endpoint'
endpoint_name = 'your-model-endpoint'

# Set up data capture
data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri=f's3://{bucket}/{prefix}/data-capture'
)

# Create a default model monitor
my_monitor = DefaultModelMonitor(
    role='your-sagemaker-role-arn',  # Replace with your SageMaker role ARN
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

# Set up the baseline
my_monitor.suggest_baseline(
    baseline_dataset='s3://path-to-your-baseline-dataset/baseline.csv',  # Replace with your baseline dataset
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=f's3://{bucket}/{prefix}/baseline-results',
    wait=True
)

# Create a monitoring schedule
from sagemaker.model_monitor import CronExpressionGenerator

my_monitor.create_monitoring_schedule(
    monitor_schedule_name='my-monitoring-schedule',
    endpoint_input=endpoint_name,
    output_s3_uri=f's3://{bucket}/{prefix}/monitoring-output',
    statistics=my_monitor.baseline_statistics(),
    constraints=my_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
    enable_cloudwatch_metrics=True,
)

print("Model monitoring schedule created.")

## 2. Amazon Augmented AI (A2I)

Amazon A2I provides built-in human review workflows for common machine learning use cases. Here's an example of setting up a human review workflow for image moderation:

In [None]:
import boto3
import json

a2i = boto3.client('sagemaker-a2i-runtime')
s3 = boto3.client('s3')

# Assume you have created a human review workflow in the A2I console
# and have the ARN of the flow definition
flow_definition_arn = 'your-flow-definition-arn'  # Replace with your flow definition ARN

# Sample image URL
image_url = "https://example.com/image-to-moderate.jpg"

# Create a human loop
response = a2i.start_human_loop(
    HumanLoopName='image-moderation-loop-' + str(int(time.time())),
    FlowDefinitionArn=flow_definition_arn,
    HumanLoopInput={
        'InputContent': json.dumps({
            "initialValue": "EXPLICIT",
            "imageUrl": image_url
        })
    }
)

print(f"Human loop created: {response['HumanLoopArn']}")

# In a real scenario, you would wait for the human review to complete
# and then process the results. Here's how you might check the status:

# Get human loop status
human_loop_name = response['HumanLoopArn'].split('/')[-1]
status_response = a2i.describe_human_loop(HumanLoopName=human_loop_name)
print(f"Human loop status: {status_response['HumanLoopStatus']}")

# When the loop is completed, you can get the results from S3
# The results location would be specified in your flow definition

## 3. Amazon Textract

Amazon Textract is a service that automatically extracts text, handwriting, and data from scanned documents. Here's an example of using Textract to extract text and form data from a document:

In [None]:
import boto3
import time

textract = boto3.client('textract')
s3 = boto3.client('s3')

# Assume you have a document in your S3 bucket
bucket_name = 'your-bucket-name'  # Replace with your bucket name
document_name = 'sample-form.pdf'  # Replace with your document name

# Start the Textract job
response = textract.start_document_analysis(
    DocumentLocation={
        'S3Object': {
            'Bucket': bucket_name,
            'Name': document_name
        }
    },
    FeatureTypes=['FORMS', 'TABLES']
)

job_id = response['JobId']

# Wait for the job to complete
while True:
    response = textract.get_document_analysis(JobId=job_id)
    status = response['JobStatus']
    print(f"Job status: {status}")
    if status in ['SUCCEEDED', 'FAILED']:
        break
    time.sleep(5)

# Process the results
if status == 'SUCCEEDED':
    print("\nExtracted text:")
    for item in response['Blocks']:
        if item['BlockType'] == 'LINE':
            print(item['Text'])
    
    print("\nExtracted form data:")
    for item in response['Blocks']:
        if item['BlockType'] == 'KEY_VALUE_SET':
            if 'KEY' in item['EntityTypes']:
                key = item['Relationships'][0]['Ids']
                value = item['Relationships'][1]['Ids']
                key_text = next(block['Text'] for block in response['Blocks'] if block['Id'] == key[0])
                value_text = next(block['Text'] for block in response['Blocks'] if block['Id'] == value[0])
                print(f"{key_text}: {value_text}")
else:
    print("Textract job failed.")

## 4. Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning. Here's an example of setting up a Kendra index and performing a search:

In [None]:
import boto3
import time

kendra = boto3.client('kendra')

# Create a Kendra index
response = kendra.create_index(
    Name='sample-index',
    Edition='DEVELOPER_EDITION',
    RoleArn='your-kendra-role-arn'  # Replace with your Kendra role ARN
)

index_id = response['Id']

# Wait for the index to be created
while True:
    response = kendra.describe_index(Id=index_id)
    status = response['Status']
    print(f"Index status: {status}")
    if status == 'ACTIVE':
        break
    time.sleep(60)

# Add a data source (example: S3)
response = kendra.create_data_source(
    IndexId=index_id,
    Name='sample-data-source',
    Type='S3',
    DataSourceConfiguration={
        'S3Configuration': {
            'BucketName': 'your-bucket-name',  # Replace with your bucket name
            'InclusionPrefixes': ['documents/']
        }
    },
    RoleArn='your-kendra-role-arn'  # Replace with your Kendra role ARN
)

data_source_id = response['Id']

# Sync the data source
kendra.start_data_source_sync_job(
    Id=data_source_id,
    IndexId=index_id
)

# Perform a search
response = kendra.query(
    IndexId=index_id,
    QueryText='What is machine learning?'
)

print("\nSearch results:")
for result in response['ResultItems']:
    print(f"Document Title: {result.get('DocumentTitle')}")
    print(f"Document Excerpt: {result.get('DocumentExcerpt', {}).get('Text')}")
    print("---")

# Clean up (uncomment to delete the index when you're done)
# kendra.delete_index(Id=index_id)

## Conclusion

These advanced AWS AI/ML services provide powerful capabilities for specific use cases:

1. SageMaker Autopilot for automated machine learning
2. SageMaker Model Monitor for continuous model quality monitoring
3. Amazon Augmented AI (A2I) for human review workflows
4. Amazon Textract for document text and data extraction
5. Amazon Kendra for intelligent search

These services can significantly accelerate your AI/ML projects by automating complex tasks and providing specialized functionality. As you work with these services, remember to:

- Manage your AWS resources carefully to control costs
- Ensure you have the necessary permissions and IAM roles set up
- Be aware of service limits and quotas
- Consider data privacy and security, especially when dealing with sensitive information

As you become more familiar with these advanced services, explore how they can be combined with core AI services and your existing workflows to create sophisticated, AI-powered applications.