# Deploying Sampl Model from Hugging Face to AWS SageMaker

This notebook demonstrates how to deploy a Sampl model from Hugging Face to AWS SageMaker and perform inference.

## Table of Contents
1. [Setup and Installation](#setup-and-installation)
2. [AWS Configuration](#aws-configuration)
3. [Model Preparation](#model-preparation)
4. [SageMaker Deployment](#sagemaker-deployment)
5. [Inference Testing](#inference-testing)
6. [Monitoring and Cleanup](#monitoring-and-cleanup)

## Setup and Installation

In [None]:
# Install required packages
!pip install sagemaker transformers torch datasets boto3

# Import libraries
import sagemaker
import boto3
import json
import time
import os
from sagemaker import get_execution_role
from sagemaker.huggingface import HuggingFaceModel, HuggingFaceProcessor
from transformers import AutoTokenizer, AutoModel
import torch
from datasets import Dataset
import pandas as pd

print("Libraries imported successfully!")

## AWS Configuration

In [None]:
# Configure AWS and SageMaker
try:
    # Get execution role
    role = get_execution_role()
    print(f"SageMaker execution role: {role}")
except Exception as e:
    print(f"Error getting execution role: {e}")
    print("Please ensure you're running this notebook in a SageMaker instance or have proper AWS credentials configured")

# Initialize SageMaker session
sagemaker_session = sagemaker.Session()
region = sagemaker_session.boto_region_name
print(f"AWS Region: {region}")

# Create S3 bucket for model artifacts (if needed)
bucket = sagemaker_session.default_bucket()
print(f"Default S3 bucket: {bucket}")

## Model Preparation

In [None]:
# Define model configuration
MODEL_ID = "microsoft/DialoGPT-medium"  # You can change this to the specific Sampl model you want
MODEL_VERSION = "1.0.0"

print(f"Model ID: {MODEL_ID}")
print(f"Model Version: {MODEL_VERSION}")

# Test model locally first
try:
    print("Loading model locally for testing...")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, padding_side='left')
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    model = AutoModel.from_pretrained(MODEL_ID)
    print("Model loaded successfully!")
    print(f"Model type: {type(model)}")
    print(f"Tokenizer vocab size: {len(tokenizer)}")
except Exception as e:
    print(f"Error loading model: {e}")

In [None]:
# Create inference script
inference_code = '''
import json
import os
import torch
from transformers import AutoTokenizer, AutoModel
import traceback

def model_fn(model_dir):
    """Load the model for inference"""
    try:
        print("Loading model and tokenizer...")
        model = AutoModel.from_pretrained(model_dir)
        tokenizer = AutoTokenizer.from_pretrained(model_dir, padding_side='left')
        
        if tokenizer.pad_token is None:
            tokenizer.pad_token = tokenizer.eos_token
            
        print("Model and tokenizer loaded successfully!")
        return {"model": model, "tokenizer": tokenizer}
    except Exception as e:
        print(f"Error in model_fn: {e}")
        print(traceback.format_exc())
        raise

def input_fn(request_body, request_content_type):
    """Parse input data"""
    try:
        if request_content_type == 'application/json':
            input_data = json.loads(request_body)
            if 'inputs' in input_data:
                return input_data['inputs']
            else:
                return input_data
        else:
            return request_body
    except Exception as e:
        print(f"Error in input_fn: {e}")
        return request_body

def predict_fn(input_data, model_artifacts):
    """Make predictions"""
    try:
        model = model_artifacts["model"]
        tokenizer = model_artifacts["tokenizer"]
        
        if isinstance(input_data, str):
            input_data = [input_data]
        
        # Tokenize input
        inputs = tokenizer(input_data, return_tensors="pt", padding=True, truncation=True, max_length=512)
        
        # Make prediction
        with torch.no_grad():
            outputs = model(**inputs)
            # For this example, we'll return the last hidden states
            predictions = outputs.last_hidden_state.mean(dim=1).tolist()
        
        return predictions
    except Exception as e:
        print(f"Error in predict_fn: {e}")
        print(traceback.format_exc())
        return {"error": str(e)}

def output_fn(prediction, content_type):
    """Format output"""
    try:
        if content_type == 'application/json':
            return json.dumps({"predictions": prediction}, indent=2)
        else:
            return str(prediction)
    except Exception as e:
        print(f"Error in output_fn: {e}")
        return json.dumps({"error": str(e)})
'''

# Save inference code to file
with open('/tmp/inference.py', 'w') as f:
    f.write(inference_code)

print("Inference script created successfully!")

## SageMaker Deployment

In [None]:
# Create HuggingFace Model
try:
    # Create HuggingFace Model
    huggingface_model = HuggingFaceModel(
        model_data=f"s3://{bucket}/model-artifacts/{MODEL_ID.replace('/', '-')}/model.tar.gz",
        role=role,
        transformers_version="4.26.0",
        pytorch_version="1.13.1",
        py_version="py39",
        entry_point="inference.py",
        source_dir="/tmp/"
    )
    
    print("HuggingFace Model created successfully!")
    print(f"Model data: {huggingface_model.model_data}")
    print(f"Role: {huggingface_model.role}")
    
except Exception as e:
    print(f"Error creating HuggingFace model: {e}")
    print("This might be because we haven't uploaded the model to S3 yet. Let's do that first.")

In [None]:
# Prepare and upload model to S3
print("Preparing model for S3 upload...")

# Create temporary directory for model files
import tempfile
import shutil

temp_dir = tempfile.mkdtemp()
model_dir = os.path.join(temp_dir, "model")
os.makedirs(model_dir, exist_ok=True)

try:
    # Save model and tokenizer
    print("Saving model and tokenizer...")
    tokenizer.save_pretrained(model_dir)
    model.save_pretrained(model_dir)
    
    # Copy inference script to model directory
    shutil.copy("/tmp/inference.py", model_dir)
    
    # Create tar.gz file
    import tarfile
    tar_path = "/tmp/model.tar.gz"
    with tarfile.open(tar_path, "w:gz") as tar:
        tar.add(model_dir, arcname=".")
    
    print(f"Model packaged successfully at: {tar_path}")
    
    # Upload to S3
    model_s3_path = f"s3://{bucket}/model-artifacts/{MODEL_ID.replace('/', '-')}/model.tar.gz"
    print(f"Uploading model to: {model_s3_path}")
    
    sagemaker_session.upload_data(
        path=tar_path,
        bucket=bucket,
        key_prefix=f"model-artifacts/{MODEL_ID.replace('/', '-')}"
    )
    
    print("Model uploaded to S3 successfully!")
    
finally:
    # Clean up temporary files
    shutil.rmtree(temp_dir, ignore_errors=True)
    if os.path.exists("/tmp/model.tar.gz"):
        os.remove("/tmp/model.tar.gz")

In [None]:
# Now create and deploy the model
try:
    # Create HuggingFace Model with actual S3 path
    model_s3_path = f"s3://{bucket}/model-artifacts/{MODEL_ID.replace('/', '-')}/model.tar.gz"
    
    huggingface_model = HuggingFaceModel(
        model_data=model_s3_path,
        role=role,
        transformers_version="4.26.0",
        pytorch_version="1.13.1",
        py_version="py39",
        entry_point="inference.py"
    )
    
    print("Model object created successfully!")
    
    # Deploy the model
    print("Deploying model to SageMaker endpoint...")
    predictor = huggingface_model.deploy(
        initial_instance_count=1,
        instance_type="ml.m5.large"  # You can change this based on your needs
    )
    
    print("Model deployed successfully!")
    print(f"Endpoint name: {predictor.endpoint_name}")
    
except Exception as e:
    print(f"Error deploying model: {e}")
    print(traceback.format_exc())

## Inference Testing

In [None]:
# Test the deployed model
try:
    # Test data
    test_inputs = [
        "Hello, how are you today?",
        "What is the weather like?",
        "Tell me about machine learning."
    ]
    
    print("Testing model inference...")
    
    for i, test_input in enumerate(test_inputs, 1):
        print(f"\nTest {i}:")
        print(f"Input: {test_input}")
        
        # Make prediction
        response = predictor.predict({
            "inputs": test_input
        })
        
        print(f"Response: {response}")
        print("-" * 50)
    
    print("Inference testing completed successfully!")
    
except Exception as e:
    print(f"Error during inference testing: {e}")
    print(traceback.format_exc())

In [None]:
# Performance testing
try:
    print("Running performance tests...")
    
    import time
    
    # Test with multiple requests
    test_inputs = ["Hello world!", "How are you?", "What's new?", "Tell me something interesting."]
    
    response_times = []
    
    for i, test_input in enumerate(test_inputs, 1):
        start_time = time.time()
        
        response = predictor.predict({
            "inputs": test_input
        })
        
        end_time = time.time()
        response_time = end_time - start_time
        response_times.append(response_time)
        
        print(f"Request {i}: {response_time:.2f} seconds")
    
    # Calculate statistics
    avg_response_time = sum(response_times) / len(response_times)
    min_response_time = min(response_times)
    max_response_time = max(response_times)
    
    print(f"\nPerformance Summary:")
    print(f"Average response time: {avg_response_time:.2f} seconds")
    print(f"Minimum response time: {min_response_time:.2f} seconds")
    print(f"Maximum response time: {max_response_time:.2f} seconds")
    
except Exception as e:
    print(f"Error during performance testing: {e}")

## Monitoring and Cleanup

In [None]:
# Monitor endpoint metrics
try:
    import boto3
    
    # Get CloudWatch client
    cloudwatch = boto3.client('cloudwatch')
    
    # Get endpoint name
    endpoint_name = predictor.endpoint_name
    print(f"Monitoring endpoint: {endpoint_name}")
    
    # Get invocation metrics for the last hour
    end_time = datetime.datetime.utcnow()
    start_time = end_time - datetime.timedelta(hours=1)
    
    metrics = cloudwatch.get_metric_statistics(
        Namespace='AWS/SageMaker',
        MetricName='Invocations',
        Dimensions=[
            {
                'Name': 'EndpointName',
                'Value': endpoint_name
            }
        ],
        StartTime=start_time,
        EndTime=end_time,
        Period=300,  # 5-minute periods
        Statistics=['Sum']
    )
    
    if metrics['Datapoints']:
        total_invocations = sum(dp['Sum'] for dp in metrics['Datapoints'])
        print(f"Total invocations in the last hour: {total_invocations}")
    else:
        print("No invocation data available for the last hour")
    
except Exception as e:
    print(f"Error monitoring endpoint: {e}")

In [None]:
# List all deployed endpoints (optional)
try:
    import boto3
    
    sagemaker_client = boto3.client('sagemaker')
    
    # List endpoints
    response = sagemaker_client.list_endpoints()
    
    print("Deployed SageMaker endpoints:")
    for endpoint in response['Endpoints']:
        print(f"- {endpoint['EndpointName']} (Status: {endpoint['EndpointStatus']})")
        
except Exception as e:
    print(f"Error listing endpoints: {e}")

In [None]:
# Cleanup function
def cleanup_endpoint(endpoint_name=None):
    """Delete SageMaker endpoint"""
    try:
        if endpoint_name is None:
            endpoint_name = predictor.endpoint_name
        
        print(f"Deleting endpoint: {endpoint_name}")
        sagemaker_client = boto3.client('sagemaker')
        
        # Delete endpoint
        sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
        print(f"Endpoint {endpoint_name} deletion initiated.")
        
        # Also delete endpoint config
        try:
            endpoint_config_name = f"{endpoint_name}-config"
            sagemaker_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
            print(f"Endpoint config {endpoint_config_name} deleted.")
        except Exception as e:
            print(f"Note: Could not delete endpoint config: {e}")
        
        return True
        
    except Exception as e:
        print(f"Error during cleanup: {e}")
        return False

# Uncomment the line below if you want to delete the endpoint
# cleanup_endpoint()

## Summary

This notebook demonstrates:

1. **Setup**: Installing and configuring necessary libraries for SageMaker and Hugging Face
2. **Model Loading**: Loading and testing the model locally first
3. **Packaging**: Creating a proper inference script and packaging the model
4. **Deployment**: Deploying the model to SageMaker with proper configuration
5. **Testing**: Comprehensive inference testing and performance monitoring
6. **Cleanup**: Proper cleanup of resources

### Important Notes:

- Make sure you have proper AWS credentials configured
- The notebook should be run in a SageMaker instance or with proper IAM permissions
- Adjust the instance type and other parameters based on your specific model requirements
- Consider using smaller instances for testing and larger ones for production
- Remember to clean up resources to avoid unnecessary costs

### Customization:

- Replace `MODEL_ID` with your specific Sampl model name
- Modify the `predict_fn` function based on your model's specific inference requirements
- Adjust instance types and configurations as needed
- Add additional preprocessing or postprocessing steps as required