Skip to content

intel/terraform-intel-aws-sagemaker-endpoint

Intel Logo

Intel® Optimized Cloud Modules for Terraform

© Copyright 2024, Intel Corporation

Amazon SageMaker Endpoint module

This module provides functionality to create a SageMaker Endpoint based on the latest 4th gen Intel Xeon scalable processors (called Sapphire Rapids) that is available in SageMaker endpoints at the time of publication of this module.

Performance Data

Find all the information below plus even more by navigating our full library

Link

Link

Link

Link

Link

Link

Link

Link

Usage

See examples folder for code ./examples/provisioned-realtime-endpoint/main.tf

Example of main.tf

#########################################################
# Local variables, modify for your needs                #
#########################################################

# See policies.md for recommended instances
# Intel recommended instance types for SageMaker endpoint configurations

# Compute Optimized
# ml.c7i.large, ml.c7i.xlarge, ml.c7i.2xlarge, ml.c7i.4xlarge, ml.c7i.8xlarge, ml.c7i.12xlarge, 
# ml.c7i.16xlarge, ml.c7i.24xlarge, ml.c7i.48xlarge, ml.c6i.large, ml.c6i.xlarge, ml.c6i.2xlarge, ml.c6i.4xlarge, ml.c6i.8xlarge, ml.c6i.12xlarge, ml.c6i.16xlarge, ml.c6i.24xlarge, ml.c6i.32xlarge


# General Purpose
# ml.m7i.large, ml.m7i.xlarge, ml.m7i.2xlarge, ml.m7i.4xlarge, ml.m7i.8xlarge, ml.m7i.12xlarge, 
# ml.m7i.16xlarge, ml.m7i.24xlarge, ml.m7i.48xlarge, ml.m5.large, ml.m5.xlarge, ml.m5.2xlarge, ml.m5.4xlarge, ml.m5.12xlarge, ml.m5.24xlarge, ml.m5d.large, ml.m5d.xlarge, ml.m5d.2xlarge,ml.m5d.4xlarge, ml.m5d.12xlarge, ml.m5d.24xlarge

# Memory Optimized
# ml.r7i.large, ml.r7i.xlarge, ml.r7i.2xlarge, ml.r7i.4xlarge, ml.r7i.8xlarge, ml.r7i.12xlarge, 
# ml.r7i.16xlarge, ml.r7i.24xlarge, ml.r7i.48xlarge, ml.r5.large, ml.r5.xlarge, ml.r5.2xlarge, ml.r5.4xlarge, ml.r5.12xlarge, ml.r5.24xlarge, ml.r5d.large, ml.r5d.xlarge, ml.r5d.2xlarge, ml.r5d.4xlarge, ml.r5d.12xlarge, ml.r5d.24xlarge

# Accelerated Computing
# ml.g4dn.xlarge, ml.g4dn.2xlarge, ml.g4dn.4xlarge, ml.g4dn.8xlarge, ml.g4dn.12xlarge, ml.g4dn.16xlarge, ml.inf1.xlarge, 
# ml.inf1.2xlarge, ml.inf1.6xlarge, ml.inf1.24xlarge

locals {
  region                        = "us-east-1"
  sagemaker_container_log_level = "20"
  sagemaker_program             = "inference.py"
  sagemaker_submit_directory    = "/opt/ml/model/code"

  # This is the place where you need to provide the S3 path to the model artifact. In this example, we are using a model
  # artifact that is created from SageMaker jumpstart pre-trained model for Scikit Learn Linear regression.
  # The S3 path for the model artifact will look like the example below.
  aws-jumpstart-inference-model-uri = "s3://sagemaker-us-east-1-<AWS_Account_Id>/sklearn-regression-linear-20240208-220732/model.tar.gz" # change here

  # This is the ECR registry path for the container image that is used for inferencing.
  model_image = "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3"

  enable_network_isolation = true
}

resource "random_id" "rid" {
  byte_length = 5
}

module "sagemaker_scikit_learn_model" {
  source = "../../modules"

  # Specifying SageMaker Model Primary container parameters corresponding to the production variant
  sagemaker_model_primary_container = [{
    image          = local.model_image
    model_data_url = local.aws-jumpstart-inference-model-uri
    environment = {
      "SAGEMAKER_CONTAINER_LOG_LEVEL" = local.sagemaker_container_log_level
      "SAGEMAKER_PROGRAM"             = local.sagemaker_program
      "SAGEMAKER_REGION"              = local.region
      "SAGEMAKER_SUBMIT_DIRECTORY"    = local.sagemaker_submit_directory
    }
  }]
}

module "sagemaker_endpoint" {
  source = "intel/aws-sagemaker-endpoint/intel"

  # Specifying one production variant for the SageMaker endpoint configuration
  endpoint_production_variants = [{
    model_name             = module.sagemaker_scikit_learn_model.sagemaker-model-name
    instance_type          = "ml.c7i.xlarge"
    initial_instance_count = 1
    variant_name           = "my-variant-1-${random_id.rid.dec}"
  }]
}

Run Terraform

terraform init  
terraform plan
terraform apply

Note that this example may create resources. Run terraform destroy when you don't need these resources anymore.

Considerations

  • The Sagemaker Endpoint resource created is a provisoned endpoint

AWS References

Using the SageMaker Python SDK https://sagemaker.readthedocs.io/en/stable/overview.html#use-sagemaker-jumpstart-algorithms-with-pretrained-models

Deploy a Pre-Trained Model Directly to a SageMaker Endpoint https://sagemaker.readthedocs.io/en/stable/overview.html#use-built-in-algorithms-with-pre-trained-models-in-sagemaker-python-sdk

Built-in Algorithms with pre-trained Model Table https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html

Requirements

Name Version
terraform >=1.3.0
aws ~> 5.31
random ~>3.4.3

Providers

Name Version
aws ~> 5.31
random ~>3.4.3

Modules

No modules.

Resources

Name Type
aws_sagemaker_endpoint.endpoint resource
aws_sagemaker_endpoint_configuration.ec resource
random_id.rid resource

Inputs

Name Description Type Default Required
accelerator_type The size of the Elastic Inference (EI) instance to use for the production variant. string null no
capture_mode Specifies the data to be captured. Should be one of Input or Output. string "Input" no
create_shadow_variant A boolean flag to determinie whether a shadow production variant will be created or not. bool false no
destination_s3_uri The URL for S3 location where the captured data is stored. any null no
enable_capture Flag to enable data capture. bool false no
enable_intel_tags If true adds additional Intel tags to resources bool true no
endpoint_configuration_tags Tags for the SageMaker Endpoint Configuration resource map(string) null no
endpoint_production_variants A list of Production Variant objects, one for each model that you want to host at this endpoint. list [] no
endpoint_shadow_variants Array of ProductionVariant objects. There is one for each model that you want to host at this endpoint in shadow mode with production traffic replicated from the model specified on ProductionVariants.If you use this field, you can only specify one variant for ProductionVariants and one variant for ShadowProductionVariants. list [] no
endpoint_tags Tags for the SageMaker Endpoint resource map(string) null no
initial_instance_count Initial number of instances used for auto-scaling. number 1 no
initial_sampling_percentage Portion of data to capture. Should be between 0 and 100. number 100 no
initial_variant_weight Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. If unspecified, it defaults to 1.0. string null no
instance_type The type of instance to start. string "ml.c7i.large" no
intel_tags Intel Tags map(string)
{
"intel-module": "terraform-intel-aws-sagemaker-endpoint",
"intel-registry": "https://registry.terraform.io/namespaces/intel"
}
no
json_content_types The JSON content type headers to capture. any null no
kms_key_arn Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance that hosts the endpoint. string null no
model_name The name of the model to use. string null no
shadow_accelerator_type The size of the Elastic Inference (EI) instance to use for the production variant. string null no
shadow_initial_instance_count Initial number of instances used for auto-scaling. number 1 no
shadow_initial_variant_weight Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. If unspecified, it defaults to 1.0. string null no
shadow_instance_type The type of instance to start. string "ml.c6i.large" no
shadow_model_name The name of the model to use. string null no
shadow_variant_name The name of the variant. If omitted, Terraform will assign a random, unique name. string null no
variant_name The name of the variant. If omitted, Terraform will assign a random, unique name. string null no

Outputs

Name Description
endpoint-arn The Amazon Resource Name (ARN) assigned by AWS to this endpoint
endpoint-configuration-arn The Amazon Resource Name (ARN) assigned by AWS to this endpoint configuration
endpoint-configuration-name The name of the endpoint configuration.
endpoint-configuration-tags_all A map of tags assigned to the endpoint configuration, including those inherited from the provider default_tags configuration block.
endpoint-name The name of the endpoint