# Deploy a Keras or Tensorflow model trained anywhere using Amazon SageMaker


Amazon SageMaker makes it easier for any developer or data scientist to build, train, and deploy machine learning (ML) models. While it’s designed to alleviate the undifferentiated heavy lifting from the full life cycle of ML models, Amazon SageMaker’s capabilities can also be used independently of one another; that is, models trained in Amazon SageMaker can be optimized and deployed outside of Amazon SageMaker including edge (mobile or IoT devices). Conversely, Amazon SageMaker can deploy and host pre-trained models such as model zoos or models trained locally by your team. 

In this notebook, we’ll demonstrate how to deploy a trained Keras (Tensorflow backend) model using Amazon SageMaker, taking advantage of Amazon SageMaker deployment features, such as selecting the type and number of instances, model compilation to improve inference latency, and autoscaling.

### Step 1. Set up

In the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Upload the current notebook and set the kernel to ``conda_tensorflow_p36``.

The get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

In [1]:
#!pip install --upgrade "sagemaker>=2"

In [2]:
from sagemaker import get_execution_role
from sagemaker import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [3]:
country = 'United_States'

In [4]:
import sagemaker
print(sagemaker.__version__)

2.228.0


In [5]:
!pip install sagemaker==2.242.0 --upgrade

Collecting sagemaker==2.242.0
  Using cached sagemaker-2.242.0-py3-none-any.whl.metadata (16 kB)
Collecting sagemaker-core<2.0.0,>=1.0.17 (from sagemaker==2.242.0)
  Using cached sagemaker_core-1.0.27-py3-none-any.whl.metadata (4.9 kB)
Collecting mock<5.0,>4.0 (from sagemaker-core<2.0.0,>=1.0.17->sagemaker==2.242.0)
  Using cached mock-4.0.3-py3-none-any.whl.metadata (2.8 kB)
Using cached sagemaker-2.242.0-py3-none-any.whl (1.6 MB)
Using cached sagemaker_core-1.0.27-py3-none-any.whl (407 kB)
Using cached mock-4.0.3-py3-none-any.whl (28 kB)
Installing collected packages: mock, sagemaker-core, sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.228.0
    Uninstalling sagemaker-2.228.0:
      Successfully uninstalled sagemaker-2.228.0
Successfully installed mock-4.0.3 sagemaker-2.242.0 sagemaker-core-1.0.27


In [6]:
pip install tensorflow==2.18.0

Collecting tensorflow==2.18.0
  Using cached tensorflow-2.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)
Collecting libclang>=13.0.0 (from tensorflow==2.18.0)
  Using cached libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl.metadata (5.2 kB)
Collecting tensorboard<2.19,>=2.18 (from tensorflow==2.18.0)
  Using cached tensorboard-2.18.0-py3-none-any.whl.metadata (1.6 kB)
Collecting tensorflow-io-gcs-filesystem>=0.23.1 (from tensorflow==2.18.0)
  Using cached tensorflow_io_gcs_filesystem-0.37.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (14 kB)
Using cached tensorflow-2.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (615.4 MB)
Using cached libclang-18.1.1-py2.py3-none-manylinux2010_x86_64.whl (24.5 MB)
Using cached tensorboard-2.18.0-py3-none-any.whl (5.5 MB)
Using cached tensorflow_io_gcs_filesystem-0.37.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.1 MB)
Installing collected packages: libc

If you are running this locally, check your version of Tensorflow to prevent downstream framework errors.

In [7]:
import tensorflow as tf
print(tf.__version__)  # This notebook runs on TensorFlow 1.15.x or earlier

2025-03-30 21:21:53.544151: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:21:53.547713: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:21:53.558926: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743369713.579243     210 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743369713.585171     210 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-30 21:21:53.605218: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins

2.18.0


In [8]:
tf_framework_version = tf.__version__

Import necessary Python packages and install the version of h5py for compatibility with your Keras model.

In [9]:
# ref: https://github.com/keras-team/keras/issues/14265
!pip install "h5py==2.10.0"
import h5py
import numpy as np

Collecting h5py==2.10.0
  Using cached h5py-2.10.0.tar.gz (301 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: h5py
  Building wheel for h5py (setup.py) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py bdist_wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[85 lines of output][0m
  [31m   [0m /opt/conda/lib/python3.11/site-packages/setuptools/__init__.py:94: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  [31m   [0m !!
  [31m   [0m 
  [31m   [0m         ********************************************************************************
  [31m   [0m         Requirements should be satisfied by a PEP 517 installer.
  [31m   [0m         If you are using pip, you can try `pip install --use-pep517`.
  [31m   [0m         ********************************************************************************
 

### Step 2. Load the Keras model using the json and weights file

If you saved your model in the TensorFlow ProtoBuf format, skip to "Step 4. Convert the TensorFlow model to an Amazon SageMaker-readable format.

Create a directory called ``keras_model``, download [hosted keras model](https://s3.amazonaws.com/aws-ml-blog/artifacts/keras-tensorflow-model-deployment/model.zip), and unzip the model.json and model-weights.h5 files to ``keras_model``.

In [10]:
#!mkdir keras_model

In [11]:
#!wget https://s3.amazonaws.com/aws-ml-blog/artifacts/keras-tensorflow-model-deployment/model.zip

In [12]:
#!unzip model.zip -d keras_model

In [13]:
#import os
#import tensorflow as tf
#import tensorflow.keras as keras
#from keras.models import model_from_json

#with open(os.path.join('keras_model', 'model.json'), 'r') as fp:
#    loaded_model_json = fp.read()
#loaded_model = model_from_json(loaded_model_json)

In [14]:
#loaded_model.load_weights('keras_model/model-weights.h5')

### Step 3. Export the Keras model to the TensorFlow ProtoBuf format

In [15]:
#from tensorflow.python.saved_model import builder
#from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
#from tensorflow.python.saved_model import tag_constants

In [16]:
# Note: This directory structure will need to be followed - see notes for the next section
#model_version = '1'
#export_dir = 'export/Servo/' + model_version

In [17]:
# Build the Protocol Buffer SavedModel at 'export_dir'
#builder = builder.SavedModelBuilder(export_dir)

In [18]:
# Create prediction signature to be used by TensorFlow Serving Predict API
#signature = predict_signature_def(
#    inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})

In [19]:
#session = tf.compat.v1.Session()
#init_op = tf.compat.v1.global_variables_initializer()
#session.run(init_op)
# Save the meta graph and variables
#builder.add_meta_graph_and_variables(
#    sess=session, tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature})
#builder.save()

### Step 3-1 Repack/Convert the Colab Model Tarballs so that they match MultiModel Upload Format

In [20]:
import os
import tarfile
import shutil

SOURCE_DIR = "models_for_repack"
DEST_DIR = "models_for_deploy"
TMP_EXTRACT_DIR = "tmp_extract"
TMP_REPACK_DIR = "tmp_repack"

def repack_all_models_for_mme(source_dir=SOURCE_DIR, dest_dir=DEST_DIR):
    os.makedirs(dest_dir, exist_ok=True)
    os.makedirs(TMP_EXTRACT_DIR, exist_ok=True)
    os.makedirs(TMP_REPACK_DIR, exist_ok=True)

    model_files = [f for f in os.listdir(source_dir) if f.startswith("saved_model_") and f.endswith(".tar.gz")]

    for model_file in model_files:
        # Extract country name: saved_model_United_States.tar.gz → United_States
        country = model_file.replace("saved_model_", "").replace(".tar.gz", "")
        input_tar_path = os.path.join(source_dir, model_file)
        output_tar_path = os.path.join(dest_dir, f"{country}.model.tar.gz")

        print(f"📦 Repacking: {model_file} → {country}.model.tar.gz")

        # Clean temp dirs
        shutil.rmtree(TMP_EXTRACT_DIR, ignore_errors=True)
        shutil.rmtree(TMP_REPACK_DIR, ignore_errors=True)
        os.makedirs(TMP_EXTRACT_DIR, exist_ok=True)
        os.makedirs(TMP_REPACK_DIR, exist_ok=True)

        # Step 1: Extract original tar
        with tarfile.open(input_tar_path, "r:gz") as tar:
            tar.extractall(TMP_EXTRACT_DIR)

        # Step 2: Find the only subfolder
        subfolders = [f for f in os.listdir(TMP_EXTRACT_DIR) if os.path.isdir(os.path.join(TMP_EXTRACT_DIR, f))]
        if len(subfolders) != 1:
            raise RuntimeError(f"❌ Expected 1 subfolder inside {model_file}, found: {subfolders}")

        original_model_path = os.path.join(TMP_EXTRACT_DIR, subfolders[0])
        repack_target_path = os.path.join(TMP_REPACK_DIR, "1")

        # Step 3: Copy contents into a "1/" folder
        shutil.copytree(original_model_path, repack_target_path)

        # Step 4: Create new MME-ready tar.gz with root "1/"
        with tarfile.open(output_tar_path, "w:gz") as tar:
            tar.add(repack_target_path, arcname="1")

        print(f"✅ Saved: {output_tar_path}")

    # Cleanup temp dirs
    shutil.rmtree(TMP_EXTRACT_DIR, ignore_errors=True)
    shutil.rmtree(TMP_REPACK_DIR, ignore_errors=True)

    print("🎉 All models repacked and saved to:", dest_dir)


In [21]:
repack_all_models_for_mme()

📦 Repacking: saved_model_ThailandV2.tar.gz → ThailandV2.model.tar.gz
✅ Saved: models_for_deploy/ThailandV2.model.tar.gz
📦 Repacking: saved_model_United_StatesV2.tar.gz → United_StatesV2.model.tar.gz
✅ Saved: models_for_deploy/United_StatesV2.model.tar.gz
📦 Repacking: saved_model_United_StatesV3.tar.gz → United_StatesV3.model.tar.gz
✅ Saved: models_for_deploy/United_StatesV3.model.tar.gz
📦 Repacking: saved_model_United_StatesV4.tar.gz → United_StatesV4.model.tar.gz
✅ Saved: models_for_deploy/United_StatesV4.model.tar.gz
📦 Repacking: saved_model_Argentina.tar.gz → Argentina.model.tar.gz
✅ Saved: models_for_deploy/Argentina.model.tar.gz
📦 Repacking: saved_model_Brunei_Darussalam.tar.gz → Brunei_Darussalam.model.tar.gz
✅ Saved: models_for_deploy/Brunei_Darussalam.model.tar.gz
📦 Repacking: saved_model_Brazil.tar.gz → Brazil.model.tar.gz
✅ Saved: models_for_deploy/Brazil.model.tar.gz
📦 Repacking: saved_model_Australia.tar.gz → Australia.model.tar.gz
✅ Saved: models_for_deploy/Australia.model

In [23]:
import tarfile
import os
import shutil

def extract_and_inspect_model(tar_path, extract_dir="tmp_inspect"):
    # Clean up old temp dir
    shutil.rmtree(extract_dir, ignore_errors=True)
    os.makedirs(extract_dir, exist_ok=True)

    # Extract tar.gz
    with tarfile.open(tar_path, "r:gz") as tar:
        tar.extractall(extract_dir)

    # Find model directory
    for root, dirs, files in os.walk(extract_dir):
        if "saved_model.pb" in files:
            model_path = root
            break
    else:
        raise FileNotFoundError("No saved_model.pb found in extracted archive.")

    print(f"✅ Found SavedModel at: {model_path}")
    
    # Run saved_model_cli
    os.system(f"saved_model_cli show --all --dir {model_path}")

In [24]:
extract_and_inspect_model("models_for_deploy/United_StatesV4.model.tar.gz")

✅ Found SavedModel at: tmp_inspect/1


2025-03-30 21:22:54.649452: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:22:54.652780: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:22:54.663143: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743369774.681229     865 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743369774.686486     865 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-30 21:22:54.704553: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serve']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['numeric_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 20)
        name: serve_numeric_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['numeric_input'] tensor_info:
 

In [25]:
extract_and_inspect_model("models_for_deploy/Chile.model.tar.gz")

✅ Found SavedModel at: tmp_inspect/1


2025-03-30 21:23:00.616650: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:23:00.619886: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:23:00.630324: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743369780.648258     892 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743369780.653553     892 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-30 21:23:00.671470: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serve']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['numeric_input'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 20)
        name: serve_numeric_input:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['numeric_input'] tensor_info:
 

In [26]:
extract_and_inspect_model("models_for_deploy/ThailandV2.model.tar.gz")

✅ Found SavedModel at: tmp_inspect/1


2025-03-30 21:23:06.159780: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:23:06.163595: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-30 21:23:06.175357: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743369786.193501     919 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743369786.198752     919 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-30 21:23:06.216573: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serve']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['keras_tensor'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 20)
        name: serve_keras_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['keras_tensor'] tensor_info:
    

### Step 4. Convert TensorFlow model to a SageMaker readable format

Move the TensorFlow exported model into a directory export\Servo\. SageMaker will recognize this as a loadable TensorFlow model. Your directory and file structure should look like:

####  Upload All Models to S3

Spain
United_Arab_Emirates
Canada
Kazakhstan
Indonesia

In [27]:
import os
from sagemaker import Session

def upload_all_models_to_s3(source_dir="models_for_deploy", s3_prefix="model"):
    sess = Session()
    bucket = sess.default_bucket()
    s3_paths = {}

    for filename in os.listdir(source_dir):
        if filename.endswith(".model.tar.gz"):
            local_path = os.path.join(source_dir, filename)
            print(f"☁️ Uploading {filename} → s3://{bucket}/{s3_prefix}/{filename}")
            
            s3_uri = sess.upload_data(
                path=local_path,
                key_prefix=s3_prefix
            )
            s3_paths[filename] = s3_uri
            print(f"✅ Uploaded to: {s3_uri}")

    return s3_paths
    
uploaded_model_uris = upload_all_models_to_s3()

☁️ Uploading United_States.model.tar.gz → s3://sagemaker-us-west-2-986030204467/model/United_States.model.tar.gz
✅ Uploaded to: s3://sagemaker-us-west-2-986030204467/model/United_States.model.tar.gz
☁️ Uploading Thailand.model.tar.gz → s3://sagemaker-us-west-2-986030204467/model/Thailand.model.tar.gz
✅ Uploaded to: s3://sagemaker-us-west-2-986030204467/model/Thailand.model.tar.gz
☁️ Uploading ThailandV2.model.tar.gz → s3://sagemaker-us-west-2-986030204467/model/ThailandV2.model.tar.gz
✅ Uploaded to: s3://sagemaker-us-west-2-986030204467/model/ThailandV2.model.tar.gz
☁️ Uploading United_StatesV2.model.tar.gz → s3://sagemaker-us-west-2-986030204467/model/United_StatesV2.model.tar.gz
✅ Uploaded to: s3://sagemaker-us-west-2-986030204467/model/United_StatesV2.model.tar.gz
☁️ Uploading United_StatesV3.model.tar.gz → s3://sagemaker-us-west-2-986030204467/model/United_StatesV3.model.tar.gz
✅ Uploaded to: s3://sagemaker-us-west-2-986030204467/model/United_StatesV3.model.tar.gz
☁️ Uploading Unit

In [28]:
uploaded_model_uris

{'United_States.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/United_States.model.tar.gz',
 'Thailand.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/Thailand.model.tar.gz',
 'ThailandV2.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/ThailandV2.model.tar.gz',
 'United_StatesV2.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/United_StatesV2.model.tar.gz',
 'United_StatesV3.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/United_StatesV3.model.tar.gz',
 'United_StatesV4.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/United_StatesV4.model.tar.gz',
 'Argentina.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/Argentina.model.tar.gz',
 'Brunei_Darussalam.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/Brunei_Darussalam.model.tar.gz',
 'Brazil.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/Brazil.model.tar.gz',
 'Australia.model.tar.gz': 's3://sagemaker-us-west-2-986030204467/model/A

In [29]:
uploaded_model_uris['United_StatesV4.model.tar.gz']

's3://sagemaker-us-west-2-986030204467/model/United_StatesV4.model.tar.gz'

In [30]:
%%time
#sm_model = Model(model_data=model_data, framework_version=tf_framework_version,role=role)
#uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type)

from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(model_data=uploaded_model_uris['United_StatesV4.model.tar.gz'], role=role, framework_version='2.18.0',
                        sagemaker_session=sess)
#uncompiled_predictor = model.deploy(initial_instance_count=1, 
#                                    instance_type=instance_type,image_uri=image_uri)

CPU times: user 0 ns, sys: 4.38 ms, total: 4.38 ms
Wall time: 3.7 ms


In [31]:
from sagemaker.multidatamodel import MultiDataModel
from time import gmtime, strftime
DATA_PREFIX = 'pisa2022'
MULTI_MODEL_ARTIFACTS='multi_model_artifacts'
ENDPOINT_INSTANCE_TYPE='ml.t2.medium'
ENDPOINT_NAME = f'mme-pisa-2022-{strftime("%Y-%m-%d-%H-%M-%S", gmtime())}'
MODEL_NAME = ENDPOINT_NAME
# This is where our MME will read models from on S3.
model_data_prefix = f"s3://{bucket}/{DATA_PREFIX}/{MULTI_MODEL_ARTIFACTS}/"
model_staging_prefix = f"s3://{bucket}/model/"

In [32]:
mme = MultiDataModel(
    name=MODEL_NAME,
    model_data_prefix=model_data_prefix,
    model=model,  # passing our model - passes container image needed for the endpoint
    sagemaker_session=sess
)

In [33]:
def register_models_in_mme(mme, model_s3_dict):
    """
    Given an MME object and a dictionary {filename: s3_uri}, register all models.
    """
    for filename, s3_uri in model_s3_dict.items():
        print(f"📦 Registering {filename} in MME...")
        mme.add_model(
            model_data_source=s3_uri,
            model_data_path=filename  # This is the TargetModel name
        )
        print(f"✅ Registered: {filename}")

register_models_in_mme(mme, uploaded_model_uris)

📦 Registering United_States.model.tar.gz in MME...
✅ Registered: United_States.model.tar.gz
📦 Registering Thailand.model.tar.gz in MME...
✅ Registered: Thailand.model.tar.gz
📦 Registering ThailandV2.model.tar.gz in MME...
✅ Registered: ThailandV2.model.tar.gz
📦 Registering United_StatesV2.model.tar.gz in MME...
✅ Registered: United_StatesV2.model.tar.gz
📦 Registering United_StatesV3.model.tar.gz in MME...
✅ Registered: United_StatesV3.model.tar.gz
📦 Registering United_StatesV4.model.tar.gz in MME...
✅ Registered: United_StatesV4.model.tar.gz
📦 Registering Argentina.model.tar.gz in MME...
✅ Registered: Argentina.model.tar.gz
📦 Registering Brunei_Darussalam.model.tar.gz in MME...
✅ Registered: Brunei_Darussalam.model.tar.gz
📦 Registering Brazil.model.tar.gz in MME...
✅ Registered: Brazil.model.tar.gz
📦 Registering Australia.model.tar.gz in MME...
✅ Registered: Australia.model.tar.gz
📦 Registering Cambodia.model.tar.gz in MME...
✅ Registered: Cambodia.model.tar.gz
📦 Registering Austria.mo

In [34]:
from sagemaker.tensorflow.serving import Model
image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.18.0-cpu-py310-ubuntu20.04-ec2'

In [54]:
predictor = mme.deploy(
    initial_instance_count=1, instance_type=ENDPOINT_INSTANCE_TYPE, endpoint_name=ENDPOINT_NAME, image_uri=image_uri
)

-----!

In [35]:
list(mme.list_models())

['',
 'Albania.model.tar.gz',
 'Argentina.model.tar.gz',
 'Australia.model.tar.gz',
 'Austria.model.tar.gz',
 'Belgium.model.tar.gz',
 'Brazil.model.tar.gz',
 'Brunei_Darussalam.model.tar.gz',
 'Bulgaria.model.tar.gz',
 'Cambodia.model.tar.gz',
 'Canada, Mexico, United_States.model.tar.gz',
 'Canada.model.tar.gz',
 'Chile.model.tar.gz',
 'Colombia.model.tar.gz',
 'Costa_Rica, El_Salvador, Guatemala, Panama.model.tar.gz',
 'Costa_Rica.model.tar.gz',
 'Croatia.model.tar.gz',
 'Czech_Republic.model.tar.gz',
 'Denmark.model.tar.gz',
 'Dominican_Republic.model.tar.gz',
 'El_Salvador.model.tar.gz',
 'Indonesia.model.tar.gz',
 'Japan, Korea, Taiwan.model.tar.gz',
 'Kazakhstan.model.tar.gz',
 'Latvia.model.tar.gz',
 'Lithuania.model.tar.gz',
 'Malaysia.model.tar.gz',
 'Malta.model.tar.gz',
 'Mexico.model.tar.gz',
 'Mongolia.model.tar.gz',
 'Montenegro.model.tar.gz',
 'Morocco.model.tar.gz',
 'Netherlands.model.tar.gz',
 'New_Zealand.model.tar.gz',
 'North_Macedonia.model.tar.gz',
 'Norway.mode

In [35]:
from sagemaker.s3 import s3_path_join
full_path = s3_path_join(model_staging_prefix, model_archive)
print("Full model path:", full_path)

NameError: name 'model_archive' is not defined

In [198]:
!aws s3 ls s3://sagemaker-us-west-2-986030204467/model/United_States.model.tar.gz

2025-03-23 18:05:26     156293 United_States.model.tar.gz


In [199]:
import logging
boto3.set_stream_logger('botocore', level='ERROR')

In [200]:
mme.add_model(model_data_source=model_staging_prefix+model_archive, model_data_path=model_archive)

's3://sagemaker-us-west-2-986030204467/pisa2022/multi_model_artifacts/United_States.model.tar.gz'

In [201]:
list(mme.list_models())

['', 'United_States.model.tar.gz']

### Step 5. Deploy the trained model

In [202]:
#from sagemaker.tensorflow.serving import Model
#instance_type = 'ml.c5.xlarge'
#image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.18.0-cpu-py310-ubuntu20.04-ec2'

In [203]:
#pip install -U sagemaker

In [204]:
#%%time
#sm_model = Model(model_data=model_data, framework_version=tf_framework_version,role=role)
#uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type)

#from sagemaker.tensorflow import TensorFlowModel

#model = TensorFlowModel(model_data=model_data, role=role, framework_version='2.18.0')
#uncompiled_predictor = model.deploy(initial_instance_count=1, 
#                                    instance_type=instance_type,image_uri=image_uri)

In [205]:
!tar -tzf United_States.model.tar.gz

tar: Removing leading `/' from member names
/
.ipynb_checkpoints/
1/
1/.ipynb_checkpoints/
1/assets/
1/fingerprint.pb
1/saved_model.pb
1/variables/
1/variables/variables.data-00000-of-00001
1/variables/variables.index


### Step 6. Invoke the endpoint

#### Invoke the SageMaker endpoint from the notebook

In [206]:
# The sample model expects an input of shape [1,50]
data = np.random.randn(1, 20)
data.shape

(1, 20)

In [207]:
predictor.predict(data,  initial_args={'TargetModel': 'United_States.model.tar.gz'})

{'error': 'JSON Value: [[-0.9468745101282406, 0.210982268937443, 0.829047295759814, -0.08553047460908296, 0.35320924351931157, -0.2959219693541786, -1.1321148898031524, -0.4347980390589598, -0.9672111781742968, -0.4698378242986649, 1.7123206100452424, -0.8276450351371874, -0.17478181252612383 Is not object'}

In [None]:
import boto3

# Initialize the SageMaker runtime client
sagemaker_runtime = boto3.client("sagemaker-runtime")

# Define CSV input (must match your model input shape)
csv_input = "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0"  # Example 10-feature row

# Invoke the endpoint
response = sagemaker_runtime.invoke_endpoint(
    EndpointName="tensorflow-inference-2025-03-19-03-19-31-180",
    ContentType="text/csv",  # Important: CSV format
    Body=csv_input
)

# Parse the response
print(response["Body"].read().decode("utf-8"))

#### Compile model using SageMaker Neo

[SageMaker Neo](https://aws.amazon.com/sagemaker/neo/) makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.

In [None]:
instance_family = 'ml_c5'
framework = 'tensorflow'
compilation_job_name = 'keras-compile'
# output path for compiled model artifact
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)
data_shape = {'inputs':[1, data.shape[0], data.shape[1]]}

In [None]:
optimized_estimator = sm_model.compile(target_instance_family=instance_family,
                                         input_shape=data_shape,
                                         job_name=compilation_job_name,
                                         role=role,
                                         framework=framework,
                                         framework_version=tf_framework_version,
                                         output_path=compiled_model_path
                                        )

In [None]:
optimized_predictor = optimized_estimator.deploy(initial_instance_count = 1, instance_type = instance_type)

#### Invoke optimized SageMaker endpoint

In [None]:
optimized_predictor.predict(data)

### Step 7. Clean up

To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.

In [None]:
uncompiled_predictor.delete_endpoint()

In [None]:
optimized_predictor.delete_endpoint()

### Conclusion

In this blog post, we demonstrated converting a Keras model to TensorFlow SavedModel format, deploying a trained model to a SageMaker Endpoint, and compiling the same trained model using SageMaker Neo to get better performance. Using Amazon SageMaker, you can take a trained model and in a few lines of code have a scalable, managed inference deployment. This gives you the flexibility to use your existing model training workflows, while easily deploying trained models to production with all the benefits and optimizations offered by a managed platform.