# Deploy a Keras or Tensorflow model trained anywhere using Amazon SageMaker


Amazon SageMaker makes it easier for any developer or data scientist to build, train, and deploy machine learning (ML) models. While it’s designed to alleviate the undifferentiated heavy lifting from the full life cycle of ML models, Amazon SageMaker’s capabilities can also be used independently of one another; that is, models trained in Amazon SageMaker can be optimized and deployed outside of Amazon SageMaker including edge (mobile or IoT devices). Conversely, Amazon SageMaker can deploy and host pre-trained models such as model zoos or models trained locally by your team. 

In this notebook, we’ll demonstrate how to deploy a trained Keras (Tensorflow backend) model using Amazon SageMaker, taking advantage of Amazon SageMaker deployment features, such as selecting the type and number of instances, model compilation to improve inference latency, and autoscaling.

### Step 1. Set up

In the AWS Management Console, go to the Amazon SageMaker console. Choose Notebook Instances, and create a new notebook instance. Upload the current notebook and set the kernel to ``conda_tensorflow_p36``.

The get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

In [1]:
#!pip install --upgrade "sagemaker>=2"

In [2]:
from sagemaker import get_execution_role
from sagemaker import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [3]:
import sagemaker
print(sagemaker.__version__)

2.228.0


In [4]:
!pip install sagemaker==2.242.0 --upgrade

Collecting sagemaker==2.242.0
  Using cached sagemaker-2.242.0-py3-none-any.whl.metadata (16 kB)
Collecting sagemaker-core<2.0.0,>=1.0.17 (from sagemaker==2.242.0)
  Using cached sagemaker_core-1.0.25-py3-none-any.whl.metadata (4.9 kB)
Collecting mock<5.0,>4.0 (from sagemaker-core<2.0.0,>=1.0.17->sagemaker==2.242.0)
  Using cached mock-4.0.3-py3-none-any.whl.metadata (2.8 kB)
Using cached sagemaker-2.242.0-py3-none-any.whl (1.6 MB)
Using cached sagemaker_core-1.0.25-py3-none-any.whl (406 kB)
Using cached mock-4.0.3-py3-none-any.whl (28 kB)
Installing collected packages: mock, sagemaker-core, sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.228.0
    Uninstalling sagemaker-2.228.0:
      Successfully uninstalled sagemaker-2.228.0
Successfully installed mock-4.0.3 sagemaker-2.242.0 sagemaker-core-1.0.25


In [5]:
pip install tensorflow==2.18.0

Note: you may need to restart the kernel to use updated packages.


If you are running this locally, check your version of Tensorflow to prevent downstream framework errors.

In [6]:
import tensorflow as tf
print(tf.__version__)  # This notebook runs on TensorFlow 1.15.x or earlier

2025-03-21 16:27:47.685896: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-21 16:27:47.689492: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-21 16:27:47.700400: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742574467.719903    1098 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742574467.725709    1098 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-21 16:27:47.745448: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins

2.18.0


In [7]:
tf_framework_version = tf.__version__

Import necessary Python packages and install the version of h5py for compatibility with your Keras model.

In [8]:
# ref: https://github.com/keras-team/keras/issues/14265
!pip install "h5py==2.10.0"
import h5py
import numpy as np

Collecting h5py==2.10.0
  Using cached h5py-2.10.0.tar.gz (301 kB)
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: h5py
  Building wheel for h5py (setup.py) ... [?25lerror
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mpython setup.py bdist_wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[85 lines of output][0m
  [31m   [0m /opt/conda/lib/python3.11/site-packages/setuptools/__init__.py:94: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  [31m   [0m !!
  [31m   [0m 
  [31m   [0m         ********************************************************************************
  [31m   [0m         Requirements should be satisfied by a PEP 517 installer.
  [31m   [0m         If you are using pip, you can try `pip install --use-pep517`.
  [31m   [0m         ********************************************************************************
 

### Step 2. Load the Keras model using the json and weights file

If you saved your model in the TensorFlow ProtoBuf format, skip to "Step 4. Convert the TensorFlow model to an Amazon SageMaker-readable format.

Create a directory called ``keras_model``, download [hosted keras model](https://s3.amazonaws.com/aws-ml-blog/artifacts/keras-tensorflow-model-deployment/model.zip), and unzip the model.json and model-weights.h5 files to ``keras_model``.

In [9]:
#!mkdir keras_model

In [10]:
#!wget https://s3.amazonaws.com/aws-ml-blog/artifacts/keras-tensorflow-model-deployment/model.zip

In [11]:
#!unzip model.zip -d keras_model

In [12]:
#import os
#import tensorflow as tf
#import tensorflow.keras as keras
#from keras.models import model_from_json

#with open(os.path.join('keras_model', 'model.json'), 'r') as fp:
#    loaded_model_json = fp.read()
#loaded_model = model_from_json(loaded_model_json)

In [13]:
#loaded_model.load_weights('keras_model/model-weights.h5')

### Step 3. Export the Keras model to the TensorFlow ProtoBuf format

In [14]:
#from tensorflow.python.saved_model import builder
#from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
#from tensorflow.python.saved_model import tag_constants

In [15]:
# Note: This directory structure will need to be followed - see notes for the next section
#model_version = '1'
#export_dir = 'export/Servo/' + model_version

In [16]:
# Build the Protocol Buffer SavedModel at 'export_dir'
#builder = builder.SavedModelBuilder(export_dir)

In [17]:
# Create prediction signature to be used by TensorFlow Serving Predict API
#signature = predict_signature_def(
#    inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})

In [18]:
#session = tf.compat.v1.Session()
#init_op = tf.compat.v1.global_variables_initializer()
#session.run(init_op)
# Save the meta graph and variables
#builder.add_meta_graph_and_variables(
#    sess=session, tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature})
#builder.save()

### Step 4. Convert TensorFlow model to a SageMaker readable format

Move the TensorFlow exported model into a directory export\Servo\. SageMaker will recognize this as a loadable TensorFlow model. Your directory and file structure should look like:

In [19]:
model_path = 'export/Servo/1/'

In [20]:
!saved_model_cli show --all --dir {model_path}

2025-03-21 16:28:06.668112: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-21 16:28:06.671425: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-21 16:28:06.681856: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1742574486.699316    1177 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1742574486.704526    1177 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-21 16:28:06.722401: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins

####  Tar the entire directory and upload to S3

Spain
United_Arab_Emirates
Canada
Kazakhstan
Indonesia

In [21]:
import tarfile
model_archive = 'model.tar.gz'
with tarfile.open(model_archive, mode='w:gz') as archive:
    archive.add('export', recursive=True)

In [22]:
model_data = sess.upload_data(path=model_archive, key_prefix='model')

In [23]:
model_data

's3://sagemaker-us-west-2-986030204467/model/model.tar.gz'

### Step 5. Deploy the trained model

In [24]:
from sagemaker.tensorflow.serving import Model
instance_type = 'ml.c5.xlarge'
image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.18.0-cpu-py310-ubuntu20.04-ec2'

In [25]:
#pip install -U sagemaker

In [27]:
%%time
#sm_model = Model(model_data=model_data, framework_version=tf_framework_version,role=role)
#uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type)

from sagemaker.tensorflow import TensorFlowModel

model = TensorFlowModel(model_data=model_data, role=role, framework_version='2.18.0')
uncompiled_predictor = model.deploy(initial_instance_count=1, 
                                    instance_type=instance_type,image_uri=image_uri)

-------!CPU times: user 246 ms, sys: 38.6 ms, total: 285 ms
Wall time: 4min 2s


### Step 6. Invoke the endpoint

#### Invoke the SageMaker endpoint from the notebook

In [None]:
# The sample model expects an input of shape [1,50]
data = np.random.randn(1, 20)
data.shape

In [None]:
uncompiled_predictor.predict(data)

In [None]:
import boto3

# Initialize the SageMaker runtime client
sagemaker_runtime = boto3.client("sagemaker-runtime")

# Define CSV input (must match your model input shape)
csv_input = "0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0"  # Example 10-feature row

# Invoke the endpoint
response = sagemaker_runtime.invoke_endpoint(
    EndpointName="tensorflow-inference-2025-03-19-03-19-31-180",
    ContentType="text/csv",  # Important: CSV format
    Body=csv_input
)

# Parse the response
print(response["Body"].read().decode("utf-8"))

#### Compile model using SageMaker Neo

[SageMaker Neo](https://aws.amazon.com/sagemaker/neo/) makes it easy to compile pre-trained TensorFlow models and build an inference optimized container without the need for any custom model serving or inference code.

In [None]:
instance_family = 'ml_c5'
framework = 'tensorflow'
compilation_job_name = 'keras-compile'
# output path for compiled model artifact
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)
data_shape = {'inputs':[1, data.shape[0], data.shape[1]]}

In [None]:
optimized_estimator = sm_model.compile(target_instance_family=instance_family,
                                         input_shape=data_shape,
                                         job_name=compilation_job_name,
                                         role=role,
                                         framework=framework,
                                         framework_version=tf_framework_version,
                                         output_path=compiled_model_path
                                        )

In [None]:
optimized_predictor = optimized_estimator.deploy(initial_instance_count = 1, instance_type = instance_type)

#### Invoke optimized SageMaker endpoint

In [None]:
optimized_predictor.predict(data)

### Step 7. Clean up

To avoid incurring charges to your AWS account for the resources used in this tutorial, you need to delete the SageMaker Endpoint.

In [None]:
uncompiled_predictor.delete_endpoint()

In [None]:
optimized_predictor.delete_endpoint()

### Conclusion

In this blog post, we demonstrated converting a Keras model to TensorFlow SavedModel format, deploying a trained model to a SageMaker Endpoint, and compiling the same trained model using SageMaker Neo to get better performance. Using Amazon SageMaker, you can take a trained model and in a few lines of code have a scalable, managed inference deployment. This gives you the flexibility to use your existing model training workflows, while easily deploying trained models to production with all the benefits and optimizations offered by a managed platform.