# **Step 1 : Custom / Pre-Trained HF Models: Packaging, Compressing, and Uploading to S3 for SageMaker Inference**

	•   Select and download a custom/pre-trained model from Hugging Face.
	•   Organize the model files into the directory structure required by SageMaker.
	•   Package the model directory into a .tar.gz file for SageMaker compatibility.
	•   Upload the tarball to your Amazon S3 bucket for use in SageMaker model deployment.

In [18]:
!pip install transformers torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu 
# cpu or cu121 (Replace cu121 with your CUDA version (check via !nvidia-smi)

Looking in indexes: https://download.pytorch.org/whl/cpu


In [1]:
import os
import json
import boto3
import sagemaker
import subprocess
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker import get_execution_role
from transformers import AutoModelForSequenceClassification, AutoTokenizer

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


  import pynvml  # type: ignore[import]


In [7]:
# -------------------------------
# Create Boto Clients
# -------------------------------

s3_client = boto3.client("s3")
sm = boto3.client("sagemaker")

In [None]:
# ----------------------------
# Config
# ----------------------------

hf_model_name = "<hf_model_repo>"   # Hugging Face Model Repo ID
s3_bucket = "<bucket_name>"  # S3 Bucket
tar_file_path = "model.tar.gz"
s3_key = os.path.basename(tar_file_path)

In [3]:
# ----------------------------
# Step 1: Download Own Custom Hugging Face Model
# ----------------------------

model_folder = "hf_model"
os.makedirs(model_folder, exist_ok=True)

model = AutoModelForSequenceClassification.from_pretrained(hf_model_name)
tokenizer = AutoTokenizer.from_pretrained(hf_model_name)

model.save_pretrained(model_folder)
tokenizer.save_pretrained(model_folder)
print(f"Downloaded Hugging Face Model '{hf_model_name}' to '{model_folder}'")

config.json:   0%|          | 0.00/634 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloaded Hugging Face Model 'omkarwazulkar/SentimentModel-V1.0' to 'hf_model'


In [4]:
# ----------------------------
# Step 2: Compress Model Folder To .tar
# -----------------------------

if os.path.exists(tar_file_path):
    os.remove(tar_file_path)

subprocess.run(
    ["tar", "-czvf", tar_file_path, "-C", model_folder, "."],
    check=True
)
print(f"Compressed Model Folder To '{tar_file_path}'")

./
./vocab.txt
./tokenizer_config.json
./config.json
./tokenizer.json
./model.safetensors
./special_tokens_map.json
Compressed Model Folder To 'model.tar.gz'


In [None]:
# ----------------------------
# Step 3: Upload .tar To S3
# ----------------------------

s3_client.upload_file(tar_file_path, s3_bucket, s3_key)
print(f"Uploaded model to s3://{s3_bucket}/{s3_key}")

In [None]:
# ----------------------------
# Step 4: Output S3 Location
# ----------------------------

model_data = f"s3://{s3_bucket}/{s3_key}"
print(f"Model Data S3 Location: {model_data}")

# **Step 2 : Building and Publishing the Model Inference Image from GitHub to Amazon ECR**
	
    •   Below a link to your GitHub repository: https://github.com/omkarwazulkar/hf-sagemaker-deploy
	•   Use the repository contents to build the Docker image through AWS CodeBuild.
	•   The files include (Dockerfile, buildspec.yml, requirements.txt, and inference.py)
	•   Create an Amazon ECR repository to store the inference image.
	•   Build the Docker image in CodeBuild and push the final image to the ECR repository.
	•   Ensure the built image includes all required components: 
            the inference handler script, 
            model loading logic, dependencies, 
            and runtime environment needed for SageMaker inference.

# **Step 3 : Configuring SageMaker, Deploying the Model, and Running Inference**

	•   Set up the required SageMaker session, execution role, and configuration parameters.
	•   Define the SageMaker Model using the ECR-hosted inference image and the S3 model artifact.
	•   Deploy the model to a SageMaker endpoint with the desired instance type and configuration.
	•   Create a Predictor object and prepare the inference payload.
	•   Send the payload to the endpoint and retrieve the prediction results.

In [8]:
# -------------------------------
# Config
# -------------------------------

AWS_REGION = "region"
ECR_IMAGE_URI = "************.dkr.ecr.us-east-1.amazonaws.com/<your_repo_name>:latest"
MODEL_S3_URI = "s3://<bucket-name>/model.tar.gz"
ENDPOINT_NAME = "sentiment-classification-endpoint"

In [9]:
# -------------------------------
# SageMaker Session & Role
# -------------------------------

sagemaker_session = sagemaker.Session()
role = get_execution_role()

In [10]:
# -------------------------------
# Define SageMaker Model
# -------------------------------

model = Model(
    image_uri=ECR_IMAGE_URI,
    model_data=MODEL_S3_URI,
    role=role,
    sagemaker_session=sagemaker_session,
    env={
        'SAGEMAKER_CONTAINER_LOG_LEVEL': '30',
        'SAGEMAKER_ENABLE_CLOUDWATCH_LOGGING': 'true'
    }
)

In [11]:
# -------------------------------
# Deploy Endpoint
# -------------------------------

try:
    predictor = model.deploy(
        instance_type="ml.m5.xlarge",
        initial_instance_count=1,
        endpoint_name=ENDPOINT_NAME
    )
    print(f"Endpoint '{ENDPOINT_NAME}' created successfully!")
except Exception as e:
    print(f"Failed to deploy endpoint: {e}")

-------!Endpoint 'sentiment-classification-endpoint' created successfully!


In [12]:
# -------------------------------
# Define Predictor
# -------------------------------

predictor = Predictor(endpoint_name=ENDPOINT_NAME, sagemaker_session=sagemaker_session)

In [13]:
# -------------------------------
# Define Request Payload
# -------------------------------

request_body = {
    "text": [
        "I loved this movie!",
        "This was awful.",
        "Amazing storyline and acting."
    ]
}

In [14]:
# -------------------------------
# Send Request To SageMaker Endpoint
# -------------------------------

response = predictor.predict(
    json.dumps(request_body),
    initial_args={'ContentType': 'application/json'}
)

In [15]:
# -------------------------------
# Decode Bytes -> String, Then Parse JSON
# -------------------------------

prediction = json.loads(response.decode("utf-8"))

In [16]:
# -------------------------------
# Map Numeric Predictions To Labels
# -------------------------------

label_map = {0: "Negative", 1: "Positive"}
predicted_labels = [label_map[p] for p in prediction['predictions']]
print("Predictions:", predicted_labels)

Predictions: ['Positive', 'Negative', 'Positive']


# **Step 4 : Cleaning Up SageMaker Resources**

	•   Delete the SageMaker endpoint to stop ongoing usage and avoid charges.
	•   Remove the associated endpoint configuration from SageMaker.
	•   Delete the SageMaker model to complete the cleanup process.

In [17]:
# -------------------------------
# Get Latest Endpoint
# -------------------------------

endpoints = sm.list_endpoints(MaxResults=1)["Endpoints"]
if endpoints:
    endpoint_name = endpoints[0]["EndpointName"]
    config_name = sm.describe_endpoint(EndpointName=endpoint_name)["EndpointConfigName"]
    model_name = sm.describe_endpoint_config(EndpointConfigName=config_name)["ProductionVariants"][0]["ModelName"]
    
    # -------------------------------
    # Delete Endpoint, Config, & Model
    # -------------------------------

    sm.delete_endpoint(EndpointName=endpoint_name)
    sm.delete_endpoint_config(EndpointConfigName=config_name)
    sm.delete_model(ModelName=model_name)
    print(f"Deleted endpoint, config, and model: {endpoint_name}")
else:
    print("No endpoints found to delete.")

Deleted endpoint, config, and model: sentiment-classification-endpoint
