# Hands-On Exercise: Recommendation System with SageMaker Factorization Machines

In this exercise, you will **build a basic recommendation system** using **Amazon SageMaker’s built-in Factorization Machines (FM) algorithm**. Recommendation systems are commonly used for **product suggestions, content ranking, and personalization**, where the goal is to predict how likely a user is to interact with an item.

You will:
<ul>
<li>Launch Amazon SageMaker Studio and start a notebook</li>
<li>Create a user–item interaction dataset and generate 1,000 synthetic records</li>
<li>Convert interaction data into a sparse matrix and RecordIO-Protobuf format</li>
<li>Train a <b>binary classification Factorization Machines model</b></li>
<li>Deploy the model to a SageMaker real-time endpoint</li>
<li>Send user–item pairs for inference and interpret predicted interaction scores for ranking</li>
</ul>

## 1. Launch Amazon SageMaker Studio

### 1.1 Navigate to Amazon SageMaker AI and Click SageMaker Studio on the left side.

If no domain exists:

Choose Create SageMaker Domain <br>
Select Set up for single user<br>
Accept default settings for networking and storage.<br>
Wait for domain provisioning to complete<br>
Modify the user profile to make sure that the IAM role has access to S3 bucket. Or, change the profile to make sure that the IAM role has access to the bucket.

⚠️ Cost Note: SageMaker Studio resources are not free. Make sure to shut down the endpoint and Studio apps after completing the lab.

### 1.2 Create a New Notebook
Choose python 3 notebook.

### 1.3 Install Required Dependencies
Run the following cell once to ensure required libraries are available:

In [2]:
!pip install -q pandas scipy sagemaker

## 2. Generate 1000 Synthetic Interaction Records

To make the model more realistic, generate synthetic interactions:

In [5]:
import numpy as np
import pandas as pd

np.random.seed(42)

users = np.arange(100, 200)
items = np.arange(2000, 2100)

records = []
for _ in range(1000):
    records.append([
        np.random.choice(users),
        np.random.choice(items),
        np.random.choice([0, 1], p=[0.99, 0.01])
    ])

synthetic_df = pd.DataFrame(records, columns=["user_id", "item_id", "interaction"])
synthetic_df.to_csv("user_item_interactions.csv", index=False)
synthetic_df.head()


Unnamed: 0,user_id,item_id,interaction
0,151,2092,0
1,171,2060,0
2,182,2086,0
3,187,2099,0
4,102,2021,0


## 3. Convert Data to Sparse Matrix Format

Factorization Machines require numeric, sparse feature vectors.

### 3.1 Encode Users and Items

In [6]:
from scipy.sparse import csr_matrix

df = pd.read_csv("user_item_interactions.csv")

user_map = {u: i for i, u in enumerate(df.user_id.unique())}
item_map = {i: j for j, i in enumerate(df.item_id.unique())}

df["user_idx"] = df.user_id.map(user_map)
df["item_idx"] = df.item_id.map(item_map)

num_users = len(user_map)
num_items = len(item_map)


### 3.2 Build the Sparse Feature Matrix

Each row represents a user–item pair with one-hot encoded features:

In [1]:
rows = np.arange(len(df))
cols = np.concatenate([df.user_idx, df.item_idx + num_users])
data = np.ones(len(cols))

X = csr_matrix(
    (data, (np.concatenate([rows, rows]), cols)),
    shape=(len(df), num_users + num_items)
)

X = X.astype(np.float32)
y = df.interaction.values.astype("float32")


NameError: name 'np' is not defined

## 4. Convert Data to RecordIO-Protobuf Format

SageMaker’s FM algorithm requires RecordIO-Protobuf.


In [8]:
import sagemaker
from sagemaker.amazon.common import write_spmatrix_to_sparse_tensor
from sklearn.model_selection import train_test_split

sess = sagemaker.Session()

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

with open("train.recordio", "wb") as f:
    write_spmatrix_to_sparse_tensor(f, X_train, y_train)

with open("test.recordio", "wb") as f:
    write_spmatrix_to_sparse_tensor(f, X_test, y_test)

sess.upload_data(
    "train.recordio",
    bucket="knodax-ml-specialty-lab-exercises",
    key_prefix="recommendation-system/train"
)

sess.upload_data(
    "test.recordio",
    bucket="knodax-ml-specialty-lab-exercises",
    key_prefix="recommendation-system/test"
)

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


's3://knodax-ml-specialty-lab-exercises/recommendation-system/test/test.recordio'

## 5. Train the Factorization Machines Model

### 5.1 Configure the FM Estimator

In [9]:
import sagemaker
from sagemaker.estimator import Estimator
from sagemaker.image_uris import retrieve

role = sagemaker.get_execution_role()
image_uri = retrieve("factorization-machines", sess.boto_region_name)

fm = Estimator(
    image_uri=image_uri,
    role=role,
    instance_count=1,
    instance_type="ml.m5.large",
    output_path="s3://knodax-ml-specialty-lab-exercises/recommendation-system",
    sagemaker_session=sess
)

fm.set_hyperparameters(
    feature_dim=num_users + num_items,
    predictor_type="binary_classifier",
    epochs=50,
    mini_batch_size=64,
    num_factors=32
)


### 5.2 Start the Training Job

In [11]:
train_s3 = "s3://knodax-ml-specialty-lab-exercises/recommendation-system/train/"
test_s3  = "s3://knodax-ml-specialty-lab-exercises/recommendation-system/test/"

fm.fit({
    "train": sagemaker.TrainingInput(train_s3, content_type="application/x-recordio-protobuf"),
    "test":  sagemaker.TrainingInput(test_s3,  content_type="application/x-recordio-protobuf"),
})


INFO:sagemaker.telemetry.telemetry_logging:SageMaker Python SDK will collect telemetry to help us better understand our user's needs, diagnose issues, and deliver additional features.
To opt out of telemetry, please disable via TelemetryOptOut parameter in SDK defaults config. For more information, refer to https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk.
INFO:sagemaker:Creating training-job with name: factorization-machines-2025-12-14-19-03-00-366


2025-12-14 19:03:02 Starting - Starting the training job...
2025-12-14 19:03:16 Starting - Preparing the instances for training...
2025-12-14 19:03:38 Downloading - Downloading input data...
2025-12-14 19:04:14 Downloading - Downloading the training image..............[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
  if num_device is 1 and 'dist' not in kvstore:[0m
[34m[12/14/2025 19:06:46 INFO 140527535621952] Reading default configuration from /opt/amazon/lib/python3.8/site-packages/algorithm/resources/default-conf.json: {'epochs': 1, 'mini_batch_size': '1000', 'use_bias': 'true', 'use_linear': 'true', 'bias_lr': '0.1', 'linear_lr': '0.001', 'factors_lr': '0.0001', 'bias_wd': '0.01', 'linear_wd': '0.001', 'factors_wd': '0.00001', 'bias_init_method': 'normal', 'bias_init_sigma': '0.01', 'linear_init_method': 'normal', 'linear_init_sigma': '0.01', 'factors_init_method': 'normal', 'factors_init_sigma': '0.001', 'batch

## 6. Deploy the Model to a Real-Time Endpoint

In [12]:
predictor = fm.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.large"
)

INFO:sagemaker:Creating model with name: factorization-machines-2025-12-14-19-08-54-413
INFO:sagemaker:Creating endpoint-config with name factorization-machines-2025-12-14-19-08-54-413
INFO:sagemaker:Creating endpoint with name factorization-machines-2025-12-14-19-08-54-413


---------!

## 7. Generate Recommendations (Inference)

### 7.1 Send User–Item Pairs for Prediction

import boto3
import json
import datetime

# -------- Config --------
endpoint_name = "factorization-machines-2025-12-14-19-08-54-413"
region = "us-east-1"

bucket = "knodax-ml-specialty-lab-exercises"
prefix = "recommendation-system/output"

# -------- Clients --------
rt = boto3.client("sagemaker-runtime", region_name=region)
s3 = boto3.client("s3", region_name=region)

# -------- Invoke Endpoint --------
with open("train.recordio", "rb") as f:
    payload = f.read()

resp = rt.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/x-recordio-protobuf",
    Accept="application/json",
    Body=payload
)

raw_response = resp["Body"].read().decode("utf-8")

# -------- Prepare output (safe JSON) --------
try:
    response_json = json.loads(raw_response)
except json.JSONDecodeError:
    response_json = {"raw_response": raw_response}

output_object = {
    "endpoint": endpoint_name,
    "timestamp_utc": datetime.datetime.utcnow().isoformat(),
    "response": response_json
}

# -------- Write to S3 --------
key = f"{prefix}/prediction-{datetime.datetime.utcnow().strftime('%Y%m%dT%H%M%SZ')}.json"

s3.put_object(
    Bucket=bucket,
    Key=key,
    Body=json.dumps(output_object, indent=2).encode("utf-8"),
    ContentType="application/json"
)

print("Prediction saved to:")
print(f"s3://{bucket}/{key}")


In [None]:
##%% md
## 8. Clean Up Resources
