## SageMaker + JFrog Artifactory Demo

Train in SageMaker, store artifacts in JFrog Artifactory with `frogml`, then serve on a SageMaker endpoint.

You will:

- train a model in SageMaker
- publish the ML artifacts to JFrog Artifactory
- test inference locally
- deploy a SageMaker endpoint

> Prerequisites: AWS credentials configured, `frogml` installed, and access to JFrog Artifactory.

### Table of contents

- Setup
- Training
- Publish artifacts to Artifactory
- Inference
- Deploy to a SageMaker endpoint
- Test the endpoint
- Cleanup

### Setup

Configure `frogml` for your JFrog instance:

`frogml config add --interactive`

Use an Artifactory repo for ML artifacts (for example `ml-models-local`).

In [None]:
import os

# JFrog settings used by training and inference
JF_URL = "https://<your-company>.jfrog.io"
JF_REPO = "ml-models-local"
JF_PROJECT = "sagemaker-demo"

# Optional: make available to local steps in this notebook
os.environ["JF_URL"] = JF_URL
os.environ["JF_REPO"] = JF_REPO
os.environ["JF_PROJECT"] = JF_PROJECT

print("JFrog config ready")

## Training

This launches a SageMaker training job. The training script saves artifacts locally, publishes them to Artifactory with `frogml`, and records the version for inference.

### Training configuration

Set the SageMaker execution role and region. The job uses AWS Secrets Manager for tokens and your JFrog settings.

In [None]:
from sagemaker.core.helper.session_helper import Session

sagemaker_session = Session()
role = "arn:aws:iam::<your-aws-account-id>:role/service-role/<your-sagemaker-execution-role>"
region = sagemaker_session.boto_region_name

print(f"Region: {region}")

In [None]:
from sagemaker.train.model_trainer import ModelTrainer, Mode
from sagemaker.train.configs import SourceCode, Compute
from sagemaker.core import image_uris

TRAINING_IMAGE = image_uris.retrieve(
    framework="pytorch",
    region=region,
    version="2.7.1",
    py_version="py312",
    instance_type="ml.t3.medium",
    image_scope="training"
)

source_code = SourceCode(
    source_dir="training",
    requirements="requirements.txt",
    entry_script="train.py",
)


env = {
    # Secrets are pulled in-container using AWS Secrets Manager
    "HF_TOKEN_SECRET_ID": "jfrog/hf_token",
    "JF_ACCESS_TOKEN_SECRET_ID": "jfrog/jf_token",

    # JFrog Artifactory target for model artifacts
    "JF_URL": JF_URL,
    "JF_REPO": JF_REPO,
    "JF_PROJECT": JF_PROJECT,

    # Optional: use HF remote in Artifactory for model downloads
    "HF_ENDPOINT": "https://<your-company>.jfrog.io/artifactory/api/huggingfaceml/<your-hf-remote>",
    "HF_HUB_DOWNLOAD_TIMEOUT": "86400",
    "HF_HUB_ETAG_TIMEOUT": "86400",
}

devops_assistant = ModelTrainer(
    sagemaker_session=sagemaker_session,
    training_image=TRAINING_IMAGE,
    hyperparameters="training/hyperparameters.json",
    training_mode=Mode.SAGEMAKER_TRAINING_JOB,
    source_code=source_code,
    base_job_name="qwen-05b-devops-finetuning",
    environment=env,
    compute=Compute(instance_type="ml.m4.xlarge", instance_count=1),
    role=role
)




devops_assistant.train()


### Publish artifacts to Artifactory

In `training/train.py`, artifacts are saved to `SM_MODEL_DIR` (or `./output` locally) and uploaded with `frogml.huggingface.log_model`.

1. save model + tokenizer to the output directory
2. call `frogml.huggingface.log_model` to push to Artifactory
3. use the logged version (timestamp if not set) for inference

## Inference

Load artifacts from Artifactory, test locally, then deploy to a SageMaker endpoint. Set `MODEL_VERSION` to the version from training.

### Define the model schema

The schema validates request/response shapes for serving.

In [None]:
from sagemaker.serve.builder.schema_builder import SchemaBuilder

# Create schema builder
sample_input = {"inputs": "What is Kubernetes?", "parameters": {"max_new_tokens": 100}}
sample_output = [{"generated_text": "Kubernetes is a container orchestration platform that simplifies the deployment, scaling, and management of containerized applications."}]
schema_builder = SchemaBuilder(sample_input, sample_output)

print("Schema builder created successfully!")

### Build the model locally

Quick check that the inference spec can download and load artifacts from Artifactory.

In [None]:
import os

from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.train.configs import SourceCode
from deployment.inference import DevopsAssistantInferenceSpec
from sagemaker.serve.utils.types import ModelServer
import uuid

# Configuration
MODEL_NAME_PREFIX = "devops-assistant"
ENDPOINT_NAME_PREFIX = "devops-assistant-endpoint"

# Generate unique identifiers
unique_id = str(uuid.uuid4())[:8]
model_name = f"{MODEL_NAME_PREFIX}-{unique_id}"
endpoint_name = f"{ENDPOINT_NAME_PREFIX}-{unique_id}"


# Create ModelBuilder
inference_env = {
    # Version or path in Artifactory produced by training
    "MODEL_VERSION": "2026-01-31-17-56-20-670",

    # JFrog Artifactory access
    "JF_ACCESS_TOKEN_SECRET_ID": "jfrog/jf_token",
    "JF_URL": JF_URL,
    "JF_REPO": JF_REPO,
    "JF_PROJECT": JF_PROJECT,
}
# Make env available during local build/pickling
os.environ.update(inference_env)  # for local testing

inference_spec = DevopsAssistantInferenceSpec()


In [None]:
from sagemaker.serve.model_builder import Mode

# Create ModelBuilder in LOCAL_CONTAINER mode
local_model_builder = ModelBuilder(
    inference_spec=inference_spec,
    #model_server=ModelServer.MMS,  # TorchServe/MMS for HF
    schema_builder=schema_builder,
    env_vars=inference_env,
    mode=Mode.IN_PROCESS,
)

# Build the model
local_model = local_model_builder.build(model_name=model_name)
print(f"Model Successfully Created: {local_model.model_name}")

### Run local inference

Serve in-process to test requests without building or deploying containers.

In [None]:
# Deploy locally in in-process mode
local_endpoint = local_model_builder.deploy_local(endpoint_name=endpoint_name)
print(f"Local Endpoint Successfully Created: {local_endpoint.endpoint_name}")
print("Note: This runs entirely in your Python process - no containers!")

In [None]:
# Test 1: Single prediction

# Text-generation payload (align with schema_builder parameters)
sample_input = {
    "inputs": "Explain Kubernetes in one paragraph.",
    "parameters": {
        "max_new_tokens": 100,
        "min_new_tokens": 20,
        "do_sample": True,
        "temperature": 0.8,
        "top_p": 0.95,
        "top_k": 50,
    },
}

response_1 = local_endpoint.invoke(
    body=sample_input,
    content_type="application/json"
)
print(f"Test 1 - Single prediction: {response_1.body}")

## Deploy to a SageMaker endpoint

Build a SageMaker model that loads Artifactory artifacts at startup using `inference_env`.

### Build the SageMaker model

Package the inference code and register the SageMaker model.

In [None]:

from sagemaker.core import image_uris

INFERENCE_IMAGE = image_uris.retrieve(
    framework="pytorch",
    region=region,
    version="2.7.1",
    py_version="py312",
    image_scope="inference",
    instance_type="ml.m5.xlarge",
)


sagemaker_model_builder = ModelBuilder(
    source_code=SourceCode(
        source_dir="deployment",
        requirements="requirements.txt",
        entry_script="inference.py",
    ),
    inference_spec=inference_spec,
    model_server=ModelServer.MMS,  # Multi Model Server for HuggingFace
    schema_builder=schema_builder,
    env_vars=inference_env,
    role_arn=role,
    image_uri=INFERENCE_IMAGE
)

# Build the model
core_model = sagemaker_model_builder.build(model_name=model_name)
print(f"Model Successfully Created: {core_model.model_name}")

### Deploy a SageMaker endpoint

Create a managed endpoint. Delete it when you are done to avoid ongoing costs.

In [None]:
# Deploy the model
core_endpoint = sagemaker_model_builder.deploy(endpoint_name=endpoint_name)
print(f"Endpoint Successfully Created: {core_endpoint.endpoint_name}")

### Test the endpoint

Send a sample request to confirm the endpoint is serving responses from the Artifactory-backed model.

In [None]:
import json

# Text-generation payload (align with schema_builder parameters)
sample_input = {
    "inputs": "Explain Kubernetes in one paragraph.",
    "parameters": {
        "max_new_tokens": 100,
        "min_new_tokens": 20,
        "do_sample": True,
        "temperature": 0.8,
        "top_p": 0.95,
        "top_k": 50,
    },
}

response_1 = core_endpoint.invoke(
    body=json.dumps(sample_input).encode("utf-8"),
    content_type="application/json"
)

print(f"Test 1 - Single prediction: {response_1.body}")

### Cleanup

Delete the SageMaker endpoint when you are done to avoid ongoing charges.