# Customizing SageMaker PyTorch Framework Container

SageMaker Framework containers support installing dependencies from a `requirements.txt` file submitted with your scripts: But this might not be desirable if:

- You re-use the same set of dependencies very often in many jobs (since the pip install happens on billable instance time), or
- You want to disable network access and control which packages are available in the environment.

While you could build [fully custom containers](https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/custom-training-containers) and even implement the same "framework" pattern in your own images, if the goal is just to add a few libraries on top of the SageMaker provided base - there's no reason to re-invent the wheel!

In [None]:
# Python Built-Ins:
import os
import shutil
from string import Template
import subprocess

# External Dependencies:
import boto3
import sagemaker

botosess = boto3.Session()
account_id = botosess.client("sts").get_caller_identity()["Account"]
region = botosess.region_name
smsess = sagemaker.Session(boto_session=botosess)

## Get parent AWS images

For simple library customizations, we'll just derive containers `FROM` the SageMaker-maintained framework images for CPU (set by `instance_type`) training and inference:

In [None]:
def get_image_uri(scope="training", instance_type="ml.m5.large"):
    return sagemaker.image_uris.retrieve(
        "pytorch",
        region=botosess.region_name,
        version="1.7",
        py_version="py3",
        instance_type=instance_type,
        accelerator_type=None,
        image_scope=scope,
        container_version=None,
        #distribution=None,
        #base_framework_version=None,
    )

train_parent_uri = get_image_uri(scope="training", instance_type="ml.m5.large")
print(f"Training: {train_parent_uri}")
inf_parent_uri = get_image_uri(scope="inference", instance_type="ml.m5.large")
print(f"Serving: {inf_parent_uri}")

## Common setup

This docker template assumes and uses a requirements.txt - so we'll just copy in the one from the `src/` folder:

In [None]:
shutil.copy2("src/requirements.txt", "container/requirements.txt")

Log in to ECR:

In [None]:
# Our account:
print(f"Logging in to our account {account_id} ({region})...")
our_docker_registry = f"{account_id}.dkr.ecr.{region}.amazonaws.com"
!aws ecr get-login-password | docker login --username AWS --password-stdin $our_docker_registry
# Training & inference are always in same account I think:
print(f"Logging in to {train_parent_uri})")
parent_docker_registry = train_parent_uri.partition("/")[0]
!aws ecr get-login-password | docker login --username AWS --password-stdin $parent_docker_registry

Create the ECR repository:

In [None]:
docker_repo_name = "sagemaker-custom"
our_docker_repo_uri = f"{our_docker_registry}/{docker_repo_name}"

!aws ecr create-repository --repository-name $docker_repo_name

## Build images

The provided template Dockerfile `Dockerfile.tpl` inherits `FROM` some base image and simply installs the contents of requirements.txt... So we'll build separate custom images for training and inference:

In [None]:
def init_dockerfile(parent):
    with open("container/Dockerfile.tpl", "r") as ftmp:
        docker_template = Template(ftmp.read())
        with open("container/Dockerfile", "w") as fdocker:
            fdocker.write(docker_template.substitute({
                "BASE_IMAGE": parent,
            }))

In [None]:
# Build training:
custom_training_uri = f"{our_docker_repo_uri}:training-latest"
init_dockerfile(train_parent_uri)
!cd container && docker build -t custom-pytorch-training -t $custom_training_uri .

In [None]:
# Build inference:
custom_inference_uri = f"{our_docker_repo_uri}:inference-latest"
init_dockerfile(inf_parent_uri)
!cd container && docker build -t custom-pytorch-inference -t $custom_inference_uri .

In [None]:
!docker images

## Push containers to ECR

In [None]:
print(f"Pushing to {custom_training_uri}")
!docker push $custom_training_uri
%store custom_training_uri

print(f"Pushing to {custom_inference_uri}")
!docker push $custom_inference_uri
%store custom_inference_uri

Our customized training and inference container URIs should now be ready to use!

Use e.g. via the `image_uri` override parameters in [Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html#sagemaker.estimator.Estimator) and [Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html#sagemaker.model.Model) constructors (inherited by e.g. `PyTorch` and `PyTorchModel`)