# Register an LLM Base Model with SageMaker Model Registry

## Overview
Large Language models, such as [Llama2](https://ai.meta.com/llama/) from Meta comes with a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Depending on the parameter count, and the floating point precisions used for the model weights, the total model size of these LLMs could be very large. For instance, a llama2-70b model with fp32 could have about 280GB in model size. Therefore, downloading weights from the public internet, such as Huggingface hub could be slow and inefficient. The inefficiency is magnified even more when multiple team members working on projects that use the same base model. Another challenge is on how to organize and manage these open-souce LLMs effective within the organization. 

## Proposed Approach 
In this notebook, we leverage SageMaker Model Registry to store the weights of the base LLM models. SageMaker model registry is a fully managed model repository used to store and version trained machine learning (ML) models at scale. When we finetune the base models, we could easily use the base model group for more efficient download. SageMaker model registry gives organization a better model management tool that helps them organize and manage model version of open-source LLMs. Additionally, with the recent support for Model Registry Collections, you can use Collections to group registered models that are related to each other and organize them in hierarchies to improve model discoverability at scale. Here's a diagram that shows SageMaker Model Registry with collection support:

![sagemaker model registry](images/model-registry-collection.png)


First, we would install git lfs and initialize it to allow model weights to be downloaded from Huggingface Hub directly.

In [2]:
!sudo apt update && sudo apt install git-lfs -y 
!git lfs install --skip-repo

Hit:1 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done[33m[33m[33m
Building dependency tree... Done
Reading state information... Done
7 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 7 not upgraded.
Need to get 3503 kB of archives.
After this operation, 10.4 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 git-lfs amd64 3.0.2-1ubuntu0.2 [3503 kB]
Fetched 3503 kB in 0s (17.5 MB/s)[33m
debconf: delaying package configuration, since apt-utils is not installed

7[0;23r8[1ASelecting previ

Import all the required packages for this notebook

In [3]:
import os
import sagemaker
from sagemaker.collection import Collection
import boto3

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


Instantiate a new SageMaker session and define the variables

In [4]:
sm_session = sagemaker.session.Session()
default_bucket = sm_session.default_bucket()
sm_client = boto3.client("sagemaker")
role = sagemaker.get_execution_role()

In [5]:
model_id = "NousResearch/Llama-2-7b-chat-hf" # Change this value to any other model on Huggingfae Hub.
base_model_s3_save_loc=f"s3://{default_bucket}/data/{model_id}/basemodel"

git clone the repository from huggingface hub without model weights. 

In [17]:
role

'arn:aws:iam::866824485776:role/service-role/AmazonSageMaker-ExecutionRole-20240725T121088'

In [6]:
!GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/{model_id}

Cloning into 'Llama-2-7b-chat-hf'...
remote: Enumerating objects: 32, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 32 (delta 4), reused 0 (delta 0), pack-reused 21 (from 1)[K
Unpacking objects: 100% (32/32), 496.75 KiB | 5.98 MiB/s, done.


In [7]:
model_name = os.path.basename(model_id)

## Download Model Weight
To download model weight, run `git clone` with `lfs` option for downloading large files. 
In our example, we only download the safetensors model weights, and not the torch weights. That would save us some time.

## SafeTensors
At a high level, safetensors is a safe and fast file format for storing and loading tensors. Typically, PyTorch model weights are saved or pickled into a .bin file with Python’s pickle utility. However, pickle is not secure and pickled files may contain malicious code that can be executed. safetensors is a secure alternative to pickle, making it ideal for sharing model weights.


In [8]:
!cd {model_name} && git lfs pull --include "*.safetensors"

Downloading LFS objects: 100% (2/2), 14 GB | 88 MB/s                            

At the time of this writing, SageMaker Model Registry requires model weights to be converted into a `tar.gz` file. The following cell creates the `tar.gz` files with the required model weight.

*Note:* Due to the sheer volume of the model weight, creating a `tar.gz` file could take some time. In our experiment, the process takes about 35 minutes.

In [9]:
%%time
!cd {model_name} && rm -rf .git* && tar -cvzf ../model.tar.gz .

./
./LICENSE.txt
./README.md
./USE_POLICY.md
./added_tokens.json
./config.json
./generation_config.json
./model-00001-of-00002.safetensors
./model-00002-of-00002.safetensors
./model.safetensors.index.json
./pytorch_model-00001-of-00003.bin
./pytorch_model-00002-of-00003.bin
./pytorch_model-00003-of-00003.bin
./pytorch_model.bin.index.json
./special_tokens_map.json
./tokenizer.json
./tokenizer.model
./tokenizer_config.json
CPU times: user 36.4 s, sys: 3.52 s, total: 39.9 s
Wall time: 39min 18s


### Upload the model artifacts to S3 bucket.

In [10]:
%%time
model_data_uri = sagemaker.s3.S3Uploader.upload(
    local_path="./model.tar.gz",
    desired_s3_uri=base_model_s3_save_loc,
)
print(model_data_uri)

s3://sagemaker-us-east-1-866824485776/data/NousResearch/Llama-2-7b-chat-hf/basemodel/model.tar.gz
CPU times: user 59.7 s, sys: 59.8 s, total: 1min 59s
Wall time: 1min 33s


## SageMaker Model Registry Collection
There are 2 ways that you could create a model regisgtry collection: 

1. Use SageMaker Studio UI
2. Use SageMaker Python SDK

We'll use the SageMaker Python SDK to create a collection in the notebook

In [11]:
from sagemaker.collection import Collection
model_collector = Collection(sagemaker_session=sm_session)

# Step 1
We'll first create a model package group for the base LLM model. After the model package group is created, we can add the base model as a version. 

In [12]:
# Model Package Group Vars
base_model_group_name = f"{model_id.replace('/', '-')}"
base_model_group_desc = f"Source: https://huggingface.co/{model_id}"
base_tags = [
    { 
        "Key": "modelType",
        "Value": "BaseModel"
    },
    { 
        "Key": "fineTuned",
        "Value": "False"
    },
    { 
        "Key": "sourceDataset",
        "Value": "None"
    }
]

model_package_group_input_dict = {
    "ModelPackageGroupName" : base_model_group_name,
    "ModelPackageGroupDescription" : base_model_group_desc,
    "Tags": base_tags
}
create_model_pacakge_group_response = sm_client.create_model_package_group(
    **model_package_group_input_dict
)
print(f'Created ModelPackageGroup Arn : {create_model_pacakge_group_response["ModelPackageGroupArn"]}')

base_model_pkg_group_name = create_model_pacakge_group_response["ModelPackageGroupArn"]

Created ModelPackageGroup Arn : arn:aws:sagemaker:us-east-1:866824485776:model-package-group/NousResearch-Llama-2-7b-chat-hf


# Step 2
Register a model version for the base model using the model package group created in the previous step.

In [13]:
from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.28.1',
    pytorch_version='2.0.0',
    py_version='py310',
    model_data=model_data_uri,
    role=role
)

In [14]:
_response = huggingface_model.register(
    content_types=["application/json"],
    response_types=["application/json"],
    inference_instances=[
        "ml.p2.16xlarge",
        "ml.p3.16xlarge",
        "ml.g4dn.4xlarge",
        "ml.g4dn.8xlarge",
        "ml.g4dn.12xlarge",
        "ml.g4dn.16xlarge",
    ],
    transform_instances=[
        "ml.p2.16xlarge", 
        "ml.p3.16xlarge",
        "ml.g4dn.4xlarge", 
        "ml.g4dn.8xlarge",
        "ml.g4dn.12xlarge",
        "ml.g4dn.16xlarge",
    ],
    model_package_group_name=base_model_pkg_group_name,
    approval_status="Approved"
)

Store the base model version package ARN for the next step

In [15]:
%store base_model_pkg_group_name

Stored 'base_model_pkg_group_name' (str)


# Step 3
In this step, we'll first create a new SageMaker Model Registry Collection using SageMaker python SDK. 
After the collection is created, we'll associate the model version created in the previous step with this collection. 

In [19]:
#ensure to add policy for IAM role
# https://docs.aws.amazon.com/sagemaker/latest/dg/modelcollections-permissions.html

In [23]:
# create model collection
model_group_collection = f"{base_model_group_name}-collection"
base_collection = model_collector.create(
    collection_name=model_group_collection
)

ValueError: Collection with the given name already exists

In [25]:
_response = model_collector.add_model_groups(
    collection_name=base_collection["Arn"],
    model_groups=[base_model_pkg_group_name]
)

print(f"Model collection creation status: {_response}")

Model collection creation status: {'added_groups': ['arn:aws:sagemaker:us-east-1:866824485776:model-package-group/NousResearch-Llama-2-7b-chat-hf'], 'failure': []}
