Model Store - will be creating:
* Model Group that will track all experiments and deployment history for our model, 
* Model Package that will record information on a specific model experiment and deployment
* Model Card that contains qualitative information for anyone who needs to maintain the model after its initial development.

This assignment should be relatively easy. Feel free to use lab code or code from your final project to complete this assignment. We simply want to take an existing model and add it to our Model Store.

In [6]:
import boto3
import sagemaker
import pandas as pd
# import numpy as np

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [7]:
# initialize 
sess = sagemaker.Session()
bucket = sess.default_bucket()
role = sagemaker.get_execution_role()
region = boto3.Session().region_name
account_id = boto3.client("sts").get_caller_identity().get("Account")
boto_session = boto3.Session(region_name=region)

# sagemaker - understand the difference
sagemaker_client = boto_session.client(service_name="sagemaker", region_name=region)

#### Creating Model Group
Give your Model Group an informative name about what this group does, e.g. xgboost-breast-cancer-detection, 
and give it a brief but informative description of what this group does in a bit more detail (best practice is under ~250 chars).


In [None]:
# Creating model package group
model_package_group_name = 'test-lab-model-group'
model_package_group_description = 'This group will be used to test out storage of models'
response = sagemaker_client.create_model_package_group(
    ModelPackageGroupName=model_package_group_name,
    ModelPackageGroupDescription=model_package_group_description,
    Tags=[
        {
            'Key': 'MonthCreated',
            'Value': 'September'
        },
    ]
)

In [14]:
# Verify part 1
response = sagemaker_client.describe_model_package_group(
    ModelPackageGroupName=model_package_group_name
)
print(response)

{'ModelPackageGroupName': 'test-lab-model-group', 'ModelPackageGroupArn': 'arn:aws:sagemaker:us-east-1:936912055594:model-package-group/test-lab-model-group', 'ModelPackageGroupDescription': 'This group will be used to test out storage of models', 'CreationTime': datetime.datetime(2024, 9, 30, 22, 45, 2, 925000, tzinfo=tzlocal()), 'CreatedBy': {'IamIdentity': {'Arn': 'arn:aws:sts::936912055594:assumed-role/LabRole/SageMaker', 'PrincipalId': 'AROA5UJCPNUVM4RORLTYL:SageMaker'}}, 'ModelPackageGroupStatus': 'Completed', 'ResponseMetadata': {'RequestId': 'e3964823-70bd-4b1f-86d8-d759ed678992', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'e3964823-70bd-4b1f-86d8-d759ed678992', 'content-type': 'application/x-amz-json-1.1', 'content-length': '455', 'date': 'Mon, 30 Sep 2024 22:46:42 GMT'}, 'RetryAttempts': 0}}


#### Creating Model Package

The Model Package will contain specific details about our current model. Our Model Package should document model deployment information (instance image, model data source i.e. our binary artifact, data source, any pre-processor or post-processor scripts, etc.). After we learn more about Model monitoring, we can also include model quality, model data quality, model bias and model explainability reports here too!

ref: https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-version.html
ref:

In [26]:
response = sagemaker_client.create_model_package(
    ModelPackageGroupName=model_package_group_name,
    ModelPackageDescription='Initial model package deployed',
    InferenceSpecification={
        'Containers': [
            {
                'ContainerHostname': 'Container-1',
                'Image': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1',
                'ModelDataUrl': 's3://sagemaker-us-east-1-936912055594/DEMO-breast-cancer-prediction-xgboost-highlevel/output/xgb-2024-09-30-20-57-19/xgb-2024-09-30-20-57-19/output/model.tar.gz',
                'ProductId': '10',
                'Environment': {
                    'string': 'test'
                },
                'ModelInput': {
                    'DataInputConfig': 'batch'
                },
                'Framework': 'XGBOOST',
                'FrameworkVersion': '1.7',
                'NearestModelName': 'xgboost'
            },
        ],
        'SupportedTransformInstanceTypes': [
            'ml.m4.2xlarge'
        ],
        'SupportedRealtimeInferenceInstanceTypes': [
            'ml.m5.xlarge'
        ],
        'SupportedContentTypes': [
            'text/csv',
        ],
        'SupportedResponseMIMETypes': [
            'text/csv',
        ]
    }
)

In [34]:
sagemaker_client.list_model_packages(ModelPackageGroupName=model_package_group_name)


{'ModelPackageSummaryList': [{'ModelPackageGroupName': 'test-lab-model-group',
   'ModelPackageVersion': 1,
   'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:936912055594:model-package/test-lab-model-group/1',
   'ModelPackageDescription': 'Initial model package deployed',
   'CreationTime': datetime.datetime(2024, 9, 30, 23, 17, 52, 790000, tzinfo=tzlocal()),
   'ModelPackageStatus': 'Completed'}],
 'ResponseMetadata': {'RequestId': '37006c2a-3f00-4553-9186-2d58ce09768b',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '37006c2a-3f00-4553-9186-2d58ce09768b',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '323',
   'date': 'Mon, 30 Sep 2024 23:29:27 GMT'},
  'RetryAttempts': 0}}

In [36]:
# verify model package
response = sagemaker_client.describe_model_package(
    ModelPackageName='arn:aws:sagemaker:us-east-1:936912055594:model-package/test-lab-model-group/1'
)
print(response)

{'ModelPackageGroupName': 'test-lab-model-group', 'ModelPackageVersion': 1, 'ModelPackageArn': 'arn:aws:sagemaker:us-east-1:936912055594:model-package/test-lab-model-group/1', 'ModelPackageDescription': 'Initial model package deployed', 'CreationTime': datetime.datetime(2024, 9, 30, 23, 17, 52, 790000, tzinfo=tzlocal()), 'InferenceSpecification': {'Containers': [{'ContainerHostname': 'Container-1', 'Image': '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1', 'ImageDigest': 'sha256:f037aa7389a000dc611e723fb227d21a07cf495fd5e7bd8292a260ae101b5546', 'ModelDataUrl': 's3://sagemaker-us-east-1-936912055594/DEMO-breast-cancer-prediction-xgboost-highlevel/output/xgb-2024-09-30-20-57-19/xgb-2024-09-30-20-57-19/output/model.tar.gz', 'Environment': {'string': 'test'}, 'ModelInput': {'DataInputConfig': 'batch'}, 'Framework': 'XGBOOST', 'FrameworkVersion': '1.7', 'NearestModelName': 'xgboost'}], 'SupportedTransformInstanceTypes': ['ml.m4.2xlarge'], 'SupportedRealtimeInferenceIns

#### Creating Model Card
The Model Card will contain qualitative details about our current model. The Model Card can contain a lot of information. At a minimum, it should contain details of what the model algorithm is, how the model was trained, what hyperparameters were used to train the model, what the input features for the model are, who the model owner is (you), what problem the model is trying to solve, intended uses of the model, evaluation details of the model, and so on.

ref: https://docs.aws.amazon.com/sagemaker/latest/dg/model-cards-create.html

In [3]:
content = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "http://json-schema.org/draft-07/schema#",
  "title": "SageMakerModelCardSchema",
  "description": "Internal model card schema for SageMakerRepositoryService without model_package_details",
  "version": "0.1.0",
  "type": "object",
  "additionalProperties": False,
  "properties": {
    "model_overview": {
      "description": "Overview about the model",
      "type": "object",
      "additionalProperties": False,
      "properties": {
        "model_description": {
          "description": "description of model",
          "type": "string",
          "maxLength": 1024
        },
        "model_creator": {
          "description": "Creator of model",
          "type": "string",
          "maxLength": 1024
        },
        "model_artifact": {
          "description": "Location of the model artifact",
          "type": "array",
          "maxContains": 15,
          "items": {
            "type": "string",
            "maxLength": 1024
          }
        },
        "algorithm_type": {
          "description": "Algorithm used to solve the problem",
          "type": "string",
          "maxLength": 1024
        },
        "problem_type": {
          "description": "Problem being solved with the model",
          "type": "string"
        },
        "model_owner": {
          "description": "Owner of model",
          "type": "string",
          "maxLength": 1024
        }
      }
    },
    "intended_uses": {
      "description": "Intended usage of model",
      "type": "object",
      "additionalProperties": False,
      "properties": {
        "purpose_of_model": {
          "description": "Why the model was developed?",
          "type": "string",
          "maxLength": 2048
        },
        "intended_uses": {
          "description": "intended use cases",
          "type": "string",
          "maxLength": 2048
        },
        "factors_affecting_model_efficiency": {
          "type": "string",
          "maxLength": 2048
        },
        "risk_rating": {
          "description": "Risk rating for model card",
          "$ref": "#/definitions/risk_rating"
        },
        "explanations_for_risk_rating": {
          "type": "string",
          "maxLength": 2048
        }
      }
    },
    "business_details": {
      "description": "Business details of model",
      "type": "object",
      "additionalProperties": False,
      "properties": {
        "business_problem": {
          "description": "What business problem does the model solve?",
          "type": "string",
          "maxLength": 2048
        },
        "business_stakeholders": {
          "description": "Business stakeholders",
          "type": "string",
          "maxLength": 2048
        },
        "line_of_business": {
          "type": "string",
          "maxLength": 2048
        }
      }
    },
    "training_details": {
      "description": "Overview about the training",
      "type": "object",
      "additionalProperties": False,
      "properties": {
        "objective_function": {
          "description": "the objective function the model will optimize for",
          "function": {
            "$ref": "#/definitions/objective_function"
          },
          "notes": {
            "type": "string",
            "maxLength": 1024
          }
        },
        "training_observations": {
          "type": "string",
          "maxLength": 1024
        },
        "training_job_details": {
          "type": "object",
          "additionalProperties": False,
          "properties": {
            "training_arn": {
              "description": "SageMaker Training job arn",
              "type": "string",
              "maxLength": 1024
            },
            "training_datasets": {
              "description": "Location of the model datasets",
              "type": "array",
              "maxContains": 15,
              "items": {
                "type": "string",
                "maxLength": 1024
              }
            },
            "training_environment": {
              "type": "object",
              "additionalProperties": False,
              "properties": {
                "container_image": {
                  "description": "SageMaker training image uri",
                  "type": "array",
                  "maxContains": 15,
                  "items": {
                    "type": "string",
                    "maxLength": 1024
                  }
                }
              }
            },
            "training_metrics": {
              "type": "array",
              "items": {
                "maxItems": 50,
                "$ref": "#/definitions/training_metric"
              }
            },
            "user_provided_training_metrics": {
              "type": "array",
              "items": {
                "maxItems": 50,
                "$ref": "#/definitions/training_metric"
              }
            },
            "hyper_parameters": {
              "type": "array",
              "items": {
                "maxItems": 100,
                "$ref": "#/definitions/training_hyper_parameter"
              }
            },
            "user_provided_hyper_parameters": {
              "type": "array",
              "items": {
                "maxItems": 100,
                "$ref": "#/definitions/training_hyper_parameter"
              }
            }
          }
        }
      }
    },
    "evaluation_details": {
      "type": "array",
      "default": [],
      "items": {
        "type": "object",
        "required": [
          "name"
        ],
        "additionalProperties": False,
        "properties": {
          "name": {
            "type": "string",
            "pattern": ".{1,63}"
          },
          "evaluation_observation": {
            "type": "string",
            "maxLength": 2096
          },
          "evaluation_job_arn": {
            "type": "string",
            "maxLength": 256
          },
          "datasets": {
            "type": "array",
            "items": {
              "type": "string",
              "maxLength": 1024
            },
            "maxItems": 10
          },
          "metadata": {
            "description": "additional attributes associated with the evaluation results",
            "type": "object",
            "additionalProperties": {
              "type": "string",
              "maxLength": 1024
            }
          },
          "metric_groups": {
            "type": "array",
            "default": [],
            "items": {
              "type": "object",
              "required": [
                "name",
                "metric_data"
              ],
              "properties": {
                "name": {
                  "type": "string",
                  "pattern": ".{1,63}"
                },
                "metric_data": {
                  "type": "array",
                  "items": {
                    "anyOf": [
                      {
                        "$ref": "#/definitions/simple_metric"
                      },
                      {
                        "$ref": "#/definitions/linear_graph_metric"
                      },
                      {
                        "$ref": "#/definitions/bar_chart_metric"
                      },
                      {
                        "$ref": "#/definitions/matrix_metric"
                      }
                    ]

                  }
                }
              }
            }
          }
        }
      }
    },
    "additional_information": {
      "additionalProperties": False,
      "type": "object",
      "properties": {
        "ethical_considerations": {
          "description": "Any ethical considerations that the author wants to provide",
          "type": "string",
          "maxLength": 2048
        },
        "caveats_and_recommendations": {
          "description": "Caveats and recommendations for people who might use this model in their applications.",
          "type": "string",
          "maxLength": 2048
        },
        "custom_details": {
          "type": "object",
          "additionalProperties": {
            "$ref": "#/definitions/custom_property"
          }
        }
      }
    }
  },
  "definitions": {
    "source_algorithms": {
      "type": "array",
      "minContains": 1,
      "maxContains": 1,
      "items": {
        "type": "object",
        "additionalProperties": False,
        "required": [
          "algorithm_name"
        ],
        "properties": {
          "algorithm_name": {
            "description": "The name of an algorithm that was used to create the model package. The algorithm must be either an algorithm resource in your SageMaker account or an algorithm in AWS Marketplace that you are subscribed to.",
            "type": "string",
            "maxLength": 170
          },
          "model_data_url": {
            "description": "The Amazon S3 path where the model artifacts, which result from model training, are stored.",
            "type": "string",
            "maxLength": 1024
          }
        }
      }
    },
    "inference_specification": {
      "type": "object",
      "additionalProperties": False,
      "required": [
        "containers"
      ],
      "properties": {
        "containers": {
          "description": "Contains inference related information which were used to create model package.",
          "type": "array",
          "minContains": 1,
          "maxContains": 15,
          "items": {
            "type": "object",
            "additionalProperties": False,
            "required": [
              "image"
            ],
            "properties": {
              "model_data_url": {
                "description": "The Amazon S3 path where the model artifacts, which result from model training, are stored.",
                "type": "string",
                "maxLength": 1024
              },
              "image": {
                "description": "Inference environment path. The Amazon EC2 Container Registry (Amazon ECR) path where inference code is stored.",
                "type": "string",
                "maxLength": 255
              },
              "nearest_model_name": {
                "description": "The name of a pre-trained machine learning benchmarked by Amazon SageMaker Inference Recommender model that matches your model.",
                "type": "string"
              }
            }
          }
        }
      }
    },
    "risk_rating": {
      "description": "Risk rating of model",
      "type": "string",
      "enum": [
        "High",
        "Medium",
        "Low",
        "Unknown"
      ]
    },
    "custom_property": {
      "description": "Additional property in section",
      "type": "string",
      "maxLength": 1024
    },
    "objective_function": {
      "description": "objective function that training job is optimized for",
      "additionalProperties": False,
      "properties": {
        "function": {
          "type": "string",
          "enum": [
            "Maximize",
            "Minimize"
          ]
        },
        "facet": {
          "type": "string",
          "maxLength": 63
        },
        "condition": {
          "type": "string",
          "maxLength": 63
        }
      }
    },
    "training_metric": {
      "description": "training metric data",
      "type": "object",
      "required": [
        "name",
        "value"
      ],
      "additionalProperties": False,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "value": {
          "type": "number"
        }
      }
    },
    "training_hyper_parameter": {
      "description": "training hyper parameter",
      "type": "object",
      "required": [
        "name"
      ],
      "additionalProperties": False,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "value": {
          "type": "string",
          "pattern": ".{0,255}"
        }
      }
    },
    "linear_graph_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": False,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "linear_graph"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "array",
                "items": {
                  "type": "number"
                },
                "minItems": 2,
                "maxItems": 2
              },
              "minItems": 1
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "bar_chart_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": False,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "bar_chart"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "number"
              },
              "minItems": 1
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "matrix_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": False,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "matrix"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "array",
                "items": {
                  "type": "number"
                },
                "minItems": 1,
                "maxItems": 20
              },
              "minItems": 1,
              "maxItems": 20
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        }
      }
    },
    "simple_metric": {
      "description": "metric data",
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": False,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "number",
            "string",
            "boolean"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "number"
            },
            {
              "type": "string",
              "maxLength": 63
            },
            {
              "type": "boolean"
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "axis_name_array": {
      "type": "array",
      "items": {
        "type": "string",
        "maxLength": 63
      }
    },
    "axis_name_string": {
      "type": "string",
      "maxLength": 63
    }
  }
}               

In [None]:
# Model card
model_card_name = 'MC-test-name'
response = client.create_model_card(
    ModelCardName=card,
    SecurityConfig={
        'KmsKeyId': 'string'
    },
    Content='string',
    ModelCardStatus='Draft',
    Tags=[
        {
            'Class': 'mlOps',
        },
    ]
)

In [None]:
# Verify model card
response = client.describe_model_card(
    ModelCardName=model_card_name
)