# Download and Save the Model

To save this model so that you can use it from various locations, including other notebooks or the model server, upload it to s3-compatible storage.

In [1]:
!pip install boto3 botocore


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Install the required packages and define a function for the upload

## Download from Huggingface

In [2]:
# !git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
!hfd mistralai/Mistral-7B-Instruct-v0.2 --exclude .bin --hf_username $HF_USERNAME --hf_token $HF_TOKEN --tool aria2c -x 8

Downloading to ./Mistral-7B-Instruct-v0.2
Test GIT_REFS_URL: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/info/refs?service=git-upload-pack
git clone https://<your id>:<your token>@huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Cloning into 'Mistral-7B-Instruct-v0.2'...
remote: Enumerating objects: 74, done.[K
remote: Counting objects: 100% (74/74), done.[K
remote: Compressing objects: 100% (52/52), done.[K
remote: Total 74 (delta 35), reused 48 (delta 21), pack-reused 0 (from 0)[K
Unpacking objects: 100% (74/74), 481.51 KiB | 5.80 MiB/s, done.

Start Downloading lfs files, bash script:
aria2c --header="Authorization: Bearer <your token>" -x 8 -s 8 -k 1M -c "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/model-00001-of-00003.safetensors" -d "." -o "model-00001-of-00003.safetensors"
aria2c --header="Authorization: Bearer <your token>" -x 8 -s 8 -k 1M -c "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/model-00002-of-0000

## Helper functions for upload

In [3]:
import os
import boto3
import botocore

aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
region_name = os.environ.get('AWS_DEFAULT_REGION')
bucket_name = os.environ.get('AWS_S3_BUCKET')

session = boto3.session.Session(aws_access_key_id=aws_access_key_id,
                                aws_secret_access_key=aws_secret_access_key)

s3_resource = session.resource(
    's3',
    config=botocore.client.Config(signature_version='s3v4'),
    endpoint_url=endpoint_url,
    region_name=region_name)

bucket = s3_resource.Bucket(bucket_name)

#upload the model directory without git
def upload_directory_to_s3(local_directory, s3_prefix):
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            file_path = os.path.join(root, filename)
            relative_path = os.path.relpath(file_path, local_directory)
            if ".git" in relative_path:
                continue
            s3_key = os.path.join(s3_prefix, relative_path)
            print(f"{file_path} -> {s3_key}")
            bucket.upload_file(file_path, s3_key)


def list_objects(prefix):
    filter = bucket.objects.filter(Prefix=prefix)
    for obj in filter.all():
        print(obj.key)

## Check the Storage Bucket

In your S3 bucket, under the `models` upload prefix, run the `list_object` command. As best practice, to avoid mixing up model files, keep only one model and its required files in a given prefix or directory. This practice allows you to download and serve a directory with all the files that a model requires. 

If this is the first time running the code, this cell will have no output or the fraud model from the predictive AI/ML exercise.


In [4]:
list_objects("models")

models/Mistral-7B-Instruct-v0.2/README.md
models/Mistral-7B-Instruct-v0.2/config.json
models/Mistral-7B-Instruct-v0.2/generation_config.json
models/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
models/Mistral-7B-Instruct-v0.2/pytorch_model-00001-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model-00002-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model-00003-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model.bin.index.json
models/Mistral-7B-Instruct-v0.2/special_tokens_map.json
models/Mistral-7B-Instruct-v0.2/tokenizer.json
models/Mistral-7B-Instruct-v0.2/tokenizer.model
models/Mistral-7B-Instruct-v0.2/tokenizer_config.json


## Upload and check again

In [5]:
upload_directory_to_s3("Mistral-7B-Instruct-v0.2", "models/Mistral-7B-Instruct-v0.2")

Mistral-7B-Instruct-v0.2/README.md -> models/Mistral-7B-Instruct-v0.2/README.md
Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors -> models/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
Mistral-7B-Instruct-v0.2/model.safetensors.index.json -> models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensors -> models/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensors
Mistral-7B-Instruct-v0.2/pytorch_model-00002-of-00003.bin -> models/Mistral-7B-Instruct-v0.2/pytorch_model-00002-of-00003.bin
Mistral-7B-Instruct-v0.2/special_tokens_map.json -> models/Mistral-7B-Instruct-v0.2/special_tokens_map.json
Mistral-7B-Instruct-v0.2/tokenizer.model -> models/Mistral-7B-Instruct-v0.2/tokenizer.model
Mistral-7B-Instruct-v0.2/pytorch_model.bin.index.json -> models/Mistral-7B-Instruct-v0.2/pytorch_model.bin.index.json
Mistral-7B-Instruct-v0.2/tokenizer.json -> models/Mistral-7B-Instruct-v0.2/tokenizer.json
Mistral-7B-Instr

To confirm this worked, run the `list_objects` function again:

Use the function to upload the `models` folder in a rescursive fashion:

In [6]:
list_objects("models")

models/Mistral-7B-Instruct-v0.2/README.md
models/Mistral-7B-Instruct-v0.2/config.json
models/Mistral-7B-Instruct-v0.2/generation_config.json
models/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
models/Mistral-7B-Instruct-v0.2/pytorch_model-00001-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model-00002-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model-00003-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model.bin.index.json
models/Mistral-7B-Instruct-v0.2/special_tokens_map.json
models/Mistral-7B-Instruct-v0.2/tokenizer.json
models/Mistral-7B-Instruct-v0.2/tokenizer.model
models/Mistral-7B-Instruct-v0.2/tokenizer_config.json


Push to Model Registry

In [None]:
!pip install model-registry=="0.2.14"
#!pip install kserve=="0.13"

from model_registry import ModelRegistry

In [None]:
# grab the url from the annotations in service

registry = ModelRegistry(
    server_address="https://test-rest.apps.cluster-rr2bp.rr2bp.sandbox199.opentlc.com",
    author="test"
)

In [None]:
rm = registry.register_model(
    "Mistral",
    "minio-api-minio.apps.cluster-rr2bp.rr2bp.sandbox199.opentlc.com/models/models/Mistral-7B-Instruct-v0.2",
    model_format_name="sklearn",
    model_format_version="1",
    version="v1",
    description="Model for code Assistance",
    metadata={
        "accuracy": 3.14,
        "license": "BSD 3-Clause License",
    }
)