# Download and Save the Model

To save this model so that you can use it from various locations, including other notebooks or the model server, upload it to s3-compatible storage.

## Install the required packages and define a function for the upload

In [1]:
import warnings
from urllib3.exceptions import InsecureRequestWarning

# Suppress only the InsecureRequestWarning from urllib3
warnings.filterwarnings("ignore", category=InsecureRequestWarning)

In [2]:
!pip install boto3 botocore


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Download from Huggingface

In [3]:
import os

HUGGINGFACE_USER="rcarrata"
HUGGINGFACE_TOKEN="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"

#git_repo = "https://huggingface.co/ibm/merlinite-7b"
#git_repo = "https://huggingface.co/ibm/granite-7b-base"

# git_repo = "https://huggingface.co/instructlab/merlinite-7b-lab"
# git_repo = "https://huggingface.co/instructlab/granite-7b-lab"

# git_repo = "https://huggingface.co/google/flan-t5-small"

# git_repo = "https://huggingface.co/codellama/CodeLlama-7b-hf"

# git_repo = "https://huggingface.co/mosaicml/mpt-7b-chat"
# git_repo = "https://huggingface.co/mosaicml/mpt-7b-instruct"

#HUGGINGFACE_USER = os.getenv("HUGGINGFACE_USER")
#HUGGINGFACE_TOKEN = os.getenv("HUGGINGFACE_TOKEN")

#git_repo = f"https://{HUGGINGFACE_USER}:{HUGGINGFACE_TOKEN}@huggingface.co/mistralai/Mistral-7B-Instruct-v0.2"

git_repo = f"https://{HUGGINGFACE_TOKEN}:{HUGGINGFACE_TOKEN}@huggingface.co/meta-llama/Llama-2-7b-chat-hf"
# git_repo = f"https://{HUGGINGFACE_TOKEN}:{HUGGINGFACE_TOKEN}@huggingface.co/meta-llama/Llama-2-13b-chat-hf"
!git config pull.rebase false


In [4]:
!git clone $git_repo

fatal: destination path 'Llama-2-7b-chat-hf' already exists and is not an empty directory.


In [5]:
import os

model_name = os.path.basename(git_repo)
model_name

'Llama-2-7b-chat-hf'

## Helper functions for upload

In [6]:
import os
import boto3
import botocore

aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
region_name = os.environ.get('AWS_DEFAULT_REGION')
bucket_name = os.environ.get('AWS_S3_BUCKET')

session = boto3.session.Session(aws_access_key_id=aws_access_key_id,
                                aws_secret_access_key=aws_secret_access_key)

s3_resource = session.resource(
    's3',
    config=botocore.client.Config(signature_version='s3v4'),
    endpoint_url=endpoint_url,
    region_name=region_name)

bucket = s3_resource.Bucket(bucket_name)

#upload the model directory without git
def upload_directory_to_s3(local_directory, s3_prefix, remove_safetensors=True):
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            file_path = os.path.join(root, filename)
            relative_path = os.path.relpath(file_path, local_directory)
            if ".git" in relative_path:
                print(f"skipping {relative_path}")
                continue
            # if remove_safetensors and ".safetensors" in relative_path:
            #     print(f"skipping {relative_path}")
            #     continue
            s3_key = os.path.join(s3_prefix, relative_path)
            print(f"{file_path} -> {s3_key}")
            bucket.upload_file(file_path, s3_key)


def list_objects(prefix):
    filter = bucket.objects.filter(Prefix=prefix)
    for obj in filter.all():
        print(obj.key)
        
def delete_subdirectory(s3_prefix):
    """
    Deletes all objects under a specified S3 prefix, effectively removing the subdirectory.
    """
    objects_to_delete = bucket.objects.filter(Prefix=s3_prefix)
    delete_keys = {'Objects': [{'Key': obj.key} for obj in objects_to_delete]}
    if delete_keys['Objects']:
        bucket.delete_objects(Delete=delete_keys)
        print(f"Deleted all objects in directory {s3_prefix}")
    else:
        print(f"No objects found in directory {s3_prefix}")


delete_subdirectory('models/granite-7b-base/')
## Check the Storage Bucket

In your S3 bucket, under the `models` upload prefix, run the `list_object` command. As best practice, to avoid mixing up model files, keep only one model and its required files in a given prefix or directory. This practice allows you to download and serve a directory with all the files that a model requires. 

If this is the first time running the code, this cell will have no output or the fraud model from the predictive AI/ML exercise.


In [7]:
list_objects("models")

models/Mistral-7B-Instruct-v0.2/README.md
models/Mistral-7B-Instruct-v0.2/config.json
models/Mistral-7B-Instruct-v0.2/generation_config.json
models/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
models/Mistral-7B-Instruct-v0.2/pytorch_model-00001-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model-00002-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model-00003-of-00003.bin
models/Mistral-7B-Instruct-v0.2/pytorch_model.bin.index.json
models/Mistral-7B-Instruct-v0.2/special_tokens_map.json
models/Mistral-7B-Instruct-v0.2/tokenizer.json
models/Mistral-7B-Instruct-v0.2/tokenizer.model
models/Mistral-7B-Instruct-v0.2/tokenizer_config.json
models/fraud/1/model.onnx
models/granite-7b-base/README.md
models/granite-7b-base/config.json
models/granite-7b-base/generation_config.json
mo

## Upload and check again

Use the function to upload the `models` folder in a rescursive fashion:

In [8]:
upload_directory_to_s3(model_name, f"models/{model_name}")

Llama-2-7b-chat-hf/README.md -> models/Llama-2-7b-chat-hf/README.md
skipping .gitattributes
Llama-2-7b-chat-hf/tokenizer.json -> models/Llama-2-7b-chat-hf/tokenizer.json
Llama-2-7b-chat-hf/config.json -> models/Llama-2-7b-chat-hf/config.json
Llama-2-7b-chat-hf/generation_config.json -> models/Llama-2-7b-chat-hf/generation_config.json
Llama-2-7b-chat-hf/pytorch_model-00002-of-00002.bin -> models/Llama-2-7b-chat-hf/pytorch_model-00002-of-00002.bin
Llama-2-7b-chat-hf/tokenizer.model -> models/Llama-2-7b-chat-hf/tokenizer.model
Llama-2-7b-chat-hf/model.safetensors.index.json -> models/Llama-2-7b-chat-hf/model.safetensors.index.json
Llama-2-7b-chat-hf/pytorch_model.bin.index.json -> models/Llama-2-7b-chat-hf/pytorch_model.bin.index.json
Llama-2-7b-chat-hf/model-00002-of-00002.safetensors -> models/Llama-2-7b-chat-hf/model-00002-of-00002.safetensors
Llama-2-7b-chat-hf/LICENSE.txt -> models/Llama-2-7b-chat-hf/LICENSE.txt
Llama-2-7b-chat-hf/tokenizer_config.json -> models/Llama-2-7b-chat-hf/to

To confirm this worked, run the `list_objects` function again:

This time, you should see files listed in the directory/prefix


In [9]:
list_objects("models")

models/Llama-2-7b-chat-hf/LICENSE.txt
models/Llama-2-7b-chat-hf/README.md
models/Llama-2-7b-chat-hf/USE_POLICY.md
models/Llama-2-7b-chat-hf/config.json
models/Llama-2-7b-chat-hf/generation_config.json
models/Llama-2-7b-chat-hf/model-00002-of-00002.safetensors
models/Llama-2-7b-chat-hf/model.safetensors.index.json
models/Llama-2-7b-chat-hf/pytorch_model-00002-of-00002.bin
models/Llama-2-7b-chat-hf/pytorch_model.bin.index.json
models/Llama-2-7b-chat-hf/special_tokens_map.json
models/Llama-2-7b-chat-hf/tokenizer.json
models/Llama-2-7b-chat-hf/tokenizer.model
models/Llama-2-7b-chat-hf/tokenizer_config.json
models/Mistral-7B-Instruct-v0.2/README.md
models/Mistral-7B-Instruct-v0.2/config.json
models/Mistral-7B-Instruct-v0.2/generation_config.json
models/Mistral-7B-Instruct-v0.2/model-00001-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00002-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model-00003-of-00003.safetensors
models/Mistral-7B-Instruct-v0.2/model.safetensors.inde

### Next Step

Now that you've saved the model to s3 storage, you can refer to the model by using the same data connection to serve the model as an API.
