# Save the Model

To save this model so that you can use it from various locations, including other notebooks or the model server, upload it to s3-compatible storage.

## Install the required packages and define a function for the upload

In [None]:
!pip install boto3 botocore lakefs==0.7.1

### Define lakeFS Repository

In [None]:
import os
import lakefs
repo_name = os.environ.get('LAKEFS_REPO_NAME')

mainBranch = "main"
trainingBranch = "train01"

# -----------------------------------------------------------------------------
# lakeFS CONTROL PLANE: Reference the repository (versioned AI namespace)
# What this does: Creates a client-side handle to the lakeFS repository that
# versions all datasets and model artifacts for this workflow.
# Why it matters: The repository is the control-plane boundary where lineage,
# commits, and promotions are tracked across training and serving.
# AI data control plane value: Establishes a single governed namespace tying
# model artifacts back to the exact data versions used to produce them.
# -----------------------------------------------------------------------------
repo = lakefs.Repository(repo_name)
print(repo)

In [None]:
import boto3
import botocore

aws_access_key_id = os.environ.get('LAKECTL_CREDENTIALS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('LAKECTL_SERVER_ENDPOINT_URL')
region_name = os.environ.get('LAKEFS_DEFAULT_REGION')
bucket_name = os.environ.get('LAKEFS_REPO_NAME')

if not all([aws_access_key_id, aws_secret_access_key, endpoint_url, region_name, bucket_name]):
    raise ValueError("One or data connection variables are empty.  "
                     "Please check your data connection to an S3 bucket.")

# -----------------------------------------------------------------------------
# lakeFS DATA PLANE (S3 gateway configuration)
# What this does: Configures boto3 to send standard S3 API calls to the lakeFS
# endpoint rather than directly to object storage.
# Why it matters: All object uploads still use native S3 semantics, but lakeFS
# transparently versions these writes and associates them with branches/commits.
# AI data control plane value: Preserves compatibility with existing ML tooling
# while enabling versioned, auditable model storage.
# -----------------------------------------------------------------------------
session = boto3.session.Session(aws_access_key_id=aws_access_key_id,
                                aws_secret_access_key=aws_secret_access_key)

s3_resource = session.resource(
    's3',
    config=botocore.client.Config(signature_version='s3v4'),
    endpoint_url=endpoint_url,
    region_name=region_name)

bucket = s3_resource.Bucket(bucket_name)


def upload_directory_to_s3(local_directory, s3_prefix):
    num_files = 0
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            file_path = os.path.join(root, filename)
            relative_path = os.path.relpath(file_path, local_directory)
            s3_key = os.path.join(s3_prefix, relative_path)
            print(f"{file_path} -> {s3_key}")
            bucket.upload_file(file_path, s3_key)
            num_files += 1
    return num_files


def list_objects(prefix):
    filter = bucket.objects.filter(Prefix=prefix)
    for obj in filter.all():
        print(obj.key)

## Verify the upload

In your S3 bucket, under the `models` upload prefix, run the `list_object` command. As best practice, to avoid mixing up model files, keep only one model and its required files in a given prefix or directory. This practice allows you to download and serve a directory with all the files that a model requires. 

If this is the first time running the code, this cell will have no output.

If you've already uploaded your model, you should see this output: `models/fraud/1/model.onnx`


In [None]:
list_objects(f"{trainingBranch}/models")

## Upload the model to the training branch in lakeFS and check again

Use the function to upload the `models` folder in a rescursive fashion:

In [None]:
local_models_directory = "models"

if not os.path.isdir(local_models_directory):
    raise ValueError(f"The directory '{local_models_directory}' does not exist.  "
                     "Did you finish training the model in the previous notebook?")

# -----------------------------------------------------------------------------
# lakeFS: Upload model artifacts into a branch namespace
# What this does: Uploads the trained model files into the trainingBranch path
# (s3://<repo>/<branch>/models/...).
# Why it matters: Writing into a branch path stages model changes without
# impacting the main branch used for serving or other experiments.
# AI data control plane value: Enables safe, isolated model experimentation and
# validation before promotion, using standard S3 tooling.
# -----------------------------------------------------------------------------
num_files = upload_directory_to_s3("models", f"{trainingBranch}/models")

if num_files == 0:
    raise ValueError("No files uploaded.  Did you finish training and "
                     "saving the model to the \"models\" directory?  "
                     "Check for \"models/fraud/1/model.onnx\"")


To confirm this worked, run the `list_objects` function again:

In [None]:
# -----------------------------------------------------------------------------
# lakeFS: Inspect objects staged on a branch
# What this does: Lists objects currently present under the trainingBranch path.
# Why it matters: Provides immediate visibility into which model artifacts are
# part of this branch before committing or promoting them.
# AI data control plane value: Makes versioned model state explicit and reviewable
# prior to creating an immutable snapshot.
# -----------------------------------------------------------------------------
list_objects(f"{trainingBranch}/models")

## Commit changes in lakeFS repository

In [None]:
# -----------------------------------------------------------------------------
# lakeFS CONTROL PLANE: Commit model artifacts (immutable snapshot)
# What this does: Creates a commit capturing the current contents of the
# trainingBranch, including the uploaded model files.
# Why it matters: Commits are immutable references. This guarantees that the
# model artifact can always be traced back to a specific, reproducible state.
# AI data control plane value: Enables precise lineage between training data,
# preprocessing artifacts, and the model used for serving or evaluation.
# -----------------------------------------------------------------------------
branchTraining = repo.branch(trainingBranch)
ref = branchTraining.commit(message='Uploaded data, artifacts and model')
print(ref.get_commit())

### Next Step

Now that you've saved the model to s3 storage, you can refer to the model by using the same data connection to serve the model as an API.
