# Deploy Multimodal Embedding Model with OCI Data Science BYOC
This notebook is supplemental to [Deploy a Multimodal RAG Pipeline On OCI Data Science and Generative AI]() Livelab.

In [None]:
# Import packages and setup auth
import ads
import os
ads.set_auth("resource_principal")

### Setting up infrastructure variables needed for deployment

Before you can run this notebook, you need to set the following variables:

**region**: Region to deploy model infrastructure and deployment in. Set by ads package.

**container_image**: The path to your container image that was pushed to OCIR in Lab 3.

**compartment_id**: Compartment where the project was deployed in. One of the environment variables set automatically in OCI Data Science Notebooks.

**project_id**: OCID for the project. Set automatically by an environment variable in OCI Data Science Notebooks.

**log_group_id**: Optional, Log group OCID that was obtained at the end of lab 4, task 1.

**log_id**: Optional, log OCID that was obtained at the end of lab 4, task 2.

**instance_shape**: The instance that the multimodal embedding model will be deployed on. GPU shapes are recommend for larger models or highly concurrent requests.


In [None]:

# Extract region information from the Notebook environment variables and signer
region = ads.common.utils.extract_region()
# Replace container image to your container image path
container_image = "<your-container-image-path>"

# Set environment variables
compartment_id = os.environ["PROJECT_COMPARTMENT_OCID"]
project_id = os.environ["PROJECT_OCID"]

# Optional logging resources
log_group_id = "<your-log-group-ocid>"
log_id = "<your-log-ocid>"

# Specify instance shape
instance_shape = "VM.GPU.A10.1"



Multimodal embedding models can process text and images the same. There are many open source examples on HuggingFace that we can use.

The following code cells download the embedding model to local storage, and uploads it to an object storage bucket. Second cell creates an OCI Data Science Model from reference to the specified object storage bucket.

You will need to set the following variables:

**bucket**: Bucket name that we created earlier this lab.

**namespace**: Tenant namespace OCID that we obtained in Lab 3, Task 3.

**model_prefix**: A prefix for your model from HuggingFace.

**Note**: You may need to authenticate with HuggingFace for some models that are gated by repository owners. 

In [None]:

bucket= "<your-bucket-name>" # this should be a versioned bucket
namespace = "<your-tenant-namespace-id>"
model_name = "<your-model-name>" # HuggingFace model name, usually: <model-provider>/<model-name>
model_prefix = "<model-prefix>" # e.g VLM2Vec

In [None]:
!huggingface-cli download $model_name --local-dir $model_prefix

In [None]:
!oci os object bulk-upload --src-dir $model_name --prefix $model_prefix -bn $bucket -ns $namespace --auth "resource_principal"

In [None]:
from ads.model.datascience_model import DataScienceModel
bucket = "<your-bucket-name>"
namespace = "<your-tenant-namespace>"
model_prefix = "<model-prefix>"
artifact_path = f"oci://{bucket}@{namespace}/{model_prefix}"

model = (DataScienceModel()
  .with_compartment_id(compartment_id)
  .with_project_id(project_id)
  .with_display_name(f"{model_prefix}")
  .with_artifact(artifact_path)
)

model.create(model_by_reference=True)

### Configure infrastructure details
The cell below configures the infrastructure details for the Data Science BYOC deployment. This deployment will be placed behind a load balancer. Some definitions:

**with_bandwith_mbps**: By default, this is set to 10. If you higher bandwidth requirements, you can scale this up as needed.

**with_replica**: Amount of instances to create and load the model. If you have high concurrency requirements, you can scale this up accordingly.

**with_access_log** and **with_predict_log**: The access logs control who accesses your model, and predict logs are logs emitted from the container image deployed. This is optional but highly recommended. You may delete these variables if you are not using logging.

In [46]:
from ads.model.deployment import (
    ModelDeployment,
    ModelDeploymentContainerRuntime,
    ModelDeploymentInfrastructure,
    ModelDeploymentMode,
)
infrastructure = (
    ModelDeploymentInfrastructure()
    .with_project_id(project_id)
    .with_compartment_id(compartment_id)
    .with_shape_name(instance_shape)
    .with_bandwidth_mbps(10)
    .with_replica(1)
    .with_web_concurrency(1)
    .with_access_log(
        log_group_id=log_group_id,
        log_id=log_id,
    )
    .with_predict_log(
        log_group_id=log_group_id,
        log_id=log_id,
    )
)

### Configure container runtime details

The code cell below configures runtime details for our container. Some definitions:

**with_env**: Environment variables to set on the container. Since we are deploying an embedding model, we will use the /v1/embeddings endpoint and this will to the predict endpoint of the model deployment.

**with_cmd**: Container startup command, these are arguments we are adding to vLLM.

Run the cell below.

In [49]:

env_var = {
    'MODEL_DEPLOY_PREDICT_ENDPOINT': '/v1/embeddings',
}

cmd_var = ["--model", f"/opt/ds/model/deployed_model/{model_prefix}", "--tensor-parallel-size", "2", "--port", "8080", "--served-model-name", "odsc-llm", "--host", "0.0.0.0", "--trust-remote-code"]

container_runtime = (
    ModelDeploymentContainerRuntime()
    .with_image(container_image)
    .with_server_port(8080)
    .with_health_check_port(8080)
    .with_env(env_var)
    .with_cmd(cmd_var)
    .with_deployment_mode(ModelDeploymentMode.HTTPS)
    .with_model_uri(model.id)
    .with_region(region)
)


### Deploy Model
Code cell below deploys the model with infrastructure and container runtime details we previously set. This will create the deployment and we can watch the deployment state by running the code cell after it.

In [None]:
deployment = (
    ModelDeployment()
    .with_display_name(f"{model_prefix} MD with BYOC")
    .with_description(f"Deployment of {model_prefix} MD with vLLM BYOC container")
    .with_infrastructure(infrastructure)
    .with_runtime(container_runtime)
).deploy(wait_for_completion=False)

In [None]:
deployment.watch()

If everything worked correctly, your model should now be deployed. Move to inference-with-byoc-model to test and experiment with multimodal embedding model.