# Deploy Embedding Models with TEI on Vertex AI

This tutorial demonstrates how to deploy any Embedding model from Hugging Face to Vertex AI using Text Embediing Inference

## Installations

Before we can install the packages make sure you have the cli installed: https://cloud.google.com/sdk/docs/install

Install the packages required for executing this notebook.

In [1]:
! pip install --upgrade --quiet google-cloud-aiplatform google-cloud-storage "google-auth>=2.23.3"

[0m

## Setup Vertex AI and SDK

Login to gcloud and set your project id.

```bash
gcloud auth login 
gcloud auth application-default login
``````

### Setup SDK with your project id

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [1]:
PROJECT_ID = "huggingface-ml"  # @param {type:"string"}
REGION = "us-central1"  # @param {type: "string"}
BUCKET_URI = f"gs://vertexai-{PROJECT_ID}-tgi"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID} --quiet
# Set the region
! gcloud config set ai/region {REGION} --quiet
# create the bucket if it doesn't exist
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

The following set of constants will be used to create names and display names of Vertex AI Prediction resources like models, endpoints, and model deployments.

In [2]:
# set model names and version
MODEL_NAME = "BGE-Large" # @param {type:"string"}
MODEL_VERSION = "v01" # @param {type: "string"}
MODEL_DISPLAY_NAME = f"TEI-{MODEL_NAME}-{MODEL_VERSION}" # @param {type:"string"}
ENDPOINT_DISPLAY_NAME = f"endpoint-{MODEL_NAME}-{MODEL_VERSION}" # @param {type:"string"}

# https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference
DOCKER_ARTIFACT_REPO = "custom-tei-example" # @param {type:"string"}
BASE_TGI_IMAGE = "ghcr.io/huggingface/text-embedding-inference:latest" # @param {type:"string"}
SERVING_CONTAINER_IMAGE_URI = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{DOCKER_ARTIFACT_REPO}/base-tei-image:latest"

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [3]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

## 2. Deploy model to Vertex AI

create new model

In [4]:
model = aiplatform.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    serving_container_image_uri=SERVING_CONTAINER_IMAGE_URI,
    serving_container_environment_variables={
        "MODEL_ID": "BAAI/bge-large-en-v1.5",
        },
    serving_container_ports=[80],
)


model.wait()

print(model.display_name)
print(model.resource_name)

Creating Model
Create Model backing LRO: projects/755607090520/locations/us-central1/models/3427166748661514240/operations/8775985187918970880
Model created. Resource name: projects/755607090520/locations/us-central1/models/3427166748661514240@1
To use this Model in another session:
model = aiplatform.Model('projects/755607090520/locations/us-central1/models/3427166748661514240@1')
TGI-Gemma-7b-v01
projects/755607090520/locations/us-central1/models/3427166748661514240


The deployment will take ~20-25 minutes. You can check the status of the deployment in the cloud console.

In [5]:
machine_type = 'g2-standard-4' # L4 GPUs
endpoint = aiplatform.Endpoint.create(display_name=ENDPOINT_DISPLAY_NAME)

deployed_model = model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=MODEL_NAME,
    machine_type=machine_type,
    accelerator_type="NVIDIA_L4",
    accelerator_count=1,
    traffic_percentage=100,
    min_replica_count=1,
    sync=True,
)

Creating Endpoint
Create Endpoint backing LRO: projects/755607090520/locations/us-central1/endpoints/3306444769978220544/operations/4289274059151114240
Endpoint created. Resource name: projects/755607090520/locations/us-central1/endpoints/3306444769978220544
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/755607090520/locations/us-central1/endpoints/3306444769978220544')
Deploying model to Endpoint : projects/755607090520/locations/us-central1/endpoints/3306444769978220544
Deploy Endpoint model backing LRO: projects/755607090520/locations/us-central1/endpoints/3306444769978220544/operations/4928785206237724672
Endpoint model deployed. Resource name: projects/755607090520/locations/us-central1/endpoints/3306444769978220544


In [6]:
prompt = "Deep Learning is"

res = deployed_model.predict(instances=[
  {"inputs": prompt} ]
)
print(prompt + res.predictions[0])


Deep Learning is attracting much attention recently. It is a chance that it will be incorporated in a navigation system. We are concerned that already deep learning is included in things like smart phones. Ask about various views to give an explanation in natural language comprehensible to the user. It is envisioned that the map creation system is able to collect information on the site using cameras from vehicles.


Delete resources

In [7]:
deployed_model.undeploy_all()
deployed_model.delete()
model.delete()

Undeploying Endpoint model: projects/755607090520/locations/us-central1/endpoints/3306444769978220544
Undeploy Endpoint model backing LRO: projects/755607090520/locations/us-central1/endpoints/3306444769978220544/operations/2260402427020705792
Endpoint model undeployed. Resource name: projects/755607090520/locations/us-central1/endpoints/3306444769978220544
Deleting Endpoint : projects/755607090520/locations/us-central1/endpoints/3306444769978220544
Delete Endpoint  backing LRO: projects/755607090520/locations/us-central1/operations/4050583278900477952
Endpoint deleted. . Resource name: projects/755607090520/locations/us-central1/endpoints/3306444769978220544
Deleting Model : projects/755607090520/locations/us-central1/models/3427166748661514240
Delete Model  backing LRO: projects/755607090520/locations/us-central1/models/3427166748661514240/operations/6872088445448093696
Model deleted. . Resource name: projects/755607090520/locations/us-central1/models/3427166748661514240
