# Deploy Golden Gate 7B on Vertex AI 

This tutorial demonstrates how to deploy Golden Gate to Vertex AI using Hugging Face Text Generation Inference.

_Note: Make sure you build the container with the `patch` for the Golden Gate models._

## Installations

Before we can install the packages make sure you have the cli installed: https://cloud.google.com/sdk/docs/install

Install the packages required for executing this notebook.

In [None]:
! pip install --upgrade --quiet google-cloud-aiplatform google-cloud-storage "google-auth>=2.23.3"

## Setup Vertex AI and SDK

Login to gcloud and set your project id.

```bash
gcloud auth login 
gcloud auth application-default login
``````

### Setup SDK with your project id

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
# PROJECT_ID = "[your-project-id]"  # @param {type:"string"}
PROJECT_ID = "huggingface-ml"  # @param {type:"string"}
REGION = "us-central1"  # @param {type: "string"}
BUCKET_URI = f"gs://vertexai-{PROJECT_ID}-tgi"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID} --quiet
# Set the region
! gcloud config set ai/region {REGION} --quiet
# create the bucket if it doesn't exist
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

The following set of constants will be used to create names and display names of Vertex AI Prediction resources like models, endpoints, and model deployments.

In [3]:
# set model names and version
MODEL_NAME = "Golden-Gate-7b" # @param {type:"string"}
MODEL_VERSION = "v01" # @param {type: "string"}
MODEL_DISPLAY_NAME = f"TGI-{MODEL_NAME}-{MODEL_VERSION}" # @param {type:"string"}
ENDPOINT_DISPLAY_NAME = f"endpoint-{MODEL_NAME}-{MODEL_VERSION}" # @param {type:"string"}

# https://github.com/huggingface/text-generation-inference/pkgs/container/text-generation-inference
DOCKER_ARTIFACT_REPO = "custom-tgi-example" # @param {type:"string"}
BASE_TGI_IMAGE = "ghcr.io/huggingface/text-generation-inference:latest" # @param {type:"string"}
SERVING_CONTAINER_IMAGE_URI = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{DOCKER_ARTIFACT_REPO}/base-tgi-image:latest"

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [4]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

## 2. Deploy model to Vertex AI

create new model

In [15]:
model = aiplatform.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    serving_container_image_uri=SERVING_CONTAINER_IMAGE_URI,
    serving_container_environment_variables={
        "MODEL_ID": "gg-hf/golden-gate-7b",
        "NUM_SHARD": "1",
        "MAX_INPUT_LENGTH": "512",
        "MAX_TOTAL_TOKENS": "1024",
        "MAX_BATCH_PREFILL_TOKENS": "1512",
        "HUGGING_FACE_HUB_TOKEN": "TOKEN WITH ACCESS TO THE PRIVATE REPO",
        },
    serving_container_ports=[80],
)


model.wait()

print(model.display_name)
print(model.resource_name)

Creating Model
Create Model backing LRO: projects/1049843053967/locations/us-central1/models/4312862947253682176/operations/393893374861508608
Model created. Resource name: projects/1049843053967/locations/us-central1/models/4312862947253682176@1
To use this Model in another session:
model = aiplatform.Model('projects/1049843053967/locations/us-central1/models/4312862947253682176@1')
TGI-Golden-Gate-7b-v01
projects/1049843053967/locations/us-central1/models/4312862947253682176


The deployment will take ~20-25 minutes. You can check the status of the deployment in the cloud console.

In [16]:
machine_type = 'g2-standard-4' # L4 GPUs
endpoint = aiplatform.Endpoint.create(display_name=ENDPOINT_DISPLAY_NAME)

deployed_model = model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=MODEL_NAME,
    machine_type=machine_type,
    accelerator_type="NVIDIA_L4",
    accelerator_count=1,
    traffic_percentage=100,
    min_replica_count=1,
    sync=True,
)

Creating Endpoint
Create Endpoint backing LRO: projects/1049843053967/locations/us-central1/endpoints/60211455760269312/operations/1744973263072657408
Endpoint created. Resource name: projects/1049843053967/locations/us-central1/endpoints/60211455760269312
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/1049843053967/locations/us-central1/endpoints/60211455760269312')
Deploying model to Endpoint : projects/1049843053967/locations/us-central1/endpoints/60211455760269312
Deploy Endpoint model backing LRO: projects/1049843053967/locations/us-central1/endpoints/60211455760269312/operations/7087368321040908288
Endpoint model deployed. Resource name: projects/1049843053967/locations/us-central1/endpoints/60211455760269312


In [18]:
prompt = "Deep Learning is"

res = deployed_model.predict(instances=[
  {"inputs": prompt, 
   "parameters": {"max_new_tokens": 256, "do_sample": True, "top_p": 0.95, "temparature": 1.0 }}
  ]
)
print(prompt + res.predictions[0])


Deep Learning is the upcoming technology trend that is widening its stakes in the mobile app industry and which might transform the way things are done. With significant advancements in the deep learning techniques, mobile apps have also started using deep learning techniques in order to make themselves better and more efficient than ever. The use of deep learning is a hot topic now and after witnessing the way people are passionate enough to make deep learning techniques for the smartphones, these techniques have become the most anticipated technological breakthrough today.

The novel research have marked the excellence of deep learning in the field of mobile analytics where the insights provided by the trained networks has boosted the productivity of the apps and also helped the app developers in providing a better insight about the app usage .  It helps the app developers and marketers to understand the various application enhancements, issues, and users needs better. Nowadays, deep

Delete resources

In [19]:
deployed_model.undeploy_all()
deployed_model.delete()
model.delete()

Undeploying Endpoint model: projects/1049843053967/locations/us-central1/endpoints/60211455760269312
Undeploy Endpoint model backing LRO: projects/1049843053967/locations/us-central1/endpoints/60211455760269312/operations/3646618205729849344
Endpoint model undeployed. Resource name: projects/1049843053967/locations/us-central1/endpoints/60211455760269312
Deleting Endpoint : projects/1049843053967/locations/us-central1/endpoints/60211455760269312
Delete Endpoint  backing LRO: projects/1049843053967/locations/us-central1/operations/5055118989189971968
Endpoint deleted. . Resource name: projects/1049843053967/locations/us-central1/endpoints/60211455760269312
Deleting Model : projects/1049843053967/locations/us-central1/models/4312862947253682176
Delete Model  backing LRO: projects/1049843053967/locations/us-central1/models/4312862947253682176/operations/8450833108227325952
Model deleted. . Resource name: projects/1049843053967/locations/us-central1/models/4312862947253682176
