In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [None]:
# This file has been modified by ROI Training, Inc. on 05/10/2024

# Getting Started with Text Embeddings + Vector Search


## Overview and context

### 1. Introduction

In this tutorial, you learn how to use Google Cloud AI tools to quickly bring the power of Large Language Models to enterprise systems.  

This tutorial covers the following -

*   What are embeddings - what business challenges do they help solve ?
*   Understanding Text with Vertex AI Text Embeddings
*   Find Embeddings fast with Vertex AI Vector Search
*   Grounding LLM outputs with Vector Search

This tutorial is based on [the blog post](https://cloud.google.com/blog/products/ai-machine-learning/how-to-use-grounding-for-your-llms-with-text-embeddings), combined with sample code.

### 2. Bringing Gen AI and LLMs to production services

Many people are now starting to think about how to bring Gen AI and LLMs to production services, and facing with several challenges.

- "How to integrate LLMs or AI chatbots with existing IT systems, databases and business data?"
- "We have thousands of products. How can I let LLM memorize them all precisely?"
- "How to handle the hallucination issues in AI chatbots to build a reliable service?"

Here is a quick solution: **grounding** with **embeddings** and **vector search**.

What is grounding? What are embedding and vector search? In this tutorial, we will learn these crucial concepts to build reliable Gen AI services for enterprise use. But before we dive deeper, let's try the demo below.

### 3. What are Embeddings?

With the rise of LLMs, why is it becoming important for IT engineers and ITDMs to understand how they work?

In traditional IT systems, most data is organized as structured or tabular data, using simple keywords, labels, and categories in databases and search engines.

<div style="text-align: center;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/1.png" 
         width="650">
</div>
<br>

In contrast, AI-powered services arrange data into a simple data structure known as "embeddings."
<div style="text-align: center;">
    <img 
         style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/2.png"
         width="650"
    >
</div>

Once trained with specific content like text, images, or any content, AI creates a space called "embedding space", which is essentially a map of the content's meaning.


<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/3.png"
         width="650">
</div>

AI can identify the location of each content on the map, that's what embedding is.
                   
<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/4.png"
         width="650">
</div>

Let's take an example where a text discusses movies, music, and actors, with a distribution of 10%, 2%, and 30%, respectively. In this case, the AI can create an embedding with three values: 0.1, 0.02, and 0.3, in 3 dimensional space.

<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/5.png"
         width="650">
</div>

AI can put content with similar meanings closely together in the space.
                   
This is how Google organizes data across various services like Google Search, YouTube, Play, and many others, to provide search results and recommendations with relevant content.

Embeddings can also be used to represent different types of things in businesses, such as products, users, user activities, conversations, music & videos, signals from IoT sensors, and so on.

This is how Google organizes data across various services like Google Search, YouTube, Play, and many others, to provide search results and recommendations with relevant content.

Embeddings can also be used to represent different types of things in businesses, such as products, users, user activities, conversations, music & videos, signals from IoT sensors, and so on.

AI and Embeddings are now playing a crucial role in creating a new way of human-computer interaction.

<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/6.png"
         width="650">
</div>


AI organizes data into embeddings, which represent what the user is looking for, the meaning of contents, or many other things you have in your business. This creates a new level of user experience that is becoming the new standard.

To learn more about embeddings, [Foundational courses: Embeddings on Google Machine Learning Crush Course](https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture) and [Meet AI’s multitool: Vector embeddings by Dale Markowitz](https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings) are great materials.



### 4. Vertex AI Embeddings for Text

With the [Vertex AI Embeddings for Text](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings), you can easily create a text embedding with LLM. The product is also available on [Vertex AI Model Garden](https://cloud.google.com/model-garden)

<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/7.png"
         width="650">
</div>


This API is designed to extract embeddings from texts. It can take text input up to 3,072 input tokens, and outputs 768 dimensional text embeddings.

### 5. LLM text embedding business use cases

With the embedding API, you can apply the innovation of embeddings, combined with the LLM capability, to various text processing tasks, such as:

**LLM-enabled Semantic Search**: text embeddings can be used to represent both the meaning and intent of a user's query and documents in the embedding space. Documents that have similar meaning to the user's query intent will be found fast with vector search technology. The model is capable of generating text embeddings that capture the subtle nuances of each sentence and paragraphs in the document.

**LLM-enabled Text Classification**: LLM text embeddings can be used for text classification with a deep understanding of different contexts without any training or fine-tuning (so-called zero-shot learning). This wasn't possible with the past language models without task-specific training.

**LLM-enabled Recommendation**: The text embedding can be used for recommendation systems as a strong feature for training recommendation models such as Two-Tower model. The model learns the relationship between the query and candidate embeddings, resulting in next-gen user experience with semantic product recommendation.

LLM-enabled Clustering, Anomaly Detection, Sentiment Analysis, and more, can be also handled with the LLM-level deep semantics understanding.




## Lab Tasks - Creating Embeddings

### 6. Install Python SDK

Vertex AI, Cloud Storage and BigQuery APIs can be accessed with multiple ways including REST API and Python SDK. In this tutorial we will use the SDK.

In [None]:
!pip install --upgrade --user google-cloud-aiplatform>=1.29.0 google-cloud-storage 'google-cloud-bigquery[pandas]'

To use the newly installed packages in this Jupyter runtime, we need to restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

### 7. Environment variables

Sets environment variables. If asked, please replace the following `[your-project-id]` with your project ID and run it.

In [None]:
# get project ID
PROJECT_ID = ! gcloud config get project
PROJECT_ID = PROJECT_ID[0]
LOCATION = "us-central1"
if PROJECT_ID == "(unset)":
  print(f"Please set the project ID manually below")

In [None]:
# define project information
if PROJECT_ID == "(unset)":
  PROJECT_ID = "[your-project-id]" # @param {type:"string"}

# generate an unique id for this session
from datetime import datetime
UID = datetime.now().strftime("%m%d%H%M")

### 8. Enable APIs

Run the following to enable APIs for Compute Engine, Vertex AI, Cloud Storage and BigQuery with this Google Cloud project.

In [None]:
! gcloud services enable compute.googleapis.com aiplatform.googleapis.com storage.googleapis.com bigquery.googleapis.com --project {PROJECT_ID}

### 9. Data Preparation

We will be using [the Stack Overflow public dataset](https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow) hosted on BigQuery table `bigquery-public-data.stackoverflow.posts_questions`. This is a very big dataset with 23 million rows that doesn't fit into the memory. We are going to limit it to 1000 rows for this tutorial.

In [None]:
# load the BQ Table into a Pandas Dataframe
import pandas as pd
from google.cloud import bigquery

QUESTIONS_SIZE = 1000

bq_client = bigquery.Client(project = PROJECT_ID)
QUERY_TEMPLATE = """
        SELECT distinct q.id, q.title
        FROM (SELECT * FROM `bigquery-public-data.stackoverflow.posts_questions`
        where Score > 0 ORDER BY View_Count desc) AS q
        LIMIT {limit} ;
        """
query = QUERY_TEMPLATE.format(limit = QUESTIONS_SIZE)
query_job = bq_client.query(query)
rows = query_job.result()
df = rows.to_dataframe()

# examine the data
df.head()
print(df.shape)

### 10. Call the API to generate embeddings

With the Stack Overflow dataset, we will use the `title` column (the question title) and generate embedding for it with Embeddings for Text API. The API is available under the [vertexai](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai) package of the SDK.

You may see some warning messages from the TensorFlow library but you can ignore them.

In [None]:
# init the vertexai package
import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)

From the package, import [TextEmbeddingModel](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.language_models.TextEmbeddingModel) and get a model.

In [None]:
# Load the text embeddings model
from vertexai.preview.language_models import TextEmbeddingModel
model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")

In this tutorial we will use `textembedding-gecko@001` model for getting text embeddings. Please take a look at [Supported models](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#supported_models) on the doc to see the list of supported models.

Once you get the model, you can call its [get_embeddings](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.language_models.TextEmbeddingModel#vertexai_language_models_TextEmbeddingModel_get_embeddings) function to get embeddings. You can pass up to 5 texts at once in a call. But there is a caveat. By default, the text embeddings API has a "request per minute" quota set to 60 for new Cloud projects and 600 for projects with usage history (see [Quotas and limits](https://cloud.google.com/vertex-ai/docs/quotas#request_quotas) to check the latest quota value for `base_model:textembedding-gecko`). So, rather than using the function directly, you may want to define a wrapper like below to limit under 10 calls per second, and pass 5 texts each time.

In [None]:
import time
import tqdm # to show a progress bar

# get embeddings for a list of texts
BATCH_SIZE = 5
def get_embeddings_wrapper(texts):

  embs = []
  for i in tqdm.tqdm(range(0, len(texts), BATCH_SIZE)):
    time.sleep(1) # to avoid the quota error
    result = model.get_embeddings(texts[i:i + BATCH_SIZE])
    embs = embs + [e.values for e in result]
  return embs

The following code will get embedding for the question titles and add them as a new column `embedding` to the DataFrame. This will take a few minutes.

In [None]:
# get embeddings for the question titles and add them as "embedding" column
df = df.assign(embedding=get_embeddings_wrapper(df.title.tolist()))
df.head()
df.shape

### 11. Look at the embedding similarities

Let's see how these embeddings are organized in the embedding space with their meanings by quickly calculating the similarities between them and sorting them.

As embeddings are vectors, you can calculate similarity between two embeddings by using one of the popular metrics like the followings:

<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/8.png"
         width="650">
</div>


Which metric should we use? Usually it depends on how each model is trained. In case of the model `textembedding-gecko@001`, we need to use inner product (dot product).

In the following code, it picks up one question randomly and uses the numpy `np.dot` function to calculate the similarities between the question and other questions.

In [None]:
import random
import numpy as np

# pick one of them as a key question
key = random.randint(0, len(df))

# calc dot product between the key and other questions
embs = np.array(df.embedding.to_list())
similarities = np.dot(embs[key], embs.T)

# print similarities for the first 5 questions
similarities[:5]

Finally, sort the questions with the similarities and print the list.

In [None]:
# print the question
print(f"Key question: {df.title[key]}\n")

# sort and print the questions by similarities
sorted_questions = sorted(zip(df.title, similarities), key=lambda x: x[1], reverse=True)[:20]
for i, (question, similarity) in enumerate(sorted_questions):
  print(f"{similarity:.4f} {question}")

## Lab Tasks - Vertex Search

As we have explained above, you can find similar embeddings by calculating the distance or similarity between the embeddings.

But this isn't easy when you have millions or billions of embeddings. For example, if you have 1 million embeddings with 768 dimensions, you need to repeat the distance calculations for 1 million x 768 times. This would take some seconds - too slow

So the researchers have been studying a technique called [Approximate Nearest Neighbor (ANN)](https://en.wikipedia.org/wiki/Nearest_neighbor_search) for faster search. ANN uses "vector quantization" for separating the space into multiple spaces with a tree structure. This is similar to the index in relational databases for improving the query performance, enabling very fast and scalable search with billions of embeddings.

With the rise of LLMs, the ANN is getting popular quite rapidly, known as the Vector Search technology.

<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/gweb-cloudblog-publish/images/7._ANN.1143068821171228.max-2200x2200.png"
         width="650">
</div>

In 2020, Google Research published a new ANN algorithm called [ScaNN](https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html). It is considered one of the best ANN algorithms in the industry, also the most important foundation for search and recommendation in major Google services such as Google Search, YouTube and many others.

Google Cloud developers can take the full advantage of Google's vector search technology with [Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search/overview) (previously called Matching Engine). With this fully managed service, developers can just add the embeddings to its index and issue a search query with a key embedding for the blazingly fast vector search. In the case of the Stack Overflow demo, Vector Search can find relevant questions from 8 million embeddings in tens of milliseconds.


<div style="text-align: center; margin-bottom: 10px;">
    <img style="border: 1px solid gray" 
         src="https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/9.png"
         width="650">

With Vector Search, you don't need to spend much time and money building your own vector search service from scratch or using open source tools if your goal is high scalability, availability and maintainability for production systems.

### 12. Save the embeddings in a JSON file
To load the embeddings to Vector Search, we need to save them in JSON files with JSONL format. See more information in the docs at [Input data format and structure](https://cloud.google.com/vertex-ai/docs/matching-engine/match-eng-setup/format-structure#data-file-formats).

First, export the `id` and `embedding` columns from the DataFrame in JSONL format, and save it.

In [None]:
# save id and embedding as a json file
jsonl_string = df[['id', 'embedding']].to_json(orient = 'records', lines = True)
with open(f'questions.json', 'w') as f:
  f.write(jsonl_string)

# show the first few lines of the json file
! head -n 3 questions.json

Then, create a new Cloud Storage bucket and copy the file to it.

In [None]:
UID = datetime.now().strftime("%m%d%H%M")

BUCKET_URI = f"gs://{PROJECT_ID}-{UID}"
! gsutil mb -l $LOCATION -p {PROJECT_ID} {BUCKET_URI}
! gsutil cp questions.json {BUCKET_URI}

### 13. Create an Index

Our journey takes a bit of a detour here:

* Creating the index will take 1+ hours, so you won't get to see the index creation complete
* You will review the Python code (that uses the SDK) for creating an Index endpoint and Deploying the index, but since your index isn't ready, you will only run the first few lines (the code that actually builds the index is commented out)
* Ultimately, you will do queries using an index endpoint provided by the instructor

Go ahead and execute the cell below.

In [None]:
from google.cloud import aiplatform
PROJECT_ID = "roi-cisco-genai"
aiplatform.init(project=PROJECT_ID, location=LOCATION)

# # create index
# my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
#   display_name = f"vertex_search_lab",
#   contents_delta_uri = BUCKET_URI,
#   dimensions = 768,
#   approximate_neighbors_count = 20,
#   distance_measure_type = "DOT_PRODUCT_DISTANCE",
#   sync = False
# )
# print("See your index at https://console.cloud.google.com/vertex-ai/matching-engine/indexes")

### 14. Create Index Endpoint and deploy the Index

Again, this code is for review, so it has been commented out

In [None]:
# # create IndexEndpoint
# my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
#   display_name = f"embvs-tutorial-index-endpoint-{UID}",
#   public_endpoint_enabled = True,
# )

# DEPLOYED_INDEX_ID = f"embvs_tutorial_deployed_{UID}"

# # deploy the Index to the Index Endpoint
# my_index_endpoint.deploy_index(
#   index = my_index, deployed_index_id = DEPLOYED_INDEX_ID
# )

### 15. Set up for queries

Since you will be using an index and endpoint hosted in another project, and based on a different sampling of articles, you need to do a little setup first.

In [None]:
# Copy the instructors dataframe to your workbook instance
! gcloud storage cp gs://c-genai/my_data.csv .
df = pd.read_csv('my_data.csv', engine="python" )
df.head(5)

# register your workbook instance so you can call the endpoint
# wait until you see a message object from the curl command before going to next cell
! curl \
-X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://assign-a-role-nbmajpyjya-uc.a.run.app/iam?account_type=serviceAccount&role=projects/roi-cisco-genai/roles/cisco_class"


### 16. Run a query

Since you will be using an index and endpoint hosted in another project, and based on a different sampling of articles, you need to do a little setup first.

In [None]:
test_embeddings = get_embeddings_wrapper(["How to read JSON with Python?"])

my_index_endpoint_id = [get_from_instructor] # replace right side of assignment with string provided by instructor
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(my_index_endpoint_id)

DEPLOYED_INDEX_ID = [get_from_instructor] # replace right side of assignment with string provided by instructor

# Test query
response = my_index_endpoint.find_neighbors(
  deployed_index_id = DEPLOYED_INDEX_ID,
  queries = test_embeddings,
  num_neighbors = 20,
)

import numpy as np

for idx, neighbor in enumerate(response[0]):
  id = np.int64(neighbor.id)
  similar = df.query("id == @id", engine = "python")
  if len(similar) > 0:
      print(f"{neighbor.distance:.4f} {similar.title.values[0]}")

The `find_neighbors` function only takes milliseconds to fetch the similar items even when you have billions of items on the Index, thanks to the ScaNN algorithm. Vector Search also supports [autoscaling](https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public#autoscaling) which can automatically resize the number of nodes based on the demands of your workloads.

# Summary

## Grounding LLM outputs with Vertex AI Vector Search

As we have seen, by combining the Embeddings API and Vector Search, you can use the embeddings to "ground" LLM outputs to real business data with low latency.

For example, if an user asks a question, Embeddings API can convert it to an embedding, and issue an query on Vector Search to find similar embeddings in its index. Those embeddings represent the actual business data in the databases. As we are just retrieving the business data and not generating any artificial texts, there is no risk of having hallucinations in the result.

![](https://storage.googleapis.com/gweb-cloudblog-publish/original_images/10._grounding.png)

### The difference between the questions and answers

In this tutorial, we have used the Stack Overflow dataset. There is a reason why we had to use it; As the dataset has many pairs of **questions and answers**, so you can just find quesions similar to your question to find answers to it.

In many business use cases, the semantics (meaning) of questions and answers are different. Also, there could be cases where you would want to add variety of recommended or personalized items to the results, like product search on e-commerce sites.

In these cases, the simple semantics search don't work well. It's more like a recommendation system problem where you may want to train a model (e.g. Two-Tower model) to learn the relationship between the question embedding space and answer embedding space. Also, many production systems adds reranking phase after the semantic search to achieve higher search quality. Please see [Scaling deep retrieval with TensorFlow Recommenders and Vertex AI Matching Engine](https://cloud.google.com/blog/products/ai-machine-learning/scaling-deep-retrieval-tensorflow-two-towers-architecture) to learn more.

### Hybrid of semantic + keyword search

Another typical challenge you will face in production system is to support keyword search combined with the semantic search. For example, for e-commerce product search, you may want to let users find product by entering its product name or model number. As LLM doesn't memorize those product names or model numbers, semantic search can't handle those "usual" search functionalities.

[Vertex AI Search](https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-search-and-conversation-is-now-generally-available) is another product you may consider for those requirements. While Vector Search provides a simple semantic search capability only, Search provides a integrated search solution that combines semantic search, keyword search, reranking and filtering, available as an out-of-the-box tool.

### What about Retrieval Augmented Generation (RAG)?

In this tutorial, we have looked at the simple combination of LLM embeddings and vector search. From this starting point, you may also extend the design to [Retrieval Augmented Generation (RAG)](https://www.google.com/search?q=Retrieval+Augmented+Generation+(RAG)&oq=Retrieval+Augmented+Generation+(RAG)).

RAG is a popular architecture pattern of implementing grounding with LLM with text chat UI. The idea is to have the LLM text chat UI as a frontend for the document retrieval with vector search and summarization of the result.

![](https://storage.googleapis.com/gweb-cloudblog-publish/images/Figure-7-Ask_Your_Documents_Flow.max-529x434.png)

There are some pros and cons between the two solutions.

| | Emb + vector search | RAG |
|---|---|---|
| Design | simple | complex |
| UI | Text search UI | Text chat UI |
| Summarization of result | No | Yes |
| Multi-turn (Context aware) | No | Yes |
| Latency | millisecs | seconds |
| Cost | lower | higher |
| Hallucinations | No risk | Some risk |

The Embedding + vector search pattern we have looked at with this tutorial provides simple, fast and low cost semantic search functionality with the LLM intelligence. RAG adds context-aware text chat experience and result summarization to it. While RAG provides the more "Gen AI-ish" experience, it also adds a risk of hallucination and higher cost and time for the text generation.

To learn more about how to build a RAG solution, you may look at [Building Generative AI applications made easy with Vertex AI PaLM API and LangChain](https://cloud.google.com/blog/products/ai-machine-learning/generative-ai-applications-with-vertex-ai-palm-2-models-and-langchain).

## Resources

To learn more, please check out the following resources:

### Documentations

[Vertex AI Embeddings for Text API documentation
](https://https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings)

[Vector Search documentation](https://cloud.google.com/vertex-ai/docs/matching-engine/overview)

### Vector Search blog posts

[Vertex Matching Engine: Blazing fast and massively scalable nearest neighbor search](https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search)

[Find anything blazingly fast with Google's vector search technology](https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly-fast-googles-vector-search-technology)

[Enabling real-time AI with Streaming Ingestion in Vertex AI](https://cloud.google.com/blog/products/ai-machine-learning/real-time-ai-with-google-cloud-vertex-ai)

[Mercari leverages Google's vector search technology to create a new marketplace](https://cloud.google.com/blog/topics/developers-practitioners/mercari-leverages-googles-vector-search-technology-create-new-marketplace)

[Recommending news articles using Vertex AI Matching Engine](https://cloud.google.com/blog/products/ai-machine-learning/recommending-articles-using-vertex-ai-matching-engine)

[What is Multimodal Search: "LLMs with vision" change businesses](https://cloud.google.com/blog/products/ai-machine-learning/multimodal-generative-ai-search)

# Utilities

Sometimes it takes tens of minutes to create or deploy Indexes and you would lose connection with the Colab runtime. In that case, instead of creating or deploying new Index again, you can check [the Vector Search Console](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints) and get the existing ones to continue.

## Get an existing Index Endpoint

To get an Index Endpoint object that already exists, replace the following `[your-index-endpoint-id]` with the Index Endpoint ID and run the cell. You can check the ID on [the Vector Search Console > INDEX ENDPOINTS tab](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints).

In [None]:
my_index_id = "[your-index-id]" # @param {type:"string"}
my_index = aiplatform.MatchingEngineIndex(my_index_id)

## Get an existing Index

To get an Index object that already exists, replace the following `[your-index-id]` with the index ID and run the cell. You can check the ID on [the Vector Search Console > INDEXES tab](https://console.cloud.google.com/vertex-ai/matching-engine/indexes).

In [None]:
my_index_endpoint_id = "[your-index-endpoint-id]" # @param {type:"string"}
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(my_index_endpoint_id)


## Create an index using the REST API directly (no library)

During lab development, there were problems with the Python SDK, so this code allows creation of an index by calling the API endpoint directly

In [None]:
import requests
import os
import subprocess
import json

data = {
  "display_name": f"vertex_search_lab",
  "metadata": {
    "contentsDeltaUri": BUCKET_URI,
    "config": {
      "dimensions": 768,
      "approximateNeighborsCount": 10,
      "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
      "algorithm_config": {
        "treeAhConfig": {
          "leafNodeEmbeddingCount": 1000,
          "leafNodesToSearchPercent": 10
        }
      }
    }
  }
}

access_token = subprocess.getoutput('gcloud auth print-access-token')
headers = {
    'Authorization': f'Bearer {access_token}',
    'Content-Type': 'application/json; charset=utf-8',
}

# Set the URL
url = f"https://us-central1-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{LOCATION}/indexes"

# Make the POST request
response = requests.post(url, headers=headers, data=json.dumps(data))
content = response.json()
if "name" in content:
    print ("Index creation operation started. You can see the index by clicking on the link below")
    print ('https://console.cloud.google.com/vertex-ai/matching-engine/indexes')