# Lab 3.3 - Azure AI Search



# Step 1: Upload data into Azure Blob Storage

1. If you don't have one, create a new Azure Storage account
2. Inside that account, create a new container called `hotels`
3. Inside that account, create a new container called `unsplash-images`
   
![Storage Account](../img/search02.png)

4. Clone the https://github.com/Azure-Samples/azure-search-sample-data repo
5. Upload the `./hotels/HotelsData_toAzureSearch.JSON` file to the `hotels` container
6. Upload the contents of the `./unsplah-images/landmarks` folder to the `unsplash-images` container

# Step 2: Create a new Azure Search service

1. In the Azure Portal, create a new Azure AI Search service.
2. Choose France Central (or the region you used before)
3. Choose Basic or higher tier
4. You can leave everything else as default
5. Click on Review + Create

![AI Search service](../img/search01.png)

# Step 3: Import hotel data into Azure AI Search

1. In Azure AI Search, start the "Import and vectorize data" wizard

![Import and Vectorize](../img/search03.png)

2. Select the Azure Blob Storage account and the hotels container
3. Check `Enable deletion tracking` and click on Next
4. Choose the Azure OpenAI Service created before
5. Choose an embedding model which will be used to vectorize the data. If you don't have one yet, just go with `text-embedding-ada-002`
6. Click Next
7. Leave `Vectorize images` and `Extract text from images` unchecked and click Next
8. Leave `Enable semantic ranker` checked and click Next
9. Give it a friendly name in `Objects name prefix` (eg: hotels) and click Create

# Step 4: Experiment with queries

1. Make sure the indexing job is complete (it may take a few minutes). You should see ~45 documents in the Index.

![Index](../img/search04.png)


In [None]:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery

# Load the environment variables with dotenv
from dotenv import load_dotenv
load_dotenv()
import os

service_endpoint = os.getenv("AZURE_AI_SEARCH_ENDPOINT")
key = os.getenv("AZURE_AI_SEARCH_API_KEY")
index_name = "hotels" # replace this if you named your index differently

# Create a client
search_client = SearchClient(service_endpoint, index_name, AzureKeyCredential(key))

print('Using endpoint', service_endpoint)
print('Using index', index_name)

In [None]:
# The deployment name of the embedding model used in Azure AI Search
embedding_deployment_name = "ada-embedding" # Change this if you used another one

from openai import AzureOpenAI
import json 

# Endpoint and API key can be found in Azure AI Studio -> Project Settings -> Project Properties -> Get API endpoints
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
# The name of the deployment to tbe used. Found in Azure AI Studio -> Deployments
AZURE_OPENAI_DEPLOYMENT_ID = os.getenv("AZURE_OPENAI_DEPLOYMENT_ID")

# Print the endpoint to verify it was loaded correctly
print('If you see some text below, the endpoint was loaded successfully.')
print(AZURE_OPENAI_ENDPOINT)

client = AzureOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_API_KEY,
    api_version="2024-02-01",
    azure_deployment=embedding_deployment_name
)


In [None]:
# Let's see vectors in action

query = "Which ones are city center hotels?"
response = client.embeddings.create(input=[query], model=embedding_deployment_name)
embeddings = response.data[0].embedding

print("Vector length is ", len(embeddings))
print("Vector is ", embeddings)

In [None]:
# Now let's do an actual search using vectors

vector_query = VectorizedQuery(vector=embeddings, k_nearest_neighbors=3, fields="text_vector")
results = search_client.search(
    search_text=query, 
    query_type="semantic", 
    vector_queries=[vector_query])

# view results
for result in results:
    print(json.dumps(result, indent=2))

In [None]:
# And another one
query = "Which hotels are in the mountains?"
response = client.embeddings.create(input=[query], model=embedding_deployment_name)
embeddings = response.data[0].embedding

vector_query = VectorizedQuery(vector=embeddings, k_nearest_neighbors=3, fields="text_vector")
results = search_client.search(
    search_text=query, 
    query_type="semantic", 
    vector_queries=[vector_query])

# view results
for result in results:
    print(json.dumps(result, indent=2))

## Optional next steps

There are many labs and end-to-end examples available in the Azure AI Search documentation and in GitHub. Here's some you could try:

* https://github.com/Azure/azure-search-vector-samples/tree/main
* https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/basic-vector-workflow/azure-search-vector-python-sample.ipynb