### For this lab, you would need to create a Cosmos DB container with vector indexing and search features enabled. 
(read more: https://learn.microsoft.com/en-us/python/api/overview/azure/cosmos-readme?view=azure-python)




### Set up steps

1- Enable Vector indexing and search (Vector Search for NoSQL API) and Fulle Text search (Preview Features for Full Text Search) in Azure Cosmos DB for NoSQL via Features page of your Azure Cosmos DB:
<img src=".\imgs\cosmos_policy.png" alt="description" width="900" height="300"/>

2- Define vector embedding policy, indexing policy and full text policy which will then be used for creating the container:

In [28]:
vector_embedding_policy = {
    "vectorEmbeddings": [
        {
            "path":"/vector1",
            "dataType":"float32",
            "distanceFunction":"cosine",
            "dimensions":1536
        },
                {
            "path":"/vector2",
            "dataType":"float32",
            "distanceFunction":"cosine",
            "dimensions":1536
        }
    ]
}

In [29]:
full_text_policy = {
    "defaultLanguage": "en-US",
    "fullTextPaths": [
        {
            "path": "/description",
            "language": "en-US"
        },
                {
            "path": "/name",
            "language": "en-US"
        }
    ]
}

In [30]:
indexing_policy = {
    "indexingMode": "consistent",
    "automatic": True,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/_etag/?"
        },
        {
            "path": "/vector1/*"
        }],
    "vectorIndexes": [
            {"path": "/vector1", "type": "diskANN"},
        ],

    "fullTextIndexes": [
            {"path": "/description"},
            {"path": "/name"}
        ]
}

3- Create the container "reviews"

In [31]:
from azure.cosmos import CosmosClient, PartitionKey
import os
from dotenv import load_dotenv
load_dotenv()


databaseName = os.getenv("COSMOS_DATABASE_NAME")
containerName =  "reviews"
client = CosmosClient.from_connection_string(os.getenv("COSMOS_CONNECTION_STRING"))
database = client.get_database_client(databaseName)
database.create_container(id=containerName, partition_key=PartitionKey(path="/category"),
                          indexing_policy=indexing_policy,
                          vector_embedding_policy=vector_embedding_policy,
                          full_text_policy=full_text_policy)

<ContainerProxy [dbs/Contoso/colls/reviews]>

4- Populate with data + vectors

In [34]:
from azure.cosmos.aio import CosmosClient
import os
import json

# connect to embedding service
from semantic_kernel.connectors.ai.open_ai import AzureTextEmbedding
import os
api_key = os.getenv("OPENAI_API_KEY")
embedding_service = AzureTextEmbedding(
    deployment_name="text-embedding-ada-002",
    api_key= os.getenv('AZURE_OPENAI_KEY'),
    endpoint= os.getenv('AZURE_OPENAI_EMBED_ENDPOINT'),
    base_url= os.getenv('AZURE_OPENAI_BASE_EMBED_URL'))

# Load product reviews from JSON file
with open('./src/sample_products.json', 'r', encoding="utf-8") as f:
    reviews = json.load(f)

databaseName = os.getenv("COSMOS_DATABASE_NAME")
containerName =  "reviews"
async def create_products(reviews):
    async with CosmosClient.from_connection_string(os.getenv("COSMOS_CONNECTION_STRING")) as client: # the with statement will automatically initialize and close the async client
        database = client.get_database_client(databaseName)
        container = database.get_container_client(containerName)
        for i in range(len(reviews)):
            reviews[i]['id'] = str(i+1)
            embedding = (await embedding_service.generate_embeddings([reviews[i]["description"]]))[0]
            embedding_list = embedding.tolist()
            # reviews[i]['vector1_diskann'] = embedding_list
            reviews[i]['vector1'] = embedding_list
            reviews[i]['vector2'] = embedding_list
            await container.upsert_item(reviews[i])
            
    print(f"Inserted {len(reviews)} records into the container '{containerName}' in database '{databaseName}'.")

In [35]:
await create_products(reviews)

Inserted 1956 records into the container 'reviews' in database 'Contoso'.


### examples

#### Full text search

In [36]:
from azure.cosmos import CosmosClient
import os
from dotenv import load_dotenv
load_dotenv()

client = CosmosClient.from_connection_string(os.getenv("COSMOS_CONNECTION_STRING"))
containerName = "reviews"
databaseName = os.getenv("COSMOS_DATABASE_NAME")
database = client.get_database_client(databaseName)
container = database.get_container_client(containerName)

In [37]:
ex1 =container.query_items(
    query="SELECT TOP 5 * FROM c WHERE FullTextContainsAll(c.description, 'watch', 'health')",
    enable_cross_partition_query=True)

In [38]:
for item in ex1:
    print("ID: ", item.get('id'), "Name: ", item.get('name'), "Description: ", item.get('description'))

ID:  10 Name:  Apple Watch Series 8 Description:  The Apple Watch Series 8 integrates advanced health and fitness tracking with a bright, always-on Retina display. It offers ECG, blood oxygen monitoring, and temperature sensing, making it a smart companion for holistic wellness. With improved crash detection, fast charging, and seamless connectivity with the iPhone ecosystem, the Watch Series 8 is a comprehensive wearable for everyday life.
ID:  20 Name:  Samsung Galaxy Watch 5 Pro Description:  Samsung Galaxy Watch 5 Pro offers robust fitness and health monitoring in a durable titanium build. Its 1.4-inch AMOLED display is protected by sapphire crystal, and advanced GPS features make it ideal for explorers. With body composition measurement and up to 80 hours of battery life, it's one of the most feature-packed smartwatches for Android users.
ID:  125 Name:  Apple Watch Series 9 GPS Description:  The Apple Watch Series 9 GPS model offers advanced health tracking, crash detection, and 

#### Vector Search

Generate embedding for the question:

In [85]:
question = "I want a smart watch that is advanced, can track my health, has a long battery life and is waterproof."
test_embedding = (await embedding_service.generate_embeddings([question]))[0]
embedding_list = test_embedding.tolist() 

Search with index (DiskANN):

In [86]:
indexed_query = f"""
SELECT TOP 5 c.id, c.name, c.description, VectorDistance(c.vector1, {embedding_list}) AS score
FROM c
ORDER BY VectorDistance(c.vector1, {embedding_list})
"""


In [87]:
indexed_query_result =container.query_items(
    query=indexed_query,
    enable_cross_partition_query=True)

In [88]:
for item in indexed_query_result:
    print("ID: ", item.get('id'), "Name: ", item.get('name'), "Description: ", item.get('description'))

ID:  1806 Name:  Samsung Galaxy Watch 5 Description:  The Samsung Galaxy Watch 5 redefines smartwatch versatility with a robust suite of health and fitness tracking, including sleep analysis, body composition, and heart health tools. Its AMOLED display is vibrant and customizable, while the durable sapphire crystal glass resists scratches. Long battery life and compatibility with Android and iOS ensure everyday utility.
ID:  237 Name:  Samsung Galaxy Watch5 Pro Description:  The Samsung Galaxy Watch5 Pro is built for adventure with advanced GPS, turn-by-turn navigation, a titanium case, and Sapphire Crystal display. Its robust set of health features—like body composition, sleep tracking, and heart rate sensors—combined with a multi-day battery, makes it the definitive smartwatch for outdoor enthusiasts and fitness-minded users.
ID:  20 Name:  Samsung Galaxy Watch 5 Pro Description:  Samsung Galaxy Watch 5 Pro offers robust fitness and health monitoring in a durable titanium build. Its 

Max of total request units: 99

Search with no index (brute force):

In [89]:
brute_force_query = f"""
SELECT TOP 5 c.id, c.name, c.description, VectorDistance(c.vector2, {embedding_list}) AS score
FROM c
ORDER BY VectorDistance(c.vector2, {embedding_list})
"""

In [90]:
brute_force_query_result =container.query_items(
    query=brute_force_query,
    enable_cross_partition_query=True)

In [91]:
for item in brute_force_query_result:
    print("ID: ", item.get('id'), "Name: ", item.get('name'), "Description: ", item.get('description'))

ID:  1806 Name:  Samsung Galaxy Watch 5 Description:  The Samsung Galaxy Watch 5 redefines smartwatch versatility with a robust suite of health and fitness tracking, including sleep analysis, body composition, and heart health tools. Its AMOLED display is vibrant and customizable, while the durable sapphire crystal glass resists scratches. Long battery life and compatibility with Android and iOS ensure everyday utility.
ID:  237 Name:  Samsung Galaxy Watch5 Pro Description:  The Samsung Galaxy Watch5 Pro is built for adventure with advanced GPS, turn-by-turn navigation, a titanium case, and Sapphire Crystal display. Its robust set of health features—like body composition, sleep tracking, and heart rate sensors—combined with a multi-day battery, makes it the definitive smartwatch for outdoor enthusiasts and fitness-minded users.
ID:  20 Name:  Samsung Galaxy Watch 5 Pro Description:  Samsung Galaxy Watch 5 Pro offers robust fitness and health monitoring in a durable titanium build. Its 

Max of total request units: 206

Both yielded same results, both missed Fitbit Charge 5 as the most compatible (only one that has also water resistence feature.)

#### Hybrid search
Learn more: https://learn.microsoft.com/en-us/azure/cosmos-db/gen-ai/hybrid-search?context=/azure/cosmos-db/nosql/context/context


In [141]:
hybrid_query = f"""SELECT TOP 5 * 
                    FROM c
                    ORDER BY RANK RRF(VectorDistance(c.vector1, {embedding_list}), FullTextScore(c.description,'water', 'battery', 'health'))"""

In [142]:

hybrid_query_result =container.query_items(
query=hybrid_query,
enable_cross_partition_query=True)


In [143]:
for item in hybrid_query_result:
    print("ID: ", item.get('id'), "Name: ", item.get('name'), "Description: ", item.get('description'))

ID:  677 Name:  Fitbit Charge 5 Advanced Fitness Tracker Description:  The Fitbit Charge 5 Advanced Fitness Tracker helps users stay on top of their health and wellness goals, offering features like built-in GPS, ECG monitoring, and a bright AMOLED display. Its health metrics dashboard tracks heart rate, sleep, stress, and SpO2 levels. With up to seven days of battery life, 24/7 activity tracking, and water resistance, it’s ideal for active lifestyles.
ID:  560 Name:  Apple Watch Ultra 2 Description:  Apple Watch Ultra 2 is engineered for adventure, durability, and exploration. It offers the brightest Apple display yet, customizable action button, advanced GPS, and water resistance up to 100 meters. With new trail running and diving features, precision health sensors, and long battery life, it’s the ultimate smartwatch for extreme sports and daily use.
ID:  1806 Name:  Samsung Galaxy Watch 5 Description:  The Samsung Galaxy Watch 5 redefines smartwatch versatility with a robust suite o