### For this lab, you would need: 
1. In your Azure postgreSQL account go to settings -> server parameters -> search for azure.extensions
2. Turn on the following extensions:
- AZURE_AI
- PG_DISKANN
- VECTOR
3. Make sure to save the changes so they are deployed.



### Populate an example table of products

In Terminal, go to src folder and run below to set up the product_catalogue table:

- **python product_catalog_init.py**

### import main requirements

In [None]:
from semantic_kernel.connectors.ai.open_ai import AzureTextEmbedding
import os
from dotenv import load_dotenv
load_dotenv(override=True)
import psycopg2
from pgvector.psycopg2 import register_vector
from src.get_conn import get_connection_uri

### Set up and populate vector tables
As we saw in session 4 that first step is to generate embeddings and populate the table with vectors for descriptions (batch vector generation and insertion):

In [2]:
def get_embed_service():
    embedding_service = AzureTextEmbedding(
        deployment_name="text-embedding-ada-002",
        api_key= os.getenv('AZURE_OPENAI_KEY'),
        endpoint= os.getenv('AZURE_OPENAI_EMBED_ENDPOINT'),
        base_url= os.getenv('AZURE_OPENAI_BASE_EMBED_URL'))
    return embedding_service

#### Create the table

In [None]:
conn_uri = get_connection_uri()
with psycopg2.connect(conn_uri) as conn:
    with conn.cursor() as cur:
        # create/replace product_catalogue_vectors table 
        cur.execute("""
            DROP TABLE IF EXISTS product_catalogue_vectors CASCADE;
            CREATE TABLE product_catalogue_vectors (
                vector_id SERIAL PRIMARY KEY,
                id INTEGER REFERENCES product_catalogue(id),
                description TEXT NOT NULL,
                embedding vector(1536) NOT NULL
            );
        """)
        conn.commit()



Connection uri was rertieved successfully.


#### Populate with embedding vectors

In [None]:
#generate embeddings for product descriptions and store them
embedding_service = get_embed_service()
conn_uri = get_connection_uri()
with psycopg2.connect(conn_uri) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT id, description FROM product_catalogue")
        for row in cur.fetchall():
            prod_id, desc = row
            response = await embedding_service.generate_embeddings([desc])
            embedding = response[0]
            cur.execute(
                "INSERT INTO product_catalogue_vectors (embedding, id, description) VALUES (%s,%s,%s)", (embedding.tolist(), prod_id, desc)
            )
        conn.commit()
                    
print("Product_catalogue_vectors table created and populated with embeddings.")


All embeddings inserted into product_catalogue_vectors table.


#### Build the diskANN index:

In [3]:
conn_uri = get_connection_uri()
with psycopg2.connect(conn_uri) as conn:
    with conn.cursor() as cur:
        query = """ CREATE INDEX product_catalogue_index ON Product_catalogue_vectors 
        USING diskann (embedding vector_cosine_ops)"""
        cur.execute(query)
        conn.commit()


Connection uri was rertieved successfully.


### Examples

#### Generate embedding for the sample question:

In [3]:
embedding_service = get_embed_service()
question = "I want a smart watch that can track my health, has a long battery life and is water resistant."
test_embedding = (await embedding_service.generate_embeddings([question]))[0]
embedding_list = test_embedding.tolist() 

#### Search for top results:

In [4]:
conn_uri = get_connection_uri()
with psycopg2.connect(conn_uri) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT id, embedding, description
            FROM product_catalogue_vectors
            ORDER BY embedding <-> %s
            LIMIT %s;
            """,
            (str(embedding_list), 10)
        )

        rows = cur.fetchall()

tuple_list = []
for row in rows:
    product_id, embedding, desc = row
    tuple_list.append((product_id, desc))

    print(f"Product ID: {product_id} - Description: {desc}")

Connection uri was rertieved successfully.
Product ID: 1806 - Description: The Samsung Galaxy Watch 5 redefines smartwatch versatility with a robust suite of health and fitness tracking, including sleep analysis, body composition, and heart health tools. Its AMOLED display is vibrant and customizable, while the durable sapphire crystal glass resists scratches. Long battery life and compatibility with Android and iOS ensure everyday utility.
Product ID: 20 - Description: Samsung Galaxy Watch 5 Pro offers robust fitness and health monitoring in a durable titanium build. Its 1.4-inch AMOLED display is protected by sapphire crystal, and advanced GPS features make it ideal for explorers. With body composition measurement and up to 80 hours of battery life, it's one of the most feature-packed smartwatches for Android users.
Product ID: 507 - Description: Huawei Watch GT 3 blends classic aesthetics with advanced health tracking, including SpO2, sleep, heart rate, and over 100 fitness modes. I

#### How can we improve accuracy in the results? 

We expected a product with all key features to be ranked as the top result, but it is not. Let's try semantic reranking to see if we can bump it up the list:

Setting up Azure OpenAI connection so that we can use our deployed gpt-4.1 model for reranking:

In [5]:
conn_uri = get_connection_uri()
endpoint = os.getenv('AZURE_OPENAI_ENDPOINT')
com_index = endpoint.find('.com')
truncated_endpoint = endpoint[:com_index + 4]
with psycopg2.connect(conn_uri) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        # set Azure OpenAI settings
        setting_query = """ 
                SELECT azure_ai.set_setting('azure_openai.endpoint', %s);
                SELECT azure_ai.set_setting('azure_openai.subscription_key', %s);
                """

        cur.execute(setting_query, (truncated_endpoint, os.getenv('AZURE_OPENAI_API_KEY')))
        conn.commit()


Connection uri was rertieved successfully.


Let's first use the extract operator to get first 3 features requested by user:

In [6]:
conn_uri = get_connection_uri()
with psycopg2.connect(conn_uri) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        rerank_query = f"""
        SELECT azure_ai.extract(
                'I want a smart watch that can track my health, has a long battery life and is water resistant.',
                ARRAY['feature1', 'feature2', 'feature3'],
                'gpt-4.1'
                )
        """
        cur.execute(rerank_query)
        ranked_products = cur.fetchall()
features =  ranked_products[0][0]['feature1'] + ', '+ ranked_products[0][0]['feature2'] + ', ' + ranked_products[0][0]['feature3'] 

Connection uri was rertieved successfully.


In [24]:
features

'health tracking, long battery life, water resistance'

Now let's use these features to rerank the results:

In [25]:
conn_uri = get_connection_uri()
with psycopg2.connect(conn_uri) as conn:
    register_vector(conn)
    with conn.cursor() as cur:
        rerank_query = f"""
        WITH similar_products AS (
            SELECT id, embedding, description
            FROM product_catalogue_vectors_diskann
            ORDER BY embedding <-> %s
            LIMIT %s
        )
        SELECT rank, description, id
        FROM azure_ai.rank(
                %s,
                ARRAY(SELECT description FROM similar_products ORDER BY id ASC),
                ARRAY(SELECT id FROM similar_products ORDER BY id ASC),
                'gpt-4.1'
                )
        LEFT JOIN
            similar_products USING (id)
        ORDER BY
            rank ASC;
        """
        cur.execute(rerank_query, (str(embedding_list),10, features))
        ranked_products = cur.fetchall()
        

Connection uri was rertieved successfully.


Now Fitbit Charge 5 which has all 3 requirements is ranked 1!

In [26]:
for r in ranked_products:
    rank, description, id = r
    print(f"Rank: {rank}, ID: {id}, Description: {description}")

Rank: 1, ID: 1144, Description: The Xiaomi Mi Watch S1 Active is a stylish, fitness-focused smartwatch featuring an Always-On 1.43-inch AMOLED display, over 117 sport modes, GPS, heart rate, and SpO2 monitoring. With up to 12 days of battery life and 5ATM water resistance, it’s suitable for sports enthusiasts, swimmers, and everyday use. Notifications, Bluetooth calling, and app integrations make it a comprehensive lifestyle companion.
Rank: 2, ID: 507, Description: Huawei Watch GT 3 blends classic aesthetics with advanced health tracking, including SpO2, sleep, heart rate, and over 100 fitness modes. Its AMOLED display and two-week battery life, plus built-in GPS and Bluetooth calling, make it a top choice for tech-savvy users and athletes.
Rank: 3, ID: 1017, Description: Casio GBD-H1000 is a rugged fitness smartwatch with built-in GPS, heart rate monitor, and solar-assisted charging. Military-grade shock resistance and 200m water resistance make it perfect for athletes and adventurer