# SK: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.
Pre-requisite: You have already run L4-SK-CreateDB notebook to import the product catalog CSV file to a Chroma vector DB

In [1]:
#pip install --upgrade semantic-kernel

In [2]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [21]:
import semantic_kernel as sk
import os
import logging
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('__name__')
kernel=sk.Kernel(log=logger)

api_key = os.environ['OPENAI_API_KEY']
kernel.add_chat_service(
        "chat-gpt", OpenAIChatCompletion("gpt-3.5-turbo-0301", api_key)
)

<semantic_kernel.kernel.Kernel at 0x7efbf1d208e0>

In [22]:
from semantic_kernel.connectors.ai.open_ai import OpenAITextEmbedding
kernel.add_text_embedding_generation_service(
        "ada", OpenAITextEmbedding("text-embedding-ada-002", api_key)
    )

<semantic_kernel.kernel.Kernel at 0x7efbf1d208e0>

In [23]:
# Use the Chroma VectorDB create in previous notebook (L4-SK-CreateDB.ipynb)
from semantic_kernel.connectors.memory.chroma import ChromaMemoryStore
kernel.register_memory_store(memory_store=ChromaMemoryStore(persist_directory="catalog"))

INFO:chromadb.telemetry.posthog:Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
INFO:chromadb.db.duckdb:loaded in 1000 embeddings
INFO:chromadb.db.duckdb:loaded in 1 collections


In [6]:
query ="Please suggest a shirt with sunblocking"

In [7]:
# Query the vector DB locally
docs = await kernel.memory.search_async(collection="outdoordb", limit=5, min_relevance_score=0.3, query=query)

DEBUG:openai:message='Request to OpenAI API' method=post path=https://api.openai.com/v1/embeddings
DEBUG:openai:api_version=None data='{"model": "text-embedding-ada-002", "input": ["Please suggest a shirt with sunblocking"], "encoding_format": "base64"}' message='Post details'
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=26 request_id=e979e4e21026ba4829005e161306a8c9 response_code=200
DEBUG:chromadb.db.index.hnswlib:time to pre process our knn query: 4.0531158447265625e-06
DEBUG:chromadb.db.index.hnswlib:time to run knn query: 0.00043582916259765625


In [8]:
docs[0].text

'Sun Shield Shirt by  :  "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \n\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\n\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\n\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\n\nSun Protection That Won\'t Wear Off\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.'

In [24]:
# Now lets augment the LLM query with retrieval from the local vector DB with the RAG (Retrieval Augmented Generation) pattern
# The prompt below should be self explanatory of what we are tryign to do with this RAG pattern
async def ragqna(kernel, query, limit) -> str:
    # Step1: Retrieval: Get list of documents from local DB matching the query
    docs = await kernel.memory.search_async(collection="outdoordb", limit=limit, min_relevance_score=0.3, query=query)
    # Step2: Augment: Construct the augmented prompt from the retrieved document. Retrieved docs separated by triple backticks to make it easy for LLM to instruct
    qdocs = "\n```\n".join([docs[i].text for i in range(len(docs))])
    
    prompt = """{{ $qdocs}} 
    
    Question: Please query above documents delimited by triple backticks for {{ $query }} 
    and return results in a table in markdown and summarize each one.
    """
    
    # Step3: Generation: Generate a summary and markdown formatted output as requested in the prompt from the LLM API
    summarize = kernel.create_semantic_function(prompt, temperature=0.0)
    context_variables = sk.ContextVariables(variables={
        "qdocs": qdocs,
        "query": query
    })
    response = summarize(variables=context_variables)
    return response

In [25]:
result = await ragqna(kernel, "shirts with sunblocking",3)

DEBUG:openai:message='Request to OpenAI API' method=post path=https://api.openai.com/v1/embeddings
DEBUG:openai:api_version=None data='{"model": "text-embedding-ada-002", "input": ["shirts with sunblocking"], "encoding_format": "base64"}' message='Post details'
INFO:openai:message='OpenAI API response' path=https://api.openai.com/v1/embeddings processing_ms=44 request_id=842f8c35f29c37e258f9ee5e312e5b77 response_code=200
DEBUG:chromadb.db.index.hnswlib:time to pre process our knn query: 3.814697265625e-06
DEBUG:chromadb.db.index.hnswlib:time to run knn query: 0.0004448890686035156
DEBUG:__name__:Extracting blocks from template: {{ $qdocs}} 
    
    Question: Please query above documents delimited by triple backticks for {{ $query }} 
    and return results in a table in markdown and summarize each one.
    
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:__name__:Rendering string template: {{ $qdocs}} 
    
    Question: Please query above documents delimited by triple backticks for

In [26]:
str(result)


"| Shirt Name | Sun Protection Rating | Fabric Composition | Additional Features |\n| --- | --- | --- | --- |\n| Sun Shield Shirt | UPF 50+ | 78% nylon, 22% Lycra Xtra Life fiber | Moisture-wicking, abrasion-resistant, fits over swimsuit |\n| Men's Plaid Tropic Shirt | UPF 50+ | 52% polyester, 48% nylon | Wrinkle-free, front and back cape venting, two front bellows pockets |\n| Women's Tropical Tee | UPF 50+ | Shell: 71% nylon, 29% polyester. Cape lining: 100% polyester | Wrinkle-resistant, low-profile pockets, front and back cape venting, two front pockets, tool tabs, eyewear loop |\n\nThe Sun Shield Shirt, Men's Plaid Tropic Shirt, and Women's Tropical Tee all offer UPF 50+ sun protection, blocking 98% of the sun's harmful rays. The Sun Shield Shirt is made of 78% nylon and 22% Lycra Xtra Life fiber, and is moisture-wicking and abrasion-resistant. The Men's Plaid Tropic Shirt is made of 52% polyester and 48% nylon"

In [27]:
from IPython.display import display, Markdown
display(Markdown(str(result)))

| Shirt Name | Sun Protection Rating | Fabric Composition | Additional Features |
| --- | --- | --- | --- |
| Sun Shield Shirt | UPF 50+ | 78% nylon, 22% Lycra Xtra Life fiber | Moisture-wicking, abrasion-resistant, fits over swimsuit |
| Men's Plaid Tropic Shirt | UPF 50+ | 52% polyester, 48% nylon | Wrinkle-free, front and back cape venting, two front bellows pockets |
| Women's Tropical Tee | UPF 50+ | Shell: 71% nylon, 29% polyester. Cape lining: 100% polyester | Wrinkle-resistant, low-profile pockets, front and back cape venting, two front pockets, tool tabs, eyewear loop |

The Sun Shield Shirt, Men's Plaid Tropic Shirt, and Women's Tropical Tee all offer UPF 50+ sun protection, blocking 98% of the sun's harmful rays. The Sun Shield Shirt is made of 78% nylon and 22% Lycra Xtra Life fiber, and is moisture-wicking and abrasion-resistant. The Men's Plaid Tropic Shirt is made of 52% polyester and 48% nylon