## RAG Custom Knowledge AI Agent PoC, with Semantic Kernel.

In this notebook, I demonstrate a RAG use case that uses the semantic kernel framework to build an AI knwledge base agent that retrieves relevant content from Azure AI Search and uses the result to augment the query passed to an LLM and generate a response for the company employee user that grounded on the org's knowledge base repository.

The agent uses a native function or plugin (in semantic kernel lingo), to make a call to Azure AI Search that ensures the LLM is grounded on relevant and contextual information.

Dependencies include: Semantic Kernel python library, Azure AI Search SearchClient package, Azure Open AI python package. Azure blob storage was provisioned as the data store for the PDF documents and Azure AI Document Intelligence is used for document cracking and semantic chunking.

### Prerequisites

+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).
 
+ Azure AI Search, Basic tier, Azure OpenAI.

+ A deployment of the `text-embedding-ada-002` and `GPT 4o` models on Azure OpenAI.

+ Azure Blob Storage.


![Semantic chunking in RAG](https://github.com/jbernec/rag-orchestrations/blob/main/images/semantic-chunking.png?raw=true)

In [0]:
import logging
import sys
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion, OpenAIChatPromptExecutionSettings
from semantic_kernel.utils.logging import setup_logging
from semantic_kernel.prompt_template import PromptTemplateConfig, InputVariable
from services import Service
from samples.service_settings import ServiceSettings
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.functions import KernelArguments
from semantic_kernel.connectors.ai.open_ai.services.azure_text_embedding import AzureTextEmbedding

In [0]:
async def config_kernel() -> Kernel:
    pass
    # Set the logging level for the kernel to debug
    logging.getLogger(name="kernel").setLevel(level=logging.DEBUG)

    # Instantiate the kernel object and define the service_id variable
    kernel = Kernel()
    service_id = dbutils.secrets.get(scope="myscope", key="aoai-deploymentname")

    # Register the AzureChat service with the kernel
    kernel.add_service(
        AzureChatCompletion(
            service_id=dbutils.secrets.get(scope="myscope", key="aoai-deploymentname"),
            endpoint=dbutils.secrets.get(scope="myscope", key="aoai-endpoint"),
            api_key=dbutils.secrets.get(scope="myscope", key="aoai-api-key"),
            deployment_name=dbutils.secrets.get(
                scope="myscope", key="aoai-deploymentname"
            ),
        )
    )

    # Register the azure text embedding service
    kernel.add_service(
        AzureTextEmbedding(
            service_id="embedding",
            endpoint=dbutils.secrets.get(scope="myscope", key="aoai-endpoint"),
            deployment_name="embedding",
            api_key=dbutils.secrets.get(scope="myscope", key="aoai-api-key"),
        )
    )

    return kernel

In [0]:
# RUn the kernel instantiation function
kernel = await config_kernel()

In [0]:
# import memory store packages
from semantic_kernel.connectors.memory.azure_cognitive_search import AzureCognitiveSearchMemoryStore
from semantic_kernel.memory.semantic_text_memory import SemanticTextMemory
from semantic_kernel.core_plugins.text_memory_plugin import TextMemoryPlugin

In [0]:
async def config_skmemory():
    # instantiate the azure ai search store
    ai_search_store = AzureCognitiveSearchMemoryStore(
        vector_size=1536,
        search_endpoint=dbutils.secrets.get(scope="myscope", key="aisearch-endpoint"),
        admin_key=dbutils.secrets.get(scope="myscope", key="aisearch-adminkey")
    )

    # instantiate the semantic memory abstraction with the ai search store
    embedding_gen = AzureTextEmbedding(
        service_id="embedding",
        endpoint="https://aoai-srv.openai.azure.com/",
        deployment_name="embedding",
        api_key=dbutils.secrets.get(scope="myscope", key="aoai-api-key"),
    )

    memory = SemanticTextMemory(
        storage=ai_search_store,
        embeddings_generator=embedding_gen
    )

    return memory

In [0]:
# Execute the memory function
memory = await config_skmemory()

#### Add Semantic search to the Chatbot to enable a RAG workflow based on internal company documents.

In [0]:
prompt = """
You are an AI assistant powered by the ChatGPT-4 model. Your task is to respond to user queries based on the provided input text, the history of the conversation, and the context derived from a retrieval search system. While your responses should be primarily grounded in the context, you can offer limited suggestions outside of the context if they are relevant and beneficial to the user.

**Input Variables:**
1. **User Input Text:** {{$user_input}}
2. **Chat History:** {{$chat_history}}
3. **Context from Retrieval System:** {{$retrieval_context}}

**Instructions:**
1. Review the **User Input Text** to understand the current query or request.
2. Refer to the **Chat History** to maintain coherence and continuity in the conversation.
3. Utilize the **Context from Retrieval System** to provide accurate and contextually relevant responses.
4. If the context does not fully address the user's needs, provide limited and relevant suggestions based on general knowledge or logical inference.
5. Ensure responses are clear, concise, and helpful.

**Example:**

**User Input Text:** "What is the capital of France?"
**Chat History:** "User previously asked about European countries and their capitals."
**Context from Retrieval System:** "France is a country in Europe with Paris as its capital."

**Response:** "The capital of France is Paris. If you have any other questions about European countries or need more information, feel free to ask!"

**Template:**

**User Input Text:** "{{$user_input}}"
**Chat History:** "{{$chat_history}}"
**Context from Retrieval System:** "{{$retrieval_context}}"

**Response:** [Your response here]
"""

In [0]:
# import search related packages
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery
from azure.search.documents.models import (
    QueryType,
    QueryCaptionType,
    QueryAnswerType
)
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential

# Assign values to these variables using the corresponding azure key vault secrets
endpoint = dbutils.secrets.get(scope="myscope", key="aisearch-endpoint")
credential = AzureKeyCredential(dbutils.secrets.get(scope="myscope", key="aisearch-adminkey")) if len(dbutils.secrets.get(scope="myscope", key="aisearch-adminkey")) > 0 else DefaultAzureCredential()
index_name = "manual-aisearch-index"

In [0]:
# Develop a class plugin. In this class, define functions that will generate a random number between a min and max int that will be used as the number of paragraphs n for a semantic function that generates a short story.
# Also, define a function that will search the azure ai search vector db for relevant answers to user prompts

from typing_extensions import Annotated
from semantic_kernel.functions import kernel_function
import random

class SearchRetrievalPlugin:
    """
    Description: Query the Azure AI Search Vector DB for a context specific answer.
    """
    @kernel_function(
        description="Search retrieval function", name="search_retrieval"
    )
    def search_retrieval(self, user_input:str) -> str:
        """
        Search and retrieve answers from Azure AI Search.
        Returns:
            str
        """
        query = user_input
        search_client = SearchClient(endpoint, index_name, credential)
        vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields="vector", exhaustive=True)

        r = search_client.search(  
        search_text=query,
        vector_queries=[vector_query],
        select=["parent_id", "content"],
        query_type=QueryType.SEMANTIC,
        semantic_configuration_name='my-semantic-config',
        query_caption=QueryCaptionType.EXTRACTIVE,
        query_answer=QueryAnswerType.EXTRACTIVE,
        top=1
    )
        #query_result = results.get_answers()[0].text
        results = [doc["content"].replace("\n", "").replace("\r", "") for doc in r]
        content = "\n".join(results)
        return content
    

    @kernel_function(
        description="Generate a random number between min and max", name="generate_number"
    )

    def generate_number(self, min: Annotated[int, "minimum number of paragraphs"], max: Annotated[int, "maximum number of paragraphs"] = 10) -> Annotated[int, "output is a number"]:
        """
        Generate a number between min-max
        Example:
            min='4' max='10' => randint(4,19)
        Args:
            min - The lower limit for the random number generation
            max - The upper limit for the random number generation
        Returns:
            int - value
        """
        try:
            return str(random.randint(min, max))
        except ValueError as e:
            print(f"Invalid input {min} and {max}")
            raise e


In [0]:
async def chat_func_acs_memory(kernel: Kernel, memory: SemanticTextMemory):

    kernel.add_plugin(plugin=TextMemoryPlugin(memory=memory), plugin_name="TextMemoryAISearchPlugin")

    # define execution execution settings
    execution_settings = OpenAIChatPromptExecutionSettings(
        service_id=dbutils.secrets.get(scope="myscope", key="aoai-deploymentname"),
        ai_model_id=dbutils.secrets.get(scope="myscope", key="aoai-deploymentname"),
        max_tokens=2000,
        temperature=0.7,
    )

    chat_template_config = PromptTemplateConfig(
        template=prompt,
        description="Chat with the assistant",
        input_variables=[
            InputVariable(
                name="user_input", description="The user input", is_required=True
            ),
            InputVariable(
                name="chat_history",
                description="The history of the conversation",
                is_required=True,
            ),
            InputVariable(
                name="retrieval_context",
                description="The memory search query",
                is_required=True,
            ),
        ],
        execution_settings=execution_settings,
    )

    chat_func = kernel.add_function(
        prompt=prompt,
        function_name="chatFunction",
        plugin_name="chatPlugin",
        description="chat with assistant",
        prompt_template_config=chat_template_config,
    )

    # Register the plugin class with the kernel
    search_retrieval_plugin = kernel.add_plugin(SearchRetrievalPlugin(), plugin_name="SearchRetrievalPlugin")

    # extract and hold each native function in it's own variable object for further use
    search_retrieval = search_retrieval_plugin.get("search_retrieval")

    while True:
        try:
            user_input = input("User:>")
        except KeyboardInterrupt:
            print("\n\nExiting chat..")
            return False
        except EOFError:
            print("\n\nExiting chat..")
            return False
        if user_input == "exit":
            print("\n\nExiting chat..")
            return "Good bye, please let me know if you need further help."
        if user_input == "quit":
            print("\n\nExiting chat..")
            return "Good bye, please let me know if you need further help."

        # read the user's chat message, add it to the chat history, add the AI's reply to our chat history
        chat_history = ChatHistory()
        chat_history.add_user_message(user_input)
        # invoke the plugin function
        content = await search_retrieval.invoke(kernel=kernel, user_input=user_input)
        arguments = KernelArguments(chat_history=chat_history, user_input=user_input, retrieval_context=content)
        response = await kernel.invoke(function=chat_func, arguments=arguments)
        chat_history.add_assistant_message(str(response))
        print(f"Assistant:> {response}")

In [0]:
# run the chat function
await chat_func_acs_memory(kernel=kernel, memory=memory)

##### The combination of semantic chunking and the prebuilt layout model of Azure AI Document Intelligence provided more relevant and accurate response to all 5 questions asked. Especially the question relating to the cost of heathcare. 

##### The previous solution developed with the Azure AI Search integrated vectorization failed to produce accurate and relevant answers relating to cost. This is because the cost information is contained in an embedded table in the PDF document. The native document extraction model defined as part of the integrated vectorization was unable to successfuly crack and extract all relevant details from the document.

##### In addition, semantic chunking ensured that all answers remained relevant to the user prompt.