## Imports


In [1]:
import yaml
from dotenv import load_dotenv

## Load Configs


In [2]:
with open("../configs/user_config.yaml") as f:
    model_config = yaml.safe_load(f)

load_dotenv("../configs/environment_variables.env")

True

## Knowledge base Creation


### Defaults


In [3]:
INDEX_NAME = "ch3_data"
DATA_PATH = "../../data/ch3_data"

### Imports


In [4]:
import os

#import azure.ai.resources.client
from azure.ai.generative.index import build_index
from azure.ai.resources.client import AIClient
from azure.ai.resources.operations._index_data_source import (
    ACSOutputConfig,
    LocalSource,
)
from azure.identity import DefaultAzureCredential

from azure.core.credentials import AzureKeyCredential
from azure.search.documents.aio import SearchClient
from azure.search.documents.models import RawVectorQuery
from openai import AsyncAzureOpenAI

### Functions


- `build_cogsearch_index` creates the index `INDEX_NAME` using the data present in `DATA_PATH`. 

- The entire text extracted from the PDF will be converted to smaller managable segments of text referred to as chunks. 

- `chunk_size` is the number of tokens present in each chunk. And, `chunk_overlap` is the number of overlapping tokens between adjacent chuncks.

- After splitting the text into smaller chuncks, these chuncks will be converted to numberical representation of the text using an Azure Openai embedding model. This are referred to as vector embeddings. 

- Embeddings are stored in the `vector_store`.

In [5]:
def build_cogsearch_index(
    index_name: str,
    path_to_data: str,
    chunk_size: int,
    chunk_overlap: int,
    data_source_url: str = None,
):
    # Set up environment variables for cog search SDK
    os.environ["AZURE_COGNITIVE_SEARCH_TARGET"] = os.environ.get(
        "AZURE_AI_SEARCH_ENDPOINT", ""
    )
    os.environ["AZURE_COGNITIVE_SEARCH_KEY"] = os.environ.get("AZURE_AI_SEARCH_KEY", "")

    client = AIClient.from_config(DefaultAzureCredential())

    #default_aoai_connection = client.get_default_aoai_connection()
    default_aoai_connection = client._connections.get(os.environ.get("AZURE_OPENAI_CONNECTION", ""))
    default_aoai_connection.set_current_environment()

    default_acs_connection = client.connections.get(
        os.environ.get("AZURE_COGNITIVE_SEARCH_CONNECTION_NAME", "")
    )
    default_acs_connection.set_current_environment()

    # Use the same index name when registering the index in AI Studio
    index = build_index(
        output_index_name=index_name,
        vector_store="azure_cognitive_search",
        embeddings_model=f"azure_open_ai://deployment/{os.environ.get('AZURE_OPENAI_EMBEDDING_DEPLOYMENT')}/model/{os.environ.get('AZURE_OPENAI_EMBEDDING_MODEL')}",
        data_source_url=data_source_url,
        index_input_config=LocalSource(input_data=path_to_data),
        acs_config=ACSOutputConfig(
            acs_index_name=index_name,
        ),
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
    )

    # register the index so that it shows up in the project
    cloud_index = client.indexes.create_or_update(index)

    # update user-config to add index name
    with open("../configs/user_config.yaml", 'w') as f:
        model_config["rag"]["index_name"] = cloud_index.name
        yaml.safe_dump(model_config, f)

    print(f"Created index '{cloud_index.name}'")
    print(f"Local Path: {index.path}")
    print(f"Cloud Path: {cloud_index.path}")

### Ingest Documents


In [6]:
build_cogsearch_index(
    index_name=INDEX_NAME,
    path_to_data=DATA_PATH,
    chunk_size=model_config["rag"]["chunk_size"],
    chunk_overlap=model_config["rag"]["chunk_overlap"],
)

Class AIClient: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
[DocumentChunksIterator::filter_extensions] Filtered 0 files out of 1
[DocumentChunksIterator::crack_documents] Total time to load files: 6.651878356933594e-05
{
  ".txt": 0.0,
  ".md": 0.0,
  ".html": 0.0,
  ".htm": 0.0,
  ".py": 0.0,
  ".pdf": 1.0,
  ".ppt": 0.0,
  ".pptx": 0.0,
  ".doc": 0.0,
  ".docx": 0.0,
  ".xls": 0.0,
  ".xlsx": 0.0
}
[DocumentChunksIterator::split_documents] Total time to split 62 documents into 161 chunks: 0.9729752540588379
Processing document: CH3-data.pdf0
Processing document: CH3-data.pdf1
Processing document: CH3-data.pdf2
Processing document: CH3-data.pdf3
Processing document: CH3-data.pdf4
Processing document: CH3-data.pdf5
Processing document: CH3-data.pdf6
Processing document: CH3-data.pdf7
Processing document: CH3-data.pdf8
Processing document: CH3-data.pdf9
Processing document: CH3-data.pdf10
Processing docu

Created index 'ch3_data'
Local Path: /home/shailesh_sharma/tiger_hackathon/hacks-main/codes/notebooks/ch3_data-mlindex
Cloud Path: azureml://subscriptions/57a36344-3906-4293-9991-5010c5255d5e/resourcegroups/rg-shailesh.sharma_ai/workspaces/ai-build-shaileshsharma-v1/datastores/workspaceblobstore/paths/LocalUpload/a221550659575a0681c6758820ede2f6/ch3_data-mlindex/


## Chat with Documents


### Imports


In [7]:
import asyncio
from typing import List

import nest_asyncio
from openai import AzureOpenAI

nest_asyncio.apply()

### Functions


- `get_documents` function takes in user question and identifies the top 5 (`num_docs`) chuncks that are most relevant to the user question. 

- The `question` is converted to vector embeddings using the same embedding model used while creating the index. This vector is compared with the embeddings stored in the vector store, to retieve the top few chunks based on similarity scores. 

- These retrieved chuncks will be refered to as context to the Azure Openai LLM.

In [8]:
async def get_documents(
    question: str,
    index_name: str,
    num_docs=5,
) -> str:
    #  retrieve documents relevant to the user's question from Cognitive Search
    search_client = SearchClient(
        endpoint=os.environ.get("AZURE_AI_SEARCH_ENDPOINT", ""),
        credential=AzureKeyCredential(os.environ.get("AZURE_AI_SEARCH_KEY", "")),
        index_name=index_name,
    )

    async with AsyncAzureOpenAI(
        azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT", ""),
        api_key=os.environ.get("AZURE_OPENAI_KEY", ""),
        api_version=os.environ.get("AZURE_OPENAI_API_VERSION", ""),
    ) as aclient:

        # generate a vector embedding of the user's question
        embedding = await aclient.embeddings.create(
            input=question, model=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT")
        )
        embedding_to_query = embedding.data[0].embedding

    context = ""
    contexts = []
    async with search_client:
        # use the vector embedding to do a vector search on the index
        vector_query = RawVectorQuery(
            vector=embedding_to_query, k=num_docs, fields="contentVector"
        )
        results = await search_client.search(
            search_text="", vector_queries=[vector_query], select=["id", "content"]
        )

        async for result in results:
            context += f"\n>>> {result['content']}"
            contexts.append(result["content"])

    return context, contexts

- Following is the prompt template for Azure Openai GPT 3.5 turbo model
    ```python
        [
            {"role": "system", "content": system_role},
            {"role": "user", "content": user_prompt},
        ]
    ```

- The `system_role` and `user_prompt` is defined in the user config. This would be the input to the LLM, and it whould contain placeholders for `question`, and the `context` retrieved in the previous step.

- Upon passing this to the LLM, we get a response in the question based on the context provided from the input documents. 



In [9]:
def build_message(user_prompt: str, system_role: str) -> List[dict]:
    return [
        {"role": "system", "content": system_role},
        {"role": "user", "content": user_prompt},
    ]


def chat_completion(
    question: str,
    system_role: str,
    user_prompt: str,
    index_name: str,
    num_docs: int = 5,
    temperature: float = 0.7,
    max_tokens: int = 800,
):
    # get search documents for the last user message in the conversation
    context, contexts = asyncio.run(
        get_documents(
            question=question,
            index_name=index_name,
            num_docs=num_docs,
        )
    )

    # TODO: Add context to user message
    user_prompt = user_prompt.format(question=question, context=context)
    message = build_message(user_prompt=user_prompt, system_role=system_role)

    with AzureOpenAI(
        azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT", ""),
        api_key=os.environ.get("AZURE_OPENAI_KEY", ""),
        api_version=os.environ.get("AZURE_OPENAI_API_VERSION", ""),
    ) as client:

        # call Azure OpenAI with the system prompt and user's question
        chat_completion = client.chat.completions.create(
            model=os.environ.get("AZURE_OPENAI_CHAT_DEPLOYMENT"),
            messages=message,
            temperature=temperature,
            max_tokens=max_tokens,
        )

    response = {
        "choices": [
            {
                "index": 0,
                "message": {
                    "role": "assistant",
                    "content": chat_completion.choices[0].message.content,
                },
            }
        ]
    }

    # add context in the returned response
    context_dict = {
        "context": context,
        "contexts": contexts,
        "num_docs": num_docs,
        "temperature": temperature,
        "max_tokens": max_tokens,
    }
    response["choices"][0]["context"] = context_dict
    return response

In [10]:
def chat_with_documents(question: str):
    result = chat_completion(
        question=question,
        system_role=model_config["prompt"]["system_role"],
        user_prompt=model_config["prompt"]["user_prompt"]
        + "\n\nQuestion:'{question}' \n\nContext: '{context}'",
        index_name=INDEX_NAME,
        num_docs=model_config["rag"]["num_docs"],
        temperature=model_config["model"]["temperature"],
        max_tokens=model_config["model"]["max_tokens"],
    )
    print(result["choices"][0]["message"]["content"])

### Question and Answering on the Data


In [11]:
chat_with_documents(
    question="Why did Formula 1 introduce limitations on the number of upgrades teams could make to their power units during the season?",
)

Based on the given context, Formula 1 introduced limitations on the number of upgrades teams could make to their power units during the season to achieve the long-term competitive balance, sporting fairness, and financial stability of the Championship in respect of power units. These limitations aim to preserve the unique technology and engineering challenge of Formula 1 while also ensuring that power unit manufacturers allocate their resources efficiently within the power unit cost cap.
