# Azure OpenAI Own Data - Vector database

This notebook contains an example how to use ready vector database as own data with Azure Open AI Services. In this example our vector database is Azure AI Search that indexes all of our data. Vector database is the most common scenario to achieve better search results based on your own data. It is required that your Vector database supports your data types that you index to it. 

My data example here includes all published results (until 3rd of March 2024) for Finnish Figure Skating association's Synchronized skating competitions from the season 2023-2024. More information about results you can find from [https://www.figureskatingresults.fi/](https://www.figureskatingresults.fi/).

## Pre-requirements 

- Create OpenAI service to Azure and deploy at least one model. Fill your own *config.jsonc* file. You can find an example file from *example-config.jsonc*. 
- Azure Blob container that contains all necessary files in a single folder
- Azure AI Search service that has already indexed those files using embedding model

## Initialize OpenAI service

In [None]:
%pip install --upgrade --quiet openai jsonc-parser

In [None]:
# Import configuration and initliaze client
from openai import AzureOpenAI
from jsonc_parser.parser import JsoncParser

config = JsoncParser.parse_file('config.jsonc')

client = AzureOpenAI(
    api_version=config['azure_oai_api_version'],
    azure_endpoint=config['azure_oai_endpoint'],
    api_key=config['azure_oai_key']
)
gpt_model_name=config['azure_oai_gpt_model_name']
embedding_model_name=config['azure_oai_embedding_model_name']
ai_search_endpoint=config['azure_ai_search_endpoint']
ai_search_key=config['azure_ai_search_key']
ai_index_name=config['azure_ai_index_name']

## Use vector database with keyword based search

Connect to source data with keyword based search (does not use embeddings).

Because the source data is quite tabular without closer information about teams, this is not the best solutions for the search options.

In [None]:
# Make prompt
message_text = [
    {
        "role":"system",
        "content":"You are an AI assistant that helps people find information about Finnish Figure Skating Associations Synchronized Skating results on season 2023-2024. There are azure-search based data source provided with that data."
    },
    {
        "role": "user",
        "content": "Which team has got most of points in the category of Seniors?"
    }
]

# Start client towards the AI
response = client.chat.completions.create(
    model=gpt_model_name,
    messages=message_text,
    max_tokens=1000,
    extra_body={
        "data_sources": [
            {
                "type": "azure_search",
                "parameters": {
                    "endpoint": ai_search_endpoint,
                    "index_name": ai_index_name,
                    "authentication": {
                        "type": "api_key",
                        "key": ai_search_key
                    },
                    "query_type": "simple"
                }
            }
        ]
    }
)

# Print content
print("System message: ", message_text[0]['content'])
print("User message: ", message_text[1]['content'])
print("Response: ",response.choices[0].message.content)
print("Input tokens: ", response.usage.prompt_tokens)
print("Output tokens: ", response.usage.total_tokens-response.usage.prompt_tokens)

## Use vector database with vector based search

Connect to Vector database with vector based search. 

While using vector based search, we find correct answers from our data. When comparing to the keyword based search that did not find anything, this maps the question's embedding towards the vector database and by using k-nearest neighbor (KNN) -algorithm finds the closest result and prints out it for us.

In [None]:
# Make prompt
message_text = [
    {
        "role":"system",
        "content":"You are an AI assistant that helps people find information about Finnish Figure Skating Associations Synchronized Skating results on season 2023-2024. There are azure-search based data source provided with that data."
    },
    {
        "role": "user",
        "content": "Which team has got most of points in the category of Seniors?"
    }
]

# Start client towards the AI
response = client.chat.completions.create(
    model=gpt_model_name,
    messages=message_text,
    max_tokens=1000,
    extra_body={
        "data_sources": [
            {
                "type": "azure_search",
                "parameters": {
                    "endpoint": ai_search_endpoint,
                    "index_name": ai_index_name,
                    "authentication": {
                        "type": "api_key",
                        "key": ai_search_key
                    },
                    "embedding_dependency": {
                        "type": "deployment_name",
                        "deployment_name": embedding_model_name
                    },
                    "query_type": "vector"
                }
            }
        ]
    }
)

# Print content
print("System message: ", message_text[0]['content'])
print("User message: ", message_text[1]['content'])
print("Response: ",response.choices[0].message.content)
print("Input tokens: ", response.usage.prompt_tokens)
print("Output tokens: ", response.usage.total_tokens-response.usage.prompt_tokens)

## Using together or add semantic ranking support

You can combine these two elements by choosing *vector_simple_hybrid* as a query type. Because the source data is kind of and mostly table structured PDF in my case and the keyword search did not find anything, this is not going to find anything either. If your source data is prose-structured text, you can turn on semantic ranging support by setting query type as *vector_semantic_hybrid* or *semantic*. Semantic ranking understands the context of the question and can make better answers with the right context. Keep in mind that semantic ranking is not always necessary.

### Pricing considerations

When you enable keyword+vector based query or semantic type of query, it will increase your cost.