Copyright (c) Microsoft Corporation.

Licensed under the MIT License.

# Text2SQL with Semantic Kernel & Azure OpenAI

This notebook demonstrates how the SQL plugin can be integrated with Semantic Kernel and Azure OpenAI to answer questions from the database based on the schemas provided. 

A multi-shot approach is used for SQL generation for more reliable results and reduced token usage. More details can be found in the README.md.

In [1]:
import logging
import os
import yaml
import dotenv
import json
from semantic_kernel.connectors.ai.open_ai import (
    AzureChatCompletion,
)
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.kernel import Kernel
from plugins.vector_based_sql_plugin.vector_based_sql_plugin import VectorBasedSQLPlugin
from semantic_kernel.functions.kernel_arguments import KernelArguments
from semantic_kernel.prompt_template.prompt_template_config import PromptTemplateConfig
from IPython.display import display, Markdown
logging.basicConfig(level=logging.INFO)

## Kernel Setup

In [2]:
dotenv.load_dotenv()
kernel = Kernel()

## Set up GPT connections

In [3]:
service_id = "chat"

In [4]:
chat_service = AzureChatCompletion(
    service_id=service_id,
    deployment_name=os.environ["OpenAI__CompletionDeployment"],
    endpoint=os.environ["OpenAI__Endpoint"],
    api_key=os.environ["OpenAI__ApiKey"],
)
kernel.add_service(chat_service)

In [5]:
# Register the SQL Plugin with the Database name to use.
sql_plugin = VectorBasedSQLPlugin()
kernel.add_plugin(sql_plugin, "SQL")

KernelPlugin(name='SQL', description=None, functions={'GetEntitySchema': KernelFunctionFromMethod(metadata=KernelFunctionMetadata(name='GetEntitySchema', plugin_name='SQL', description='Gets the schema of a view or table in the SQL Database by selecting the most relevant entity based on the search term. Extract key terms from the user question and use these as the search term. Several entities may be returned. Only use when the provided schemas in the system prompt are not sufficient to answer the question.', parameters=[KernelParameterMetadata(name='text', description='The text to run a semantic search against. Relevant entities will be returned.', default_value=None, type_='str', is_required=True, type_object=<class 'str'>, schema_data={'type': 'string', 'description': 'The text to run a semantic search against. Relevant entities will be returned.'}, function_schema_include=True)], is_prompt=False, is_asynchronous=True, return_parameter=KernelParameterMetadata(name='return', descript

## Prompt Setup

In [6]:
# Load prompt and execution settings from the file
with open("./prompt.yaml", "r") as file:
    data = yaml.safe_load(file.read())
    prompt_template_config = PromptTemplateConfig(**data)

In [7]:
chat_function = kernel.add_function(
    prompt_template_config=prompt_template_config,
    plugin_name="ChatBot",
    function_name="Chat",
)

## ChatBot setup

In [8]:
history = ChatHistory()

In [9]:
async def ask_question(question: str, chat_history: ChatHistory) -> str:
    """Asks a question to the chatbot and returns the answer.
    
    Args:
        question (str): The question to ask the chatbot.
        chat_history (ChatHistory): The chat history object.
        
    Returns:
        str: The answer from the chatbot.
    """

    # Create important information prompt that contains the SQL database information.
    engine_specific_rules = "Use TOP X to limit the number of rows returned instead of LIMIT X. NEVER USE LIMIT X as it produces a syntax error."
    important_information_prompt = f"""
    [SQL DATABASE INFORMATION]
    {await sql_plugin.system_prompt(engine_specific_rules=engine_specific_rules, question=question)}
    [END SQL DATABASE INFORMATION]
    """

    arguments = KernelArguments()
    arguments["chat_history"] = chat_history
    arguments["important_information"] = important_information_prompt
    arguments["user_input"] = question

    logging.info("Question: %s", question)

    answer = await kernel.invoke(
        function_name="Chat",
        plugin_name="ChatBot",
        arguments=arguments,
        chat_history=chat_history,
    )

    logging.info("Answer: %s", answer)

    # Log the question and answer to the chat history.
    chat_history.add_user_message(question)
    chat_history.add_message({"role": "assistant", "message": answer})

    json_answer = json.loads(str(answer))

    display(Markdown(json_answer["answer"]))

In [10]:
await ask_question("What are the different product categories we have?", history)

INFO:httpx:HTTP Request: POST https://aoai-text2sql-adi.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"


INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'https://open-ai-vector-db.search.windows.net/indexes('text-2-sql-query-cache-index')/docs/search.post.search?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '34853'
    'api-key': 'REDACTED'
    'Accept': 'application/json;odata.metadata=none'
    'x-ms-client-request-id': '926fce74-7784-11ef-822e-0242ac110002'
    'User-Agent': 'azsdk-python-search-documents/11.6.0b4 Python/3.12.6 (Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36)'
A body is sent with the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json; odata.metadata=none; odata.streaming=true; charset=utf-8'
    'Content-Encoding': 'REDACTED'
    'Vary': 'REDACTED'
    'Server': 'Microsoft-IIS/10.0'
    'Strict-Transport-Security': 'REDACTED'
    'Prefere

The product categories available are organised under four main parent categories: Accessories, Clothing, Components, and Bikes. Here are the details:

### Accessories
- Bike Racks
- Bike Stands
- Bottles and Cages
- Cleaners
- Fenders
- Helmets
- Hydration Packs
- Lights
- Locks
- Panniers
- Pumps
- Tires and Tubes

### Clothing
- Bib-Shorts
- Caps
- Gloves
- Jerseys
- Shorts
- Socks
- Tights
- Vests

### Components
- Handlebars
- Bottom Brackets
- Brakes
- Chains
- Cranksets
- Derailleurs
- Forks
- Headsets
- Mountain Frames
- Pedals
- Road Frames
- Saddles
- Touring Frames
- Wheels

### Bikes
- Mountain Bikes
- Road Bikes
- Touring Bikes

In [11]:
await ask_question("What is the top performing product by quantity of units sold?", history)

INFO:httpx:HTTP Request: POST https://aoai-text2sql-adi.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'https://open-ai-vector-db.search.windows.net/indexes('text-2-sql-query-cache-index')/docs/search.post.search?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '34715'
    'api-key': 'REDACTED'
    'Accept': 'application/json;odata.metadata=none'
    'x-ms-client-request-id': '97849cdc-7784-11ef-822e-0242ac110002'
    'User-Agent': 'azsdk-python-search-documents/11.6.0b4 Python/3.12.6 (Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36)'
A body is sent with the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json; odata.metadata=none; odata.strea

In [None]:
await ask_question("Which country did we sell the most to in June 2008?", history)

INFO:httpx:HTTP Request: POST https://aoai-text2sql-adi.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'https://open-ai-vector-db.search.windows.net/indexes('text-2-sql-query-cache-index')/docs/search.post.search?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '34774'
    'api-key': 'REDACTED'
    'Accept': 'application/json;odata.metadata=none'
    'x-ms-client-request-id': '18181f5a-7784-11ef-9963-0242ac110002'
    'User-Agent': 'azsdk-python-search-documents/11.6.0b4 Python/3.12.6 (Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.36)'
A body is sent with the request
INFO:azure.core.pipeline.policies.http_logging_policy:Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json; odata.metadata=none; odata.strea

In June 2008, the country with the highest sales was the United Kingdom, with total sales amounting to £572,496.56.