Copyright (c) Microsoft Corporation.

Licensed under the MIT License.

# Text2SQL with Semantic Kernel & Azure OpenAI

This notebook demonstrates how the SQL plugin can be integrated with Semantic Kernel and Azure OpenAI to answer questions from the database based on the schemas provided. 

A multi-shot approach is used for SQL generation for more reliable results and reduced token usage. More details can be found in the README.md.

### Dependencies

To install dependencies:

`uv sync --package semantic_kernel_text_2_sql`

`uv add --editable ../text_2_sql_core/`

In [None]:
# This is only needed for this notebook to work
import sys
from pathlib import Path

# Add the parent directory of `src` to the path
sys.path.append(str(Path.cwd() / "src"))

In [1]:
import logging
import os
import yaml
import dotenv
import json
from semantic_kernel.connectors.ai.open_ai import (
    AzureChatCompletion,
)
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.kernel import Kernel
from semantic_kernel_text_2_sql.plugins.prompt_based_sql_plugin.prompt_based_sql_plugin import PromptBasedSQLPlugin
from semantic_kernel.functions.kernel_arguments import KernelArguments
from semantic_kernel.prompt_template.prompt_template_config import PromptTemplateConfig
from IPython.display import display, Markdown

logging.basicConfig(level=logging.INFO)

## Kernel Setup

In [2]:
dotenv.load_dotenv()
kernel = Kernel()

## Set up GPT connections

In [3]:
service_id = "chat"

In [4]:
chat_service = AzureChatCompletion(
    service_id=service_id,
    deployment_name=os.environ["OpenAI__CompletionDeployment"],
    endpoint=os.environ["OpenAI__Endpoint"],
    api_key=os.environ["OpenAI__ApiKey"],
)
kernel.add_service(chat_service)

In [5]:
# Register the SQL Plugin with the Database name to use.
sql_plugin = PromptBasedSQLPlugin(database=os.environ["Text2Sql__DatabaseName"])
kernel.add_plugin(sql_plugin, "SQL")

KernelPlugin(name='SQL', description=None, functions={'GetEntitySchema': KernelFunctionFromMethod(metadata=KernelFunctionMetadata(name='GetEntitySchema', plugin_name='SQL', description='Get the detailed schema of an entity in the Database. Use the entity and the column returned to formulate a SQL query. The view name or table name must be one of the ENTITY NAMES defined in the [ENTITIES LIST]. Only use the column names obtained from GetEntitySchema() when constructing a SQL query, do not make up column names.', parameters=[KernelParameterMetadata(name='entity_name', description='The view or table name to get the schema for. It must be one of the ENTITY NAMES defined in the [ENTITIES LIST] function.', default_value=None, type_='str', is_required=True, type_object=<class 'str'>, schema_data={'type': 'string', 'description': 'The view or table name to get the schema for. It must be one of the ENTITY NAMES defined in the [ENTITIES LIST] function.'}, function_schema_include=True)], is_promp

## Prompt Setup

In [6]:
# Load prompt and execution settings from the file
with open("./semantic_kernel_text_2_sql/src/prompt.yaml", "r") as file:
    data = yaml.safe_load(file.read())
    prompt_template_config = PromptTemplateConfig(**data)

In [7]:
chat_function = kernel.add_function(
    prompt_template_config=prompt_template_config,
    plugin_name="ChatBot",
    function_name="Chat",
)

## ChatBot setup

In [8]:
history = ChatHistory()

In [9]:
async def ask_question(question: str, chat_history: ChatHistory) -> str:
    """Asks a question to the chatbot and returns the answer.
    
    Args:
        question (str): The question to ask the chatbot.
        chat_history (ChatHistory): The chat history object.
        
    Returns:
        str: The answer from the chatbot.
    """

    # Create important information prompt that contains the SQL database information.
    engine_specific_rules = "Use TOP X to limit the number of rows returned instead of LIMIT X. NEVER USE LIMIT X as it produces a syntax error."
    sql_database_information_prompt = f"""
    [SQL DATABASE INFORMATION]
    {sql_plugin.sql_prompt_injection(engine_specific_rules=engine_specific_rules)}
    [END SQL DATABASE INFORMATION]
    """

    arguments = KernelArguments()
    arguments["sql_database_information"] = sql_database_information_prompt
    arguments["user_input"] = question

    logging.info("Question: %s", question)

    answer = await kernel.invoke(
        function_name="Chat",
        plugin_name="ChatBot",
        arguments=arguments,
        chat_history=chat_history,
    )

    logging.info("Answer: %s", answer)

    # Log the question and answer to the chat history.
    chat_history.add_user_message(question)
    chat_history.add_message({"role": "assistant", "message": answer})

    json_answer = json.loads(str(answer))

    display(Markdown(json_answer["answer"]))

In [10]:
await ask_question("What are the different product categories we have?", history)

INFO:root:Question: What are the different product categories we have?
INFO:semantic_kernel.functions.kernel_function:Function ChatBot-Chat invoking.
INFO:semantic_kernel.contents.chat_history:Could not parse prompt <message role="system">
As a senior analyst, your primary responsibility is to provide accurate, thorough answers to user queries. Use all available functions to craft detailed final responses with clear explanations and actionable insights.

- Always use the provided functions to obtain key information.
- If a function is required, you must use it to complement the answer.
- Use multiple functions in parallel to enhance the results.
- Always provide an answer; never leave it blank.

The response must meet the following requirements:

[RESPONSE OUTPUT REQUIREMENTS]

  The answer MUST be in JSON format:
  {
    "answer": "<GENERATED ANSWER>",
    "sources": [
      {"title": <SOURCE 1 TITLE>, "chunk": <SOURCE 1 CONTEXT CHUNK>, "reference": "<SOURCE 1 REFERENCE>"},
      {"ti

The different product categories available are as follows:

### Accessories
- Bike Racks
- Bike Stands
- Bottles and Cages
- Cleaners
- Fenders
- Helmets
- Hydration Packs
- Lights
- Locks
- Panniers
- Pumps
- Tires and Tubes

### Clothing
- Bib-Shorts
- Caps
- Gloves
- Jerseys
- Shorts
- Socks
- Tights
- Vests

### Components
- Handlebars
- Bottom Brackets
- Brakes
- Chains
- Cranksets
- Derailleurs
- Forks
- Headsets
- Mountain Frames
- Pedals
- Road Frames
- Saddles
- Touring Frames
- Wheels

### Bikes
- Mountain Bikes
- Road Bikes
- Touring Bikes

In [11]:
await ask_question("What is the top performing product by quantity of units sold?", history)

INFO:root:Question: What is the top performing product by quantity of units sold?
INFO:semantic_kernel.functions.kernel_function:Function ChatBot-Chat invoking.
INFO:semantic_kernel.contents.chat_history:Could not parse prompt <message role="system">
As a senior analyst, your primary responsibility is to provide accurate, thorough answers to user queries. Use all available functions to craft detailed final responses with clear explanations and actionable insights.

- Always use the provided functions to obtain key information.
- If a function is required, you must use it to complement the answer.
- Use multiple functions in parallel to enhance the results.
- Always provide an answer; never leave it blank.

The response must meet the following requirements:

[RESPONSE OUTPUT REQUIREMENTS]

  The answer MUST be in JSON format:
  {
    "answer": "<GENERATED ANSWER>",
    "sources": [
      {"title": <SOURCE 1 TITLE>, "chunk": <SOURCE 1 CONTEXT CHUNK>, "reference": "<SOURCE 1 REFERENCE>"},

The top performing product by quantity of units sold is the 'Classic Vest, S' with a total of 522 units sold.

In [12]:
await ask_question("Which country did we sell the most to in June 2008?", history)

INFO:root:Question: Which country did we sell the most to in June 2008?
INFO:semantic_kernel.functions.kernel_function:Function ChatBot-Chat invoking.
INFO:semantic_kernel.contents.chat_history:Could not parse prompt <message role="system">
As a senior analyst, your primary responsibility is to provide accurate, thorough answers to user queries. Use all available functions to craft detailed final responses with clear explanations and actionable insights.

- Always use the provided functions to obtain key information.
- If a function is required, you must use it to complement the answer.
- Use multiple functions in parallel to enhance the results.
- Always provide an answer; never leave it blank.

The response must meet the following requirements:

[RESPONSE OUTPUT REQUIREMENTS]

  The answer MUST be in JSON format:
  {
    "answer": "<GENERATED ANSWER>",
    "sources": [
      {"title": <SOURCE 1 TITLE>, "chunk": <SOURCE 1 CONTEXT CHUNK>, "reference": "<SOURCE 1 REFERENCE>"},
      {"t

In June 2008, the country with the highest number of sales orders was the United States, with a total of 18 orders.