# Text-to-SQL Chain Evaluations

In [1]:
import getpass
import os

#set environmental variable for OpenAI API key
os.environ["OPENAI_API_KEY"] = getpass.getpass()

 ········


In [2]:
import pandas as pd

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 1000)
test_set = pd.read_csv('src/wnba_facts.csv')

### Create baseline model (GPT3.5 Turbo)

In [17]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

baseline_llm = ChatOpenAI(model_name= "gpt-3.5-turbo-0125", temperature= 0)

baseline_prompt = ChatPromptTemplate.from_messages([
    ("system", "Act as a WNBA expert and answer this user question"),
    ("user", "{input}"),
])

baseline_chain = baseline_prompt | baseline_llm

### Evaluate baseline model (GPT3.5 Turbo)

In [18]:
questions = test_set['question']
ground_truths = test_set['answer']

answers = []
contexts = []

# Inference
for query in questions:
    response = baseline_chain.invoke({"input": query})
    answers.append(response.content)
    
# To dict
data = {
    "question": questions,
    "answer": answers,
    "ground_truths": ground_truths
}

dataset = pd.DataFrame.from_dict(data)
display(dataset)

Unnamed: 0,question,answer,ground_truths
0,Which player scored the most points in a single game between 2018 and 2024?,"As of now, the player who scored the most points in a single game between 2018 and 2024 is Liz Cambage. She set a WNBA record by scoring 53 points in a game for the Dallas Wings against the New York Liberty on July 17, 2018. This performance by Cambage surpassed the previous record of 51 points set by Riquna Williams in 2013.",A'ja Wilson & Liz Cambage
1,Which team had the highest average points per game in the 2021 season?,"The Seattle Storm had the highest average points per game in the 2021 WNBA season, averaging 87.4 points per game. Led by superstars like Breanna Stewart and Jewell Loyd, the Storm had a high-powered offense that consistently put up big numbers throughout the season.",Las Vegas Aces
2,How many games ended in overtime during the 2023 season?,"During the 2023 WNBA season, there were a total of 12 games that ended in overtime. Overtime games are always exciting to watch as they add an extra level of intensity and drama to the competition.",11
3,Which player had the highest free throw percentage in the 2020 season (min. 25 FT’s attempted)?,"In the 2020 WNBA season, the player with the highest free throw percentage (minimum 25 free throws attempted) was Allie Quigley of the Chicago Sky. Quigley shot an impressive 95.7% from the free throw line, making 45 out of 47 attempts. She is known for her sharpshooting and consistency from the charity stripe.",Tiffany Mitchell
4,What was the win-loss record of the Seattle Storm in the 2022 season?,"As of now, the 2022 WNBA season has not yet started. The regular season typically runs from May to September, so the win-loss record of the Seattle Storm for the 2022 season is not available at this time. You can follow the team's progress throughout the season on the official WNBA website or through various sports news outlets.",25 - 17
5,Which team had the longest winning streak in the 2019 season (inclusive of postseason)?,"The Washington Mystics had the longest winning streak in the 2019 WNBA season, with a 12-game winning streak that spanned from July 30 to September 6. This streak helped propel them to the best record in the league and ultimately win the WNBA Championship that year.",Washington Mystics (8 wins)
6,"How many triple-doubles were recorded in the WNBA between 2018 and 2024, and who had the most triple-doubles?","Between 2018 and 2024, there were a total of 14 triple-doubles recorded in the WNBA. The player with the most triple-doubles during this time period was Sabrina Ionescu of the New York Liberty, who recorded 6 triple-doubles. Other notable players who recorded triple-doubles during this time frame include Courtney Vandersloot, Liz Cambage, and Breanna Stewart. Triple-doubles are a rare and impressive feat in the WNBA, showcasing a player's versatility and impact on the game.","29 total triple doubles, and Alyssa Thomas had the most (with 12)"
7,Which team had the best defense in the 2024 season (by opponent points)?,"In the 2024 WNBA season, the Seattle Storm had the best defense in terms of opponent points per game. They allowed an average of 76.2 points per game, which was the lowest in the league. Led by defensive stalwarts like Breanna Stewart and Jewell Loyd, the Storm's defensive prowess played a key role in their success that season.",Connecticut Sun
8,Identify the game with the highest combined score (both teams) from 2018 to 2024.,"The game with the highest combined score in WNBA history occurred on July 8, 2018, between the Dallas Wings and the Washington Mystics. The final score was 108-105 in favor of the Mystics, resulting in a combined score of 213 points. This game set a new record for the highest combined score in WNBA history.","07/12/2022, Phoenix Mercury @ Minnesota Lynx"
9,Which player played the most minutes in the 2021 season?,"In the 2021 WNBA season, the player who played the most minutes was Courtney Vandersloot of the Chicago Sky. Vandersloot logged a total of 1,082 minutes over the course of the season, showcasing her durability and importance to her team. She is known for her exceptional playmaking abilities and leadership on the court.",Skylar Diggins-Smith


### Create baseline model (GPT4 Turbo)

In [19]:
baseline_llm = ChatOpenAI(model_name= "gpt-4-turbo", temperature= 0)

baseline_chain = baseline_prompt | baseline_llm

### Evaluate baseline model (GPT4 Turbo)

In [20]:
questions = test_set['question']
ground_truths = test_set['answer']

answers = []
contexts = []

# Inference
for query in questions:
    response = baseline_chain.invoke({"input": query})
    answers.append(response.content)
    
# To dict
data = {
    "question": questions,
    "answer": answers,
    "ground_truths": ground_truths
}

dataset = pd.DataFrame.from_dict(data)
display(dataset)

Unnamed: 0,question,answer,ground_truths
0,Which player scored the most points in a single game between 2018 and 2024?,"As of my last update in 2023, the record for the most points scored in a single WNBA game between 2018 and 2023 is held by Liz Cambage. She scored 53 points for the Dallas Wings in a game against the New York Liberty on July 17, 2018. This performance set the record for the most points in a single WNBA game. Please check the latest statistics or sources for any updates beyond 2023.",A'ja Wilson & Liz Cambage
1,Which team had the highest average points per game in the 2021 season?,"In the 2021 WNBA season, the Connecticut Sun had the highest average points per game. They were a strong offensive team throughout the season, showcasing their scoring capabilities consistently in their games.",Las Vegas Aces
2,How many games ended in overtime during the 2023 season?,"As of my last update, I don't have the specific data for the 2023 WNBA season, including the number of games that ended in overtime. For the most accurate and current information, I recommend checking the WNBA's official website or reliable sports news sources. They typically provide detailed statistics and game results for each season.",11
3,Which player had the highest free throw percentage in the 2020 season (min. 25 FT’s attempted)?,"In the 2020 WNBA season, the player with the highest free throw percentage (with a minimum of 25 free throws attempted) was Brionna Jones of the Connecticut Sun. She achieved a free throw percentage of 90.7%, making 39 out of 43 attempts.",Tiffany Mitchell
4,What was the win-loss record of the Seattle Storm in the 2022 season?,"In the 2022 WNBA season, the Seattle Storm finished with a win-loss record of 22-14.",25 - 17
5,Which team had the longest winning streak in the 2019 season (inclusive of postseason)?,"In the 2019 WNBA season, the longest winning streak, including the postseason, was held by the Washington Mystics. The Mystics had a 10-game winning streak from August 25 to September 19, 2019. This streak was crucial in propelling them towards their first WNBA Championship in franchise history. They finished the regular season with a 26-8 record and continued their strong performance through the playoffs.",Washington Mystics (8 wins)
6,"How many triple-doubles were recorded in the WNBA between 2018 and 2024, and who had the most triple-doubles?","As of my last update in 2023, I don't have the complete data for 2024. However, up to 2023, triple-doubles in the WNBA were still a rare feat. Through the years, the number of triple-doubles has gradually increased due to the rising level of talent and versatility among players.\n\nAs of the end of the 2023 season, Sabrina Ionescu, a guard for the New York Liberty, was notable for recording multiple triple-doubles, showcasing her all-around game. She had already set herself apart by 2023 as one of the leaders in triple-doubles in the WNBA. Other players like Candace Parker and Courtney Vandersloot have also been known for their ability to achieve triple-doubles.\n\nFor the exact number of triple-doubles recorded between 2018 and 2024, and to confirm who had the most during this period, I recommend checking the latest statistics from the WNBA's official website or other reliable sports statistics websites, as they would have the most current and accurate data, including any updates ...","29 total triple doubles, and Alyssa Thomas had the most (with 12)"
7,Which team had the best defense in the 2024 season (by opponent points)?,"I'm sorry for any confusion, but as of my last update in 2023, I don't have access to the specific statistics from the 2024 WNBA season. To find the most accurate and current information regarding which team had the best defense in the 2024 WNBA season based on opponent points, I recommend checking the latest stats on the WNBA's official website or a reliable sports statistics website like ESPN or Basketball Reference. They provide detailed and updated statistics that would include defensive rankings and other relevant data.",Connecticut Sun
8,Identify the game with the highest combined score (both teams) from 2018 to 2024.,"As of my last update in January 2023, I don't have the data for games beyond that point, including up to 2024. However, up to 2023, one of the highest-scoring games in WNBA history occurred on July 17, 2018, when the Chicago Sky and the Dallas Wings combined for a total of 198 points in a single game. The final score was Dallas Wings 104, Chicago Sky 94. This game is notable for its high scoring, but please check the latest records or official WNBA sources for any updates or games beyond January 2023 that may have surpassed this score.","07/12/2022, Phoenix Mercury @ Minnesota Lynx"
9,Which player played the most minutes in the 2021 season?,"In the 2021 WNBA season, the player who logged the most minutes was Courtney Vandersloot of the Chicago Sky. She played a total of 1,074 minutes across 32 games during the regular season. Vandersloot is well-known for her exceptional skills as a point guard and her ability to play significant minutes, contributing heavily to her team's performance.",Skylar Diggins-Smith


### Evaluate of baseline model (GPT3.5 turbo)

#### set up AWS S3 and Athena for Database Access

In [7]:
from sqlalchemy import create_engine
from urllib.parse import quote_plus
from langchain_community.utilities import SQLDatabase

#database info
AWS_REGION = "us-east-1"
SCHEMA_NAME = "wnba_db"
S3_STAGING_DIR = "s3://wnbadata/"

connect_str = "awsathena+rest://athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}"

#connect to AWS athena
engine = create_engine(connect_str.format(
        region_name=AWS_REGION,
        schema_name=SCHEMA_NAME,
        s3_staging_dir=quote_plus(S3_STAGING_DIR)
))

#create SQL db object
db = SQLDatabase(engine)

#fetch table schema
schema = db.get_table_info()

  engine = create_engine(connect_str.format(


### Set up text2sql chain with query examples and table descriptions chain (GPT3.5 Turbo)

In [8]:
import json

#load query examples from json
f = open("src/query_example.json")
query_examples_json = json.load(f)

examples = query_examples_json["query_examples"]

In [9]:
file_paths = ["src/wnba_nba_pbp_data_dict.json", \
              "src/wnba_player_box.json", \
              "src/wnba_player_info.json", \
              "src/wnba_schedule.json", \
              "src/wnba_teambox.json"]

In [10]:
def get_table_details():

    table_details = ""

    for file_path in file_paths:
        
        #load table names and descriptions from json
        f = open(file_path)
        table_dict = json.load(f)
        
        #retrieve table names and descriptions and compile into string
        table_details = table_details + "Table Name:" + table_dict['table_name'] + "\n" \
        + "Table Description:" + table_dict['table_description']

        for col in table_dict['values']:
            table_details = table_details + "\n" + "Column Name:" + col['column_name'] + "\n" \
            + "Column Description:" + col['column_description'] + "\n" \
            + "COlumn Type:" + col['column_type']
        
        table_details = table_details + "\n\n"
    
    return table_details

In [11]:
from langchain_core.output_parsers.openai_tools import PydanticToolsParser
from langchain_core.prompts import FewShotChatMessagePromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

#Creating a Pydantic Base Model
class Table(BaseModel):
    """Table in SQL database."""

    name: str = Field(description = "Name of table in SQL database.")

table_details = get_table_details()

#create table chain prompt
table_prompt_system = f"""Refer the Above Context and Return the names of SQL Tables that MIGHT be relevant to the above context\n\n 
The tables are:

{table_details}
 """

table_details_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", table_prompt_system),
        ("human", "{input}"),
    ]
)

#set up LLM
table_chain_llm = ChatOpenAI(model_name= "gpt-3.5-turbo-0125", temperature= 0)
table_chain_llm_wtools = table_chain_llm.bind_tools([Table])
output_parser = PydanticToolsParser(tools=[Table])

#create table chain
table_chain = table_details_prompt | table_chain_llm_wtools | output_parser

In [12]:
from langchain.globals import set_debug
from langchain_core.prompts import MessagesPlaceholder
from langchain.agents.format_scratchpad.openai_tools import (
    format_to_openai_tool_messages,
)
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser
from langchain.agents import tool
from langchain.agents import AgentExecutor
from langchain_community.agent_toolkits import SQLDatabaseToolkit

#set up LLM
llm = ChatOpenAI(model_name= "gpt-3.5-turbo-0125", temperature= 0)
#utilize langchain agent toolkit: SQL toolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()
llm_with_tools = llm.bind_tools(tools)

#create few shot prompt with query examples
example_prompt = ChatPromptTemplate.from_messages(["User input: {input}\nSQL query: {query}"])
few_shot_prompt = FewShotChatMessagePromptTemplate(
    examples = examples,
    example_prompt = example_prompt,
)

#create prompt for agent
prompt = ChatPromptTemplate.from_messages([
    (
        "system",
        """You are an agent designed to interact with a SQL database and a WNBA expert to answer questions about the WNBA.
        Given an input question, create a syntactically correct SQLite query to run, then look at the results of the query and return the answer.
        You should also leverage your pre-existing knowledge of WNBA rules, statistics, teams, players, and history to understand and interpret user questions and your answer accurately.
        Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 5 results.
        You can order the results by a relevant column to return the most interesting examples in the database.
        Never query for all the columns from a specific table, only ask for the relevant columns given the question.
        You have access to tools for interacting with the database.
        Base your final answer on the information returned by these tools and with your existing knowledge of the WNBA.
        You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

        When referring to a specific game, DO NOT just provide the game ID. You MUST also provide the two teams involved in the game.
        
        You MUST always execute the query and use the result to contruct the final answer. DO NOT tell me to excute the query.
        
        DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
        
        To start you should ALWAYS look at the tables in the database to see what you can query.
        Do NOT skip this step.
        Then you should query the schema of the most relevant tables.
        Here is the relevant table info: {table_names_to_use}.""",
    ),
    few_shot_prompt,
    ("user", "{input}"),
    #message placeholder for storing intermediate steps
    MessagesPlaceholder(variable_name = "agent_scratchpad"),
])

#create agent
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(
            x["intermediate_steps"]
        ),
        "table_names_to_use": lambda x: x["table_names_to_use"],
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)
agent_executor = AgentExecutor(agent = agent, tools = tools, verbose = False, return_intermediate_steps = True)

### Evaluate of text2sql chain with query examples and table descriptions chain (GPT3.5 turbo)

In [13]:
questions = test_set['question']
ground_truths = test_set['answer']

answers = []
contexts = []

# Inference
for query in questions:
    response = agent_executor.invoke({
        "input": query,
        #passing result of table chain to text2sql agent
        "table_names_to_use": table_chain.invoke(query),
    })
    answers.append(response['output'])
    contexts.append(response['intermediate_steps'])
    
# To dict
data = {
    "question": questions,
    "answer": answers,
    "ground_truths": ground_truths
}

dataset = pd.DataFrame.from_dict(data)
display(dataset)

Unnamed: 0,question,answer,ground_truths
0,Which player scored the most points in a single game between 2018 and 2024?,"The player who scored the most points in a single game between 2018 and 2024 is Liz Cambage, with a total of 53 points in a game.",A'ja Wilson & Liz Cambage
1,Which team had the highest average points per game in the 2021 season?,The team with the highest average points per game in the 2021 season was the Las Vegas Aces with an average of 89.25 points per game.,Las Vegas Aces
2,How many games ended in overtime during the 2023 season?,There were 0 games that ended in overtime during the 2023 season.,11
3,Which player had the highest free throw percentage in the 2020 season (min. 25 FT’s attempted)?,"To find the player with the highest free throw percentage in the 2020 season with a minimum of 25 free throws attempted, we can use the following SQL query:\n\n```sql\nSELECT (SUM(free_throws_made) * 1.0 / SUM(free_throws_attempted)) AS free_throw_pct, athlete_display_name\nFROM wnba_playerbox\nWHERE season = 2020\nGROUP BY athlete_display_name\nHAVING SUM(free_throws_attempted) >= 25\nORDER BY free_throw_pct DESC\nLIMIT 1;\n```\n\nThis query calculates the free throw percentage for each player in the 2020 season, filters out players with less than 25 free throws attempted, and then selects the player with the highest free throw percentage.",Tiffany Mitchell
4,What was the win-loss record of the Seattle Storm in the 2022 season?,The win-loss record of the Seattle Storm in the 2022 season is not available in the database.,25 - 17
5,Which team had the longest winning streak in the 2019 season (inclusive of postseason)?,"To determine which team had the longest winning streak in the 2019 season, including the postseason, we need to calculate the consecutive wins for each team and identify the team with the longest streak.\n\nHere is the SQL query to find the team with the longest winning streak in the 2019 season:\n\n```sql\nSELECT team_name, MAX(streak) AS longest_winning_streak\nFROM (\n SELECT team_name, COUNT(*) AS streak\n FROM wnba_teambox\n WHERE season = 2019 AND team_winner = TRUE\n GROUP BY team_name, game_date\n) AS streaks\nGROUP BY team_name\nORDER BY longest_winning_streak DESC\nLIMIT 1;\n``` \n\nThis query calculates the consecutive wins for each team in the 2019 season and postseason, then identifies the team with the longest winning streak. The result will show the team name and the length of their longest winning streak in the 2019 season.",Washington Mystics (8 wins)
6,"How many triple-doubles were recorded in the WNBA between 2018 and 2024, and who had the most triple-doubles?","Between 2018 and 2024, there were multiple triple-doubles recorded in the WNBA. The player with the most triple-doubles during this period is Courtney Vandersloot.","29 total triple doubles, and Alyssa Thomas had the most (with 12)"
7,Which team had the best defense in the 2024 season (by opponent points)?,"The team with the best defense in the 2024 season, based on opponent points, was the Connecticut Sun. They had the lowest opponent points scored against them.",Connecticut Sun
8,Identify the game with the highest combined score (both teams) from 2018 to 2024.,The game with the highest combined score (both teams) from 2018 to 2024 had a total score of 540 points. The game ID is 401558893.,"07/12/2022, Phoenix Mercury @ Minnesota Lynx"
9,Which player played the most minutes in the 2021 season?,The player who played the most minutes in the 2021 season was Skylar Diggins-Smith with a total of 1440 minutes played.,Skylar Diggins-Smith


### Set up text2sql chain with query examples and table descriptions chain (GPT4 Turbo)

In [14]:
#set up LLM
table_chain_llm = ChatOpenAI(model_name= "gpt-4-turbo", temperature= 0)
table_chain_llm_wtools = table_chain_llm.bind_tools([Table])
output_parser = PydanticToolsParser(tools=[Table])

#create table chain
table_chain = table_details_prompt | table_chain_llm_wtools | output_parser

In [15]:
#set up LLM
llm = ChatOpenAI(model_name= "gpt-4-turbo", temperature= 0)
#utilize langchain agent toolkit: SQL toolkit
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
tools = toolkit.get_tools()
llm_with_tools = llm.bind_tools(tools)


#create agent
agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_tool_messages(
            x["intermediate_steps"]
        ),
        "table_names_to_use": lambda x: x["table_names_to_use"],
    }
    | prompt
    | llm_with_tools
    | OpenAIToolsAgentOutputParser()
)
agent_executor = AgentExecutor(agent = agent, tools = tools, verbose = False, return_intermediate_steps = True)

### Evaluate of text2sql chain with query examples and table descriptions chain (GPT4 turbo)

In [16]:
questions = test_set['question']
ground_truths = test_set['answer']

answers = []
contexts = []

# Inference
for query in questions:
    response = agent_executor.invoke({
        "input": query,
        #passing result of table chain to text2sql agent
        "table_names_to_use": table_chain.invoke(query),
    })
    answers.append(response['output'])
    contexts.append(response['intermediate_steps'])
    
# To dict
data = {
    "question": questions,
    "answer": answers,
    "ground_truths": ground_truths
}

dataset = pd.DataFrame.from_dict(data)
display(dataset)

Unnamed: 0,question,answer,ground_truths
0,Which player scored the most points in a single game between 2018 and 2024?,"The player who scored the most points in a single game between 2018 and 2024 is A'ja Wilson, with 53 points in a game.",A'ja Wilson & Liz Cambage
1,Which team had the highest average points per game in the 2021 season?,"The team with the highest average points per game in the 2021 WNBA season was the Las Vegas Aces, averaging 89.25 points per game.",Las Vegas Aces
2,How many games ended in overtime during the 2023 season?,"During the 2023 WNBA season, 11 games ended in overtime.",11
3,Which player had the highest free throw percentage in the 2020 season (min. 25 FT’s attempted)?,"In the 2020 WNBA season, Tiffany Mitchell had the highest free throw percentage among players who attempted at least 25 free throws. She made 77 out of 81 free throws, achieving a free throw percentage of approximately 95.06%.",Tiffany Mitchell
4,What was the win-loss record of the Seattle Storm in the 2022 season?,"In the 2022 season, the Seattle Storm had the following win-loss records based on the type of games:\n\n- Regular Season (STD): 22 wins and 14 losses\n- Playoffs Round of 16 (RD16): 2 wins\n- Playoffs Semifinals (SEMI): 1 win and 3 losses\n\nOverall, they had a successful regular season and participated in various stages of the playoffs.",25 - 17
5,Which team had the longest winning streak in the 2019 season (inclusive of postseason)?,Agent stopped due to max iterations.,Washington Mystics (8 wins)
6,"How many triple-doubles were recorded in the WNBA between 2018 and 2024, and who had the most triple-doubles?","Between 2018 and 2024, there were a total of 29 triple-doubles recorded in the WNBA. The player with the most triple-doubles during this period was Alyssa Thomas, who achieved 12 triple-doubles.","29 total triple doubles, and Alyssa Thomas had the most (with 12)"
7,Which team had the best defense in the 2024 season (by opponent points)?,"The team with the best defense in the 2024 season, based on the lowest average opponent points per game, was the Sun, with an average of 71.08 points allowed per game.",Connecticut Sun
8,Identify the game with the highest combined score (both teams) from 2018 to 2024.,"The game with the highest combined score between 2018 and 2024 was game ID 401558893, featuring Team Stewart and Team Wilson, with a total score of 270 points.","07/12/2022, Phoenix Mercury @ Minnesota Lynx"
9,Which player played the most minutes in the 2021 season?,"The player who played the most minutes in the 2021 WNBA season was Skylar Diggins-Smith, with a total of 1440 minutes.",Skylar Diggins-Smith
