# **Experiment 2: Inference and DB Code Execution Notebook**


This notebook is for the 2nd experiment in my final project, where I test Execution Accuracy (EA) via direct code execution of generated SQL vs. a toy SQLITE database.

In [1]:
%%capture
# using unsloth, which is faster and less memory than hugging face
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes
!pip install evaluate
!pip install tbparse


In [2]:
from datasets import load_dataset, Dataset, concatenate_datasets, list_metrics,load_metric
import pandas as pd
from unsloth import FastLanguageModel,is_bfloat16_supported
import torch
import tensorboard as tb
from tbparse import SummaryReader
import tbparse
import re
from transformers import AutoTokenizer, TrainingArguments, pipeline, TrainerCallback
from random import randint
import numpy as np
import evaluate
from trl import SFTTrainer
from tqdm import tqdm
import matplotlib.pyplot as plt
import glob
import os
import sqlite3
import csv
from tqdm import tqdm

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [3]:
from google.colab import files

In [4]:
# connect to gdrive to save and retrieve trained models locally
from google.colab import drive
drive_path="/content/gdrive"
drive.mount(drive_path)

Mounted at /content/gdrive


In [5]:
max_seq_length = 2048
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # 4-bit quantization

model_dir=f"{drive_path}/MyDrive/models"
checkpoint_dir=f"{drive_path}/MyDrive/checkpoints"
sqlite_dir=f"{drive_path}/MyDrive/sqlite"


## **Create SQLite toy database**

Create database via SQL statements below and sample data loaded from CSV files. The database is composed of tables related to stock market portfolios, with no additional context given to our Llama model other than create table statements

In [6]:
conn = sqlite3.connect('toy.db')
c = conn.cursor() #

# Create table - CLIENTS
# c.execute('''CREATE TABLE test_table
#              ([id] INTEGER PRIMARY KEY, [name] text, [email] text, [joining_date] date, [salary] integer)''')

c.execute('''DROP TABLE IF EXISTS portfolios''')
conn.commit()

c.execute('''DROP TABLE IF EXISTS portfolio_holdings''')
conn.commit()

c.execute('''DROP TABLE IF EXISTS stock_prices''')
conn.commit()

# Create table - PORTFOLIOS
c.execute('''CREATE TABLE IF NOT EXISTS portfolios (portfolio_id INT, portfolio VARCHAR(50), portfolio_owner VARCHAR(50),age INT)''')
conn.commit()

file = open(f"{sqlite_dir}/portfolios.csv",encoding='utf-8-sig')
contents = csv.reader(file)
insert_records = "INSERT INTO portfolios (portfolio_id,portfolio, portfolio_owner,age) VALUES(?, ?, ?,?)"
c.executemany(insert_records, contents)

# Create table - PORTFOLIO HOLDINGS
c.execute('''CREATE TABLE IF NOT EXISTS portfolio_holdings (portfolio_id INT, date DATE,ticker VARCHAR(10),shares INT)''')
conn.commit()

file = open(f"{sqlite_dir}/portfolio_holdings.csv",encoding='utf-8-sig')
contents = csv.reader(file)
insert_records = "INSERT INTO portfolio_holdings (portfolio_id,date,ticker,shares) VALUES(?, ?,?,?)"
c.executemany(insert_records, contents)

# Create table - STOCK PRICES
c.execute('''CREATE TABLE IF NOT EXISTS stock_prices (date DATE,ticker VARCHAR(10),price NUMERIC)''')
conn.commit()

file = open(f"{sqlite_dir}/prices_melted.csv",encoding='utf-8-sig')
contents = csv.reader(file)
insert_records = "INSERT INTO stock_prices (date,ticker,price) VALUES(?, ?,?)"
c.executemany(insert_records, contents)

conn.commit()
conn.close()



SQL context that will be added to every question for this experiment.

In [7]:
sql_context="""CREATE TABLE IF NOT EXISTS portfolios (portfolio_id INT, portfolio VARCHAR(50),portfolio_owner (VARCHAR(50),age INT));
CREATE TABLE portfolio_holdings (portfolio_id INT, date DATE,ticker VARCHAR(10),shares INT);
CREATE TABLE stock_prices (date DATE,ticker VARCHAR(10),price NUMERIC);"""

Get manually prepared Q&A for code execution

In [8]:
questions_df=pd.read_csv(f"{sqlite_dir}/sql_questions_for_code_execution.csv")

In [9]:
!ls

gdrive	sample_data  toy.db


Test database connection and query portfolios table (should return 5 records):

In [23]:
conn = sqlite3.connect("toy.db")

cur = conn.cursor()
cur.execute("SELECT COUNT(*) FROM portfolio_holdings JOIN portfolios ON portfolio_holdings.portfolio_id = portfolios.portfolio_id WHERE portfolios.portfolio_owner = 'portfolioB' AND date = '2024-08-01';")
rows = cur.fetchall()

for row in rows:
  print(row)

conn.close()

(0,)


Use the same system prompt format as Experiment 1:

In [11]:
system_message = """You are a text to SQL query translator. Use SQLite dialect only. Users will ask you questions in English and you will generate a SQL query based on the provided SCHEMA only.
SCHEMA:
{schema}"""

In [12]:
def create_conversation_qa(examples):
  return {
    "messages": [
      {"role": "system", "content": system_message.format(schema=sql_context)},
      {"role": "user", "content": examples["question"]},
      {"role": "assistant", "content": "SELECT * from dont matter"}
    ]
  }

In [13]:
qa_dataset = Dataset.from_pandas(questions_df)
qa_dataset=qa_dataset.map(create_conversation_qa, remove_columns=qa_dataset.features,batched = False,)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Check memory:

## **Inference and SQL Query Execution**

In [14]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.
0.0 GB of memory reserved.


Load our saved model (Lora adapter config file):

In [15]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = f"{model_dir}/lora_model", # model named saved locally during training
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.43.3.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Set up the model for inference:

In [16]:
FastLanguageModel.for_inference(model)

In [17]:
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'JambaForCausalLM', 'JetMoeForCausalLM', 'LlamaForCausalLM', 'MambaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'MptForCausalLM'

Functions for retrieving SQL to execute, as well as executing SQL and returning result

In [18]:
def get_sql(sample):
    prompt = pipe.tokenizer.apply_chat_template(sample["messages"][:2], tokenize=False, add_generation_prompt=True)
    outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.01, top_k=50, top_p=0.95, eos_token_id=pipe.tokenizer.eos_token_id, pad_token_id=pipe.tokenizer.pad_token_id)
    question=sample['messages'][1]['content']
    predicted_answer = outputs[0]['generated_text'][len(prompt):].strip()

    return question,predicted_answer

In [19]:
def get_sqlite_answer(sql_string):

  conn = sqlite3.connect("toy.db")
  cur = conn.cursor()
  try:
    cur.execute(sql_string)
    print("hello")
    rows = cur.fetchall()
    answer=rows[0][0]
    conn.close()
    if answer is None:
      return 0
    else:
      return answer
  except sqlite3.OperationalError as e:
      # Handle specific SQL execution errors
      print(f"SQL execution error: {e}")
      return "error"
  except sqlite3.DatabaseError as e:
      # Handle general database errors
      print(f"Database error: {e}")
      return "error"
  except Exception as e:
      # Handle other exceptions
      print(f"An unexpected error occurred: {e}")
      return "error"
  finally:
    # Ensure the connection is closed
    if conn:
      conn.close()
  conn.close()


In [20]:
def convert_to_int_or_return(value):
    try:
        float_value = float(value)
        # Attempt to convert the value to an integer
        return int(float_value)
    except ValueError:
        # If a ValueError occurs, return the original value
        return value

Run the inference, collect results, and display accuracy for both the easy and medium/hard questions separately:

In [21]:
qa_easy_success_rate=[]
qa_med_hard_success_rate=[]
easy_error_count=0
med_hard_error_count=0

for s in tqdm(qa_dataset):
    orig_question,generated_sql=get_sql(s)
    print("original question was: ",orig_question)
    print("generated sql was: ",generated_sql)
    actual_answer=convert_to_int_or_return(questions_df.loc[questions_df['question']==orig_question]['answer'].values[0])
    difficulty=questions_df.loc[questions_df['question']==orig_question]['difficulty'].values[0]
    print("actual answer was: ",actual_answer)
    result=convert_to_int_or_return(get_sqlite_answer(generated_sql))
    print("sqlite answer was: ",result)
    if actual_answer==result:
      if difficulty=='easy':
        qa_easy_success_rate.append(1)
      else:
        qa_med_hard_success_rate.append(1)
    elif result=="error":
      if difficulty=='easy':
        qa_easy_success_rate.append(0)
        easy_error_count+=1
      else:
        qa_med_hard_success_rate.append(0)
        med_hard_error_count+=1
    else:
      if difficulty=='easy':
        qa_easy_success_rate.append(0)
      else:
        qa_med_hard_success_rate.append(0)
    if len(qa_easy_success_rate)>0:
      qa_easy_accuracy = sum(qa_easy_success_rate)/len(qa_easy_success_rate)
      print(f"Easy Q&A Accuracy: {qa_easy_accuracy*100:.2f}%")
      print("Easy SQL Execution Error Rate: ",easy_error_count/len(qa_easy_success_rate))
    if len(qa_med_hard_success_rate)>0:
      qa_med_hard_accuracy = sum(qa_med_hard_success_rate)/len(qa_med_hard_success_rate)
      print(f"Med/Hard Q&A Accuracy: {qa_med_hard_accuracy*100:.2f}%")
      print("Med/Hard SQL Execution Error Rate: ",med_hard_error_count/len(qa_med_hard_success_rate))

qa_easy_accuracy = sum(qa_easy_success_rate)/len(qa_easy_success_rate)
qa_med_hard_accuracy = sum(qa_med_hard_success_rate)/len(qa_med_hard_success_rate)

print(f"Easy Q&A Accuracy: {qa_easy_accuracy*100:.2f}%")
print(f"Easy SQL Execution Error Rate: {(easy_error_count/len(qa_easy_success_rate))*100:.2f}%")

print(f"Med/Hard Q&A Accuracy: {qa_med_hard_accuracy*100:.2f}%")
print(f"Med/Hard SQL Execution Error Rate: {(med_hard_error_count/len(qa_med_hard_success_rate))*100:.2f}%")


  1%|          | 1/100 [00:05<08:54,  5.40s/it]

original question was:  How many different portfolios are there?
generated sql was:  SELECT COUNT(DISTINCT portfolio) FROM portfolios;
actual answer was:  5
hello
sqlite answer was:  5
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  2%|▏         | 2/100 [00:06<04:35,  2.82s/it]

original question was:  What is the id of portfolioA?
generated sql was:  SELECT portfolio_id FROM portfolios WHERE portfolio = 'portfolioA';
actual answer was:  1
hello
sqlite answer was:  1
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  3%|▎         | 3/100 [00:07<03:08,  1.95s/it]

original question was:  What is the id of portfolioB?
generated sql was:  SELECT portfolio_id FROM portfolios WHERE portfolio = 'portfolioB';
actual answer was:  2
hello
sqlite answer was:  2
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  4%|▍         | 4/100 [00:08<02:27,  1.54s/it]

original question was:  What is the id of portfolioC?
generated sql was:  SELECT portfolio_id FROM portfolios WHERE portfolio = 'portfolioC';
actual answer was:  3
hello
sqlite answer was:  3
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  5%|▌         | 5/100 [00:09<02:06,  1.33s/it]

original question was:  What is the id of portfolioD?
generated sql was:  SELECT portfolio_id FROM portfolios WHERE portfolio = 'portfolioD';
actual answer was:  4
hello
sqlite answer was:  4
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  6%|▌         | 6/100 [00:10<01:51,  1.19s/it]

original question was:  What is the id of portfolioE?
generated sql was:  SELECT portfolio_id FROM portfolios WHERE portfolio = 'portfolioE';
actual answer was:  5
hello
sqlite answer was:  5
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  7%|▋         | 7/100 [00:11<01:41,  1.10s/it]

original question was:  how many unique tickers are there in the holdings?
generated sql was:  SELECT COUNT(DISTINCT ticker) FROM portfolio_holdings;
actual answer was:  8
hello
sqlite answer was:  8
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  8%|▊         | 8/100 [00:12<01:39,  1.08s/it]

original question was:  what is the count of holdings in AAPL historically?
generated sql was:  SELECT COUNT(*) FROM portfolio_holdings WHERE ticker = 'AAPL';
actual answer was:  2
hello
sqlite answer was:  2
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


  9%|▉         | 9/100 [00:12<01:31,  1.01s/it]

original question was:  how many different portfolio owners are there?
generated sql was:  SELECT COUNT(DISTINCT portfolio_owner) FROM portfolios;
actual answer was:  5
hello
sqlite answer was:  5
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0


 10%|█         | 10/100 [00:17<02:58,  1.98s/it]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


original question was:  what stock did portfolioB  have 200 shares of on Aug 1, 2024?
generated sql was:  SELECT ph.ticker FROM portfolio_holdings ph JOIN portfolios p ON ph.portfolio_id = p.portfolio_id JOIN stock_prices sp ON ph.ticker = sp.ticker WHERE p.portfolio_owner = 'portfolioB' AND ph.shares = 200 AND ph.date = '2024-08-01';
actual answer was:  BAC
hello
An unexpected error occurred: list index out of range
sqlite answer was:  error
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 0.00%
Med/Hard SQL Execution Error Rate:  1.0


 11%|█         | 11/100 [00:20<03:28,  2.34s/it]

original question was:  on how many different days was AAPL held?
generated sql was:  SELECT COUNT(DISTINCT ph.date) FROM portfolio_holdings ph JOIN portfolios p ON ph.portfolio_id = p.portfolio_id JOIN stock_prices sp ON ph.ticker = sp.ticker WHERE ph.ticker = 'AAPL';
actual answer was:  2
hello
sqlite answer was:  2
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 50.00%
Med/Hard SQL Execution Error Rate:  0.5


 12%|█▏        | 12/100 [00:24<04:08,  2.83s/it]

original question was:  How many stocks did portfolioA hold on August 1, 2024?
generated sql was:  SELECT COUNT(*) FROM portfolio_holdings JOIN portfolios ON portfolio_holdings.portfolio_id = portfolios.portfolio_id JOIN stock_prices ON portfolio_holdings.ticker = stock_prices.ticker WHERE portfolios.portfolio_owner = 'portfolioA' AND portfolio_holdings.date = '2024-08-01';
actual answer was:  2
hello
sqlite answer was:  0
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.3333333333333333


 13%|█▎        | 13/100 [00:28<04:55,  3.40s/it]

original question was:  How many shares of BAC did portfolioB hold on August 1, 2024?
generated sql was:  SELECT SUM(php.shares * sp.price) FROM portfolios p JOIN portfolio_holdings php ON p.portfolio_id = php.portfolio_id JOIN stock_prices sp ON php.ticker = sp.ticker WHERE p.portfolio_owner = 'portfolioB' AND php.date = '2024-08-01' AND php.ticker = 'BAC';
actual answer was:  200
hello
sqlite answer was:  0
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 25.00%
Med/Hard SQL Execution Error Rate:  0.25


 14%|█▍        | 14/100 [00:32<05:02,  3.51s/it]

original question was:  what is the portfolio name of the portfolio that held TSLA on August 1, 2024?
generated sql was:  SELECT p.portfolio FROM portfolios p JOIN portfolio_holdings ph ON p.portfolio_id = ph.portfolio_id JOIN stock_prices sp ON ph.ticker = sp.ticker WHERE ph.date = '2024-08-01' AND sp.ticker = 'TSLA';
actual answer was:  portfolioC
hello
An unexpected error occurred: list index out of range
sqlite answer was:  error
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 20.00%
Med/Hard SQL Execution Error Rate:  0.4


 15%|█▌        | 15/100 [00:37<05:24,  3.82s/it]

original question was:  How many shares of MSFT did Xiao hold on Jan 1, 2024?
generated sql was:  SELECT SUM(php.shares) FROM portfolios p JOIN portfolio_holdings php ON p.portfolio_id = php.portfolio_id JOIN stock_prices sp ON php.ticker = sp.ticker WHERE p.portfolio_owner = 'Xiao' AND php.date = '2024-01-01' AND sp.ticker = 'MSFT';
actual answer was:  123
hello
sqlite answer was:  0
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 16.67%
Med/Hard SQL Execution Error Rate:  0.3333333333333333


 16%|█▌        | 16/100 [00:41<05:42,  4.08s/it]

original question was:  How many shares of XOM did Chun have on Jan 1, 2024?
generated sql was:  SELECT SUM(php.shares * sp.price) FROM portfolios p JOIN portfolio_holdings php ON p.portfolio_id = php.portfolio_id JOIN stock_prices sp ON php.ticker = sp.ticker WHERE p.portfolio_owner = 'Chun' AND php.date = '2024-01-01' AND php.ticker = 'XOM';
actual answer was:  222
hello
sqlite answer was:  0
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 17%|█▋        | 17/100 [00:43<04:37,  3.34s/it]

original question was:  What was the price of AAPL on Jan 2, 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-01-02' AND ticker = 'AAPL';
actual answer was:  185
hello
sqlite answer was:  185
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 18%|█▊        | 18/100 [00:45<03:52,  2.83s/it]

original question was:  what was the price of GOOG on Jan 31, 2024?
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-01-31' AND ticker = 'GOOG';
actual answer was:  141
hello
sqlite answer was:  141
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 19%|█▉        | 19/100 [00:46<03:19,  2.47s/it]

original question was:  what was the price of MSFT on May 3, 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-05-03' AND ticker = 'MSFT';
actual answer was:  405
hello
sqlite answer was:  405
Easy Q&A Accuracy: 100.00%
Easy SQL Execution Error Rate:  0.0
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 20%|██        | 20/100 [00:48<02:57,  2.22s/it]

original question was:  what was the price of MSFT on April 3rd
generated sql was:  SELECT price FROM stock_prices WHERE date = '2022-04-03' AND ticker = 'MSFT';
actual answer was:  419
hello
An unexpected error occurred: list index out of range
sqlite answer was:  error
Easy Q&A Accuracy: 92.31%
Easy SQL Execution Error Rate:  0.07692307692307693
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 21%|██        | 21/100 [00:50<02:40,  2.04s/it]

original question was:  what was the stock price of microsoft on April 3rd
generated sql was:  SELECT price FROM stock_prices WHERE date = '2022-04-03' AND ticker = 'MSFT';
actual answer was:  419
hello
An unexpected error occurred: list index out of range
sqlite answer was:  error
Easy Q&A Accuracy: 85.71%
Easy SQL Execution Error Rate:  0.14285714285714285
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 22%|██▏       | 22/100 [00:51<02:29,  1.92s/it]

original question was:  what was the price of Apple stock on jan 2, 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-01-02' AND ticker = 'AAPL';
actual answer was:  185
hello
sqlite answer was:  185
Easy Q&A Accuracy: 86.67%
Easy SQL Execution Error Rate:  0.13333333333333333
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 23%|██▎       | 23/100 [00:53<02:20,  1.82s/it]

original question was:  what was the price of google on jan 31, 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-01-31' AND ticker = 'GOOG';
actual answer was:  141
hello
sqlite answer was:  141
Easy Q&A Accuracy: 87.50%
Easy SQL Execution Error Rate:  0.125
Med/Hard Q&A Accuracy: 14.29%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 24%|██▍       | 24/100 [01:01<04:48,  3.80s/it]

original question was:  was the stock price of AAPL higher or lower the day after jan 31, 2024?
generated sql was:  SELECT 'higher' as price_trend FROM stock_prices sp1 INNER JOIN stock_prices sp2 ON sp1.ticker = sp2.ticker WHERE sp1.date = '2024-01-31' AND sp2.date = '2024-02-01' AND sp1.price < sp2.price UNION SELECT 'lower' as price_trend FROM stock_prices sp1 INNER JOIN stock_prices sp2 ON sp1.ticker = sp2.ticker WHERE sp1.date = '2024-01-31' AND sp2.date = '2024-02-01' AND sp1.price > sp2.price;
actual answer was:  higher
hello
sqlite answer was:  higher
Easy Q&A Accuracy: 87.50%
Easy SQL Execution Error Rate:  0.125
Med/Hard Q&A Accuracy: 25.00%
Med/Hard SQL Execution Error Rate:  0.25


 25%|██▌       | 25/100 [01:02<03:42,  2.97s/it]

original question was:  How many portfolio owners are named Chun Li in this firm?
generated sql was:  SELECT COUNT(*) FROM portfolios WHERE portfolio_owner = 'Chun Li';
actual answer was:  1
hello
sqlite answer was:  1
Easy Q&A Accuracy: 88.24%
Easy SQL Execution Error Rate:  0.11764705882352941
Med/Hard Q&A Accuracy: 25.00%
Med/Hard SQL Execution Error Rate:  0.25


 26%|██▌       | 26/100 [01:03<02:56,  2.39s/it]

original question was:  How many people are named Chun Li in this firm?
generated sql was:  SELECT COUNT(*) FROM portfolios WHERE portfolio_owner = 'Chun Li';
actual answer was:  1
hello
sqlite answer was:  1
Easy Q&A Accuracy: 88.24%
Easy SQL Execution Error Rate:  0.11764705882352941
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 27%|██▋       | 27/100 [01:04<02:23,  1.96s/it]

original question was:  How many owner's names start with the letter C?
generated sql was:  SELECT COUNT(*) FROM portfolios WHERE portfolio_owner LIKE 'C%';
actual answer was:  1
hello
sqlite answer was:  1
Easy Q&A Accuracy: 88.89%
Easy SQL Execution Error Rate:  0.1111111111111111
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 28%|██▊       | 28/100 [01:05<02:00,  1.67s/it]

original question was:  How many owner's names start with the letter J?
generated sql was:  SELECT COUNT(*) FROM portfolios WHERE portfolio_owner LIKE 'J%';
actual answer was:  3
hello
sqlite answer was:  3
Easy Q&A Accuracy: 89.47%
Easy SQL Execution Error Rate:  0.10526315789473684
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 29%|██▉       | 29/100 [01:06<01:42,  1.44s/it]

original question was:  who is the owner of portfolioC?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio = 'portfolioC';
actual answer was:  Jen Stevens
hello
sqlite answer was:  Jen Stevens
Easy Q&A Accuracy: 90.00%
Easy SQL Execution Error Rate:  0.1
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 30%|███       | 30/100 [01:07<01:30,  1.29s/it]

original question was:  who is the manager of portfolioC?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio = 'portfolioC';
actual answer was:  Jen Stevens
hello
sqlite answer was:  Jen Stevens
Easy Q&A Accuracy: 90.48%
Easy SQL Execution Error Rate:  0.09523809523809523
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 31%|███       | 31/100 [01:08<01:20,  1.17s/it]

original question was:  who is the owner of portfolioB?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio = 'portfolioB';
actual answer was:  Jack Brody
hello
sqlite answer was:  Jack Brody
Easy Q&A Accuracy: 90.91%
Easy SQL Execution Error Rate:  0.09090909090909091
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 32%|███▏      | 32/100 [01:09<01:14,  1.10s/it]

original question was:  who is the manager of portfolioB?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio = 'portfolioB';
actual answer was:  Jack Brody
hello
sqlite answer was:  Jack Brody
Easy Q&A Accuracy: 91.30%
Easy SQL Execution Error Rate:  0.08695652173913043
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 33%|███▎      | 33/100 [01:10<01:09,  1.04s/it]

original question was:  who owns portfolio C?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio = 'C';
actual answer was:  Jen Stevens
hello
An unexpected error occurred: list index out of range
sqlite answer was:  error
Easy Q&A Accuracy: 87.50%
Easy SQL Execution Error Rate:  0.125
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 34%|███▍      | 34/100 [01:11<01:04,  1.02it/s]

original question was:  who manages portfolio C?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio = 'C';
actual answer was:  Jen Stevens
hello
An unexpected error occurred: list index out of range
sqlite answer was:  error
Easy Q&A Accuracy: 84.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 35%|███▌      | 35/100 [01:12<01:20,  1.24s/it]

original question was:  does anyone with the partial name 'Doe' own a portfolio?
generated sql was:  SELECT 'Yes' FROM portfolios WHERE portfolio_owner LIKE '%Doe%' GROUP BY (1) HAVING COUNT(*) > 0;
actual answer was:  John Doe
hello
sqlite answer was:  Yes
Easy Q&A Accuracy: 80.77%
Easy SQL Execution Error Rate:  0.15384615384615385
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 36%|███▌      | 36/100 [01:13<01:11,  1.12s/it]

original question was:  what is the age of portfolioC?
generated sql was:  SELECT age FROM portfolios WHERE portfolio = 'portfolioC';
actual answer was:  38
hello
sqlite answer was:  38
Easy Q&A Accuracy: 81.48%
Easy SQL Execution Error Rate:  0.14814814814814814
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 37%|███▋      | 37/100 [01:14<01:07,  1.08s/it]

original question was:  what is the age of Jen Stevens?
generated sql was:  SELECT age FROM portfolios WHERE portfolio_owner = 'Jen Stevens';
actual answer was:  38
hello
sqlite answer was:  38
Easy Q&A Accuracy: 82.14%
Easy SQL Execution Error Rate:  0.14285714285714285
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 38%|███▊      | 38/100 [01:15<01:02,  1.01s/it]

original question was:  how old is portfolioB?
generated sql was:  SELECT age FROM portfolios WHERE portfolio = 'portfolioB';
actual answer was:  43
hello
sqlite answer was:  43
Easy Q&A Accuracy: 82.76%
Easy SQL Execution Error Rate:  0.13793103448275862
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 39%|███▉      | 39/100 [01:16<01:00,  1.01it/s]

original question was:  how old is Chun Li?
generated sql was:  SELECT age FROM portfolios WHERE portfolio_owner = 'Chun Li';
actual answer was:  59
hello
sqlite answer was:  59
Easy Q&A Accuracy: 83.33%
Easy SQL Execution Error Rate:  0.13333333333333333
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 40%|████      | 40/100 [01:17<00:52,  1.15it/s]

original question was:  what is the sum of ages of all portfolios?
generated sql was:  SELECT SUM(age) FROM portfolios;
actual answer was:  210
hello
sqlite answer was:  210
Easy Q&A Accuracy: 83.87%
Easy SQL Execution Error Rate:  0.12903225806451613
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 41%|████      | 41/100 [01:18<00:58,  1.02it/s]

original question was:  what is the minimum portfolio age?
generated sql was:  SELECT MIN(DATEDIFF('now',portfolios.date_of_birth)) FROM portfolios;
actual answer was:  25
SQL execution error: no such function: DATEDIFF
sqlite answer was:  error
Easy Q&A Accuracy: 81.25%
Easy SQL Execution Error Rate:  0.15625
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 42%|████▏     | 42/100 [01:19<01:01,  1.06s/it]

original question was:  what is the maximum portfolio age?
generated sql was:  SELECT MAX(DATEDIFF('now',portfolios.date_of_birth)) FROM portfolios;
actual answer was:  59
SQL execution error: no such function: DATEDIFF
sqlite answer was:  error
Easy Q&A Accuracy: 78.79%
Easy SQL Execution Error Rate:  0.18181818181818182
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 43%|████▎     | 43/100 [01:22<01:35,  1.68s/it]

original question was:  what is the second highest portfolio age?
generated sql was:  SELECT MAX(age) FROM (SELECT YEAR(CURRENT_DATE) - YEAR(portfolio_owner.birthday) AS age FROM portfolios WHERE portfolio = 'Growth' GROUP BY age ORDER BY age DESC LIMIT 2) AS subquery;
actual answer was:  45
SQL execution error: no such column: portfolio_owner.birthday
sqlite answer was:  error
Easy Q&A Accuracy: 78.79%
Easy SQL Execution Error Rate:  0.18181818181818182
Med/Hard Q&A Accuracy: 30.00%
Med/Hard SQL Execution Error Rate:  0.3


 44%|████▍     | 44/100 [01:24<01:36,  1.71s/it]

original question was:  what is the second lowest portfolio age?
generated sql was:  SELECT MIN(age) FROM (SELECT age FROM portfolios WHERE age > (SELECT MIN(age) FROM portfolios) ) AS subquery;
actual answer was:  38
hello
sqlite answer was:  38
Easy Q&A Accuracy: 78.79%
Easy SQL Execution Error Rate:  0.18181818181818182
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 45%|████▌     | 45/100 [01:25<01:26,  1.57s/it]

original question was:  what is the average portfolio age?
generated sql was:  SELECT AVG(DATEDIFF('now',portfolios.date_of_birth)) FROM portfolios;
actual answer was:  42
SQL execution error: no such function: DATEDIFF
sqlite answer was:  error
Easy Q&A Accuracy: 76.47%
Easy SQL Execution Error Rate:  0.20588235294117646
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 46%|████▌     | 46/100 [01:26<01:16,  1.41s/it]

original question was:  Does anyone with the partial name 'Chun' run a portfolio?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio_owner LIKE '%Chun%';
actual answer was:  Chun Li
hello
sqlite answer was:  Chun Li
Easy Q&A Accuracy: 77.14%
Easy SQL Execution Error Rate:  0.2
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 47%|████▋     | 47/100 [01:27<01:07,  1.28s/it]

original question was:  Does anyone with the partial name Steve run a portfolio?
generated sql was:  SELECT portfolio_owner FROM portfolios WHERE portfolio_owner LIKE '%Steve%';
actual answer was:  Jen Stevens
hello
sqlite answer was:  Jen Stevens
Easy Q&A Accuracy: 77.78%
Easy SQL Execution Error Rate:  0.19444444444444445
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 48%|████▊     | 48/100 [01:29<01:11,  1.38s/it]

original question was:  What is the price of AAPL on Jan 30 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-01-30' AND ticker = 'AAPL';
actual answer was:  187
hello
sqlite answer was:  187
Easy Q&A Accuracy: 78.38%
Easy SQL Execution Error Rate:  0.1891891891891892
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 49%|████▉     | 49/100 [01:30<01:14,  1.45s/it]

original question was:  What was AAPL's price on the last day of April in 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-04-30' AND ticker = 'AAPL';
actual answer was:  170
hello
sqlite answer was:  170
Easy Q&A Accuracy: 78.95%
Easy SQL Execution Error Rate:  0.18421052631578946
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 50%|█████     | 50/100 [01:32<01:16,  1.52s/it]

original question was:  What was AAPL's price on the first day of May 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-05-01' AND ticker = 'AAPL';
actual answer was:  169
hello
sqlite answer was:  169
Easy Q&A Accuracy: 79.49%
Easy SQL Execution Error Rate:  0.1794871794871795
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 51%|█████     | 51/100 [01:34<01:15,  1.55s/it]

original question was:  What was GOOG's price on May 1, 2024
generated sql was:  SELECT price FROM stock_prices WHERE ticker = 'GOOG' AND date = '2024-05-01';
actual answer was:  165
hello
sqlite answer was:  165
Easy Q&A Accuracy: 80.00%
Easy SQL Execution Error Rate:  0.175
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 52%|█████▏    | 52/100 [01:38<01:55,  2.41s/it]

original question was:  What was MSFT's price on Jan 29, 2024 minus its price on Jan 26, 2024?
generated sql was:  SELECT (SP1.price - SP0.price) FROM stock_prices SP0 INNER JOIN stock_prices SP1 ON SP0.ticker = SP1.ticker WHERE SP0.date = '2024-01-26' AND SP1.date = '2024-01-29' AND SP0.ticker = 'MSFT';
actual answer was:  5
hello
sqlite answer was:  5
Easy Q&A Accuracy: 80.49%
Easy SQL Execution Error Rate:  0.17073170731707318
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 53%|█████▎    | 53/100 [01:40<01:42,  2.17s/it]

original question was:  What was MSFT's price on Feb 1 2024
generated sql was:  SELECT price FROM stock_prices WHERE date = '2024-02-01' AND ticker = 'MSFT';
actual answer was:  402
hello
sqlite answer was:  402
Easy Q&A Accuracy: 80.95%
Easy SQL Execution Error Rate:  0.16666666666666666
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 54%|█████▍    | 54/100 [01:41<01:23,  1.81s/it]

original question was:  How many prices for MSFT are in the prices table?
generated sql was:  SELECT COUNT(*) FROM stock_prices WHERE ticker = 'MSFT';
actual answer was:  148
hello
sqlite answer was:  148
Easy Q&A Accuracy: 81.40%
Easy SQL Execution Error Rate:  0.16279069767441862
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 55%|█████▌    | 55/100 [01:42<01:10,  1.56s/it]

original question was:  How many prices for AAPL are in the database?
generated sql was:  SELECT COUNT(*) FROM stock_prices WHERE ticker = 'AAPL';
actual answer was:  148
hello
sqlite answer was:  148
Easy Q&A Accuracy: 81.82%
Easy SQL Execution Error Rate:  0.1590909090909091
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 56%|█████▌    | 56/100 [01:42<00:55,  1.27s/it]

original question was:  What is the total count of prices in the prices table?
generated sql was:  SELECT COUNT(*) FROM stock_prices;
actual answer was:  445
hello
sqlite answer was:  445
Easy Q&A Accuracy: 82.22%
Easy SQL Execution Error Rate:  0.15555555555555556
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 57%|█████▋    | 57/100 [01:43<00:49,  1.14s/it]

original question was:  How many unique days are there in the prices table?
generated sql was:  SELECT COUNT(DISTINCT date) FROM stock_prices;
actual answer was:  148
hello
sqlite answer was:  149
Easy Q&A Accuracy: 80.43%
Easy SQL Execution Error Rate:  0.15217391304347827
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 58%|█████▊    | 58/100 [01:45<00:59,  1.42s/it]

original question was:  What is the price of MSFT on the most recent day?
generated sql was:  SELECT price FROM stock_prices WHERE ticker = 'MSFT' AND date = (SELECT MAX(date) FROM stock_prices WHERE ticker = 'MSFT');
actual answer was:  408
hello
sqlite answer was:  408
Easy Q&A Accuracy: 80.85%
Easy SQL Execution Error Rate:  0.14893617021276595
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 59%|█████▉    | 59/100 [01:46<00:48,  1.19s/it]

original question was:  What is the most recent date in the prices table?
generated sql was:  SELECT MAX(date) FROM stock_prices;
actual answer was:  8/2/2024
hello
sqlite answer was:  date
Easy Q&A Accuracy: 79.17%
Easy SQL Execution Error Rate:  0.14583333333333334
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 60%|██████    | 60/100 [01:47<00:43,  1.09s/it]

original question was:  How many unique tickers are there in the prices table?
generated sql was:  SELECT COUNT(DISTINCT ticker) FROM stock_prices;
actual answer was:  3
hello
sqlite answer was:  4
Easy Q&A Accuracy: 77.55%
Easy SQL Execution Error Rate:  0.14285714285714285
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 61%|██████    | 61/100 [01:50<01:03,  1.63s/it]

original question was:  How many duplicate tickers are there in the holdings table?
generated sql was:  SELECT COUNT(*) FROM (SELECT holdings.ticker FROM portfolios JOIN portfolio_holdings ON portfolios.portfolio_id = portfolio_holdings.portfolio_id GROUP BY holdings.ticker HAVING COUNT(*) > 1) AS duplicates;
actual answer was:  2
SQL execution error: no such column: holdings.ticker
sqlite answer was:  error
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 36.36%
Med/Hard SQL Execution Error Rate:  0.2727272727272727


 62%|██████▏   | 62/100 [01:55<01:46,  2.80s/it]

original question was:  What is the market value of portfolioA's AAPL holding on Aug 1, 2024
generated sql was:  SELECT SUM(shares * price) FROM portfolio_holdings JOIN stock_prices ON portfolio_holdings.ticker = stock_prices.ticker WHERE portfolio_holdings.portfolio_id IN (SELECT portfolio_id FROM portfolios WHERE portfolio = 'portfolioA' AND portfolio_owner = 'John Doe' AND age = 35) AND stock_prices.date = '2024-08-01' AND portfolio_holdings.ticker = 'AAPL';
actual answer was:  21836
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.25


 63%|██████▎   | 63/100 [02:00<02:10,  3.54s/it]

original question was:  What is the market value of portfolioA's AAPL holding on Aug 2, 2024
generated sql was:  SELECT SUM(shares * price) FROM portfolio_holdings JOIN stock_prices ON portfolio_holdings.ticker = stock_prices.ticker WHERE portfolio_holdings.portfolio_id IN (SELECT portfolio_id FROM portfolios WHERE portfolio = 'portfolioA' AND portfolio_owner = 'John Doe' AND age = 35) AND stock_prices.date = '2024-08-02' AND ticker = 'AAPL';
actual answer was:  43972
SQL execution error: ambiguous column name: ticker
sqlite answer was:  error
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 30.77%
Med/Hard SQL Execution Error Rate:  0.3076923076923077


 64%|██████▍   | 64/100 [02:04<02:11,  3.65s/it]

original question was:  Which portfolio held the most positions on Aug 2, 2024
generated sql was:  SELECT portfolio, MAX( COUNT(DISTINCT ticker) ) FROM portfolio_holdings JOIN portfolios ON portfolio_holdings.portfolio_id = portfolios.portfolio_id JOIN stock_prices ON portfolio_holdings.ticker = stock_prices.ticker WHERE date = '2024-08-02' GROUP BY portfolio;
actual answer was:  portfolioA
SQL execution error: ambiguous column name: ticker
sqlite answer was:  error
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 28.57%
Med/Hard SQL Execution Error Rate:  0.35714285714285715


 65%|██████▌   | 65/100 [02:08<02:12,  3.78s/it]

original question was:  What is the sum of shares held by portfolioB on Aug 1, 2024
generated sql was:  SELECT SUM( holdings.shares ) FROM portfolios portfolio_owner JOIN portfolio_holdings holdings ON portfolio_owner.portfolio_id = holdings.portfolio_id JOIN stock_prices prices ON holdings.ticker = prices.ticker WHERE portfolio_owner.portfolio = 'portfolioB' AND holdings.date = '2024-08-01';
actual answer was:  0
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 33.33%
Med/Hard SQL Execution Error Rate:  0.3333333333333333


 66%|██████▌   | 66/100 [02:11<01:59,  3.51s/it]

original question was:  What is the count of positions held by portfolioB on Aug 1, 2024
generated sql was:  SELECT COUNT(*) FROM portfolio_holdings JOIN portfolios ON portfolio_holdings.portfolio_id = portfolios.portfolio_id WHERE portfolios.portfolio_owner = 'portfolioB' AND date = '2024-08-01';
actual answer was:  2
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 31.25%
Med/Hard SQL Execution Error Rate:  0.3125


 67%|██████▋   | 67/100 [02:16<02:11,  3.98s/it]

original question was:  Which holding was largest for portfolioB on Aug 1, 2024
generated sql was:  SELECT ph.ticker, ph.shares FROM portfolio_holdings ph INNER JOIN (SELECT portfolio_id, MAX(shares) AS max_shares FROM portfolio_holdings WHERE portfolio_owner = 'portfolioB' AND date = '2024-08-01' GROUP BY portfolio_id) m ON ph.portfolio_id = m.portfolio_id AND ph.shares = m.max_shares;
actual answer was:  BAC
SQL execution error: no such column: portfolio_owner
sqlite answer was:  error
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 29.41%
Med/Hard SQL Execution Error Rate:  0.35294117647058826


 68%|██████▊   | 68/100 [02:21<02:12,  4.15s/it]

original question was:  Which holding was smallest for portfolioB on Aug 1, 2024
generated sql was:  SELECT ph.ticker, ph.shares FROM portfolio_holdings ph JOIN portfolios p ON ph.portfolio_id = p.portfolio_id JOIN stock_prices sp ON ph.ticker = sp.ticker WHERE p.portfolio = 'portfolioB' AND ph.date = '2024-08-01' ORDER BY ph.shares LIMIT 1;
actual answer was:  JPM
hello
An unexpected error occurred: list index out of range
sqlite answer was:  error
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 27.78%
Med/Hard SQL Execution Error Rate:  0.3888888888888889


 69%|██████▉   | 69/100 [02:25<02:08,  4.14s/it]

original question was:  What is the sum of the negative positions across all holdings?
generated sql was:  SELECT SUM( holdings.shares * stock_prices.price ) AS negative_positions_sum FROM portfolios JOIN portfolio_holdings ON portfolios.portfolio_id = portfolio_holdings.portfolio_id JOIN stock_prices ON portfolio_holdings.ticker = stock_prices.ticker WHERE (holdings.shares * stock_prices.price) < 0;
actual answer was:  -800
SQL execution error: no such column: holdings.shares
sqlite answer was:  error
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 26.32%
Med/Hard SQL Execution Error Rate:  0.42105263157894735


 70%|███████   | 70/100 [02:30<02:09,  4.32s/it]

original question was:  What is the sum of all the positive positions for portfolio A as of the latest holdings date?
generated sql was:  SELECT SUM( holdings.shares * stock_prices.price ) FROM portfolios JOIN portfolio_holdings AS holdings ON portfolios.portfolio_id = holdings.portfolio_id JOIN stock_prices ON holdings.ticker = stock_prices.ticker WHERE portfolios.portfolio = 'portfolio A' AND holdings.date = ( SELECT MAX(date) FROM portfolio_holdings WHERE portfolio_id = portfolios.portfolio_id );
actual answer was:  500
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 25.00%
Med/Hard SQL Execution Error Rate:  0.4


 71%|███████   | 71/100 [02:31<01:39,  3.44s/it]

original question was:  What is the sum of prices for all stocks on the first day in the prices table?
generated sql was:  SELECT SUM(price) FROM stock_prices WHERE date = (SELECT MIN(date) FROM stock_prices);
actual answer was:  694
hello
sqlite answer was:  694
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 28.57%
Med/Hard SQL Execution Error Rate:  0.38095238095238093


 72%|███████▏  | 72/100 [02:35<01:35,  3.41s/it]

original question was:  What is the sum of AAPL and GOOG on the first day in the prices table?
generated sql was:  SELECT SUM(sp.price * ph.shares) FROM stock_prices sp JOIN portfolio_holdings ph ON sp.date = ph.date AND sp.ticker = ph.ticker WHERE sp.ticker IN ('AAPL', 'GOOG') LIMIT 1;
actual answer was:  324
hello
sqlite answer was:  99083
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 27.27%
Med/Hard SQL Execution Error Rate:  0.36363636363636365


 73%|███████▎  | 73/100 [02:36<01:17,  2.89s/it]

original question was:  Which stock had the highest price on Jan 2, 2024?
generated sql was:  SELECT ticker, MAX(price) FROM stock_prices WHERE date = '2024-01-02' GROUP BY ticker;
actual answer was:  MSFT
hello
sqlite answer was:  AAPL
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 26.09%
Med/Hard SQL Execution Error Rate:  0.34782608695652173


 74%|███████▍  | 74/100 [02:38<01:06,  2.54s/it]

original question was:  Which stock had the lowest price on the first day in the database?
generated sql was:  SELECT ticker, MIN(price) FROM stock_prices WHERE date = (SELECT MIN(date) FROM stock_prices) GROUP BY ticker;
actual answer was:  GOOG
hello
sqlite answer was:  AAPL
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 25.00%
Med/Hard SQL Execution Error Rate:  0.3333333333333333


 75%|███████▌  | 75/100 [02:40<01:02,  2.49s/it]

original question was:  What is the max stock price of all stocks for January 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-01-01';
actual answer was:  408
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 24.00%
Med/Hard SQL Execution Error Rate:  0.32


 76%|███████▌  | 76/100 [02:42<00:51,  2.13s/it]

original question was:  What is the minimum stock price for January 2024?
generated sql was:  SELECT MIN(price) FROM stock_prices WHERE date = '2024-01-01';
actual answer was:  137
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 23.08%
Med/Hard SQL Execution Error Rate:  0.3076923076923077


 77%|███████▋  | 77/100 [02:44<00:50,  2.21s/it]

original question was:  What is the max stock price of all stocks for February 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-02-01';
actual answer was:  419
hello
sqlite answer was:  402
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 22.22%
Med/Hard SQL Execution Error Rate:  0.2962962962962963


 78%|███████▊  | 78/100 [02:46<00:49,  2.26s/it]

original question was:  What is the min stock price of all stocks for February 2024
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-02-01';
actual answer was:  137
hello
sqlite answer was:  142
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 21.43%
Med/Hard SQL Execution Error Rate:  0.2857142857142857


 79%|███████▉  | 79/100 [02:49<00:48,  2.29s/it]

original question was:  What is the max stock price of all stocks for March 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-03-01';
actual answer was:  428
hello
sqlite answer was:  414
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 20.69%
Med/Hard SQL Execution Error Rate:  0.27586206896551724


 80%|████████  | 80/100 [02:51<00:45,  2.29s/it]

original question was:  What is the min stock price of all stocks for March 2024
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-03-01';
actual answer was:  132
hello
sqlite answer was:  137
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 20.00%
Med/Hard SQL Execution Error Rate:  0.26666666666666666


 81%|████████  | 81/100 [02:53<00:44,  2.32s/it]

original question was:  What is the max stock price of all stocks for April 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-04-01';
actual answer was:  427
hello
sqlite answer was:  423
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 19.35%
Med/Hard SQL Execution Error Rate:  0.25806451612903225


 82%|████████▏ | 82/100 [02:56<00:42,  2.34s/it]

original question was:  What is the min stock price of all stocks for April 2024
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-04-01';
actual answer was:  151
hello
sqlite answer was:  156
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 18.75%
Med/Hard SQL Execution Error Rate:  0.25


 83%|████████▎ | 83/100 [02:58<00:40,  2.36s/it]

original question was:  What is the max stock price of all stocks for May 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-05-01';
actual answer was:  430
hello
sqlite answer was:  394
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 18.18%
Med/Hard SQL Execution Error Rate:  0.24242424242424243


 84%|████████▍ | 84/100 [03:01<00:37,  2.35s/it]

original question was:  What is the min stock price of all stocks for May 2024
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-05-01';
actual answer was:  165
hello
sqlite answer was:  165
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 20.59%
Med/Hard SQL Execution Error Rate:  0.23529411764705882


 85%|████████▌ | 85/100 [03:03<00:35,  2.35s/it]

original question was:  What is the max stock price of all stocks for June 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-06-01';
actual answer was:  452
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 20.00%
Med/Hard SQL Execution Error Rate:  0.22857142857142856


 86%|████████▌ | 86/100 [03:05<00:32,  2.34s/it]

original question was:  What is the min stock price of all stocks for June 2024
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-06-01';
actual answer was:  174
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 19.44%
Med/Hard SQL Execution Error Rate:  0.2222222222222222


 87%|████████▋ | 87/100 [03:08<00:30,  2.35s/it]

original question was:  What is the max stock price of all stocks for July 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-07-01';
actual answer was:  467
hello
sqlite answer was:  456
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 18.92%
Med/Hard SQL Execution Error Rate:  0.21621621621621623


 88%|████████▊ | 88/100 [03:10<00:28,  2.36s/it]

original question was:  What is the min stock price of all stocks for July 2024
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp INNER JOIN portfolio_holdings ph ON sp.ticker = ph.ticker WHERE sp.date = '2024-07-01';
actual answer was:  168
hello
sqlite answer was:  184
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 18.42%
Med/Hard SQL Execution Error Rate:  0.21052631578947367


 89%|████████▉ | 89/100 [03:11<00:23,  2.10s/it]

original question was:  What is the max stock price of all stocks for Aug 2024
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp WHERE sp.date = '2024-08-01';
actual answer was:  417
hello
sqlite answer was:  417
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 20.51%
Med/Hard SQL Execution Error Rate:  0.20512820512820512


 90%|█████████ | 90/100 [03:14<00:22,  2.29s/it]

original question was:  What is the min stock price of all stocks for Aug 2024
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp JOIN portfolio_holdings ph ON sp.date = ph.date AND sp.ticker = ph.ticker WHERE sp.date = '2024-08-01';
actual answer was:  168
hello
sqlite answer was:  172
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 20.00%
Med/Hard SQL Execution Error Rate:  0.2


 91%|█████████ | 91/100 [03:16<00:18,  2.07s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 19.51%
Med/Hard SQL Execution Error Rate:  0.1951219512195122


 92%|█████████▏| 92/100 [03:17<00:15,  1.91s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 19.05%
Med/Hard SQL Execution Error Rate:  0.19047619047619047


 93%|█████████▎| 93/100 [03:20<00:14,  2.10s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices INNER JOIN portfolio_holdings ON stock_prices.ticker = portfolio_holdings.ticker WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 18.60%
Med/Hard SQL Execution Error Rate:  0.18604651162790697


 94%|█████████▍| 94/100 [03:21<00:11,  1.94s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 18.18%
Med/Hard SQL Execution Error Rate:  0.18181818181818182


 95%|█████████▌| 95/100 [03:23<00:09,  1.83s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 17.78%
Med/Hard SQL Execution Error Rate:  0.17777777777777778


 96%|█████████▌| 96/100 [03:25<00:06,  1.74s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 17.39%
Med/Hard SQL Execution Error Rate:  0.17391304347826086


 97%|█████████▋| 97/100 [03:26<00:05,  1.68s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 17.02%
Med/Hard SQL Execution Error Rate:  0.1702127659574468


 98%|█████████▊| 98/100 [03:29<00:03,  1.95s/it]

original question was:  What is the average stock price for Jan 2024
generated sql was:  SELECT AVG(stock_prices.price) FROM stock_prices INNER JOIN portfolio_holdings ON stock_prices.ticker = portfolio_holdings.ticker WHERE stock_prices.date = '2024-01-01';
actual answer was:  240
hello
sqlite answer was:  0
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 16.67%
Med/Hard SQL Execution Error Rate:  0.16666666666666666


 99%|█████████▉| 99/100 [03:29<00:01,  1.60s/it]

original question was:  What is the max stock price for the whole year
generated sql was:  SELECT MAX(sp.price) FROM stock_prices sp;
actual answer was:  132
hello
sqlite answer was:  price
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 16.33%
Med/Hard SQL Execution Error Rate:  0.16326530612244897


100%|██████████| 100/100 [03:30<00:00,  2.11s/it]

original question was:  What is the min stock price for the whole year
generated sql was:  SELECT MIN(sp.price) FROM stock_prices sp;
actual answer was:  467
hello
sqlite answer was:  132
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate:  0.16
Med/Hard Q&A Accuracy: 16.00%
Med/Hard SQL Execution Error Rate:  0.16
Easy Q&A Accuracy: 76.00%
Easy SQL Execution Error Rate: 16.00%
Med/Hard Q&A Accuracy: 16.00%
Med/Hard SQL Execution Error Rate: 16.00%



