### Demo notebook querying the Eunomia database
- Eunomia contains synthetic OMOP data (no real clinical data)
- Results are very poor when querying MSSQL database. Frequently fails to perform basic intermediate tasks, such as table and column retrieval. It is unclear what the root cause of the problem is when using MSSQL.
- Agent performs reasonably well with a SQLite database. Results shown below are for SQLite.
- SQL agent using zero shot ReAct is sensitive to changes to format_instructions--improper changes can cause runtime errors.
- Uses Langchain SQL Database agent with ReAct prompting to query the database
- Must use completion endpoint, NOT the chat completion endpoints with the SQLDatabase agent


In [1]:
from dotenv import load_dotenv
from langchain_openai import OpenAI
from langchain_openai import AzureOpenAI
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.agent_toolkits.sql.base import create_sql_agent

from langchain.agents.agent_types import AgentType
from langchain_community.chat_models import ChatOpenAI
from IPython.display import Markdown

import langchain
import openai
import os
#import pyodbc
import urllib

use_sqlserver = False

_ = load_dotenv('.dbenv')


#	The only thing in my .dbenv is ...
#	Not free from https://platform.openai.com/
#	OPENAI_API_KEY=...
#	Free from https://smith.langchain.com/
#	LANGCHAIN_API_KEY=...



# Retrieve env vars from .dbenv
#SERVER = os.environ.get('SERVER')
#DATABASE = os.environ.get('DATABASE')
#DRIVER = os.environ.get('DRIVER')



if use_sqlserver:
    conn_str = f"mssql+pyodbc://{SERVER}/{DATABASE}?Trusted_Connection=Yes&driver={DRIVER}&TrustServerCertificate=yes"
else:
    conn_str = "sqlite:///eunomia.sqlite"


try:
    # Establish a connection to the database
    conn = SQLDatabase.from_uri(conn_str)
    db = conn

except pyodbc.Error as e:
    # Handle any errors that occur during the connection or query execution
    print(f"Error connecting to SQL Database: {str(e)}")

#API_KEY = os.environ.get('STAGE_API_KEY')
#API_VERSION = os.environ.get('API_VERSION')
#RESOURCE_ENDPOINT = os.environ.get('RESOURCE_ENDPOINT')

#openai.api_type = "azure"
#openai.api_key = API_KEY
#openai.api_base = RESOURCE_ENDPOINT
#openai.api_version = API_VERSION  # This can be overwritten with an incorrect default if not specified with some langchain objects

#openai.log = ""  # This is an important option to debug errors. Accepts debug or info


os.environ["LANGCHAIN_TRACING"] = "true"


# GPT4 is exclusively for chat completion, and will throw an error with the final answer. THe agent will send query to a completions endpoint: 
# InvalidRequestError: The completion operation does not work with the specified model, gpt-4. Please choose different model and try again. You can learn more about which models can be used with each operation here: https://go.microsoft.com/fwlink/?linkid=2197993.
# Use a completion endpoint instead
# chat_deployment_id = 'gpt-4'
# chat_model_id = 'gpt4'
#completions_deployment_id = 'text-davinci-003'
#completions_model_id = 'text-davinci-003'

# GPT 35 Turbo when used as a completions endpoint can find a final answer but when used with this SQL agent will throw an error at the end, preventing assignment of the response.
# OutputParserException(\"Parsing LLM output produced both a final answer and a parse-able action:
# Currently, it does not appear that 3.5 Turbo can be used with this agent and perhaps others.
# completions_deployment_id = 'gpt-35-turbo'
# completions_model_id = 'gpt-35-turbo'


mssql_prefix = '''You are an agent designed to interact with a MSSQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Always limit your query to at most {top_k} results using TSQL TOP N syntax.
You can order the results by a relevant column to return the most interesting examples in the database.
Only query for all the columns from a specific table if asked--otherwise, only ask for the relevant columns given the question.
You have access to tools for interacting with the database

If a tool fails to retrieve table names, use the following query template to list tables in the database:
USE {database};
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = '{schema}';

If a tool fails to retrieve column names, use the following query template:
USE {database};
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '<table_name>';

Only use the information returned by the below tools to construct your final answer, except when checking existence of table or column names.
You MUST double check your query before executing it. 
If you get an error while executing a query, rewrite the query and try again.
DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
If the question does not seem related to the database, just return "I don\'t know" as the answer.
Table and schema names are not case sensitive, so do not repeat queries unnecessarily using merely a different case.

All queries should be directed at the database called {database} which is standardized to OMOP 5.3.
All {database} tables are in the {schema} schema. Ignore other possible schema names.
If a query contains joins, assign a table alias to each table ensure each column name references the alias (alias.column).
All tables should be preceded with schema name (schema.table).
DO NOT USE SQLite dialect for queries or tools.

Precede every query with the following:
USE {database};
<SQL query goes here> 

Table names do not contain periods. If you see a period in what you think is a table, it is separating schema name from the table.

'''


# This is for SQLite. 3.5 Turbo does not handle lower case table names gracefully. It will add quotes around table names.
sqlite_prefix = '''You are an agent designed to interact with a SQLite database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Only query for all the columns from a specific table if asked--otherwise, only ask for the relevant columns given the question.
You have access to tools for interacting with the database. 

If you get an error while executing a query, rewrite the query and try again.
DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
If the question does not seem related to the database, just return "I don\'t know" as the answer.
Do not surround the sql query with quotes.
The data is standardized to OMOP 5.3.
All table names should be converted to upper case in this database
Do NOT quote table names when passing as action input

'''

# The innovation to the default instructions is the addition of the SQL statement request
format_instructions = '''Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action. Do NOT quote table names when passing as action input
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question, followed by the SQL statement which generated the final answer

'''


llm_to_use = OpenAI(
    temperature=0,
    # There is a bug which will generate a runtime error using deployment. Use engine instead for deployment.
    # deployment=chat_deployment_id,

#    engine=completions_deployment_id,
#    model=completions_model_id,    
    model='gpt-3.5-turbo-instruct',    
#    openai_api_key=API_KEY,
#    openai_api_base=RESOURCE_ENDPOINT,
    max_tokens=1000,

    # The next two lines will generate a runtime error because they will be included as
    # parameters in the request, which gets rejected. Instead, set them using openai.x =
    # openai_api_type="azure",
    # openai_api_version=API_VERSION,
)


toolkit = SQLDatabaseToolkit(db=conn, llm=llm_to_use)  # SQLDatabaseToolkit uses a completions endpoint, not a chat completions endpoint.

if use_sqlserver:
    prefix = mssql_prefix.format(dialect='MSSQL', top_k=10, database='EUNOMIA', schema='CDM')
else:
    prefix = sqlite_prefix.format(dialect='SQLite', top_k=10,)
    
agent_executor = create_sql_agent(
    llm=llm_to_use,
    toolkit=toolkit,
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    prefix=prefix,
    top_k=10,  # Top N results returned
    max_iterations=50,
    format_instructions=format_instructions,
)

In [2]:
%%time

question = "Display the columns and field types for the condition_occurrence table."
langchain.debug=False
#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response
# db_chain.run(PROMPT.format(question=question))



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_schema
Action Input: condition_occurrence[0m[33;1m[1;3mError: table_names {'condition_occurrence'} not found in database[0m[32;1m[1;3m I should check the list of tables in the database to see if the table exists.
Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should check the schema of the condition_occurrence table.
Action: sql_db_schema
Action Input: CONDITION_OCCURRENC

{'input': 'Display the columns and field types for the condition_occurrence table.',
 'output': "[(4483.0, 263.0, 4112343.0, 1443744000.0, 1443744000.0, 1444780800.0, 1444780800.0, 32020.0, None, None, 17479.0, 0.0, '195662009', 4112343.0, None, 0.0), (4657.0, 273.0, 192671.0, 1318204800.0, 1318204800.0, None, None, 32020.0, None, None, 18192.0, 0.0, 'K92.2', 35208414.0, None, 0.0), (4815.0, 283.0, 28060.0, 445651200.0, 445651200.0, 446515200.0, 446515200.0, 32020.0, None, None, 18859.0, 0.0, '43878008', 28060.0, None, 0.0), (4981.0, 293.0, 378001.0, 1131321600.0, 1131321600.0, 1133913600.0, 1133913600.0, 32020.0, None, None, 19515.0, 0.0, '62106007', 378001.0, None, 0.0), (5153.0, 304.0, 257012.0, 144374400.0, 144374400.0, 152841600.0, 152841600.0, 32020.0, None, None, 20239.0, 0.0, '40055000', 257012.0, None, 0.0), (5313.0, 312.0, 4134304.0, 674179200.0, 674179200.0, 676771200.0, 676771200.0, 32020.0, None, None, 20658.0, 0.0, '263102004', 4134304.0, None, 0.0), (5513.0, 326.0, 28060

In [3]:
%%time

question = "Display the columns and field types for the condition_occurrence table."
langchain.debug=False
#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response
# db_chain.run(PROMPT.format(question=question))



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_schema
Action Input: condition_occurrence[0m[33;1m[1;3mError: table_names {'condition_occurrence'} not found in database[0m[32;1m[1;3m I should check the list of tables in the database to see if the table exists.
Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should check the schema of the condition_occurrence table.
Action: sql_db_schema
Action Input: CONDITION_OCCURRENC

{'input': 'Display the columns and field types for the condition_occurrence table.',
 'output': "[(4483.0, 263.0, 4112343.0, 1443744000.0, 1443744000.0, 1444780800.0, 1444780800.0, 32020.0, None, None, 17479.0, 0.0, '195662009', 4112343.0, None, 0.0), (4657.0, 273.0, 192671.0, 1318204800.0, 1318204800.0, None, None, 32020.0, None, None, 18192.0, 0.0, 'K92.2', 35208414.0, None, 0.0), (4815.0, 283.0, 28060.0, 445651200.0, 445651200.0, 446515200.0, 446515200.0, 32020.0, None, None, 18859.0, 0.0, '43878008', 28060.0, None, 0.0), (4981.0, 293.0, 378001.0, 1131321600.0, 1131321600.0, 1133913600.0, 1133913600.0, 32020.0, None, None, 19515.0, 0.0, '62106007', 378001.0, None, 0.0), (5153.0, 304.0, 257012.0, 144374400.0, 144374400.0, 152841600.0, 152841600.0, 32020.0, None, None, 20239.0, 0.0, '40055000', 257012.0, None, 0.0), (5313.0, 312.0, 4134304.0, 674179200.0, 674179200.0, 676771200.0, 676771200.0, 32020.0, None, None, 20658.0, 0.0, '263102004', 4134304.0, None, 0.0), (5513.0, 326.0, 28060

In [4]:
question = "List all table names in the CDM schema"
#agent_executor.run(question)
response = agent_executor.invoke(question)
response

# BadRequestError: Error code: 400 - {'error': {'message': 
#"This model's maximum context length is 4097 tokens, however you requested 5360 tokens (4360 in your prompt; 1000 for the completion). 
#Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should query the schema of the most relevant tables.
Action: sql_db_schema
Action Input: PERSON, OBSERVATION, DRUG_EXPOSURE[0m[33;1m[1;3m
CREATE TABLE "DRUG_EXPOSURE" (
	"DRUG_EXPOSURE_ID" REAL, 
	"PERSON_ID" REAL, 
	"DRUG_CONCEPT_ID" REAL, 
	"DRUG_EXPOSURE_START_DATE" REAL, 
	"DRUG_EXPOSURE_START_DATETIME" REAL, 
	"DRUG_EXPOSURE_END_DATE" REA

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens, however you requested 5360 tokens (4360 in your prompt; 1000 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

In [5]:
question = "list the top distinct conditions and their count in descending count order. Do not list concept_ids but rather list by concept name"

#agent_executor.run(question)
response = agent_executor.invoke(question)
response




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_schema
Action Input: condition_occurrence[0m[33;1m[1;3mError: table_names {'condition_occurrence'} not found in database[0m[32;1m[1;3m I should check the list of tables in the database to see if the table name is correct.
Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should query the schema of the CONDITION_OCCURRENCE table to see what columns are available.
Action: sql_

{'input': 'list the top distinct conditions and their count in descending count order. Do not list concept_ids but rather list by concept name',
 'output': 'The top distinct conditions and their count in descending count order are: 444814009 with a count of 17268, 195662009 with a count of 10217, 10509002 with a count of 8184, 65363002 with a count of 3605, 396275006 with a count of 2694, 43878008 with a count of 2656, 44465007 with a count of 1915, 62106007 with a count of 1013, 36971009 with a count of 1001, and 75498004 with a count of 939. SQL statement: SELECT COUNT(*) AS count, CONDITION_SOURCE_VALUE FROM CONDITION_OCCURRENCE GROUP BY CONDITION_SOURCE_VALUE ORDER BY count DESC LIMIT 10'}

In [6]:
question = "list people who have any variation of sinusitis"

#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should query the schema of the CONDITION_OCCURRENCE table to see what columns I can use to find people with sinusitis.
Action: sql_db_schema
Action Input: CONDITION_OCCURRENCE[0m[33;1m[1;3m
CREATE TABLE "CONDITION_OCCURRENCE" (
	"CONDITION_OCCURRENCE_ID" REAL, 
	"PERSON_ID" REAL, 
	"CONDITION_CONCEPT_ID" REAL, 
	"CONDITION_START_DATE" REAL, 
	

{'input': 'list people who have any variation of sinusitis',
 'output': "3 rows from CONDITION_OCCURRENCE table:\nCONDITION_OCCURRENCE_ID\tPERSON_ID\tCONDITION_CONCEPT_ID\tCONDITION_START_DATE\tCONDITION_START_DATETIME\tCONDITION_END_DATE\tCONDITION_END_DATETIME\tCONDITION_TYPE_CONCEPT_ID\tSTOP_REASON\tPROVIDER_ID\tVISIT_OCCURRENCE_ID\tVISIT_DETAIL_ID\tCONDITION_SOURCE_VALUE\tCONDITION_SOURCE_CONCEPT_ID\tCONDITION_STATUS_SOURCE_VALUE\tCONDITION_STATUS_CONCEPT_ID\n4483.0\t263.0\t4112343.0\t1443744000.0\t1443744000.0\t1444780800.0\t1444780800.0\t32020.0\tNone\tNone\t17479.0\t0.0\t195662009\t4112343.0\tNone\t0.0\n4657.0\t273.0\t192671.0\t1318204800.0\t1318204800.0\tNone\tNone\t32020.0\tNone\tNone\t18192.0\t0.0\tK92.2\t35208414.0\tNone\t0.0\n4815.0\t283.0\t28060.0\t445651200.0\t445651200.0\t446515200.0\t446515200.0\t32020.0\tNone\tNone\t18859.0\t0.0\t43878008\t28060.0\tNone\t0.0\nSELECT * FROM CONDITION_OCCURRENCE WHERE CONDITION_SOURCE_VALUE LIKE '%sinusitis%' OR CONDITION_SOURCE_VALUE LI

In [7]:
question = "List people with sinusitis or its descendant concepts. Results should also include the concept names and ids."

#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_schema
Action Input: CONCEPT[0m[33;1m[1;3m
CREATE TABLE "CONCEPT" (
	"CONCEPT_ID" REAL, 
	"CONCEPT_NAME" TEXT, 
	"DOMAIN_ID" TEXT, 
	"VOCABULARY_ID" TEXT, 
	"CONCEPT_CLASS_ID" TEXT, 
	"STANDARD_CONCEPT" TEXT, 
	"CONCEPT_CODE" TEXT, 
	"VALID_START_DATE" REAL, 
	"VALID_END_DATE" REAL, 
	"INVALID_REASON" TEXT
)

/*
3 rows from CONCEPT table:
CONCEPT_ID	CONCEPT_NAME	DOMAIN_ID	VOCABULARY_ID	CONCEPT_CLASS_ID	STANDARD_CONCEPT	CONCEPT_CODE	VALID_START_DATE	VALID_END_DATE	INVALID_REASON
35208414.0	Gastrointestinal hemorrhage, unspecified	Condition	ICD10CM	4-char billing code	None	K92.2	1167609600.0	4102358400.0	None
1118088.0	celecoxib 200 MG Oral Capsule [Celebrex]	Drug	RxNorm	Branded Drug	S	213469	0.0	4102358400.0	None
40213201.0	pneumococcal polysaccharide vaccine, 23 valent	Drug	CVX	CVX	S	33	1228089600.0	4102358400.0	None
*/[0m[32;1m[1;3m I should use the CONCEPT table to get the concept names and ids for

{'input': 'List people with sinusitis or its descendant concepts. Results should also include the concept names and ids.',
 'output': "[(4294548.0, 'Acute bacterial sinusitis'), (4283893.0, 'Sinusitis'), (40481087.0, 'Viral sinusitis'), (257012.0, 'Chronic sinusitis')]\nSELECT CONCEPT_ID, CONCEPT_NAME FROM CONCEPT WHERE CONCEPT_NAME LIKE '%sinusitis%'"}

In [8]:
question = "For the first 10 measurements in the measurement table, list the descriptions of each measurement."

#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_schema
Action Input: measurement[0m[33;1m[1;3mError: table_names {'measurement'} not found in database[0m[32;1m[1;3m I should check the list of tables in the database to see if the table name is correct.
Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should query the schema of the measurement table to see what columns are available.
Action: sql_db_schema
Action Input: mea

{'input': 'For the first 10 measurements in the measurement table, list the descriptions of each measurement.',
 'output': 'The first 10 measurements in the measurement table are: 8331-1, 117015009, 117015009, 8331-1, 117015009, 117015009, 23426006, 117015009, 117015009, 23426006. SQL statement: SELECT measurement_source_value FROM measurement LIMIT 10'}

In [9]:
question = "what is the person_id of the person and the visit date and time for the most recent recorded visit. Format the date and time in an understandable format"

#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should query the schema of the PERSON and VISIT_OCCURRENCE tables to see what columns I can use.
Action: sql_db_schema
Action Input: PERSON, VISIT_OCCURRENCE[0m[33;1m[1;3m
CREATE TABLE "PERSON" (
	"PERSON_ID" REAL, 
	"GENDER_CONCEPT_ID" REAL, 
	"YEAR_OF_BIRTH" REAL, 
	"MONTH_OF_BIRTH" REAL, 
	"DAY_OF_BIRTH" REAL, 
	"BIRTH_DATETIME" REAL, 
	"RA

{'input': 'what is the person_id of the person and the visit date and time for the most recent recorded visit. Format the date and time in an understandable format',
 'output': "The person_id of the person and the visit date and time for the most recent recorded visit are 3026, 2019-06-07 00:00:00, and 2019-06-08 00:00:00, respectively. The SQL statement which generated the final answer is SELECT p.PERSON_ID, DATETIME(VISIT_START_DATETIME, 'unixepoch'), DATETIME(VISIT_END_DATETIME, 'unixepoch') FROM PERSON p, VISIT_OCCURRENCE v WHERE p.PERSON_ID = v.PERSON_ID ORDER BY VISIT_START_DATETIME DESC LIMIT 10."}

In [10]:
question = "Of the conditions in the database, list the most prescribed drugs and their count and condition treated"

#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response

# BadRequestError: Error code: 400 - {'error': {'message': 
#"This model's maximum context length is 4097 tokens, however you requested 4278 tokens (3278 in your prompt; 1000 for the completion). 
#Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should query the schema of the DRUG_EXPOSURE table to see what columns I can use to answer the question.
Action: sql_db_schema
Action Input: DRUG_EXPOSURE[0m[33;1m[1;3m
CREATE TABLE "DRUG_EXPOSURE" (
	"DRUG_EXPOSURE_ID" REAL, 
	"PERSON_ID" REAL, 
	"DRUG_CONCEPT_ID" REAL, 
	"DRUG_EXPOSURE_START_DATE" REAL, 
	"DRUG_EXPOSURE_START_DATETIME" REAL,

{'input': 'Of the conditions in the database, list the most prescribed drugs and their count and condition treated',
 'output': 'The most prescribed drugs and their count are: 313782 with 9365 prescriptions, 10 with 7977 prescriptions, 113 with 7430 prescriptions, 243670 with 4380 prescriptions, 562251 with 3851 prescriptions, 52 with 3211 prescriptions, 282464 with 2158 prescriptions, 121 with 2125 prescriptions, 1043400 with 1993 prescriptions, and 43 with 1916 prescriptions. SQL statement: SELECT DRUG_SOURCE_VALUE, COUNT(DRUG_SOURCE_VALUE) AS "PRESCRIPTION_COUNT" FROM DRUG_EXPOSURE WHERE DRUG_SOURCE_VALUE IS NOT NULL GROUP BY DRUG_SOURCE_VALUE ORDER BY PRESCRIPTION_COUNT DESC LIMIT 10'}

In [11]:
question = "Of the conditions in the database, list the most prescribed drugs using the drug names, corresponding concept_ids, and counts"

#response = agent_executor.run(question)
response = agent_executor.invoke(question)
response




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_schema
Action Input: condition_occurrence, drug_exposure[0m[33;1m[1;3mError: table_names {'drug_exposure', 'condition_occurrence'} not found in database[0m[32;1m[1;3m I should check the list of tables in the database to see if I can find the correct table names.
Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mCARE_SITE, CDM_SOURCE, COHORT, COHORT_ATTRIBUTE, CONCEPT, CONCEPT_ANCESTOR, CONCEPT_CLASS, CONCEPT_RELATIONSHIP, CONCEPT_SYNONYM, CONDITION_ERA, CONDITION_OCCURRENCE, COST, DEATH, DEVICE_EXPOSURE, DOMAIN, DOSE_ERA, DRUG_ERA, DRUG_EXPOSURE, DRUG_STRENGTH, FACT_RELATIONSHIP, LOCATION, MEASUREMENT, METADATA, NOTE, NOTE_NLP, OBSERVATION, OBSERVATION_PERIOD, PAYER_PLAN_PERIOD, PERSON, PROCEDURE_OCCURRENCE, PROVIDER, RELATIONSHIP, SOURCE_TO_CONCEPT_MAP, SPECIMEN, VISIT_DETAIL, VISIT_OCCURRENCE, VOCABULARY[0m[32;1m[1;3m I should check the schema of the drug_exposure table to see if I c

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens, however you requested 4149 tokens (3149 in your prompt; 1000 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}