# DuckDB SQL Query Engine Using an LLM
In this project, we will use DuckDB as an SQL query engine. This involves integrating the database engine with the GPT-4o model to generate natural language responses to questions about the database. 

In [None]:
# %pip install duckdb-engine -q

In [None]:
from sqlalchemy import create_engine
from llama_index.core import Settings
from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLSQLTableQueryEngine
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

In [None]:
llm = OpenAI(model="gpt-4o",api_key=os.environ["OPENAI_API_KEY"])
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

Settings.llm = llm
Settings.embed_model = embed_model

## Loading the DuckDB database
We will load the DuckDB database using the create_engine function and then write a simple SQL query to check whether it is successfully loaded. 

In [None]:
engine = create_engine("duckdb:///datacamp.duckdb")

with engine.connect() as connection:
    cursor = connection.exec_driver_sql("SELECT * FROM bank LIMIT 3")
    print(cursor.fetchall())

[(56, 'housemaid', 'married', 'basic.4y', 'no', 'no', 'no', 'telephone', 'may', 'mon', 261, 1, 999, 0, 'nonexistent', 1.1, 93.994, -36.4, 4.857, 5191.0, False), (57, 'services', 'married', 'high.school', 'unknown', 'no', 'no', 'telephone', 'may', 'mon', 149, 1, 999, 0, 'nonexistent', 1.1, 93.994, -36.4, 4.857, 5191.0, False), (37, 'services', 'married', 'high.school', 'no', 'yes', 'no', 'telephone', 'may', 'mon', 226, 1, 999, 0, 'nonexistent', 1.1, 93.994, -36.4, 4.857, 5191.0, False)]


We have to create a database Tool using the SQLDatabase function. Provide it with an engine object and table name.

In [3]:
sql_database = SQLDatabase(engine, include_tables=["bank"])



Create the SQL query engine using the NLSQLTableQueryEngine function by providing it with the LlamaIndex SQL database object. 

In [None]:
query_engine = NLSQLTableQueryEngine(sql_database)

GPT-4o first generates the SQL query, runs the query to get the result, and uses the result to generate the response. This multi-step process is achieved through two lines of code. 

In [None]:
response = query_engine.query("Which is the longest running campaign?")
print(response.response)

In [None]:
response = query_engine.query("Which type of job has the most housing loan?")
print(response.response)
print(response.metadata)

In [None]:
engine.close()