## chatBot_with_KB_As_SQLDB

Source: https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html

This example demonstrates the use of the `SQLDatabaseChain` for answering questions over a database. Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The SQLDatabaseChain can therefore be used with any SQL dialect supported by SQLAlchemy, such as MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, Databricks and SQLite. Please refer to the SQLAlchemy documentation for more information about requirements for connecting to your database. For example, a connection to MySQL requires an appropriate connector such as PyMySQL. A URI for a MySQL connection might look like: mysql+pymysql://user:pass@some_mysql_db_address/db_name.

This demonstration uses SQLite and the example Chinook database. To set it up, follow the instructions on https://database.guide/2-sample-databases-sqlite/, placing the .db file in a notebooks folder at the root of this repository.

Procedure:

1. Installs
2. Prerequisites: Imports and Global Variables
3. Initiatlize Database and LLM
4. Query Validation: Quick Test Q&A to validate that our embeddings work
5. Ask 

### Installs

#### SQLITE
```
brew install sqlite
```

#### Create the Chinook DB from CSV File

Create a database called Chinook. You can do this with the following sqlite command:
```
mkdir evuniverse/sql_data
cd evuniverse/sql_data
sqlite3 Chinook.db
```

Run the following script to create the database tables and populate them with data. 
```
.read ../../misc/Chinook_Sqlite.sql
```

Once the script has finished running, you can verify that it created the database by selecting some data from a table. 
```
SELECT * FROM Artist LIMIT 10;
```

#### Create a SampleListings DB from CSV File

The sqlite3 tool uses the first row of the CSV file as the names of the columns of the table.

```
% cd sql_data main
% sqlite3 SampleListings.db
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
```

Set the mode to CSV to instruct the command-line shell program to interpret the input file as a CSV file.
```
sqlite> .mode csv
```

Import the data from the csv file into the listings table.

```
sqlite> .import ../../misc/SQL.csv listings
```

Validate by finding the tables
```
sqlite> .tables
listings
```

Find your table schema

```
sqlite> .schema listings
CREATE TABLE IF NOT EXISTS "listings"(
"Dealership" TEXT, "Phone" TEXT, "Street" TEXT, "City" TEXT,
 "State" TEXT, "ZipCode" TEXT, "Code1" TEXT, "Code2" TEXT,
 "VIN" TEXT, "YEAR" TEXT, "MAKE" TEXT, "MODEL" TEXT,
 "Type" TEXT, "SPEC" TEXT, "Color_Ext" TEXT, "Color_Int" TEXT,
 "Transmission" TEXT, "Code3" TEXT, "Code4" TEXT, "YEAR2" TEXT,
 "WebSite" TEXT, "Description" TEXT, "Photos" TEXT, "Code5" TEXT,
 "Last_Updated" TEXT, "Code7" TEXT, "Code8" TEXT, "Code9" TEXT);
sqlite>
```

Find table data
```
sqlite> select * from listings limit 10;
"Kane's Auto Sales & Service","","2150 Candia Road",Manchester,NH,3109,U,22-014,5YJSA1E22HF204432,2017,Tesla,"Model S","",90D,WHITE,"","",45639,50995,1/19/22,https://www.kanesautonh.com/vdp/18417133/Used-2017-Tesla-Model-S-100D-AWD-for-sale-in-Manchester-NH-03109,"Active Seatbelts,Air conditioning,Alarm,All Wheel ABS,AM/FM CD/MP3,CVT,Daytime Running Lights,D.....
```

Count the number of rows in the table
```
sqlite> select count (*) from listings;
8
```

Exit
```
sqlite> .exit
sql_data main % ls
  SampleListings.db
```

### 2. Prerequisites: Imports and Global Variables

In [216]:
from langchain import OpenAI as lcOpenAI 
from langchain import SQLDatabase, SQLDatabaseChain
from langchain.prompts.prompt import PromptTemplate

DEBUG = False
SHOW_SQL = False
NUM_OF_ROWS = 10

CHINOOK_DB_URI = "sqlite:///../sql_data/Chinook.db"
LISTINGS_DB_URI = "sqlite:///../sql_data/SampleListings.db"

### 3. Initiatlize Database and LLM

- `use_query_checker=True`: Sometimes the Language Model generates invalid SQL with small mistakes that can be self-corrected using the same technique used by the SQL Database Agent to try and fix the SQL using the LLM. You can simply specify this option when creating the chain: 
- `prompt=PROMPT`: You can also customize the prompt that is used
- `return_intermediate_steps=True`: You can also return the intermediate steps of the SQLDatabaseChain. This allows you to access the SQL statement that was generated, as well as the result of running that against the SQL Database.
- `top_k=NUM_OF_ROWS`: If you are querying for several rows of a table you can select the maximum number of results you want to get by using the 'top_k' parameter (default is 10). This is useful for avoiding query results that exceed the prompt max length or consume tokens unnecessarily.

In [212]:
def answerMe(query, db_uri = LISTINGS_DB_URI, showSQL = SHOW_SQL, debug = DEBUG):
    # Load Database and Large Language Model

    db = SQLDatabase.from_uri(db_uri)
    llm = lcOpenAI(temperature=0, verbose=debug)

    _DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
    Use the following format:

    Question: "Question here"
    SQLQuery: "SQL Query to run"
    SQLResult: "Result of the SQLQuery"
    Answer: "Final answer here"

    Only use the following tables:

    {table_info}

    If someone asks for the table foobar, they really mean the employee table.

    Question: {input}"""
    PROMPT = PromptTemplate(
        input_variables=["input", "table_info", "dialect"], template=_DEFAULT_TEMPLATE
    )

    db_chain = SQLDatabaseChain.from_llm(
        llm, 
        db, 
        prompt=PROMPT, 
        top_k=NUM_OF_ROWS, 
        use_query_checker=True, 
        verbose=debug, 
        return_intermediate_steps=showSQL
    )

    result = db_chain(query)
    
    if debug:
        print("\n\n####DEBUG####")
        print(result)
    else:
        if showSQL:     
            print(result['intermediate_steps'][0]['input'] + result['result'])
        else: 
            print ("query: " + result['query'])
            print ("result: " + result['result'])

        

    

In [218]:
from langchain.chains import SQLDatabaseSequentialChain

def answerMeSQL(query, db_uri = LISTINGS_DB_URI, showSQL = SHOW_SQL, debug = DEBUG):
    # Load Database and Large Language Model

    db = SQLDatabase.from_uri(db_uri)
    llm = lcOpenAI(temperature=0, verbose=debug)

    _DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
    Use the following format:

    Question: "Question here"
    SQLQuery: "SQL Query to run"
    SQLResult: "Result of the SQLQuery"
    Answer: "Final answer here"

    Only use the following tables:

    {table_info}

    If someone asks for the table foobar, they really mean the employee table.

    Question: {query}"""
    PROMPT = PromptTemplate(
        input_variables=["query", "table_info", "dialect"], template=_DEFAULT_TEMPLATE
    )

    db_chain = SQLDatabaseSequentialChain.from_llm(
        llm, 
        db, 
        query_prompt=PROMPT, 
        top_k=NUM_OF_ROWS, 
        use_query_checker=True, 
        verbose=debug, 
        return_intermediate_steps=True
    )
        
    
    result=chain.run(query)
    
    if debug == False:
        print ("query: " + query)
        print ("result: " + result)

        

    

### 4. Query

In [221]:
answerMeSQL("What do you have", debug=True)



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['MediaType', 'Employee', 'Invoice', 'Customer', 'Track', 'Album', 'PlaylistTrack', 'Playlist', 'InvoiceLine'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
What do you have
SQLQuery:[32;1m[1;3mSELECT Name FROM MediaType LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('MPEG audio file',), ('Protected AAC audio file',), ('Protected MPEG-4 video file',), ('Purchased AAC audio file',), ('AAC audio file',)][0m
Answer:[32;1m[1;3mWe have MPEG audio file, Protected AAC audio file, Protected MPEG-4 video file, Purchased AAC audio file, and AAC audio file.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [220]:
answerMeSQL("What are some example tracks by Bach?", db_uri = CHINOOK_DB_URI)



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Track', 'Artist', 'Album'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
What are some example tracks by Bach?
SQLQuery:[32;1m[1;3mSELECT Name, AlbumId, Composer FROM Track WHERE Composer LIKE '%Bach%' LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('American Woman', 141, 'B. Cummings/G. Peterson/M.J. Kale/R. Bachman'), ('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 276, 'Johann Sebastian Bach'), ('Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', 277, 'Johann Sebastian Bach'), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', 278, 'Johann Sebastian Bach'), ('Toccata and Fugue in D Minor, BWV 565: I. Toccata', 297, 'Johann Sebastian Bach')][0m
Answer:[32;1m[1;3mSome example tracks by Bach are 'American Woman', 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', 'Suite for Solo

In [195]:
answerMeSQL("How many employees are there in the employee table?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Employee'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many employees are there in the employee table?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Employee;[0m
SQLResult: [33;1m[1;3m[(8,)][0m
Answer:[32;1m[1;3mThere are 8 employees in the employee table.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m
query: How many employees are there in the employee table?
result: There are 8 employees in the employee table.


In [196]:
answerMeSQL("How many employees are there?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Employee'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many employees are there?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Employee;[0m
SQLResult: [33;1m[1;3m[(8,)][0m
Answer:[32;1m[1;3mThere are 8 employees.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m
query: How many employees are there?
result: There are 8 employees.


In [197]:
answerMeSQL("How many customers are from Brazil?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Customer', 'Invoice'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many customers are from Brazil?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Customer WHERE Country = 'Brazil';[0m
SQLResult: [33;1m[1;3m[(5,)][0m
Answer:[32;1m[1;3mThere are 5 customers from Brazil.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m
query: How many customers are from Brazil?
result: There are 5 customers from Brazil.


In [198]:
answerMeSQL("How many albums by Aerosmith?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Album', 'Artist'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many albums by Aerosmith?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Album WHERE ArtistId = 3;[0m
SQLResult: [33;1m[1;3m[(1,)][0m
Answer:[32;1m[1;3mAerosmith has 1 album.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m
query: How many albums by Aerosmith?
result: Aerosmith has 1 album.


In [199]:
answerMeSQL("What are some example tracks by composer Johann Sebastian Bach?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Track', 'Artist', 'PlaylistTrack'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
What are some example tracks by composer Johann Sebastian Bach?
SQLQuery:[32;1m[1;3mSELECT Name, Composer FROM Track WHERE Composer LIKE '%Johann Sebastian Bach%' LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Johann Sebastian Bach'), ('Aria Mit 30 Veränderungen, BWV 988 "Goldberg Variations": Aria', 'Johann Sebastian Bach'), ('Suite for Solo Cello No. 1 in G Major, BWV 1007: I. Prélude', 'Johann Sebastian Bach'), ('Toccata and Fugue in D Minor, BWV 565: I. Toccata', 'Johann Sebastian Bach'), ('Concerto No.2 in F Major, BWV1047, I. Allegro', 'Johann Sebastian Bach')][0m
Answer:[32;1m[1;3mSome example tracks by composer Johann Sebastian Bach are 'Concerto for 2 Violins in D Minor, BWV 1043: I. Vivace', 'Aria Mit 30 Veränderungen, BWV 988 "Goldberg Va

In [200]:
answerMeSQL("How many customers are from Brazil?", showSQL=True)



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Customer', 'Invoice'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many customers are from Brazil?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Customer WHERE Country = 'Brazil';[0m
SQLResult: [33;1m[1;3m[(5,)][0m
Answer:[32;1m[1;3mThere are 5 customers from Brazil.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m
query: How many customers are from Brazil?
result: There are 5 customers from Brazil.


In [201]:
answerMeSQL("How many customers are from Brazil?", debug=True)



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Customer', 'Invoice'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many customers are from Brazil?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Customer WHERE Country = 'Brazil';[0m
SQLResult: [33;1m[1;3m[(5,)][0m
Answer:[32;1m[1;3mThere are 5 customers from Brazil.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [202]:
answerMeSQL("How many customers are not from Brazil?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Customer', 'Invoice'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many customers are not from Brazil?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Customer WHERE Country != 'Brazil';[0m
SQLResult: [33;1m[1;3m[(54,)][0m
Answer:[32;1m[1;3m54 customers are not from Brazil.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m
query: How many customers are not from Brazil?
result: 54 customers are not from Brazil.


In [203]:
answerMeSQL("How many customers are there in total?")



[1m> Entering new SQLDatabaseSequentialChain chain...[0m
Table names to use:
[33;1m[1;3m['Customer'][0m

[1m> Entering new SQLDatabaseChain chain...[0m
How many customers are there in total?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM Customer;[0m
SQLResult: [33;1m[1;3m[(59,)][0m
Answer:[32;1m[1;3mThere are 59 customers in total.[0m
[1m> Finished chain.[0m

[1m> Finished chain.[0m
query: How many customers are there in total?
result: There are 59 customers in total.
