# text2sql based on llama2
## Task: To get an sql query for a given database from a natural language query

Build a chain with:<br/> Question => LLM => SQL => DB => LLM => Answer

(This file is for experimentation and model creation)

### Imports

In [1]:
from langchain_community.llms.ollama import Ollama

from langchain_community.vectorstores.pgvector import PGVector
from langchain_community.utilities.sql_database import SQLDatabase

from langchain_community.embeddings.ollama import OllamaEmbeddings

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

from langchain.chains.sql_database import query

from pswrd import PASSWORD_OF_DB
from pswrd import PASSWORD_FOR_VC_CREATOR

### Download llama2

In [2]:
model = Ollama(model="llama2", temperature=0)

## The first idea:
Is add a **prompt template** with the context of the database structure and a query for NL to `llama2`

#### Firstly add DB structure

In [3]:
DB_STRUCTURE = \
    ["""
    CREATE TABLE trip(
        id BIGSERIAL PRIMARY KEY,
        company BIGINT, 
        plane CHARACTER VARYING(60),
        town_from CHARACTER VARYING(60),
        town_to CHARACTER VARYING(60),
        time_out TIMESTAMP,
        time_in TIMESTAMP,
        CONSTRAINT FK_company FOREIGNT KEY (\"company\") REFERENCES public.company (id)
    ); 
    """,
     """
    CREATE TABLE company(
        id BIGSERIAL PRIMARY KEY,
        name CHARACTER VARYING(60) 
    );
    """,
     """
    CREATE TABLE pass_in_trip(
        id BIGSERIAL PRIMARY KEY,
        trip BIGINT,
        passenger BIGINT,
        place CHARACTER VARYING(60),
        CONSTRAINT FK_trip FOREIGNT KEY (\"trip\") REFERENCES public.trip (id),
        CONSTRAINT FK_passanger FOREIGNT KEY (\"passenger\") REFERENCES public.passenger (id)
    );
    """,
     """
    CREATE TABLE passenger(
        id BIGSERIAL PRIMARY KEY,
        name CHARACTER VARYING(60)
    );
    """
    ]

In [4]:
CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver="psycopg2",
    host="localhost",
    port=5433,
    database="llama-test",
    user="pgvc_embeddings_creator",
    password=PASSWORD_FOR_VC_CREATOR,
)
COLLECTION_NAME = "text2sql_vc"

In [5]:
embeddings = OllamaEmbeddings(model="llama2")

In [6]:
structure_retriver = PGVector.from_texts(
    embedding=embeddings,
    texts=DB_STRUCTURE,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
    pre_delete_collection=True,
    use_jsonb=True
).as_retriever()

#### Now create a template

In [7]:
template = \
"""
Translate the following query to sql using the following database structure. 
{structure}
As an answer, provide an sql query for postgresql.
Query to translate: {query}
"""

prompt_with_db_structure = ChatPromptTemplate.from_template(template)

In [8]:
model_with_structure_context = (
    {"structure": structure_retriver, "query": RunnablePassthrough()}
    | prompt_with_db_structure
    | model
    | StrOutputParser()
)

In [9]:
res_query = model_with_structure_context.invoke("Select the names of all the people who are in the airline database")
res_query

" Sure! Here's an SQL query that translates the given query for PostgreSQL:\n```sql\nSELECT name\nFROM passenger\nJOIN trip ON passenger.id = trip.passenger\nJOIN company ON trip.company = company.id;\n```\nExplanation:\n\n* The `SELECT` clause selects the `name` column from the `passenger`, `trip`, and `company` tables.\n* The `JOIN` clause performs a left outer join between the `passenger` and `trip` tables on the `id` column, and then performs a left outer join between the `trip` and `company` tables on the `company` column. This allows us to retrieve the names of all the people who are in the airline database.\n\nNote: In PostgreSQL, the `JOIN` clause can be written using the `USING` keyword instead of `ON`. For example:\n```sql\nSELECT name\nFROM passenger\nJOIN trip USING (id)\nJOIN company USING (id);\n```"

In [10]:
print(res_query)

 Sure! Here's an SQL query that translates the given query for PostgreSQL:
```sql
SELECT name
FROM passenger
JOIN trip ON passenger.id = trip.passenger
JOIN company ON trip.company = company.id;
```
Explanation:

* The `SELECT` clause selects the `name` column from the `passenger`, `trip`, and `company` tables.
* The `JOIN` clause performs a left outer join between the `passenger` and `trip` tables on the `id` column, and then performs a left outer join between the `trip` and `company` tables on the `company` column. This allows us to retrieve the names of all the people who are in the airline database.

Note: In PostgreSQL, the `JOIN` clause can be written using the `USING` keyword instead of `ON`. For example:
```sql
SELECT name
FROM passenger
JOIN trip USING (id)
JOIN company USING (id);
```


The answer looks like it's true, but it's not. The model is hallucinating

## The second idea: Use Langchain sql query tamplate

### Create new connection, with readonly privileges

In [11]:
CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver="psycopg2",
    host="localhost",
    port=5433,
    database="llama-test-2",
    user="seq2sql_llama2_rag",
    password=PASSWORD_OF_DB,
)
COLLECTION_NAME = "table_with_data_to_read"

In [12]:
db = SQLDatabase.from_uri(CONNECTION_STRING)

In [13]:
print(db.table_info)


CREATE TABLE company (
	id BIGSERIAL NOT NULL, 
	name VARCHAR(60), 
	CONSTRAINT company_pkey PRIMARY KEY (id)
)

/*
3 rows from company table:
id	name

*/


CREATE TABLE pass_in_trip (
	id BIGSERIAL NOT NULL, 
	trip BIGINT, 
	passenger BIGINT, 
	place VARCHAR(60), 
	CONSTRAINT pass_in_trip_pkey PRIMARY KEY (id), 
	CONSTRAINT fk_passanger FOREIGN KEY(passenger) REFERENCES passenger (id), 
	CONSTRAINT fk_trip FOREIGN KEY(trip) REFERENCES trip (id)
)

/*
3 rows from pass_in_trip table:
id	trip	passenger	place

*/


CREATE TABLE passenger (
	id BIGSERIAL NOT NULL, 
	name VARCHAR(60), 
	CONSTRAINT passenger_pkey PRIMARY KEY (id)
)

/*
3 rows from passenger table:
id	name

*/


CREATE TABLE trip (
	id BIGSERIAL NOT NULL, 
	company BIGINT, 
	plane VARCHAR(60), 
	town_from VARCHAR(60), 
	town_to VARCHAR(60), 
	time_out TIMESTAMP WITHOUT TIME ZONE, 
	time_in TIMESTAMP WITHOUT TIME ZONE, 
	CONSTRAINT trip_pkey PRIMARY KEY (id), 
	CONSTRAINT fk_company FOREIGN KEY(company) REFERENCES company (id

### Create seq2sql chain

In [14]:
sql_query_chain = query.create_sql_query_chain(model, db)

Check template in sql chain

In [15]:
print(sql_query_chain.get_prompts()[0].template)

You are a PostgreSQL expert. Given an input question, first create a syntactically correct PostgreSQL query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per PostgreSQL. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (") to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Pay attention to use CURRENT_DATE function to get the current date, if the question involves "today".

Use the following format:

Question: Question here
SQLQuery: SQL Query to ru

In [16]:
res = sql_query_chain.invoke({"question": "Select the names of ALL the people who are in the airline database"})
print(res)

Question: Select the names of ALL the people who are in the airline database

SQLQuery: SELECT name FROM passenger WHERE EXISTS (SELECT 1 FROM pass_in_trip WHERE passenger = id);


The model is hallucinating and therefore gives an incorrect answer. It also ignores some of the requirements of the template.
<br/>Conclusion: **LLaMa2 is not up to the task due to the size**.