## Steps:
1. In the utils.py - Instantiate db with 
```python
    db_name="dvdrental" & include_tables = ["actor","film","film_actor"]
```
2. Initiliaze the LLM model.
3. Create a first chain, which will take query and return SQL query using database table schema.
4. Create a second chain which will take the SQL query as input from first chain, execute the query and provide the answer in natural language.

In [1]:
# langchain
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# local utils
from utils import db, get_schema, custom_str_parser, handle_error_query

## Load LLM and define chains

In [2]:
# Add the LLM downloaded from Ollama
ollama_llm = "gemma:7b"
llm = ChatOllama(model=ollama_llm)

In [3]:
# ------------------------------------------------------------------------------------------------------------------------

# first chain
sql_chain_template = """Based on the table schema below, write a SQL query that would answer the user's question:
{schema}

Question: {question}
SQL Query:"""

sql_chain_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question, convert it to a SQL query. ",
        ),  # Brief context
        (
            "system",
            "Respond with only the SQL query, nothing else.",
        ),  # Emphasize desired output
        ("human", sql_chain_template),
    ]
)

sql_chain = (
    RunnablePassthrough.assign(schema=get_schema)
    | sql_chain_prompt
    | llm.bind(stop=["\nSQLResult:"])
    | StrOutputParser()
    | RunnableLambda(custom_str_parser)
)

# ------------------------------------------------------------------------------------------------------------------------

# second_chain
template = """Based on the table schema below, question, sql query, and sql response, write a natural language response:
{schema}

Question: {question}
SQL Query: {query}
SQL Response: {response}"""  # noqa: E501

prompt_response = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question and SQL response, convert it to a natural "
            "language answer. No pre-amble.",
        ),
        ("human", template),
    ]
)

sql_run_chain = (
    RunnablePassthrough.assign(query=sql_chain)
    | RunnablePassthrough.assign(
        schema=get_schema, response=lambda x: handle_error_query(x)
    )
    | prompt_response
    | llm
    | StrOutputParser()
)

## Run Queries

#### Test if langchain can access the database

In [4]:
print(db.get_table_info())


CREATE TABLE actor (
	actor_id SERIAL NOT NULL, 
	first_name VARCHAR(45) NOT NULL, 
	last_name VARCHAR(45) NOT NULL, 
	last_update TIMESTAMP WITHOUT TIME ZONE DEFAULT now() NOT NULL, 
	CONSTRAINT actor_pkey PRIMARY KEY (actor_id)
)


CREATE TABLE film (
	film_id SERIAL NOT NULL, 
	title VARCHAR(255) NOT NULL, 
	description TEXT, 
	release_year INTEGER, 
	language_id SMALLINT NOT NULL, 
	rental_duration SMALLINT DEFAULT 3 NOT NULL, 
	rental_rate NUMERIC(4, 2) DEFAULT 4.99 NOT NULL, 
	length SMALLINT, 
	replacement_cost NUMERIC(5, 2) DEFAULT 19.99 NOT NULL, 
	rating mpaa_rating DEFAULT 'G'::mpaa_rating, 
	last_update TIMESTAMP WITHOUT TIME ZONE DEFAULT now() NOT NULL, 
	special_features TEXT[], 
	fulltext TSVECTOR NOT NULL, 
	CONSTRAINT film_pkey PRIMARY KEY (film_id), 
	CONSTRAINT film_language_id_fkey FOREIGN KEY(language_id) REFERENCES language (language_id) ON DELETE RESTRICT ON UPDATE CASCADE
)


CREATE TABLE film_actor (
	actor_id SMALLINT NOT NULL, 
	film_id SMALLINT NOT NULL, 
	

In [5]:
db.run("""SELECT * FROM actor LIMIT 5""")

"[(1, 'Penelope', 'Guiness', datetime.datetime(2013, 5, 26, 14, 47, 57, 620000)), (2, 'Nick', 'Wahlberg', datetime.datetime(2013, 5, 26, 14, 47, 57, 620000)), (3, 'Ed', 'Chase', datetime.datetime(2013, 5, 26, 14, 47, 57, 620000)), (4, 'Jennifer', 'Davis', datetime.datetime(2013, 5, 26, 14, 47, 57, 620000)), (5, 'Johnny', 'Lollobrigida', datetime.datetime(2013, 5, 26, 14, 47, 57, 620000))]"

#### We can run the first chain independently to test 1. We are getting valid SQL queries 2. The format of response is correct i.e. just the query

In [6]:
sql_chain.invoke({"question": "Give me count of actors"})

'SELECT COUNT(*) AS actor_count\nFROM actor;'

In [7]:
sql_chain.invoke({"question": "How many films has Penelope starred in?"})

"SELECT COUNT(*) AS num_films\nFROM film_actor\nWHERE actor_id = (SELECT actor_id FROM actor WHERE first_name = 'Penelope' AND last_name = 'Cruise')\nGROUP BY actor_id"

In [8]:
sql_chain.invoke({"question": "How many films has Penelope Guiness starred in?"})

"SELECT COUNT(*) AS num_films\nFROM film_actor\nINNER JOIN actor ON film_actor.actor_id = actor.actor_id\nWHERE actor.first_name = 'Penelope' AND actor.last_name = 'Guiness'"

In [9]:
sql_chain.invoke(
    {"question": "Can you give me some movie names where Penelope Guiness starred?"}
)

"SELECT f.title\nFROM film_actor fa\nINNER JOIN film f ON fa.film_id = f.film_id\nINNER JOIN actor a ON fa.actor_id = a.actor_id\nWHERE a.first_name = 'Penelope' AND a.last_name = 'Guiness'"

In [10]:
sql_chain.invoke({"question": "Give me breakdown of films by release year"})

'SELECT release_year, COUNT(*) AS num_films\nFROM film\nGROUP BY release_year'

In [11]:
sql_chain.invoke({"question": "How many films were released in 2006?"})

'SELECT COUNT(*) AS num_films\nFROM film\nWHERE release_year = 2006;'

In [12]:
sql_chain.invoke(
    {"question": "Going by the description, can you give me names of movies on robots?"}
)

"SELECT f.title\nFROM film f\nINNER JOIN film_actor fa ON f.film_id = fa.film_id\nINNER JOIN actor a ON fa.actor_id = a.actor_id\nWHERE a.first_name = 'Will' OR a.last_name = 'Smith'"

In [13]:
sql_chain.invoke(
    {"question": "Give me names of movies where description contains Robot"}
)

"SELECT f.title\nFROM film f\nINNER JOIN film_actor fa ON f.film_id = fa.film_id\nINNER JOIN actor a ON fa.actor_id = a.actor_id\nWHERE f.description LIKE '%Robot%'"

### Invoking the main second chain on same queries as above.

In [16]:
print(sql_run_chain.invoke({"question": "Give me count of actors"}))

Sure, here is the natural language answer:

The query "SELECT COUNT(*) AS actor_count FROM actor" returns a single result, which is the count of actors in the database. The result is returned in a single row with a single column named "actor_count" and the value is 200.


In [17]:
print(sql_run_chain.invoke({"question": "How many films has Penelope starred in?"}))

Sure, here is the natural language answer:

Penelope Cruz has starred in a total of [num_films] films. This information is retrieved by querying the film_actor table, which associates actors with their respective films. The query selects the count of films for each actor, where the actor's first name and last name match Penelope Cruz. The results of the query are grouped by actor ID, and the total number of films for each actor is returned as the num_films value.


In [18]:
print(
    sql_run_chain.invoke(
        {"question": "How many films has Penelope Guiness starred in?"}
    )
)

Sure, here is the natural language answer:

Penelope Guiness has starred in a total of nineteen films. This information is available in the film_actor table, which joins the actor and film tables to link actors with their respective films. The query selected the count of films for each actor, and the result showed that Penelope Guiness has starred in a total of nineteen films.


In [19]:
print(
    sql_run_chain.invoke(
        {"question": "Can you give me some movie names where Penelope Guiness starred?"}
    )
)

Sure, here is the natural language answer based on the provided information:

Penelope Guinness has starred in several movies. Based on the data available, the SQL query retrieved a list of movie names where she starred. The results show that Penelope Guinness starred in movies such as "Untitled."


In [20]:
print(sql_run_chain.invoke({"question": "Give me breakdown of films by release year"}))

Sure, here is the natural language answer:

The provided text describes a relational database schema with three tables: `actor`, `film`, and `film_actor`. This schema defines relationships between actors, films, and their involvement in various films.

The question asked is to provide a breakdown of films by release year. To answer this, an SQL query is written to group films by their release year and count the number of films for each year. The query results in a single row, with the release year and the number of films released in that year.

The SQL response shows that there is only one release year in the database: 2006, with a total of 1000 films released in that year.


In [21]:
print(
    sql_run_chain.invoke(
        {
            "question": "Going by the description, can you give me names of movies on robots?"
        }
    )
)

Sure, here is the natural language answer:

Based on the provided table schema and the question, the SQL query aims to find movies in which Will Smith has starred. The query joins the `film`, `film_actor`, and `actor` tables to associate movies with actors. The query filters actors by first name and last name, ensuring that only movies associated with Will Smith are retrieved. The result of the query is a list of movie titles.


In [22]:
print(
    sql_run_chain.invoke(
        {"question": "Give me names of movies where description contains Robot"}
    )
)

Sure, here is the natural language answer:

The SQL query you provided is to find movies where the description contains the word "Robot." The result of the query is a list of movie titles, which are:

- Agent Truman
- Alley Evolution
- Angels Life
- Artist Coldblooded
- Beauty Grease
- Bedazzled Married
- Blindness Gun
- Bonnie Holocaust
- Bride Intrigue
- Caddyshack Jedi
- California Birds
- Campus Remember
- Casper Dragonfly
- Chainsaw Uptown
- Chariots Conspiracy
- Citizen Shrek
- Clones Pinocchio
- Control Anthem
- Crossing Divorce
- Dalmations Sweden
- Dares Pluto
- Dinosaur Secretary
- Fiction Christmas
- Flatliners Killer
- Floats Garden
- Grapes Fury
- Greatest North
- Gunfighter Mussolini
- Hamlet Wisdom
- Hollow Jeopardy
- Holocaust Highball
- Hustler Party
- Illusion Amelie
- Insider Arizona
- Kane Exorcist
- Kick Savannah
- Kiss Glory
- Kramer Chocolate
- Lolita World
- Loser Hustler
- Lost Bird
- Lover Truman
- Luke Mummy
- Mask Peach
- Mine Titans
- Money Harold
- Nightma