# Tabular RAG with LM Studio
This notebook demonstrates how to use LM Studio for a tabular Retrieval-Augmented Generation (TRAG)

In [1]:
from utils.llm_client import lm_studio_client, LM_STUDIO_MODEL
from sqlalchemy import create_engine
import pandas as pd

Can LLM response without any context?
Let's try to ask a question without any context.

In [2]:
user_query = '''What's the release year of 'X'?'''

prompt = f"""
User Query: {user_query}
"""

response = lm_studio_client.chat.completions.create(
    model=LM_STUDIO_MODEL,
    messages=[
        {"role": "user", "content": prompt}
    ]
)

response_text = response.choices[0].message.content

print(response_text)

I’m happy to help! Could you let me know the full title of the work (movie, album, game, etc.) called “X” that you’re asking about?


Now we need to generate a SQL query based on the user query.
We will use LM Studio to generate the SQL query.

In [3]:
user_query = '''What's the release year of 'X'?'''

prompt = f"""
You are an AI assistant that can answer questions about a movie database.
The database contains information about movies, directors, actors, studios, genres, cast members, and reviews.
Your task is to answer the user's query based on the provided database information.

User Query: {user_query}

Generate a SQL query to retrieve the requested information from the database.
"""

response = lm_studio_client.chat.completions.create(
    model=LM_STUDIO_MODEL,
    messages=[
        {"role": "user", "content": prompt}
    ]
)

response_text = response.choices[0].message.content

print(response_text)

```sql
-- Retrieve the release year for the movie titled 'X'
SELECT 
    EXTRACT(YEAR FROM release_date) AS release_year
FROM 
    movies
WHERE 
    title = 'X';
```

*Explanation:*  
- `release_date` is assumed to be a DATE/TIMESTAMP column that stores the full release date.  
- `EXTRACT(YEAR FROM ...)` pulls out just the year portion.  
- Adjust column names (`release_date`, `title`) if your schema uses different names.


Well it generated a SQL query, but we need to execute it against the database to get the answer.
Let's assume we have a database connection set up and we can execute the query.

In [4]:
engine = create_engine(f'sqlite:///datasets/movies.db')

movie_name = pd.read_sql('select title from movies', engine).sample(1).title.values[0]

print('Working with Movie Name: ', movie_name )

Working with Movie Name:  Food Dogs Of The Age Music Theory


# Now we can execute the SQL query generated by the LLM against the database.

In [5]:
# Sample Queries
prompt = """
You are an AI assistant that can answer questions about a movie database.
The database contains information about movies, directors, actors, studios, genres, cast members, and reviews.
Your task is to answer the user's query based on the provided database information.

User Query: {user_query}

Generate a SQL query to retrieve the requested information from the database.
database: sqlite:///datasets/movies.db
"""

user_queries = [
    f"What's the release year of '{movie_name}'?",
    f"Who directed the movie '{movie_name}'?",
    f"List all actors in the movie '{movie_name}'.",
    f"What are the genres of the movie '{movie_name}'?",
]
sql_queries = []
for user_query in user_queries:
    response = lm_studio_client.chat.completions.create(
        model=LM_STUDIO_MODEL,
        messages=[
            {"role": "user", "content": prompt.format(user_query=user_query)}
        ]
    )

    resp = response.choices[0].message.content
    sql = resp.strip().split('```sql')[1].split(
        '```')[0].strip()  # Extract SQL query from response
    sql_queries.append(sql)
    print(user_query, ':', sql)
    print('-' * 80)
    print()

What's the release year of 'Food Dogs Of The Age Music Theory'? : SELECT release_year
FROM movies
WHERE title = 'Food Dogs Of The Age Music Theory';
--------------------------------------------------------------------------------

Who directed the movie 'Food Dogs Of The Age Music Theory'? : -- Retrieve the director of a specific movie
SELECT d.name AS director_name
FROM movies m
JOIN directors d ON m.director_id = d.id
WHERE m.title = 'Food Dogs Of The Age Music Theory';
--------------------------------------------------------------------------------

List all actors in the movie 'Food Dogs Of The Age Music Theory'. : SELECT DISTINCT a.name
FROM actors AS a
JOIN cast_members AS cm   ON a.id = cm.actor_id
JOIN movies      AS m     ON cm.movie_id = m.id
WHERE m.title = 'Food Dogs Of The Age Music Theory';
--------------------------------------------------------------------------------

What are the genres of the movie 'Food Dogs Of The Age Music Theory'? : SELECT g.name AS genre
FROM mo

Let's execute the SQL queries against the database and get the results.

In [6]:
for sql in sql_queries:
    try:
        print(pd.read_sql(sql, engine).to_markdown(index=False))
    except Exception as e:
        print(f"Error executing query: {e}")
    print('-' * 80)
    print()

|   release_year |
|---------------:|
|           1989 |
--------------------------------------------------------------------------------

Error executing query: (sqlite3.OperationalError) no such column: d.name
[SQL: -- Retrieve the director of a specific movie
SELECT d.name AS director_name
FROM movies m
JOIN directors d ON m.director_id = d.id
WHERE m.title = 'Food Dogs Of The Age Music Theory';]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
--------------------------------------------------------------------------------

Error executing query: (sqlite3.OperationalError) no such column: a.name
[SQL: SELECT DISTINCT a.name
FROM actors AS a
JOIN cast_members AS cm   ON a.id = cm.actor_id
JOIN movies      AS m     ON cm.movie_id = m.id
WHERE m.title = 'Food Dogs Of The Age Music Theory';]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
--------------------------------------------------------------------------------

Error executing query: (sqlite3.Operationa

Lets Retry with schema details/data dictionary

In [8]:
database_schema = '''

#### 1\. `movies` Table

  * **Description:** Stores core information about each movie.
  * **Columns:**
      * `movie_id` (INTEGER, Primary Key): Unique identifier for the movie.
      * `title` (TEXT, NOT NULL): The title of the movie.
      * `release_year` (INTEGER): The year the movie was released.
      * `duration_minutes` (INTEGER): The length of the movie in minutes.
      * `synopsis` (TEXT): A brief summary of the movie's plot.
      * `budget_usd` (REAL): The production budget of the movie in USD.
      * `box_office_gross_usd` (REAL): The worldwide box office revenue in USD.
      * `director_id` (INTEGER, Foreign Key to `directors.director_id`): The director of the movie.
      * `studio_id` (INTEGER, Foreign Key to `studios.studio_id`): The studio that produced the movie.

#### 2\. `actors` Table

  * **Description:** Stores information about individual actors.
  * **Columns:**
      * `actor_id` (INTEGER, Primary Key): Unique identifier for the actor.
      * `actor_name` (TEXT, NOT NULL): The full name of the actor.
      * `birth_date` (TEXT): The actor's birth date (YYYY-MM-DD format).
      * `nationality` (TEXT): The actor's nationality.

#### 3\. `directors` Table

  * **Description:** Stores information about movie directors.
  * **Columns:**
      * `director_id` (INTEGER, Primary Key): Unique identifier for the director.
      * `director_name` (TEXT, NOT NULL): The full name of the director.
      * `birth_date` (TEXT): The director's birth date (YYYY-MM-DD format).
      * `nationality` (TEXT): The director's nationality.

#### 4\. `studios` Table

  * **Description:** Stores information about movie production studios.
  * **Columns:**
      * `studio_id` (INTEGER, Primary Key): Unique identifier for the studio.
      * `studio_name` (TEXT, NOT NULL): The name of the studio.
      * `founded_year` (INTEGER): The year the studio was founded.
      * `country` (TEXT): The country where the studio is primarily based.

#### 5\. `genres` Table

  * **Description:** Stores a list of movie genres.
  * **Columns:**
      * `genre_id` (INTEGER, Primary Key): Unique identifier for the genre.
      * `genre_name` (TEXT, NOT NULL, UNIQUE): The name of the genre (e.g., "Action", "Drama", "Sci-Fi").

#### 6\. `cast_members` Table (Junction Table)

  * **Description:** Links movies to actors, specifying the role an actor played in a particular movie. This handles the many-to-many relationship between `movies` and `actors`.
  * **Columns:**
      * `cast_id` (INTEGER, Primary Key): Unique identifier for a cast entry.
      * `movie_id` (INTEGER, Foreign Key to `movies.movie_id`): The movie ID.
      * `actor_id` (INTEGER, Foreign Key to `actors.actor_id`): The actor ID.
      * `role_name` (TEXT): The name of the character played by the actor in that movie.

#### 7\. `movie_genres` Table (Junction Table)

  * **Description:** Links movies to genres, allowing a movie to have multiple genres. This handles the many-to-many relationship between `movies` and `genres`.
  * **Columns:**
      * `movie_id` (INTEGER, Foreign Key to `movies.movie_id`): The movie ID.
      * `genre_id` (INTEGER, Foreign Key to `genres.genre_id`): The genre ID.
      * (Composite Primary Key: `movie_id`, `genre_id` to ensure unique genre assignments per movie).

#### 8\. `reviews` Table

  * **Description:** Stores user reviews and ratings for movies.
  * **Columns:**
      * `review_id` (INTEGER, Primary Key): Unique identifier for the review.
      * `movie_id` (INTEGER, Foreign Key to `movies.movie_id`): The movie being reviewed.
      * `reviewer_name` (TEXT): The name or alias of the reviewer.
      * `rating` (INTEGER): The numerical rating given (e.g., 1-10).
      * `review_text` (TEXT): The full text of the review.
      * `review_date` (TEXT): The date the review was posted (YYYY-MM-DD format).

'''

# Sample Queries
prompt = """
You are an AI assistant that can answer questions about a movie database.
The database contains information about movies, directors, actors, studios, genres, cast members, and reviews.
Your task is to answer the user's query based on the provided database information.

User Query: {user_query}

Generate a SQL query to retrieve the requested information from the database.
database: sqlite:///movies.db
schema details:
{database_schema}
"""

sql_queries = []
for user_query in user_queries:
    response = lm_studio_client.chat.completions.create(
        model=LM_STUDIO_MODEL,
        messages=[
            {"role": "user", "content": prompt.format(user_query=user_query, database_schema=database_schema)}
        ]
    )

    resp = response.choices[0].message.content
    sql = resp.strip().split('```sql')[1].split(
        '```')[0].strip()  # Extract SQL query from response
    sql_queries.append(sql)
    print(user_query, ':', sql)
    print('-' * 80)
    print()

    for sql in sql_queries:
        try:
            print(pd.read_sql(sql, engine).to_markdown(index=False))
        except Exception as e:
            print(f"Error executing query: {e}")
        print('-' * 80)
        print()

What's the release year of 'Food Dogs Of The Age Music Theory'? : SELECT release_year
FROM movies
WHERE title = 'Food Dogs Of The Age Music Theory';
--------------------------------------------------------------------------------

|   release_year |
|---------------:|
|           1989 |
--------------------------------------------------------------------------------

Who directed the movie 'Food Dogs Of The Age Music Theory'? : SELECT d.director_name
FROM directors AS d
JOIN movies      AS m ON d.director_id = m.director_id
WHERE m.title = 'Food Dogs Of The Age Music Theory';
--------------------------------------------------------------------------------

|   release_year |
|---------------:|
|           1989 |
--------------------------------------------------------------------------------

| director_name   |
|:----------------|
| saucy-segno     |
--------------------------------------------------------------------------------

List all actors in the movie 'Food Dogs Of The Age Mus