# Use Case

Gen AI model (Gemini 2.0 flash) is used as a part of the service for aggregation of theatre playbills.

## Problem

There are a lot of theatres in Moscow and tracking premieres in all of them is a unpleasant task. Despite there are aggregation services that collect this info, they are mostly uncomfortable for users as they don't have different filters, such as choosing location, specific play or stage director. Moreover, crawling through a variety of filters is also time-consuming and discouraging for users.

## Solution

The provided solution allows to transform arbitrary user input into an appropriate SQL-query and then return the response basing on the data in the database. Currently only text is supported, but it's possible to extend it to voice input as well.

## Example

E.g. user asks: "Plays for the coming month by Shakespeare, Williams or Chekhov" that is transformed into "SELECT"…

# Gen AI Capabilities

The project combines such AI capabilities as
- Structured output (SQL-query)
- Few-show prompting
- Grounding (for showing reviews) …

# Future

The developed solution will be used in conjunction with developing [API for theatres](link) as a part of [aggregation service](link) deployed both as a web-application and as telegram bot

# Implementation
## Setup

Start by installing and importing the Python SDK.

In [1]:
!pip uninstall -qqy jupyterlab  # Remove unused conflicting packages
!pip install -U -q "google-genai==1.7.0"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyterlab-lsp 3.10.2 requires jupyterlab<4.0.0a0,>=3.1.0, which is not installed.[0m[31m
[0m

In [2]:
from google import genai
from google.genai import types

genai.__version__

'1.7.0'

### Set up your API key

To run the following cell, your API key must be stored it in a [Kaggle secret](https://www.kaggle.com/discussions/product-feedback/114053) named `GOOGLE_API_KEY`.

If you don't already have an API key, you can grab one from [AI Studio](https://aistudio.google.com/app/apikey). You can find [detailed instructions in the docs](https://ai.google.dev/gemini-api/docs/api-key).

To make the key available through Kaggle secrets, choose `Secrets` from the `Add-ons` menu and follow the instructions to add your key or enable it for this notebook.

In [3]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

If you received an error response along the lines of `No user secrets exist for kernel id ...`, then you need to add your API key via `Add-ons`, `Secrets` **and** enable it.

![Screenshot of the checkbox to enable GOOGLE_API_KEY secret](https://storage.googleapis.com/kaggle-media/Images/5gdai_sc_3.png)

### Automated retry

In [4]:
# Define a retry policy. The model might make multiple consecutive calls automatically
# for a complex query, this ensures the client retries if it hits quota limits.
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
  genai.models.Models.generate_content = retry.Retry(
      predicate=is_retriable)(genai.models.Models.generate_content)

## Create a local database

For this minimal example, you'll create a local SQLite database and add some synthetic data so you have something to query.

Load the `sql` IPython extension so you can interact with the database using magic commands (the `%` instructions) to create a new, empty SQLite database.

In [5]:
TEST_MODE = False  # Set to False if you want to persist

%load_ext sql
if TEST_MODE:
    %sql sqlite:///:memory:
else:
    %sql sqlite:///sample1.db

Create the tables and insert some synthetic data. Feel free to tweak this structure and data.

In [6]:
import sqlite3

if TEST_MODE:
    db_conn = sqlite3.connect(':memory:')  # In-memory for testing
else:
    db_file = "sample1.db"
    db_conn = sqlite3.connect(db_file)     # Persistent on disk

In [7]:
%%sql
CREATE TABLE theaters (
    theater_id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    acronym TEXT
);

CREATE TABLE stages (
    stage_id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    address TEXT,
    theater_id INTEGER,
    FOREIGN KEY (theater_id) REFERENCES theaters(theater_id)
);

CREATE TABLE performances (
    performance_id INTEGER PRIMARY KEY AUTOINCREMENT,
    title TEXT NOT NULL,
    stage_id INTEGER,
    datetime TEXT NOT NULL,
    director TEXT,
    author TEXT,
    FOREIGN KEY (stage_id) REFERENCES stages(stage_id)
);


 * sqlite:///sample1.db
Done.
Done.
Done.


[]

In [8]:
%%sql
INSERT INTO theaters (name, acronym) VALUES
  ('Российский академический молодежный театр', 'РАМТ'),
  ('Мастерская Петра Фоменко', 'Фоменко'),
  ('Театр Сатирикон имени Аркадия Райкина', 'Сатирикон');

INSERT INTO stages (name, address, theater_id) VALUES
  ('Большая сцена', 'Театральная пл., д. 2, Москва', 1),
  ('Маленькая сцена', 'Театральная пл., д. 2, Москва', 1),
  ('Новая сцена', 'Кутузовский пр-т, д. 30, Москва', 2),
  ('Старая сцена', 'Кутузовский пр-т, д. 30, Москва', 2),
  ('Дворец на Яузе', 'ул. Журавлёва, д. 1, Москва', 3),
  ('Центральный дом актера', 'Арбат, д. 35, Москва', 3);

INSERT INTO performances (title, stage_id, datetime, director, author) VALUES
  ('Женщины Лазаря', 1, '2025-04-16T19:00:00', 'Алексей Бородин', 'Марина Степнова'),
  ('Пустые поезда', 2, '2025-04-30T16:00:00', 'Дмитрий Данилов', 'Дмитрий Данилов'),
  ('Чайка', 3, '2025-04-27T19:00:00', 'Петр Фоменко', 'Антон Чехов'),
  ('Светлые души, или О том, как написать рассказ', 4, '2025-04-28T19:00:00', 'Евгений Каменькович', 'Василий Шукшин'),
  ('Гроза', 5, '2025-04-20T19:00:00', 'Константин Райкин', 'Александр Островский'),
  ('Дама с собачкой', 6, '2025-04-19T19:00:00', 'Константин Райкин', 'Антон Чехов');



 * sqlite:///sample1.db
3 rows affected.
6 rows affected.
6 rows affected.


[]

## Define database functions


In [9]:
def list_tables() -> list[str]:
    """Retrieve the names of all tables in the database."""
    # Include print logging statements so you can see when functions are being called.
    print(' - DB CALL: list_tables()')

    cursor = db_conn.cursor()

    # Fetch the table names.
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")

    tables = cursor.fetchall()
    return [t[0] for t in tables]


list_tables()

 - DB CALL: list_tables()


['theaters', 'sqlite_sequence', 'stages', 'performances']

In [10]:
def describe_table(table_name: str) -> list[tuple[str, str]]:
    """Look up the table schema.

    Returns:
      List of columns, where each entry is a tuple of (column, type).
    """
    print(f' - DB CALL: describe_table({table_name})')

    cursor = db_conn.cursor()

    cursor.execute(f"PRAGMA table_info({table_name});")

    schema = cursor.fetchall()
    # [column index, column name, column type, ...]
    return [(col[1], col[2]) for col in schema]


[describe_table(table) for table in list_tables()]

 - DB CALL: list_tables()
 - DB CALL: describe_table(theaters)
 - DB CALL: describe_table(sqlite_sequence)
 - DB CALL: describe_table(stages)
 - DB CALL: describe_table(performances)


[[('theater_id', 'INTEGER'), ('name', 'TEXT'), ('acronym', 'TEXT')],
 [('name', ''), ('seq', '')],
 [('stage_id', 'INTEGER'),
  ('name', 'TEXT'),
  ('address', 'TEXT'),
  ('theater_id', 'INTEGER')],
 [('performance_id', 'INTEGER'),
  ('title', 'TEXT'),
  ('stage_id', 'INTEGER'),
  ('datetime', 'TEXT'),
  ('director', 'TEXT'),
  ('author', 'TEXT')]]

In [11]:
def execute_query(sql: str) -> list[list[str]]:
    """Execute an SQL statement, returning the results."""
    print(f' - DB CALL: execute_query({sql})')

    cursor = db_conn.cursor()

    cursor.execute(sql)
    return cursor.fetchall()


execute_query("select * from performances limit 3")

 - DB CALL: execute_query(select * from performances limit 3)


[(1,
  'Женщины Лазаря',
  1,
  '2025-04-16T19:00:00',
  'Алексей Бородин',
  'Марина Степнова'),
 (2,
  'Пустые поезда',
  2,
  '2025-04-30T16:00:00',
  'Дмитрий Данилов',
  'Дмитрий Данилов'),
 (3, 'Чайка', 3, '2025-04-27T19:00:00', 'Петр Фоменко', 'Антон Чехов')]