# example for generating database schema from business needs description

## setup

- for best results, setup new virtual environment (using `virtualenv`, `conda`, etc.)
- install the requirements from `requirements.txt` (using `pip`) or from `requlrements.yaml` (if using `conda`)
- in this "notebooks" directory, create a new `.env` file and add the Microsoft Azure key info, as shown below
- now you can run this notebook

## .env file contents

the `.env` file should have the following information:

(replace the key, endpoint and model! api version can be "2024-10-21")

```
AZURE_OPENAI_API_KEY="abcd1234wxyz9999abcdefghijklmnop123456789"
AZURE_OPENAI_API_ENDPOINT="https://my-great-organization.openai.azure.com/"
AZURE_OPENAI_API_VERSION="2024-10-21"
AZURE_OPENAI_MODEL="name-of-my-gpt"

GCP_KEY="abcd1234thisIsMyGoogleAPIKey"
```

In [1]:
# import the 'text2sql' module from the src directory
# it should print "hello world!"
import sys
sys.path.append("../src")
from text2sql import hello
print(hello.message)

hello, world!


In [2]:
import os

from dotenv import load_dotenv

from text2sql.engine.generation import AzureGenerator, GCPGenerator
from text2sql.engine.generation.postprocessing import extract_first_code_block

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# load the env variables
load_dotenv()

True

In [11]:
# create a generator
engine = "gcp"

if engine == "azure":
    generator = AzureGenerator(
        api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
        azure_endpoint=os.environ.get("AZURE_OPENAI_API_ENDPOINT"),
        api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
        model=os.environ.get("AZURE_OPENAI_MODEL"),
    )

elif engine == "gcp":
    generator = GCPGenerator(
        api_key=os.environ.get("GCP_KEY"),
        model="gemini-1.5-flash",
    )

else:
    raise ValueError(f"Unknown engine: {engine}")

In [12]:
# create the inputs
system_prompt: str = "you are a data scientist specializing in SQL."

user_message: str = """
"hello, please write me an sql file to create an empty database for an online clothing retailer.

here are a list of business needs that should be reflected in the database:
- each clothing item can have different sizes and colors, and each unique combination is its own SKU.
- clothing items need to be grouped by department (mens, womens, kids, etc.) and category (tops, bottoms, outerwear, etc.)
- the website is an online store, so customers can add items to cart, and then checkout and pay online.
- out shop and customers are international and we support multiple languages on our storefront.
- customers can also reviews on objects, which are 1~5 stars with optional text comment.
- customers can favorite items for later to multiple lists, see and edit their lists, and make them public to share.

first, generate a list of all tables we need for this database, then write the SQL to create all tables and columns.
write the SQL inside a markdown code block (```sql) so I can copy it into a file. 
""" 

In [13]:
# input is messages with system, user and assistant roles
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_message},
]

In [14]:
# generate and print the response head
response = generator.generate(messages)
print(response[:1024])

Here's a proposed schema for your online clothing retailer database, along with the SQL to create the tables.  I've included considerations for data integrity and efficiency.

**Tables:**

* **Departments:** Stores department information (e.g., Men's, Women's).
* **Categories:** Stores clothing categories (e.g., Tops, Bottoms).
* **Items:** Stores general information about clothing items.
* **SKUs:** Stores information on specific item variations (size, color, SKU).
* **Languages:** Stores supported languages.
* **Customers:** Stores customer information.
* **Addresses:** Stores customer addresses (can be multiple per customer).
* **Carts:** Stores shopping carts for each customer session.
* **CartItems:**  Links cart to specific SKUs and quantities.
* **Orders:** Stores order information after checkout.
* **OrderItems:** Links orders to specific SKUs and quantities.
* **Reviews:** Stores customer reviews.
* **FavoritesLists:** Stores customer's favorite lists.
* **Favorites:** Links c

In [15]:
sql = extract_first_code_block(response)
with open("output.sql", "w") as f:
    f.write(sql)
print("SQL written to output.sql")

SQL written to output.sql


In [16]:
print(sql)

-- Create the Departments table
CREATE TABLE Departments (
    department_id INT PRIMARY KEY AUTO_INCREMENT,
    department_name VARCHAR(255) UNIQUE NOT NULL
);

-- Create the Categories table
CREATE TABLE Categories (
    category_id INT PRIMARY KEY AUTO_INCREMENT,
    category_name VARCHAR(255) UNIQUE NOT NULL
);

-- Create the Items table
CREATE TABLE Items (
    item_id INT PRIMARY KEY AUTO_INCREMENT,
    department_id INT NOT NULL,
    category_id INT NOT NULL,
    item_name VARCHAR(255) NOT NULL,
    description TEXT,
    image_url VARCHAR(255),
    FOREIGN KEY (department_id) REFERENCES Departments(department_id),
    FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);

-- Create the SKUs table
CREATE TABLE SKUs (
    sku_id INT PRIMARY KEY AUTO_INCREMENT,
    item_id INT NOT NULL,
    size VARCHAR(20) NOT NULL,
    color VARCHAR(50) NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INT NOT NULL,
    FOREIGN KEY (item_id) REFERENCES Items(item_id)
);

-