# example for generating database schema from business needs description

## setup

- for best results, setup new virtual environment (using `virtualenv`, `conda`, etc.)
- install the requirements from `requirements.txt` (using `pip`) or from `requlrements.yaml` (if using `conda`)
- in this "notebooks" directory, create a new `.env` file and add the Microsoft Azure key info, as shown below
- now you can run this notebook

## .env file contents

the `.env` file should have the following information:

(replace the key, endpoint and model! api version can be "2024-10-21")

```
AZURE_OPENAI_API_KEY="abcd1234wxyz9999abcdefghijklmnop123456789"
AZURE_OPENAI_API_ENDPOINT="https://my-great-organization.openai.azure.com/"
AZURE_OPENAI_API_VERSION="2024-10-21"
AZURE_OPENAI_MODEL="name-of-my-gpt"
```

In [8]:
# import the 'text2sql' module from the src directory
# it should print "hello world!"
import sys
sys.path.append("../src")
from text2sql import hello
print(hello.message)

hello, world!


In [19]:
import os

from dotenv import load_dotenv

from text2sql.engine.generation import AzureGenerator
from text2sql.engine.generation.postprocessing import extract_first_code_block

In [14]:
# load the env variables
load_dotenv()

True

In [15]:
# create a generator
generator = AzureGenerator(
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_API_ENDPOINT"),
    api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
    model=os.environ.get("AZURE_OPENAI_MODEL"),
)

In [None]:
# create the inputs
system_prompt: str = "you are a data scientist specializing in SQL."

user_message: str = """
"hello, please write me an sql file to create an empty database for an online clothing retailer.

here are a list of business needs that should be reflected in the database:
- each clothing item can have different sizes and colors, and each unique combination is its own SKU.
- clothing items need to be grouped by department (mens, womens, kids, etc.) and category (tops, bottoms, outerwear, etc.)
- the website is an online store, so customers can add items to cart, and then checkout and pay online.
- out shop and customers are international and we support multiple languages on our storefront.
- customers can also reviews on objects, which are 1~5 stars with optional text comment.
- customers can favorite items for later to multiple lists, see and edit their lists, and make them public to share.

first, generate a list of all tables we need for this database, then write the SQL to create all tables and columns.
write the SQL inside a markdown code block (```sql) so I can copy it into a file. 
""" 

In [18]:
# input is messages with system, user and assistant roles
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_message},
]

In [None]:
# generate and print the response head
response = generator.generate(messages)
print(response[:1024])

Sure! Based on the business needs, here is the list of tables required for the online clothing retailer database:

1. **Departments** - To store different departments such as mens, womens, kids, etc.
2. **Categories** - To store different clothing categories like tops, bottoms, outerwear, etc.
3. **ClothingItems** - To store information about each clothing item.
4. **Sizes** - To store different available sizes for clothing items.
5. **Colors** - To store different available colors for clothing items.
6. **SKUs** - To store each unique SKU which is a combination of clothing item, size, and color.
7. **Customers** - To store customer information.
8. **ShoppingCart** - To store customer's shopping cart items.
9. **Orders** - To store customer orders after checkout.
10. **OrderItems** - To store items belonging to a specific order.
11. **Languages** - To store supported languages.
12. **Reviews** - To store customer reviews for clothing items.
13. **Favorites** - To store customer's favor

In [23]:
sql = extract_first_code_block(response)
with open("output.sql", "w") as f:
    f.write(sql)
print("SQL written to output.sql")

SQL written to output.sql


In [24]:
print(sql)

-- Departments Table
CREATE TABLE Departments (
    DepartmentID INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(100) NOT NULL
);

-- Categories Table
CREATE TABLE Categories (
    CategoryID INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(100) NOT NULL
);

-- ClothingItems Table
CREATE TABLE ClothingItems (
    ClothingItemID INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(255) NOT NULL,
    Description TEXT,
    Price DECIMAL(10, 2) NOT NULL,
    DepartmentID INT,
    CategoryID INT,
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID),
    FOREIGN KEY (CategoryID) REFERENCES Categories(CategoryID)
);

-- Sizes Table
CREATE TABLE Sizes (
    SizeID INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(50) NOT NULL
);

-- Colors Table
CREATE TABLE Colors (
    ColorID INT PRIMARY KEY AUTO_INCREMENT,
    Name VARCHAR(50) NOT NULL
);

-- SKUs Table
CREATE TABLE SKUs (
    SKU INT PRIMARY KEY AUTO_INCREMENT,
    ClothingItemID INT,
    SizeID INT,
    ColorID INT,
    Quanti