In [0]:
%pip install faker

In [0]:
from helpers import setup_schemas, setup_volumes, users_generator, products_generator, orders_generator, order_items_generator

### üèóÔ∏è Creating user schemas across environments

This command initializes the **database schemas** (namespaces) that will structure our data in **Unity Catalog**.

The function `setup_schemas.create_user_schemas()`:

- Loops through the environments: `dev` and `prd`.  
- Creates the schemas (if they don‚Äôt already exist) for each data layer:
  - `bronze` ‚Äì minimally processed data.  
  - `silver` ‚Äì cleaned and standardized datasets.  
  - `gold` ‚Äì curated datasets ready for analytics.

By creating these schemas up front, we establish a **consistent data architecture** for the entire project.


In [0]:
setup_schemas.create_user_schemas()

### üóÇÔ∏è Creating Unity Catalog Volumes

Once our schemas are set up, we need **volumes** to store the raw data files for each entity in the project.

The function `setup_volumes.create_volumes()`:

- Iterates through the environments: `dev` and `prd`.  
- Targets the `bronze` schema in each environment (e.g., `sales_dev.bronze`).  
- Creates volumes for each purpose:
  - `raw_files` ‚Äì stores generated and raw files.  


In [0]:
setup_volumes.create_volumes()

### üë§ Generating synthetic user data

Now that the schemas and volumes are set up, we can generate the **user dataset** for the project.

The function `generate_user_data.main()`:

- Creates **synthetic user records** with realistic attributes (name, email, phone, etc.).  
- Writes the generated data to the **`user` volume** in the specified environment.  
- We run it for both environments:
  - `env="dev"` ‚Äì for development/testing purposes.  
  - `env="prd"` ‚Äì for production-like datasets.

This step populates the **raw layer** of the Lakehouse with user data, ready for downstream processing and analytics.


In [0]:
user_dev = users_generator.UsersGenerator("dev")
user_prd = users_generator.UsersGenerator("prd")

user_dev.generate_new_file()
user_prd.generate_new_file()

### üõçÔ∏è Generating product catalog data

After generating user data, we create the **product dataset** for the project.

The function `generate_product_data.main()`:

- Generates a **synthetic product catalog** with attributes such as:
  - Product ID
  - Name
  - Category
  - Price
  - Daily updates to simulate realistic changes  
- Saves the generated data to the **`product` volume** in the specified environment.  
- Executed for both environments:
  - `env="dev"` ‚Äì for development and testing workflows.  
  - `env="prd"` ‚Äì for production-like datasets.

This step populates the **raw layer** of the Lakehouse with product data, providing a foundation for sales and event generation downstream.


In [0]:
product_dev = products_generator.ProductsGenerator("dev")
product_prd = products_generator.ProductsGenerator("prd")

product_dev.generate_new_file()
product_prd.generate_new_file()

### üí∞ Generating orders data

This script creates synthetic **orders transactions** for the project.  

- First, it generates a **historical snapshot** of sales for the base date.   
- All data is saved directly to the **`orders` volume** in the specified environment (`dev` or `prd`) using an **in-memory buffer**.


In [0]:
order_dev = orders_generator.OrdersGenerator("dev")
order_prd = orders_generator.OrdersGenerator("prd")

order_dev.generate_new_file()
order_prd.generate_new_file()

### üí∞ Generating order items data

This script creates synthetic **orders items transactions** for the project.  

- First, it generates a **historical snapshot** of order items for the base date.   
- All data is saved directly to the **`order_items` volume** in the specified environment (`dev` or `prd`) using an **in-memory buffer**.


In [0]:
order_items_dev = order_items_generator.OrderItemsGenerator("dev")
order_items_prd = order_items_generator.OrderItemsGenerator("prd")

order_items_dev.generate_new_file()
order_items_prd.generate_new_file()