# E-Commerce dataset to partitioned Parquet

This notebook materializes a compact e-commerce dataset into partitioned Parquet files on the local filesystem. Use it as a quick sanity check for the generator and writer APIs.

In [None]:
import datetime
from pathlib import Path
import polars as pl
from dataset_generator import create_generator, create_writer, WriterOptions, write_dataset

## Configure generator

We create the built-in `ecommerce` generator with smaller volumes so the notebook runs fast. Adjust the parameters to explore different ranges or periods.

In [None]:
generator = create_generator("ecommerce",
    seed=7,
    n_customers=500,
    n_products=200,
    orders_per_day=120,
    order_items_mean=2.4,
    file_rows_target=200,
    start_date=datetime.date(2023, 1, 1),
    end_date=datetime.date(2023, 1, 3),
)

## Write partitioned Parquet

The writer stores dimension tables as singular Parquet files and streams orders/order_items into `year=YYYY/month=MM/day=DD` folders. The output lives under `examples/demo_output/parquet`.

In [None]:
output_dir = Path("examples/demo_output/parquet").resolve()
writer = create_writer("parquet", str(output_dir), s3=None, catalog=None, options=WriterOptions(compression="snappy", file_rows_target=200))
write_dataset(generator, writer)
sorted(path.relative_to(output_dir.parent) for path in output_dir.rglob("*.parquet"))

## Inspect a sample

Load one of the generated order partitions with Polars to verify schema and values.

In [None]:
orders_file = next(output_dir.joinpath("orders").rglob("*.parquet"))
pl.read_parquet(orders_file).head(5)