This Web App is an end-to-end project for generating realistic logistics and commerce data, landing it in Azure storage, processing it in Databricks, publishing it to Azure SQL, and serving it through an Azure Functions API and TypeScript frontend.
The project models a synthetic commerce system with customers, products, warehouses, and orders. The pipeline uses NVIDIA NeMo Data Designer to generate realistic records, Azure Data Lake Storage for landing data, Databricks Auto Loader for incremental ingestion, Azure SQL for serving, and a lightweight web application for exploration.
At a glance, the platform covers:
- Synthetic data generation guided by explicit schemas and shipping logic
- Lake ingestion into Bronze Delta tables with Databricks Auto Loader
- Incremental publish into Azure SQL using watermark-based processing
- API access through Azure Functions
- A React and TypeScript frontend for browsing operational data
- Define the synthetic entities and generation rules in src/schemas.py and src/shipping_geo.py.
- Generate realistic datasets in src/generate_realistic_data.py.
- Write scheduled JSON outputs to Azure Data Lake Storage in src/daily_synthetic_pipeline.py.
- Ingest landed files into Bronze Delta tables with src/autoloader_bronze.py.
- Incrementally publish Bronze data into Azure SQL with src/sqlserver_publish.py.
- Orchestrate the downstream Databricks job with src/autoloader_to_sql_pipeline.py.
- Expose the data through the Azure Functions API in web/api/function_app.py.
- Visualize and interact with the data in the frontend under web/frontend.
- src contains the Python scripts that drive synthetic data generation, ADLS landing, Databricks ingestion, and SQL publishing.
- web contains the application layer: the Azure Functions API and the TypeScript frontend.
- sql contains the database schema, table creation scripts, and schema evolution scripts for Azure SQL.
- config contains environment and configuration helpers used across the Python pipeline.
- init-scripts contains Databricks cluster setup scripts, including NeMo and ODBC installation.
- data contains local sample CSVs that support development and testing flows.
- docs is the natural home for screenshots and additional documentation as the project evolves.
The src folder is the backbone of the platform. These are the main scripts in workflow order.
- src/client.py configures the NVIDIA NeMo client used during synthetic generation.
- src/schemas.py defines the core entities and fields that shape the generated datasets.
- src/shipping_geo.py adds warehouse, geography, and shipping-estimate realism.
- src/generate_realistic_data.py produces realistic records for the synthetic commerce domain.
- src/daily_synthetic_pipeline.py writes generated outputs to Azure Data Lake Storage in a scheduled, partition-friendly format.
- src/autoloader_bronze.py incrementally ingests landed JSON into Bronze Delta tables.
- src/sqlserver_publish.py publishes Bronze data into Azure SQL using watermark-based processing.
- src/autoloader_to_sql_pipeline.py runs the downstream ingestion and publish sequence together.
Supporting scripts include src/generate_data.py for earlier generation flows.
- web/api contains the Azure Functions backend, including SQL connectivity in web/api/shared/db.py.
- web/frontend contains the React and TypeScript user interface built with Vite.
- .github/workflows/azure-static-web-apps-gentle-plant-05f735b1e.yml contains the deployment workflow for the web application.
The sql folder contains the scripts used to bootstrap and evolve the Azure SQL schema.
- sql/000_create_schema_syn_data.sql creates the schema.
- sql/001_create_customers.sql, sql/002_create_products.sql, sql/003_create_orders.sql, and sql/005_create_warehouses.sql create the core tables.
- sql/004_create_ingestion_watermark.sql supports incremental publish tracking.
- sql/008_alter_add_shipping.sql extends older environments with shipping-related columns.
Read more about the web app here


