Synthetic Web App

This Web App is an end-to-end project for generating realistic logistics and commerce data, landing it in Azure storage, processing it in Databricks, publishing it to Azure SQL, and serving it through an Azure Functions API and TypeScript frontend.

Overview

The project models a synthetic commerce system with customers, products, warehouses, and orders. The pipeline uses NVIDIA NeMo Data Designer to generate realistic records, Azure Data Lake Storage for landing data, Databricks Auto Loader for incremental ingestion, Azure SQL for serving, and a lightweight web application for exploration.

At a glance, the platform covers:

Synthetic data generation guided by explicit schemas and shipping logic
Lake ingestion into Bronze Delta tables with Databricks Auto Loader
Incremental publish into Azure SQL using watermark-based processing
API access through Azure Functions
A React and TypeScript frontend for browsing operational data

Pipeline At A Glance

Define the synthetic entities and generation rules in src/schemas.py and src/shipping_geo.py.
Generate realistic datasets in src/generate_realistic_data.py.
Write scheduled JSON outputs to Azure Data Lake Storage in src/daily_synthetic_pipeline.py.
Ingest landed files into Bronze Delta tables with src/autoloader_bronze.py.
Incrementally publish Bronze data into Azure SQL with src/sqlserver_publish.py.
Orchestrate the downstream Databricks job with src/autoloader_to_sql_pipeline.py.
Expose the data through the Azure Functions API in web/api/function_app.py.
Visualize and interact with the data in the frontend under web/frontend.

Repository Layout

src contains the Python scripts that drive synthetic data generation, ADLS landing, Databricks ingestion, and SQL publishing.
web contains the application layer: the Azure Functions API and the TypeScript frontend.
sql contains the database schema, table creation scripts, and schema evolution scripts for Azure SQL.
config contains environment and configuration helpers used across the Python pipeline.
init-scripts contains Databricks cluster setup scripts, including NeMo and ODBC installation.
data contains local sample CSVs that support development and testing flows.
docs is the natural home for screenshots and additional documentation as the project evolves.

Source Workflow

The src folder is the backbone of the platform. These are the main scripts in workflow order.

src/client.py configures the NVIDIA NeMo client used during synthetic generation.
src/schemas.py defines the core entities and fields that shape the generated datasets.
src/shipping_geo.py adds warehouse, geography, and shipping-estimate realism.
src/generate_realistic_data.py produces realistic records for the synthetic commerce domain.
src/daily_synthetic_pipeline.py writes generated outputs to Azure Data Lake Storage in a scheduled, partition-friendly format.
src/autoloader_bronze.py incrementally ingests landed JSON into Bronze Delta tables.
src/sqlserver_publish.py publishes Bronze data into Azure SQL using watermark-based processing.
src/autoloader_to_sql_pipeline.py runs the downstream ingestion and publish sequence together.

Supporting scripts include src/generate_data.py for earlier generation flows.

Application Layer

web/api contains the Azure Functions backend, including SQL connectivity in web/api/shared/db.py.
web/frontend contains the React and TypeScript user interface built with Vite.
.github/workflows/azure-static-web-apps-gentle-plant-05f735b1e.yml contains the deployment workflow for the web application.

Database Layer

The sql folder contains the scripts used to bootstrap and evolve the Azure SQL schema.

sql/000_create_schema_syn_data.sql creates the schema.
sql/001_create_customers.sql, sql/002_create_products.sql, sql/003_create_orders.sql, and sql/005_create_warehouses.sql create the core tables.
sql/004_create_ingestion_watermark.sql supports incremental publish tracking.
sql/008_alter_add_shipping.sql extends older environments with shipping-related columns.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
config		config
data		data
docs/screenshots		docs/screenshots
init-scripts		init-scripts
library		library
sql		sql
src		src
web		web
.gitignore		.gitignore
README.md		README.md
plan.md		plan.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Web App

Overview

Pipeline At A Glance

Repository Layout

Source Workflow

Application Layer

Database Layer

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synthetic Web App

Overview

Pipeline At A Glance

Repository Layout

Source Workflow

Application Layer

Database Layer

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages