SaaS teams have usage, support, and billing data, but struggle to identify early signals of churn risk.
This project builds an end-to-end data pipeline that:
- generates synthetic SaaS data
- loads data into Google Cloud Storage (GCS)
- ingests data into BigQuery
- transforms raw data into risk signals
- visualizes results in a Streamlit dashboard
- Data Generation: Python (
subside-gen) - Data Lake: Google Cloud Storage (GCS)
- Data Warehouse: BigQuery
- Transformations: SQL
- Visualization: Streamlit
The Streamlit app displays:
- Accounts by Risk Level (categorical distribution)
- Usage vs Support Activity Over Time (time series)
subside-gen --out data_out --csv
gsutil cp data_out/*.csv gs://YOUR_BUCKET/
bq load --autodetect --source_format=CSV ...
Execute the SQL scripts to create derived tables.
streamlit run app/app.py
- Replace YOUR_BUCKET with your GCS bucket
- Set your GCP project ID (e.g. de2026-485318)
- Run SQL transformations in BigQuery to create derived tables
