Dataform is an application to manage data in BigQuery, Snowflake, Redshift, and other data warehouses. It enables data teams to build scalable, tested, SQL based data transformation pipelines using version control and engineering inspired best practices.
Compile hundreds of data models in under a second using SQLX. SQLX extends your existing SQL warehouse dialect to add features that support dependency management, testing, documentation and more.
- BigQuery
- Snowflake
- Redshift
- Postgres
- Azure SQL data warehouse
- Presto (under development)
- Turn any SQL query into a dataset published back to your warehouse
- Write data quality checks for your datasets
- Simplify generation of incremental tables using merge/insert to save costs
- Generate a DAG automatically from dataset dependencies
- Document datasets in code alongside your SQL
- Enable scripting and code re-use with a JavaScript API
- Reading and writing data from S3
- Writing unit tests
- Create slowly-changing dimension tables
- Manage development, staging and production environments
- Model Segment data in minutes
- Analyse Bigquery usage logs
You can install the Dataform SDK using the following command line. Follow the docs to get started.
npm i -g @dataform/cli
Dataform web is a development environment and production ready application for the Dataform SDK. You can learn more on dataform.co
- Read the docs here
- 5 minute overview video
- Read about how we think you should approach building a modern analytics stack
- Join us on Slack
- Read our blog
- Check out what our users say about us
Check out our contributors guide to get started with setting up the repo.