Dagster is a system for building modern data applications.
Combining an elegant programming model and beautiful tools, Dagster allows infrastructure engineers, data engineers, and data scientists to seamlessly collaborate to process and produce the trusted, reliable data needed in today's world.
To get started:
pip install dagster dagit
This installs two modules:
- dagster | The core programming model and abstraction stack; stateless, single-node, single-process and multi-process execution engines; and a CLI tool for driving those engines.
- dagit | A UI and rich development environment for Dagster, including a DAG browser, a type-aware config editor, and a streaming execution interface.
For details on contributing or running the project for development, check out our contributing guide.
Dagster works with the tools and systems that you're already using with your data, including:
Allows Dagster pipelines to be scheduled and executed, either containerized or uncontainerized, as Apache Airflow DAGs.
|Apache Spark||dagster-spark · dagster-pyspark
Libraries for interacting with Apache Spark and Pyspark.
Provides a Dagster integration with Dask / Dask.Distributed.
Provides a Dagster resource for publishing metrics to Datadog.
|/||Jupyter / Papermill||dagstermill
Built on the papermill library, dagstermill is meant for integrating productionized Jupyter notebooks into dagster pipelines.
A library for creating PagerDuty alerts from Dagster workflows.
A library for interacting with the Snowflake Data Warehouse.
A library for interacting with Amazon Web Services. Provides integrations with S3, EMR, and (coming soon!) Redshift.
A library for interacting with Google Cloud Platform. Provides integrations with BigQuery and Cloud Dataproc.
This list is growing as we are actively building more integrations, and we welcome contributions!
Several example projects are provided under the examples folder demonstrating how to use Dagster, including: