Python for Data Science - Spark Data Platform

Getting started

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in "editable mode" so that as you develop, local code changes will automatically apply.

pip install -e ".[dev]"

Duplicate the .env.example file and rename to .env. Then, fill in the values of environment variables in that file.

Start Dagster UI web server:

dagster dev -h 0.0.0.0

Open http://localhost:3000 using your browser to see the project.

Development

Adding new Python dependencies:

You can specify new Python dependencies in setup.py

Unit testing

Unit tests are available in data_platform_tests directory and you can run tests using pytest:

pytest data_platform_tests

Deployment with spark cluster

Build docker images

You need to build 2 images. One for dagster-webserver and dagster-daemon (both use the same image). And one for pipeline.

docker build -t dagster .
docker build -t pipeline pipeline_data_platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Python for Data Science - Spark Data Platform

Getting started

Development

Adding new Python dependencies:

Unit testing

Deployment with spark cluster

Build docker images

Files

README.md

Latest commit

History

README.md

File metadata and controls

Python for Data Science - Spark Data Platform

Getting started

Development

Adding new Python dependencies:

Unit testing

Deployment with spark cluster

Build docker images