Python for Data Science - Spark Data Platform

Getting started

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in "editable mode" so that as you develop, local code changes will automatically apply.

pip install -e ".[dev]"

Duplicate the .env.example file and rename to .env. Then, fill in the values of environment variables in that file.

Start Dagster UI web server:

dagster dev -h 0.0.0.0

Open http://localhost:3000 using your browser to see the project.

Development

Adding new Python dependencies:

You can specify new Python dependencies in setup.py

Unit testing

Unit tests are available in data_platform_tests directory and you can run tests using pytest:

pytest data_platform_tests

Deployment with spark cluster

Build docker images

You need to build 2 images. One for dagster-webserver and dagster-daemon (both use the same image). And one for pipeline.

docker build -t dagster .
docker build -t pipeline pipeline_data_platform

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
dagster_home		dagster_home
model_api		model_api
movies_api		movies_api
movies_ui		movies_ui
pipeline_data_platform		pipeline_data_platform
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dagster_home

dagster_home

model_api

model_api

movies_api

movies_api

movies_ui

movies_ui

pipeline_data_platform

pipeline_data_platform

.env.example

.env.example

.gitignore

.gitignore

Dockerfile

Dockerfile

Makefile

Makefile

README.md

README.md

docker-compose.yml

docker-compose.yml

requirements.txt

requirements.txt

Repository files navigation

Python for Data Science - Spark Data Platform

Getting started

Development

Adding new Python dependencies:

Unit testing

Deployment with spark cluster

Build docker images

About

Releases

Packages

Contributors 2

Languages

greyyT/py4ds-spark-data-platform

Folders and files

Latest commit

History

Repository files navigation

Python for Data Science - Spark Data Platform

Getting started

Development

Adding new Python dependencies:

Unit testing

Deployment with spark cluster

Build docker images

About

Resources

Stars

Watchers

Forks

Languages