A small boilerplate/project with docker for data science.
cp config/.env.jupyter.example .env.jupyter
cp config/.env.minio.example .env.minio
cp config/.env.postgres.example .env.postgres
cp config/.env.airflow.example .env.airflow
cp config/.env.database.example .env.database
Provide appropriate values in the .env files and then run
docker-compose up -d
if you want to persist jupyter settings, make sure to commit the container before taking it down. So
docker commit jupyter
# and then
docker-compose down
Otherwise you can simply stick to docker-compose start
and stop
.
Services:
jupyterlab
: Jupyter notebooks and jupyter lab where you can do fancy stuff (localhost:8888)minio
: A key value file store like aws (localhost:9001)postgres
: Database storemetabase
: Cool data science stuffsuperset
: Another cool visualisations serviceairflow
: Scheduler and task runner (localhost:8080)
Structure:
config
: Contains environment variables, keys, secrets etcjupyter
: Contains notebooksdags
: Task runners
Data folders:
data
: Contains data which is mounted on to miniodb_data
: Postgres database persistant volumnemetabase_data
: Metabase data persistant volume
Inspired by data-science-stack