Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace secrets.py python modules with environment variables #159

Closed
ricardogsilva opened this issue Sep 1, 2017 · 1 comment
Closed

Comments

@ricardogsilva
Copy link
Contributor

Currently DAGs are retrieving sensitive information from several secrets.py files. This pattern is meant to keep sensitive data out of the git repository (but beware there are some secrets lying around in the git history for this repo) but is not very friendly for setting up new environments (dev or otherwise). For example: I'm creating a new dev environment locally and will have to create several of these secrets files with dummy values just be able to get airflow up and running.

Also, this pattern of retrieving the secrets is not very dev friendly:

from myfile import secrets

var1 = secrets["secret1"]

This forces me to create the secrets.py file AND also create a secrets dictionary AND also create the secret1 key in the dict with some value. Otherwise I cannot get airflow to run. And I have to repeat this for all DAGs even if I'm only interested in working on a single DAG.

A more flexible strategy is proposed in the 12factor app's section on configuration. Basically it is recommended that this information be kept in the environment and not in custom python files.

In this case it would mean that, instead of several secrets.py files each holding a bunch of dictionaries with keys and strings as values, there would be several environment variables, one for each secret variable. Airflow even facilitates using this pattern for things like database connections via its connections feature.

The pattern of retrieving the secrets can also be made more flexible:

import os

var1 = os.getenv("SECRET1")  # defaults to None if SECRET1 does not exist in environment

# optionally you can specify some sensible default too
# var1 = os.getenv("SECRET1", "some_default_value")

The snippet above allows me to define just the secrets that I want to use and the code will not blow up (immediatly at least) if the other secrets are not defined.

As for the definition of the environment variables, they can be kept in a single file, which can be specific to each env, for example dev.env, staging.env, production.env. This file can be something like:

# dev.env
SECRET1=my_secret
SECRET2=other_secret

The contents of the file can then be exported to the environment using:

set -o allexport
source dev.env
set +o allexport

Or, if using docker, the docker run command supports an --env-file argument where we can specify the file.

These env files would not usually be kept in the code repository with the eventual exception of the dev file, which might make sense to keep in the repo, if it facilitates dev's setup and does not contain any truly sensitive information (for example if it uses only local database credentials)

@randomorder
Copy link
Member

See #118
We chose to use a different approach for configuration management and implemented it for S1 and S2.
Closing this for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants