Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment variables in config #7436

Open
dberenbaum opened this issue Mar 4, 2022 · 3 comments
Open

Environment variables in config #7436

dberenbaum opened this issue Mar 4, 2022 · 3 comments
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important

Comments

@dberenbaum
Copy link
Contributor

dberenbaum commented Mar 4, 2022

Originally posted by @drjasonharrison in #1416 (comment)

I'm just starting with DVC and there may be more correct ways to do what I have initially came up with, but since I couldn't find anything in the documentation or forums this is what I did. Context: a git repo (hosted by Bitbucket) with DVC (tracked files in an S3 bucket under a project specific directory).

  • we already have .env files for staging and production
  • our code uses variables defined in the .env files to run, build, deploy, log, etc
  • we use Bitbucket pipelines, but most of the work is done by our bash scripts
  • because we're using AWS we have on our developer machines $HOME/.aws/config and $HOME/.aws/credentials
  • these credentials are also the .env files but because they are for deployments they have names like DEPLOYMENT_AWS_ACCESS_KEY_ID and the AWS_ACCESS_KEY_ID is for run time on EC2.
  • we have multiple developers, some work on machine learning models (pytorch) and others do processing code and devops (myself), and some do machine learning and processing.

Given the above, I have written a script for deployment that assumes that the .env file has been parsed and all of the definitions are available in the current script environment. It takes the "developer centric" DVC configuration for our remote storage and converts it to use the script environment variables. I didn't see anything that explained a better way to do this and came up with this workaround to provide the credentials through environment variables:

set +u
if [[ -n "${DEPLOYMENT_AWS_ACCESS_KEY_ID}" ]]; then
    export AWS_ACCESS_KEY_ID="${DEPLOYMENT_AWS_ACCESS_KEY_ID}"
else
    echo "Warning: DEPLOYMENT_AWS_ACCESS_KEY_ID is not defined. Using AWS_ACCESS_KEY_ID" >&2
fi

if [[ -n "${DEPLOYMENT_AWS_SECRET_ACCESS_KEY}" ]]; then
    export AWS_SECRET_ACCESS_KEY="${DEPLOYMENT_AWS_SECRET_ACCESS_KEY}"
else
    echo "Warning: DEPLOYMENT_AWS_SECRET_ACCESS_KEY is not defined. Using AWS_SECRET_ACCESS_KEY" >&2
fi

if [[ -n "${DEPLOYMENT_AWS_DEFAULT_REGION}" ]]; then
    export AWS_DEFAULT_REGION="${DEPLOYMENT_AWS_DEFAULT_REGION}"
else
    echo "Warning: DEPLOYMENT_AWS_DEFAULT_REGION is not defined. Using AWS_DEFAULT_REGION" >&2
fi
set -u

REMOTE_STORAGE_PROFILE=""
REMOTE_STORAGE_CREDENTIALPATH=""

# remove the local version of remote.storage.credentialpath and use
# the environment variables this is likely only on a development machine
set +e
REMOTE_STORAGE_PROFILE="$(dvc config --project remote.storage.profile)"
REMOTE_STORAGE_CREDENTIALPATH="$(dvc config --local remote.storage.credentialpath)"
dvc config --project --unset remote.storage.profile
dvc config --local --unset remote.storage.credentialpath

echo "REMOTE_STORAGE_PROFILE = ${REMOTE_STORAGE_PROFILE}"
echo "REMOTE_STORAGE_CREDENTIALPATH = ${REMOTE_STORAGE_CREDENTIALPATH}"

set -e

dvc pull --verbose

if [[ -n "${REMOTE_STORAGE_PROFILE}" ]]; then
    # restore the value for remote.storage.profile if it was set before
    dvc config --project remote.storage.profile "${REMOTE_STORAGE_PROFILE}"
fi

if [[ -n "${REMOTE_STORAGE_CREDENTIALPATH}" ]]; then
    # restore the value for remote.storage.credentialpath if it was set before
    dvc config --local remote.storage.credentialpath "${REMOTE_STORAGE_CREDENTIALPATH}"
fi

If I could have used $HOME in my .dvc/config I could have used --project configuration everywhere. As it is each developer will need to run dvc config --local remote.storage.credentialpath "$HOME/.aws/credentials" in their working copy of the repository. I could have also created a $HOME/.aws/credentials file with the correct content in the bitbucket environment.

Instead I kind of aimed for the middle of the road, thinking that I could define the DVC remote.storage.url, remote.storage.profile, and remote.storage.credentialpath in a cross-developer way, I started down that path. But then had to remove remote.storage.profile and remote.storage.credentialpath from the DVC configuration when building on bitbucket.

@dberenbaum
Copy link
Contributor Author

@drjasonharrison I moved your request to a new issue since it's a bit different from environment variables in pipelines. Supporting environment variable expansion in config files is a potentially easier problem to solve.

@dberenbaum dberenbaum added feature request Requesting a new feature p2-medium Medium priority, should be done, but less important labels Mar 4, 2022
@d-walkama
Copy link

I'd like to request this feature as well.

We are using an NFS file system and as a workaround for slow md5 checksum calculations have set the index and state dirs to be in /tmp/dvc/. The issue is that dvc creates this directory and makes it read only for other users so we cannot share repos.

A solution would be to have something like:


[index]
dir = /tmp/dvc_${USER}/index
[state]
dir = /tmp/dvc_${USER}/state

work in the config file. I have tried this and it creates two directories, one with the variable expanded and one without.

@mvonpohle
Copy link

I'd also benefit from this. We're using a mapped Onedrive folder for our DVC remote. But the local path to that mapped folder contains the $USER or $HOME path which mean you have to have a different "version" of the remote for each user. DVC being able to resolve an env variable would be really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

3 participants