Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize rules that govern environment variable inheritance and precedence #3100

Closed
MeltyBot opened this issue Jan 20, 2022 · 1 comment · Fixed by #5938
Closed

Formalize rules that govern environment variable inheritance and precedence #3100

MeltyBot opened this issue Jan 20, 2022 · 1 comment · Fixed by #5938

Comments

@MeltyBot
Copy link
Contributor

MeltyBot commented Jan 20, 2022

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3173

Originally created by @pnadolny13 on 2022-01-20 19:40:37


I've had a couple use cases where I'd like to set different environment variables at different levels. For sensitive credentials I set them in the .env while non-sensitive environment specific settings (like schema_name) get set at the meltano.yml environment level.

The functionality I want is to be able to combine both sensitive and non-sensitive variables for use in a plugin as an environment variable. A use case is in the squared repo where we have an Athena connection string used in Great Expectations and Superset:

awsathena+rest://[YOUR_AWS_KEY]:[YOUR_AWS_SECRET]@athena.us-east-2.amazonaws.com/[YOUR_SCHEMA_NAME]?s3_staging_dir=s3://[YOUR_S3_PATH]

I want to be able to set the secret keys in my .env and the path and schema in my meltano.yml environment. Right now templating is supported in meltano.yml for config values using environment variables but for some plugins, especially utilities, configs arent useful and we need to be building environment variables that the utility can access. Templating doesnt seem to be supported at the env level underneath the environment key. So right now I can set secrets in the .env and non-sensitive variables at the environment.env level in meltano.yml but I have no way of blending the two to pass along to a plugin, so I end up building the connection string multiple times in the plugins vs defining it once.

Ideas: Add an env key to plugin configs that allows you to use templating to build a new environment variable at the plugin/environment level. Or...allow templating at the current env key level based on whats in your .env file.

Update (2022-05-05)

Scope of this issue is now to:

  • Write tests based on the spec in the Miro Board
  • Fix anything that fails tests
  • Update documentation to clarify inheritance

Update (2022-05-03)

Relevant Miro Board where we're whiteboarding all of this

Spec Update (2022-04-08)

Note: this updated spec follows from my comment here: https://gitlab.com/meltano/meltano/-/issues/3173#note_817409586

Problem statement:

Currently, env variables at different levels of the code cannot necessarily rely on each other or build upon each other.

Proposed solution:

We would define environment variable inheritance to be valid across these layers, in this strict order:

  • Level 1: terminal context (inherited either via .env or via shell env vars)
  • Level 2: top-level env: declarations within meltano.yml
  • Level 3: environment-level env: declarations.
  • Level 4: env: declarations within top-level plugin definitions.
  • Level 5: env: declarations within environment-scoped plugin definitions.

Example

This shows how inheritance would be expected to function across the 5 named levels. (Could be a unit test during development.)

.env

# Level 1: terminal context
# Inherits from none
LEVEL_NUM="1"
STACKED="1"
env:
  # Level 2: top-level `env:`
  # Inherits from terminal context
  LEVEL_NUM: "2"                  #  '2'
  STACKED: "${STACKED}2"          # '12'
plugins:
  extractors:
    tap-foobar:
      env:
        # Level 4: plugin-level `env:`
        # Inherits from a environment-level `env:` if an environment is active
        # Inherits directly from top-level `env:` if no environment is active
        LEVEL_NUM: "4"            #    '4'
        STACKED: "${STACKED}4"    # '1234'
environments:
  prod:
    env:
      # Level 3: environment-level `env:`
      # Inherits from top-level `env:`
      LEVEL_NUM: "3"              #   '3'
      STACKED: "${STACKED}3"      # '123'
  plugins:
    extractors:
      tap-foobar:
        env:
          # Level 5: environment-level plugin `env:`
          # Inherits from (global) plugin-level `env:`
          LEVEL_NUM: "5"          #     '5'
          STACKED: "${STACKED}5"  # '12345'

FAQ

Q. Why does the global plugin level config (level 4) inherit from the environment-level config?

A: This is counterintuitive at first but actually quite necessary.

Imagining that the plugin settings globally include a call to $SNOWFLAKE_USERNAME and imagine that each environment has its own value for SNOWFLAKE_USERNAME: ___ in the respective env:. We don't want each environment to have to redundantly declare the plugin properties but we do want to use the $SNOWFLAKE_USERNAME value that is most appropriate from the given execution context. Hence, the global-level plugin can be defined once to leverage whatever value is in $SNOWFLAKE_USERNAME and the plugin's configuration will still be seeded properly according to which environment is active.

Providing level info for research/debugging

Perhaps in the dry-run proposal in #3146, it would be good to print where each environment variables are sourced. We will need to do this without exposing sensitive values.

So, if the example above were real, we might get debug info such as:

meltano run --dry tap-foobar target-snowflake
Printing debug info in response to `--dry` CLI argument. No actions will be executed.

The plugin `tap-foobar` was declared with the following context:

- env:LEVEL_NUM - derived from the following, in order of precedence:
    - environments[prod].plugins[tap-foobar].env[LEVEL_NUM]
    - plugins[tap-foobar].env[LEVEL_NUM]
    - environments[prod].env[LEVEL_NUM]
    - env[LEVEL_NUM]
    - shell.env[LEVEL_NUM]
...

Or perhaps a more realistic example:

meltano run --dry tap-foobar target-snowflake
Printing debug info in response to `--dry` CLI argument. No actions will be executed.

The plugin `tap-foobar` was declared with the following context:

- 'tap-foobar.config[username]' was derived from the following, in order of precedence:
    - environments[prod].env[FOOBAR_USERNAME]
    - env[FOOBAR_USERNAME]
...

In the above example, the 'prod' declaration of FOOBAR_USERNAME clobbers the global one. The user can infer from this printout that their prod-level config is working correctly, and if they were to remove the prod-level config, they'd get the value of the top-level env: declaration.

Related Topics

More discussion on this can and probably should be moved to a follow on issue, but for completeness:

Further detail and settings-level env: declarations (level 6)

Not listed above, but a plugin setting may have env: declared, which operates differently from the other layers. This plugin.settings[].env: declaration simply accepts a string as env variable name, and the specified setting value will get injected and sent into the plugin's subprocess.

  • This functionality is required in order to have plugins behave as expected. Non-singer plugins don't receive a config.json so the env variable injection is the substitute.
  • I think the env: function at the setting layer should always be a pure child process injection (clobbering any existing env var but not reading from it).
  • We should probably stop this layer 6 from also sourcing data from layers 1-5, and instead make it a pure one-way injection into the plugin context. As described in #3447
  • I do not know if the current status quo for an existing env variable value will clobber an explicitly defined config value, or if the config value would clobber the env variable value, essentially replacing the env var value within the plugin's child process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants