Formalize rules that govern environment variable inheritance and precedence #3100

MeltyBot · 2022-01-20T19:40:37Z

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3173

Originally created by @pnadolny13 on 2022-01-20 19:40:37

I've had a couple use cases where I'd like to set different environment variables at different levels. For sensitive credentials I set them in the .env while non-sensitive environment specific settings (like schema_name) get set at the meltano.yml environment level.

The functionality I want is to be able to combine both sensitive and non-sensitive variables for use in a plugin as an environment variable. A use case is in the squared repo where we have an Athena connection string used in Great Expectations and Superset:

awsathena+rest://[YOUR_AWS_KEY]:[YOUR_AWS_SECRET]@athena.us-east-2.amazonaws.com/[YOUR_SCHEMA_NAME]?s3_staging_dir=s3://[YOUR_S3_PATH]

I want to be able to set the secret keys in my .env and the path and schema in my meltano.yml environment. Right now templating is supported in meltano.yml for config values using environment variables but for some plugins, especially utilities, configs arent useful and we need to be building environment variables that the utility can access. Templating doesnt seem to be supported at the env level underneath the environment key. So right now I can set secrets in the .env and non-sensitive variables at the environment.env level in meltano.yml but I have no way of blending the two to pass along to a plugin, so I end up building the connection string multiple times in the plugins vs defining it once.

Ideas: Add an env key to plugin configs that allows you to use templating to build a new environment variable at the plugin/environment level. Or...allow templating at the current env key level based on whats in your .env file.

Update (2022-05-05)

Scope of this issue is now to:

Write tests based on the spec in the Miro Board
Fix anything that fails tests
Update documentation to clarify inheritance

Update (2022-05-03)

Relevant Miro Board where we're whiteboarding all of this

Spec Update (2022-04-08)

Note: this updated spec follows from my comment here: https://gitlab.com/meltano/meltano/-/issues/3173#note_817409586

Problem statement:

Currently, env variables at different levels of the code cannot necessarily rely on each other or build upon each other.

Proposed solution:

We would define environment variable inheritance to be valid across these layers, in this strict order:

Level 1: terminal context (inherited either via .env or via shell env vars)
Level 2: top-level env: declarations within meltano.yml
Level 3: environment-level env: declarations.
Level 4: env: declarations within top-level plugin definitions.
Level 5: env: declarations within environment-scoped plugin definitions.

Example

This shows how inheritance would be expected to function across the 5 named levels. (Could be a unit test during development.)

.env

# Level 1: terminal context
# Inherits from none
LEVEL_NUM="1"
STACKED="1"

env:
  # Level 2: top-level `env:`
  # Inherits from terminal context
  LEVEL_NUM: "2"                  #  '2'
  STACKED: "${STACKED}2"          # '12'
plugins:
  extractors:
    tap-foobar:
      env:
        # Level 4: plugin-level `env:`
        # Inherits from a environment-level `env:` if an environment is active
        # Inherits directly from top-level `env:` if no environment is active
        LEVEL_NUM: "4"            #    '4'
        STACKED: "${STACKED}4"    # '1234'
environments:
  prod:
    env:
      # Level 3: environment-level `env:`
      # Inherits from top-level `env:`
      LEVEL_NUM: "3"              #   '3'
      STACKED: "${STACKED}3"      # '123'
  plugins:
    extractors:
      tap-foobar:
        env:
          # Level 5: environment-level plugin `env:`
          # Inherits from (global) plugin-level `env:`
          LEVEL_NUM: "5"          #     '5'
          STACKED: "${STACKED}5"  # '12345'

FAQ

Q. Why does the global plugin level config (level 4) inherit from the environment-level config?

A: This is counterintuitive at first but actually quite necessary.

Imagining that the plugin settings globally include a call to $SNOWFLAKE_USERNAME and imagine that each environment has its own value for SNOWFLAKE_USERNAME: ___ in the respective env:. We don't want each environment to have to redundantly declare the plugin properties but we do want to use the $SNOWFLAKE_USERNAME value that is most appropriate from the given execution context. Hence, the global-level plugin can be defined once to leverage whatever value is in $SNOWFLAKE_USERNAME and the plugin's configuration will still be seeded properly according to which environment is active.

Providing level info for research/debugging

Perhaps in the dry-run proposal in #3146, it would be good to print where each environment variables are sourced. We will need to do this without exposing sensitive values.

So, if the example above were real, we might get debug info such as:

meltano run --dry tap-foobar target-snowflake
Printing debug info in response to `--dry` CLI argument. No actions will be executed.

The plugin `tap-foobar` was declared with the following context:

- env:LEVEL_NUM - derived from the following, in order of precedence:
    - environments[prod].plugins[tap-foobar].env[LEVEL_NUM]
    - plugins[tap-foobar].env[LEVEL_NUM]
    - environments[prod].env[LEVEL_NUM]
    - env[LEVEL_NUM]
    - shell.env[LEVEL_NUM]
...

Or perhaps a more realistic example:

meltano run --dry tap-foobar target-snowflake
Printing debug info in response to `--dry` CLI argument. No actions will be executed.

The plugin `tap-foobar` was declared with the following context:

- 'tap-foobar.config[username]' was derived from the following, in order of precedence:
    - environments[prod].env[FOOBAR_USERNAME]
    - env[FOOBAR_USERNAME]
...

In the above example, the 'prod' declaration of FOOBAR_USERNAME clobbers the global one. The user can infer from this printout that their prod-level config is working correctly, and if they were to remove the prod-level config, they'd get the value of the top-level env: declaration.

Related Topics

More discussion on this can and probably should be moved to a follow on issue, but for completeness:

Further detail and settings-level `env:` declarations (`level 6`)

Not listed above, but a plugin setting may have env: declared, which operates differently from the other layers. This plugin.settings[].env: declaration simply accepts a string as env variable name, and the specified setting value will get injected and sent into the plugin's subprocess.

This functionality is required in order to have plugins behave as expected. Non-singer plugins don't receive a config.json so the env variable injection is the substitute.
I think the env: function at the setting layer should always be a pure child process injection (clobbering any existing env var but not reading from it).
We should probably stop this layer 6 from also sourcing data from layers 1-5, and instead make it a pure one-way injection into the plugin context. As described in #3447
I do not know if the current status quo for an existing env variable value will clobber an explicitly defined config value, or if the config value would clobber the env variable value, essentially replacing the env var value within the plugin's child process.

The text was updated successfully, but these errors were encountered:

MeltyBot · 2022-05-30T10:08:46Z

View 37 previous comments from the original issue on GitLab

MeltyBot assigned cjohnhanson May 30, 2022

cjohnhanson mentioned this issue May 31, 2022

Rename of internal (OLTP) "Job" tables and columns #2035

Closed

2 tasks

aaronsteers linked a pull request Jun 1, 2022 that will close this issue

Add tests and docs for environment variable resolution. #5938

Merged

6 tasks

cjohnhanson closed this as completed in #5938 Jun 2, 2022

This was referenced Jun 2, 2022

Update terminal environment variable precedence to be in line with currently xfail-ing tests #5982

Closed

Update environment variable inheritance to be in line with currently xfail-ing tests #5983

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formalize rules that govern environment variable inheritance and precedence #3100

Formalize rules that govern environment variable inheritance and precedence #3100

MeltyBot commented Jan 20, 2022 •

edited

MeltyBot commented May 30, 2022

Formalize rules that govern environment variable inheritance and precedence #3100

Formalize rules that govern environment variable inheritance and precedence #3100

Comments

MeltyBot commented Jan 20, 2022 • edited

Update (2022-05-05)

Update (2022-05-03)

Spec Update (2022-04-08)

Problem statement:

Proposed solution:

Example

FAQ

Providing level info for research/debugging

Related Topics

Further detail and settings-level env: declarations (level 6)

MeltyBot commented May 30, 2022

MeltyBot commented Jan 20, 2022 •

edited

Further detail and settings-level `env:` declarations (`level 6`)