Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify "setting.aliases" behavior - in relation to previous "env_alias" #3135

Closed
MeltyBot opened this issue Feb 2, 2022 · 10 comments · Fixed by #6278
Closed

Clarify "setting.aliases" behavior - in relation to previous "env_alias" #3135

MeltyBot opened this issue Feb 2, 2022 · 10 comments · Fixed by #6278

Comments

@MeltyBot
Copy link
Contributor

MeltyBot commented Feb 2, 2022

Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3209

Originally created by @aaronsteers on 2022-02-02 21:42:27


As raised today (2022-02-02) in office hours, there is a benefit to env_aliases in that they can be used to ease migration across forks of the same tap or target. I wanted to propose, along the same vein, a setting_alias which would work as follows: We have settings.aliases capability which in theory could work at a higher abstraction level:

  • The alias would be referenceable in meltano.yml config: blocks instead of the original name.
  • The alias could be used in env variable injection so that TARGET_SNOWFLAKE_ACCOUNT could be used in place of TARGET_SNOWFLAKE_SNOWFLAKE_ACCOUNT.
  • If both alias and the original name are set, this would be interpreted as a conflict. A hard failure would be raised: e.g.: The setting 'SNOWFLAKE_ACCOUNT' was provided along with its alias name 'ACCOUNT', which is not permitted. Please specify 'SNOWFLAKE_ACCOUNT' or 'ACCOUNT' but not both.

Caveats

  • As we're trying to move towards getting most/all config info from the hub (reducing dependency), this could be in conflict with that direction.
  • Ideally, this would live in the hub definitions, rather than in discovery.yml, for the above reason.

Potential added benefit for user onboarding and switching between variants

There could be added benefit to aliases, in that certain cross-implementation representations could be implemented by Meltano (and other orchestrators) generically. Many taps have common config settings but are each subject to variations in naming.

Example config options which would make good candidates for generic and cross-implementation aliases:

  • account
  • username
  • password
  • port
  • auth_token
  • refresh_token
  • client_secret
  • host
  • s3_bucket
  • hard_delete
  • target_schema_name
  • target_db_name

If we allowed the Hub entry to declare any of these aliases/forms of settings in the hub metadata, then orchestrators could leverage this info, and also (back to one of the original benefits of env_aliases), users could swap between variants without having to fully rewrite their config - at least not for those core elements that variants would have in common.

Benefits for dbt abstraction

Downstream tools like dbt generally need a small number of variables related to where the data will land. Having common aliases for these functionally-significant parameters would allow Meltano to inject values and/or read values from those important settings. So, for instance, Meltano could know which setting names have the aliases target_schema_name and target_db_name, then we can use that to pass the values along to dbt for identifying the source of a transformation.


Update 2022-06-03

The scope of this is to clarify the behavior with setting aliases. Discussed in the GitLab issue we decided on:

  • users can set config via the setting name and any of the values in the setting aliases array
  • we would hard fail if multiple are set, but accept any of them
  • environment variable - accept any of the options, fail if mutliple are present
    • PLUGIN_NAME_SETTING_NAME
    • PLUGIN_NAME_SETTING_ALIAS_1
    • PLUGIN_NAME_SETTING_ALIAS_2
@MeltyBot
Copy link
Contributor Author

@aaronsteers
Copy link
Contributor

@tayloramurphy - This doesn't appear to be critical for the launch. Can we push this back to Iteration 3?

@tayloramurphy
Copy link
Collaborator

@aaronsteers I've updated the issue with the scope we decided in the GitLab issue. If those changes don't require a breaking change then I think we could release it in a 2.1. What say you?

@tayloramurphy tayloramurphy changed the title Use "setting.aliases" as alternative to "env_alias" Clarify "setting.aliases" behavior - in relation to previous "env_alias" Jun 3, 2022
@DouweM
Copy link
Contributor

DouweM commented Jun 7, 2022

The scope of this is to clarify the behavior with setting aliases. Discussed in the GitLab issue we decided on:

@tayloramurphy To confirm, we will only read from alias env vars, not write them into the execution environment as was suggested in #3433?

That would mean we still have this issue when migrating from one variant to another: https://gitlab.com/meltano/meltano/-/issues/3518#note_959538073, that I also ran into in meltano/files-dbt#16 (comment). But perhaps that's not common enough to support.

@DouweM
Copy link
Contributor

DouweM commented Jun 7, 2022

@tayloramurphy I ran into what I think is a bug with alias: the target-postgres variant transferwise has a default_target_schema setting, with alias: [schema]. meltano config target-postgres set schema foo adds schema: foo to meltano.yml instead of default_target_schema: foo, which is not what i expected, but probably fine.

Then when I run meltano config target-postgres it has both schema: "foo" and default_target_schema: "foo", which is definitely NOT expected: it should only pass default_target_schema to target-postgres, as it doesn't actually know schema.

Similarly, meltano config target-postgres list lists schema under "Custom, possibly unsupported", even though it's an official alias that we shouldn't confuse the user about.

@tayloramurphy
Copy link
Collaborator

To confirm, we will only read from alias env vars,

That would be the implementation plan, yes. But the goal with aliases would be to provide that layer so that when you have

 - name: default_target_schema
   label: Default Target Schema
   aliases: [schema]

you could set $TARGET_SNOWFLAKE_SCHEMA and it should work. If you set both $TARGET_SNOWFLAKE_SCHEMA and $TARGET_SNOWFLAKE_DEFAULT_TARGET_SCHEMA it would hard fail because both are detected. So the aliases are doing what folks would want. Adding this to Meltano fell out of the 2.0 scope unfortunately.

I ran into what I think is a bug with alias

@DouweM that's definitely a bug!

@cjohnhanson
Copy link
Contributor

cjohnhanson commented Jun 16, 2022

@tayloramurphy and @aaronsteers -- After spending some time diving into this and testing out the behavior this afternoon, I think I may have lost track of this issue in the GitHub migration and it actually is resolved already.

Everything seems to work basically as outlined in the scope that's in the issue description right now, I think this was for the most part resolved in this PR, which was migrated over from GitLab and which also had some changes for another issue.

So should I consider the scope of this issue just to resolve the bug that @DouweM pointed out and ensure that aliases don't get passed through as environment variables to the execution environment for plugins?

@DouweM
Copy link
Contributor

DouweM commented Jun 17, 2022

@cjohnhanson To be clear, I think the behaviors of meltano config, meltano config set and meltano config list in #3135 (comment) are bugs, but I don't actually care that much about exposing the aliases in the environment as I referenced in #3135 (comment). It's more clear to the user if the aliases are only respected on the input side (when the user provides config to Meltano), not the output (when Meltano passes config to the plugin).

Defer to @tayloramurphy for the actual spec, though.

@tayloramurphy
Copy link
Collaborator

@cjohnhanson I tested this a bit locally on 2.0.3.

plugins:
  extractors:
  - name: tap-github
    variant: singer-io
    pip_url: tap-github
    settings:
      - name: default_target_schema
        label: Default Target Schema
        aliases: [schema, other_env]

meltano config tap-github list gives default_target_schema [env: TAP_GITHUB_DEFAULT_TARGET_SCHEMA, TAP_GITHUB_SCHEMA, TAP_GITHUB_OTHER_ENV] current value: None (default) which is what I would expect. Setting any of those env_vars individually works as you would expect:

❯ export TAP_GITHUB_DEFAULT_TARGET_SCHEMA="1"

then running config again shows default_target_schema [env: TAP_GITHUB_DEFAULT_TARGET_SCHEMA, TAP_GITHUB_SCHEMA, TAP_GITHUB_OTHER_ENV] current value: '1' (from the environment) which is right but it would be nice if it told you which env_var was actually set (probably worth a separate isssue).

If I then set ❯ export TAP_GITHUB_SCHEMA="2" and run config again I get Conflicting values for setting found in: (<generator object BaseEnvStoreManager.get.<locals>.<genexpr> at 0x104715e40>,) which is technically correct behavior in that I now have 2 env_Vars set and they're in conflict, but if I set ❯ export TAP_GITHUB_SCHEMA="1" then config shows

default_target_schema [env: TAP_GITHUB_DEFAULT_TARGET_SCHEMA, TAP_GITHUB_SCHEMA, TAP_GITHUB_OTHER_ENV] current value: '1' (from the environment)

Which is not ideal behavior. In reality we have 2 environment variables set and the behavior should be to compare if multiple of the environment variables are set regardless of their actual values.

In testing Douwe's comment in #3135 (comment) I think that is actually sorted out and the behavior around where a setting actually gets placed will be clarified in:

The scope of this issue then is the following:

  • Update the behavior to compare the environment variables and not the actual values themselves.
  • Return a better error if multiple are set
  • Update documentation to better describe aliases behavior and use cases
  • Possible if it makes sense with the code: Update the results of meltano config <plugin> list to tell you the enviornment variable when it says default_target_schema [env: TAP_GITHUB_DEFAULT_TARGET_SCHEMA, TAP_GITHUB_SCHEMA, TAP_GITHUB_OTHER_ENV] current value: '1' (from the environment) (we can spin this off into a separate issue if needed).

@cjohnhanson am I missing anything there? @aaronsteers I'd like your eyes on this as well.

@cjohnhanson
Copy link
Contributor

@tayloramurphy thanks, that totally clears things up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment