Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for env key for jobs #6386

Open
rabidaudio opened this issue Jul 8, 2022 · 10 comments
Open

feat: add support for env key for jobs #6386

rabidaudio opened this issue Jul 8, 2022 · 10 comments

Comments

@rabidaudio
Copy link
Contributor

See discussion in slack: https://meltano.slack.com/archives/C01TCRBBJD7/p1657233220239139

Consider example:

jobs:
- name: gitlab
  tasks:
  - dbt-postgres:seed tap-gitlab target-postgres dbt-postgres:snapshot dbt-postgres:run dbt-postgres:test

meltano run gitlab will run all the dbt commands for the whole project rather than allowing me to specify only the specific models that have changed (e.g. DBT_MODELS=+gitlab+).

I realize this can be solved by schedules:

schedules:
- name: gitlab
  interval: '@daily'
  job: gitlab
  env:
    DBT_MODELS: +gitlab+

However it seems to me that this information belongs to the job itself, not the schedule; that is even if I were to manually run meltano run gitlab outside of the @daily schedule, I’d still expect it to apply my configuration.

Proposal: Allow envs to be set on a job basis:

jobs:
- name: gitlab
  tasks:
  - dbt-postgres:seed tap-gitlab target-postgres dbt-postgres:snapshot dbt-postgres:run dbt-postgres:test
  env:
    DBT_MODELS: +gitlab+
    TARGET_POSTGRES_BATCH_SIZE_ROWS: 100 # could also be useful to override plugin configs

Precedence can get tricky here, but I would expect:

meltano environment < job < schedule < .env < system envrionment

@tayloramurphy
Copy link
Collaborator

https://docs.meltano.com/guide/configuration#specifying-environment-variables is the current order

- environment level plugin env # highest
- environment level env
- root level plugin env
- root level env
- .env file
- terminal env # lowest

Assuming both jobs and schedules could specify env I would expect the following:

- environment level plugin env # highest
- environment level env
- schedule level env
- job level env
- root level plugin env
- root level env
- .env file
- terminal env # lowest

my thinking is that schedules could run multiple instances of a job and one would be more inclined to override a setting for a given job on a per schedule basis rather than the other way around. Does that track?

All that said I'm in favor of adding env for both jobs and schedules (I'll open a separate issue for schedules) - the last precedence question will take a bit more debate. cc @aaronsteers

@tayloramurphy tayloramurphy changed the title Allow env vars to be set for jobs feat: add support for env key for jobs Jul 8, 2022
@rabidaudio
Copy link
Contributor Author

rabidaudio commented Jul 9, 2022

By a < b I meant b has higher precedence than a but I can see how that was ambiguous. Yes, I agree that schedules should have higher precedence than jobs.

I'm a little surprised that system env vars are overwritten by environments (for example, I might do FOO=override meltano run), but I don't think it's a problem and I'm sure y'all have considered it more than I have, and anyway that's a separate topic

@tayloramurphy
Copy link
Collaborator

I'm a little surprised that system env vars are overwritten by environments (for example, I might do FOO=override meltano run), but I don't think it's a problem and I'm sure y'all have considered it more than I have, and anyway that's a separate topic

@rabidaudio feel free to open a new issue or slack thread if you want to discuss - but what I'll quickly say is that you can still get this behavior I think if you had

environments:
  prod:
    env:
      FOO: $MY_OVERRIDE

and then set MY_OVERRIDE or not in the system env. We did try to put a lot of thought into it and it may be worth adding some of that context in https://docs.meltano.com/guide/configuration#specifying-environment-variables 🤔

@pnadolny13
Copy link
Contributor

@tayloramurphy @rabidaudio I landed on this while working on meltano/squared#289 where I was trying to get away from exporting env vars before running a command like in this case https://github.com/meltano/squared/blob/64cffc745eba6721fb97490d5aa14f6f7a970afc/data/orchestrate/dag_definition.yml#L146 when using schedules/jobs. I was a able to solve it without schedule/job env vars but was about to request the same feature before I found this issue so I was giving it some thought also:

@rabidaudio I know this is just a solution for this particular problem but what do you think about putting your dbt model selection criteria in a custom command (Squared example)? I've been exploring this pattern so that every set of dbt selection criteria is checked in as code. I know its bit me in the past when selection criteria gets really precise and I either include or exclude something I shouldnt. Did you consider this for youre use case already? Is there any cons that I might be missing other than verboseness?

@aaronsteers
Copy link
Contributor

@cjohnhanson - Assigning to you for scoping/analysis as related to #6387. Please confirm the spec - especially resolution order - before starting dev.

@cjohnhanson
Copy link
Contributor

@tayloramurphy -- there's work in progress (see #5982) to update the precedence order to be in line with this spec in Miro. New precedence order would be:

- environment level plugin env # highest
- root level plugin env 
- environment level env 
- root level env
- .env file
- terminal env # lowest

Supporting env in schedules and jobs will happen after that work is completed. So in the new precedence order, where would job and schedule envs fit? Would it be:
A)

- environment level plugin env # highest
- root level plugin env 
- schedule level env
- job level env
- environment level env # job and schedule take precedence over active environment
- root level env
- .env file
- terminal env # lowest

or B)

- environment level plugin env # highest
- root level plugin env 
- environment level env # active environment takes precedence over job and schedule
- schedule level env
- job level env
- root level env
- .env file
- terminal env # lowest

Or something else?

@tayloramurphy
Copy link
Collaborator

@cjohnhanson I think this would be the best order:

- environment level plugin env # highest
- schedule level env
- job level env
- root level plugin env 
- environment level env 
- root level env
- .env file
- terminal env # lowest

My logic here is similar to what I said in #6386 (comment) But basically, a job is a collection of plugins run at the same time and users would likely want to override plugin settings at the job level. Similarly, a schedule is a particular instance of a job and there can be multiple schedules per job, so overrides from there make sense above jobs. Then having the environment level plugin env as the highest priority is the last place you could override anything set lower in the hierarchy.

@aaronsteers would you agree on this?

@tayloramurphy
Copy link
Collaborator

@aaronsteers @cjohnhanson I'm pulling this out of the next iteration. While I think this would be a useful thing to have, I don't think we need to currently build it ourselves. Thinking through the spec discussion has been useful and clarifying in relation to #6387 but adding support for env at the jobs level isn't critical right now.

I'll mark it as accepting PR's though in case the community is interested in having this.

@stale
Copy link

stale bot commented Apr 26, 2023

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

@stale stale bot added the stale label Apr 26, 2023
@tayloramurphy tayloramurphy added the evergreen Never stale label Apr 26, 2023
@stale stale bot removed the stale label Apr 26, 2023
@edgarrmondragon
Copy link
Collaborator

More users interested in this in Slack: https://meltano.slack.com/archives/C01TCRBBJD7/p1690894749896989?thread_ts=1690894749.896989&cid=C01TCRBBJD7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants