Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to set environment variables for a pipeline #181

Closed
roll opened this issue Mar 16, 2020 · 10 comments · Fixed by #182
Closed

Ability to set environment variables for a pipeline #181

roll opened this issue Mar 16, 2020 · 10 comments · Fixed by #182

Comments

@roll
Copy link
Member

roll commented Mar 16, 2020

Overview

Recently, I've added support for TABLESCHEMA_PRESERVE_MISSING_VALUES env var to tableschema-py - https://github.com/frictionlessdata/tableschema-py#experimental - which can be useful for some use cases.

I propose we have a standard way to declare environment variables for a pipeline as it implemented for many other declarative formats like docker-compose.yml, travis.yml, etc

So we can have something like this:

temporal:
  title: temporal
  description: "temporal format"
  enviroment:
    DEBUG: True
  pipeline:

  - run: load
    parameters:
      from: 'temporal.csv'
      override_fields:
        date:
          outputFormat: '%m/%d/%Y'

  - run: dump_to_path
    parameters:
      out-path: 'output'
      pretty-descriptor: true
      temporal_format_property: outputFormat
@roll
Copy link
Member Author

roll commented Mar 16, 2020

@akariv
WDYT? Does it make sense?

@roll roll added this to Backlog (ordered by priority) in Pilot with BCO-DMO Mar 16, 2020
@roll roll moved this from Backlog (ordered by priority) to Waiting a review from Data Flows in Pilot with BCO-DMO Mar 18, 2020
@akariv
Copy link
Member

akariv commented Apr 6, 2020

I have to admit I don't necessarily see the use case here that is somewhere in between passing parameters to a processor and using actual environment variables.

passing common parameters to all processors might be achieved more elegantly by creating 'global parameters' which are then passed to all processors, updating the per-processor parameters (elegance is debatable, of course 😄 ).

e.g

temporal:
  title: temporal
  description: "temporal format"
  parameters:
    debug: True
  pipeline:
  - run: load
    parameters:
      from: 'temporal.csv'
      override_fields:
        date:
          outputFormat: '%m/%d/%Y'

  - run: dump_to_path
    parameters:
      out-path: 'output'
      pretty-descriptor: true
      temporal_format_property: outputFormat

Is there any other use case here other than controlling the FD libraries behavior (I'm honestly asking here)?

@roll
Copy link
Member Author

roll commented Apr 7, 2020

I don't know other use cases but I think this feature still can be general if we think of something like providing env vars for underlying aws library or requests etc

If there are other ways to make it work it should be fine for BCO-DMO. If we could have a custom processor setting env vars it would be enough but I guess it's not possible from a processor

The main goal of this proposal is to make the output of the DPP UI (BCO-DMO are working on) reproducible on CLI. So inside their service, they can set env vars by themselves but outputted DPP specs are going to be run in uncontrolled environments.

@roll roll added wip and removed wip labels Apr 20, 2020
@roll
Copy link
Member Author

roll commented May 19, 2020

Hi @akariv,

sorry I didn't understand it completely. Are you against this change? Could you please elaborate?

In general, I see this as kind logical because env variable managements is available in many similar specs like Travis, Docker Compose etc

@akariv
Copy link
Member

akariv commented May 20, 2020

Hey @roll - given a 2nd thought, I'm okay with this proposal.

@roll
Copy link
Member Author

roll commented Jun 21, 2020

Cool @akariv. Are you happy with this PR - #182?

@roll
Copy link
Member Author

roll commented Jun 21, 2020

DONE (ready to merge in #182)

@roll roll closed this as completed Jun 21, 2020
Pilot with BCO-DMO automation moved this from Waiting a review from Data Flows to Done Jun 21, 2020
@cschloer
Copy link
Contributor

Can a release be created for this update? Thanks.

@roll
Copy link
Member Author

roll commented Jul 28, 2020

Hi @akariv could you please release?

@roll
Copy link
Member Author

roll commented Jul 30, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
3 participants