Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring main for local dev #771

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

milo-hyben
Copy link
Collaborator

@milo-hyben milo-hyben commented May 28, 2024

This PR is an idea how to refactor main.py so we can run prod-pipelines locally without any changes to the code.
This would make the development and testing of production-pipelines locally much easier.
Atm developer has 2 options, comment out the prod code or create a new temp local copy of main.pay file. None of the options is ideal.

This core idea is to have 'PL_ENVIRONMENT' set to e.g. dev and then main picks development version of WORKFLOWS.
If PL_ENVIRONMENT is not set it assumes production version and it picks production version of WORKFLOWS.

There are two new files:

  1. workflows_prd.py
  2. workglofs_dev.py

@MattWellie
Copy link
Contributor

MattWellie commented May 30, 2024

I feel weird about modifying the entrypoint of a pipeline which is designed to run in a container, in order that we can run it outside a container.

This does feel like a clean solution, but equally wouldn't building a docker container and using that for local testing be more appropriate, so that we know that "it works on my machine" == "it works in production". Most modern IDEs can run tests while using a python environment inside a container AFAIK (though I can't remember if that's a paid/premium feature, it's been a while since I used it).

Config would also have to be altered so that the pipeline runs in a local mode...

I'm also not sure exactly what changes are being tested in the situation this PR addresses, so perhaps that wouldn't be a good solution.

@jmarshall
Copy link
Contributor

I don't especially like the proposed solution — as it seems to lead to duplication, for one thing — and I don't really see what it buys you: if you were working on one particular workflow, you're still going to have to edit workflows_dev.py to contain (only?) that workflow. That's basically equivalent to commenting things out in main.py, so it doesn't seem like a big win.

But more importantly I still don't understand what the problem this is trying to solve is.

The example given in the meeting was ”it'd be nice not to have to have hail installed locally to do prod-pipes development”. But prod-pipes is very very bound into hail, so I'm not sure what workflows you could be working on locally that wouldn't need to be able to import hailtop.batch. (Moreover main.py imports cpg_workflows which imports cpg_utils.hail_batch which imports hail — so it seems to me that you can't do anything at all in prod-pipes without the hail modules installed…)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants