Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated utils code #55

Closed
michelole opened this issue Dec 2, 2021 · 5 comments
Closed

Duplicated utils code #55

michelole opened this issue Dec 2, 2021 · 5 comments

Comments

@michelole
Copy link

Some of the src/utils.py code seems to be duplicated inside the operations folder:

This is a bit confusing when deploying the template.

@FlorianPydde
Copy link
Collaborator

Michel you are absolutely right. There are nevertheless slight differences in the code logic for retrieve_workspace for instance.
When we run code from the operation folder, we assume the code will be run in a devops pipeline, which means we will use Service Principals to connect to the AML workspace. Whereas, when running the code from src, the scripts will either be run locally or on a experiment run.

There was a similar discussion here. The problem is not so straightforward to solve but we are eager to get your inputs 😃 Do you have any suggestions ?

@mariamedp
Copy link
Member

I agree as well. The main reason that's preventing us from putting everything together is that only the src/ folder is uploaded to the compute cluster for execution during training/batch inference, and hence we need some AML utilities there. Maybe we can look at simplifying src/utils.py at maximum? And/or change the function names to make them more descriptive of the differences?

@michelole michelole changed the title Duplicate utils code Duplicated utils code Dec 2, 2021
@michelole
Copy link
Author

Since this also seems to be duplicated among templates (and potentially among their implementations), I'd favor moving this upstream.

Maybe we should then keep the discussion on the linked thread? Tentatively closing then.

@michelole
Copy link
Author

If there's a decision not to upstream it, we could explore removing the duplicate at least in this repo by:

  1. declaring and building a package with all "utils" code
  2. publishing it via a private Python package
  3. installing it via pip/conda in the remote compute instance.

@FlorianPydde
Copy link
Collaborator

The private package option would be a good alternative to reduce code duplication. Nevertherless, there is work in progress for the next release of the aml sdk (here an example regarding pipelines: azureml-previewss) which will probably require some adjustments to the solution. Hence, I would suggest keeping it simple and accessible (vs a package) even though it means having some duplicated code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants