Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you evaluate python code within dvc.yaml (any plans for dvc.py)? #6512

Closed
asmith26 opened this issue Aug 31, 2021 · 3 comments
Closed

Can you evaluate python code within dvc.yaml (any plans for dvc.py)? #6512

asmith26 opened this issue Aug 31, 2021 · 3 comments

Comments

@asmith26
Copy link

Hi there,

I have a python config/templating file, which is used by dvc.yaml for a number of variables. I believe you cannot evaluate any python code within config.py from dvc.yaml - e.g. I was wondering if dvc.yaml can evaluate python so that I could use something like

config.py

from pathlib import Path

raw_data_dir = Path("raw")
processed_data_dir = Path("processed")
processed_data_filename =  "data.csv"
processed_data_path = raw_data_dir / processed_data_filename
model_name = "model.pkl"

dvc.yaml

vars:
  - config.py

stages:
  preprocess:
    cmd: python preprocess.py
    deps:
      - ${raw_data_dir}
    outs:
      - ${processed_data_path}

  train:
    cmd: python train.py
    deps:
      - ${processed_data_path}
    outs:
      - ${model_name}

Can you evaluate python code within dvc.yaml (any plans for dvc.py)? Alternatively, like params can now be a Python file, I'm curious if you have any plans to support something like dvc.py (i.e a python file replacement for dvc.yaml that could potentially evaluate python code)?

Just some thoughts - I welcome any thoughts you may have on any of this and many thanks for these amazing libraries! :)

@skshetry
Copy link
Member

A workaround to this is wrapping dvc repro/exp run into a script that dumps the params from the script to the params.yaml file and then runs dvc repro.

In dvc.yaml file, we can formalize this as a hook that does this before running repro.

dvc.py like discussion is here: #5646. Although we had a few discussions about this, there has not been strong interest in the community.

@asmith26
Copy link
Author

Many thanks for the reply @skshetry, very useful info and thanks for the workaround.

@RomanSteinberg
Copy link

RomanSteinberg commented Sep 3, 2021

@asmith26 I think it is not a good way to handle constants usage in project. I want to suggest another way (better way in my opinion):

  1. Create config.yaml with constants.
  2. Create config.py to parse it and store in the singleton class. Use lazy loading to load this file in a class fields only once.
  3. Every script in your project calls this singleton class to provide constants.

Advantages:

  • constants in one place;
  • constants separated from code;
  • every part of your code could access constants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants