-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc.yaml: add matrix-do #9725
dvc.yaml: add matrix-do #9725
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9725 +/- ##
==========================================
- Coverage 90.45% 90.45% -0.01%
==========================================
Files 482 483 +1
Lines 36719 36797 +78
Branches 5299 5309 +10
==========================================
+ Hits 33213 33283 +70
- Misses 2900 2907 +7
- Partials 606 607 +1
☔ View full report in Codecov by Sentry. |
Do you think it makes sense to keep |
We did not implement matrix initially, because we thought I am not sure if a lot has changed in the past 2 years. Maybe GitHub Actions has standardized the syntax/concepts now? So I am not sure about foreach in the short term. But in the future, if the users don't have difficulty picking it up, we can think of deprecating foreach. |
Thanks @skshetry! Looks really nice, and appreciate the example. It might be good to also include an example of accessing subitems like stages:
stage1:
matrix:
os: [windows, linux, macos]
pyv: ["3.9", "3.10", "3.11"]
dict:
- arg1: 1
- arg2: 2
lst:
- [1, 2]
- [3, 4]
model:
- epochs: 3
thresh: 10
- epochs: 10
thresh: 15
do:
cmd: echo ${item.os} ${item.pyv} ${item.dict} ${item.lst.0} ${item.lst.1} --model epochs=${item.model.epochs},thresh=${item.model.thresh} |
My feeling is that having both will be more complex and confusing than asking users to learn |
I would personally prefer to only document |
I would prefer waiting a couple of months before we do that. If we do want to promote matrix, and deprecate foreach soon, let's scrutinize the syntax a bit more, as we don't really need to do things the way
Maybe the current way is fine, but I do want to touch on these issues and discuss explicitly. |
Great points, thanks for raising them.
Since matrix requires a dict, I like the simplicity of this, although we would need to handle conflicts with other vars.
The simplest solution would be to get rid of the stages:
stage1:
matrix:
os: [windows, linux, macos]
pyv: ["3.9", "3.10", "3.11"]
dict:
- arg1: 1
- arg2: 2
lst:
- [1, 2]
- [3, 4]
cmd: echo ${item.os} ${item.pyv} ${item.dict} ${item.lst.0} ${item.lst.1} |
Wanted to chime in with a random thought that came to me just now. Imagine something like this: # params.yaml
foo:
bar: [a,b,c,d,e]
baz: [1,2,3] # dvc.yaml
stages:
foo_bar:
matrix:
first: ${foo.bar}
second: [x, y, z]
<<: &stage_foo
cmd: ...
...
foo_baz:
matrix:
first: ${foo.baz}
second: [x, y, z]
<<: *stage_foo Could there would be a way to specify such a "ragged matrix" as a single
|
Good point that this is essentially just concatenating the two lists. I wonder if it could be accomplished using hydra. |
tests/func/parsing/test_matrix.py
Outdated
data = { | ||
"matrix": matrix, | ||
"do": { | ||
"cmd": "echo ${item.os} ${item.pyv} ${item.dict}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q) Is dict unpacking expected/supposed to work here (i.e. echo ${item]
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a blocker, I think would be good to add a test for that
@skshetry Are you okay with the suggestions in #9725 (comment)? Once we agree that, it's ready from a product perspective. |
On both variable addressing, and
I think the arguments are weak on both sides, but the grouping under
On both of these issues, I don't have a strong opinion. But I do want to hear pros and cons of those changes before deciding on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @skshetry, agreed neither are true blockers, so I'm approving.
On both variable addressing, and
matrix..do
block, one advantage that I see with what you proposed, is that it makes it easy to convert a templated stage to a matrix.
Yes, I think it's enough reason to at least drop do
since it saves you from having to make significant modifications to even a non-templated stage.
Maybe is just a plus for me, but on the side of grouping I like the possibility of using dict unpacking on |
@daavoo I assume you would be fine with either |
yes |
One argument in favour of matrix:
os: [windows, linux, macos]
pyv: ["3.9", "3.10", "3.11"] You access Is that enough of a reason to move away from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's agree on the name/syntax (no strong opinion from me) and remember to update https://github.com/iterative/dvcyaml-schema
cc @iterative/vs-code for awareness of something new worth adding to snippets
Personally I always found |
Dropping |
I think the simplest thing to do in VS Code is to drop the I think we can do this because We can add the new snippet once some more time has passed. LMK what you think. |
I'll leave this up for comments for a few days, and will merge by the end of this week. |
Thanks @skshetry! |
fixes #4741 upstream PR: iterative/dvc#9725 available since https://github.com/iterative/dvc/releases/3.12.0
fixes #4741 upstream PR: iterative/dvc#9725 available since https://github.com/iterative/dvc/releases/3.12.0
fixes #4741 upstream PR: iterative/dvc#9725 available since https://github.com/iterative/dvc/releases/3.12.0
fixes #4741 upstream PR: iterative/dvc#9725 available since https://github.com/iterative/dvc/releases/3.12.0
* guide: document matrix in dvc.yaml fixes #4741 upstream PR: iterative/dvc#9725 available since https://github.com/iterative/dvc/releases/3.12.0 * Apply suggestions from code review Co-authored-by: Dave Berenbaum <dave@iterative.ai> * Restyled by prettier (#4765) Co-authored-by: Restyled.io <commits@restyled.io> * redirect from foreach to matrix; add a complete example for matrix with templating * format matrix stage list --------- Co-authored-by: Dave Berenbaum <dave@iterative.ai> Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <commits@restyled.io>
* schema for matrix See iterative/dvc#9725. * add invalid examples
Introduces
matrix:do
, fixes #5172.DVC will build the name for the generated stages based on their variables.
If the variables are numbers, strings, booleans, etc, they will use their string representation, and for others, dvc will use the variable name and index.
For example, for above,
os
andpyv
will use their respective values, whereasdict
/lst
will use their variable name plus index. For example, one of the instance will look like follows;This name will be it's address (i.e. how you invoke it in
dvc repro
and other places), and it's identity. Likeforeach:do
, you can invoke the whole group with justdvc repro stage1
as well.Example output of above