Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

describe "meta" field in stage file and that we are preserving comments #306

Open
efiop opened this issue May 9, 2019 · 4 comments

Comments

Projects
None yet
4 participants
@efiop
Copy link
Member

commented May 9, 2019

@algomaster99

This comment has been minimized.

Copy link
Collaborator

commented May 10, 2019

@efiop @shcheklein Can I pick this up now? Please brief me with what exactly has to be done. :)

@shcheklein

This comment has been minimized.

Copy link
Member

commented May 10, 2019

Hi @algomaster99, sure! So, here is the section that should be reviewed and updated is this one https://dvc.org/doc/user-guide/dvc-file-format (btw, @efiop does it make sense to include a comment with a link to this doc by default, when we generate the stage file). It now describes the schema of the file, and until that fix by @Suor DVC was throwing an exception if there were fields with an unknown names. Also DVC was removing any comments in those stage files before, every time you run the dvc repro and files are being updated, while the only thing that should have been updated were md checksums.

After the fix DVC now preserves comments between runs and any other updates it makes to .dvc files. And you can use meta: to put any user specific, custom information.

Let me know if it makes sense.

@Suor

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

@algomaster99 I am the author of the change in dvc. Here are my two cents:

  • dvc still throws on any unknown keys, which are ones already documented plus meta top level key,
  • meta can have any structure and contain anything,
  • one can add comments to .dvc file between lines and at line ends using # comment syntax,
  • comments are preserved as long as corresponding keys are preserved,
  • string values can use yaml multi-line formatting, which is retained upon .dvc file rewrite. E.g. command can contain newlines in its formatting,

Here is an updated dvc file sample:

# This as an example starting comment
# cmd could be formatted like below, newlines will mean space when run,
# use | instead of > to make them literal.
cmd: >
    python cmd.py 
        input.data 
        output.data metrics.json
# YAML multi-line formatting is exlplained here - https://yaml-multiline.info/
deps:
- md5: da2259ee7c12ace6db43644aef2b754c
  path: cmd.py  # A line comment example
- md5: e309de87b02312e746ec5a500844ce77
  path: input.data
md5: 521ac615cfc7323604059d81d052ce00
outs:
- cache: true
  md5: 70f3c9157e3b92a6d2c93eb51439f822
  metric: false
  path: output.data
- cache: false
  md5: d7a82c3cdfd45c4ace13484a931fc526
  metric:
    type: json
    xpath: AUC
  path: metrics.json
locked: True
meta:  # optional key to contain arbitrary user data
    author:
        name: Alex
        email: alex@somedomain.org
    anykey: anydata
# "meta: some-string" is also possible it doesn't have to contain dict

Hope this helps more than scares with its volume).

@efiop

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

@shcheklein

btw, @efiop does it make sense to include a comment with a link to this doc by default, when we generate the stage file

Seems ugly and unnecessary, I would prefer not to do that 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.