Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc status --json can output non-json #10242

Open
gregstarr opened this issue Jan 17, 2024 · 8 comments · Fixed by #10253
Open

dvc status --json can output non-json #10242

gregstarr opened this issue Jan 17, 2024 · 8 comments · Fixed by #10253
Labels
A: cli Related to the CLI bug Did we break something? p3-nice-to-have It should be done this or next sprint ui user interface / interaction

Comments

@gregstarr
Copy link

Bug Report

Description

When there are large files to hash which are not cached, dvc status --json will still print out the message, which makes the output not valid json. I believe the use case of dvc status --json is to be able to pipe the output to a file and easily read it with another program, so extra messages make this inconvenient.

I accidentally erased the output I had but I think this is the message that is printed out: https://github.com/iterative/dvc-data/blob/300a3e072e5baba50f7ac5f91240891c0e30d030/src/dvc_data/hashfile/hash.py#L174

Reproduce

  1. large data file stage dependency
  2. dvc status --json for the first time

Expected

dvc status --json only outputs valid json

Environment information

Output of dvc doctor:

DVC version: 3.33.4 (choco)
---------------------------
Platform: Python 3.11.6 on Windows-10-10.0.19045-SP0
Subprojects:
        dvc_data = 2.24.0
        dvc_objects = 2.0.1
        dvc_render = 1.0.0
        dvc_task = 0.3.0
        scmrepo = 1.6.0
Supports:
        azure (adlfs = 2023.12.0, knack = 0.11.0, azure-identity = 1.15.0),
        gdrive (pydrive2 = 1.19.0),
        gs (gcsfs = 2023.12.2.post1),
        http (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.1, aiohttp-retry = 2.8.3),
        oss (ossfs = 2023.12.0),
        s3 (s3fs = 2023.12.2, boto3 = 1.33.13),
        ssh (sshfs = 2023.10.0)
Config:
        Global: C:\Users\starrgw1\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc
@dberenbaum
Copy link
Contributor

I think dvc status -q --json should be returning what you expect (json only) instead of only a return code. We have similar behavior in other commands like dvc data status -q --json. @skshetry What do you think?

@gregstarr
Copy link
Author

I saw that in the documentation but it seemed like that would result in no output at all

do not write anything to standard output. Exit with 0 if data and pipelines are up to date, otherwise 1.

https://dvc.org/doc/command-reference/status#-q

@dberenbaum
Copy link
Contributor

You are right, @gregstarr, it outputs nothing today. I mean that we should change the behavior to work this way, since there's no reason to do dvc status -q --json unless you want JSON output. We generally will still show JSON even with -q in these scenarios, like in dvc data status -q --json for example.

@gregstarr
Copy link
Author

Personally, I think just --json should basically imply "-q --json" because I can't think of any use case for using --json but wanting extra output

@dberenbaum
Copy link
Contributor

Discussed with the team. There are two issues here:

  1. We should not be writing those additional non-json logging messages to stdout
  2. We should be showing json regardless of -q when --json flag is used

@gregstarr How are you using the command?

@dberenbaum dberenbaum added bug Did we break something? ui user interface / interaction A: cli Related to the CLI labels Jan 23, 2024
@gregstarr
Copy link
Author

I have been using it like this:

dvc status --json <target> > status.json

I am using it to check the status of my pipelines and pass the json data to a minimal flask server for viewing.

Basically came as a result of this discussion post: https://discuss.dvc.org/t/ignore-files-in-stage-external-dependency-output/1889/2

I have a periodic task which clears out all the DS_Store files from my remote and checks the status of my pipelines.

@dberenbaum
Copy link
Contributor

@gregstarr dvc status -q --json should now provide only the JSON output.

@skshetry Do you want to keep it open as a reminder to stream logs to stderr here?

@dberenbaum dberenbaum added the p3-nice-to-have It should be done this or next sprint label Jan 24, 2024
@gregstarr
Copy link
Author

Thanks for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cli Related to the CLI bug Did we break something? p3-nice-to-have It should be done this or next sprint ui user interface / interaction
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants