Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plots: return error messages for failed plots #7692

Closed
1 of 3 tasks
shcheklein opened this issue May 4, 2022 · 29 comments · Fixed by #9146
Closed
1 of 3 tasks

plots: return error messages for failed plots #7692

shcheklein opened this issue May 4, 2022 · 29 comments · Fixed by #9146
Assignees
Labels
A: api Related to the dvc.api A: plots Related to the plots feature request Requesting a new feature p1-important Important, aka current backlog of things to do product: VSCode Integration with VSCode extension

Comments

@shcheklein
Copy link
Member

shcheklein commented May 4, 2022

Description and Motivation

If plots can't be processed we log only basic message and skip those plots in the json output:

$ dvc plots diff main workspace -o .dvc/tmp/plots --split --show-json -v --targets missclassified.jpg
DVC failed to load some plots for following revisions: 'workspace, main'.
{
  "missclassified.jpg": []
}

We need to have better results, granular messages about failed plots so that we can show in VS Code properly instead of silently ignoring it, see

It's related to this issues - iterative/vscode-dvc#2277 and iterative/vscode-dvc#1649 in VS Code repo. Very high level - we need to distinguish absent plots from errors and show some signal to users vs silently ignoring things and/or showing misleading messages (refresh button when there is nothing to refresh in an experiment).

Current Output

All examples are done for multiple revisions, --json + --split flags.

Single image
"eval/importance.png": [
    {
      "type": "image",
      "revisions": [
        "workspace"
      ],
      "url": "/Users/ivan/Projects/example-repos-dev/example-get-started/build/example-get-started/dvc_plots/workspace_eval_importance.png"
    },
    {
      "type": "image",
      "revisions": [
        "c475deb7448319fab434d5650264dd2dd91bad43"
      ],
      "url": "/Users/ivan/Projects/example-repos-dev/example-get-started/build/example-get-started/dvc_plots/c475deb7448319fab434d5650264dd2dd91bad43_eval_importance.png"
    },
    {
      "type": "image",
      "revisions": [
        "7e4e86ca117f1bbef288f2abebfc7c97d0a9925d"
      ],
      "url": "/Users/ivan/Projects/example-repos-dev/example-get-started/build/example-get-started/dvc_plots/7e4e86ca117f1bbef288f2abebfc7c97d0a9925d_eval_importance.png"
    }
  ]
Flexible (top-level) plot
"dvc.yaml::Precision-Recall": [
    {
      "type": "vega",
      "revisions": [
        "7e4e86ca117f1bbef288f2abebfc7c97d0a9925d",
        "c475deb7448319fab434d5650264dd2dd91bad43",
        "workspace"
      ],
      "content": {
        "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
        "data": {
          "values": "<DVC_METRIC_DATA>"
        },
        "title": "dvc.yaml::Precision-Recall",
        "width": 300,
        "height": 300,
        "mark": {
          "type": "line",
          "point": true,
          "tooltip": {
            "content": "data"
          }
        },
        "encoding": {
          "x": {
            "field": "recall",
            "type": "quantitative",
            "title": "recall"
          },
          "y": {
            "field": "precision",
            "type": "quantitative",
            "title": "precision",
            "scale": {
              "zero": false
            }
          },
          "color": {
            "field": "rev",
            "type": "nominal"
          }
        }
      },
      "datapoints": {
        "workspace": [
          {
            "precision": 0.30321774445485783,
            "recall": 1.0,
            "threshold": 0.0,
            "dvc_data_version_info": {
              "revision": "workspace",
              "filename": "eval/prc/train.json",
              "field": "precision"
            }
          },
         {"...."},
         {
            "precision": 0.6694635900509439,
            "recall": 0.9359028068705488,
            "threshold": 0.20869278966952978,
            "dvc_data_version_info": {
              "revision": "workspace",
              "filename": "eval/prc/test.json",
              "field": "precision"
            }
          }
Multiple images
"mispredicted/croissant/muffin-16115-13825-26827-1d8e67e0bffdfebcdb3b337787823ab6.jpeg": [
    {
      "type": "image",
      "revisions": [
        "workspace"
      ],
      "url": "/Users/ivan/Projects/hackathon/dvc_plots/workspace_mispredicted_croissant_muffin-16115-13825-26827-1d8e67e0bffdfebcdb3b337787823ab6.jpeg"
    }
  ],
  "mispredicted/muffin/croissant-0295ed7610487b3118febb5563bc58fd.jpg": [
    {
      "type": "image",
      "revisions": [
        "workspace"
      ],
      "url": "/Users/ivan/Projects/hackathon/dvc_plots/workspace_mispredicted_muffin_croissant-0295ed7610487b3118febb5563bc58fd.jpg"
    }
  ],
  "mispredicted/muffin/croissant-3f488b602f2a668e-3fd6af132b0dceafe014dfcf7809d2ff.jpg": [
    {
      "type": "image",
      "revisions": [
        "workspace"
      ],
      "url": "/Users/ivan/Projects/hackathon/dvc_plots/workspace_mispredicted_muffin_croissant-3f488b602f2a668e-3fd6af132b0dceafe014dfcf7809d2ff.jpg"
    }
  ],
  "mispredicted/muffin/dog-bec8602c36317744-1827c47f8ae15e6a3c4ee660035781a4.jpg": [
    {
      "type": "image",
      "revisions": [
        "workspace"
      ],
      "url": "/Users/ivan/Projects/hackathon/dvc_plots/workspace_mispredicted_muffin_dog-bec8602c36317744-1827c47f8ae15e6a3c4ee660035781a4.jpg"
    }
  ]
Stage linear plot
"dvclive/scalars/eval/loss.tsv": [
    {
      "type": "vega",
      "revisions": [
        "d82452a"
      ],
      "content": {
        "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
        "data": {
          "values": "<DVC_METRIC_DATA>"
        },
        "title": "dvclive/scalars/eval/loss.tsv",
        "width": 300,
        "height": 300,
        "mark": {
          "type": "line",
          "point": true,
          "tooltip": {
            "content": "data"
          }
        },
        "encoding": {
          "x": {
            "field": "step",
            "type": "quantitative",
            "title": "step"
          },
          "y": {
            "field": "eval/loss",
            "type": "quantitative",
            "title": "eval/loss",
            "scale": {
              "zero": false
            }
          },
          "color": {
            "field": "rev",
            "type": "nominal"
          }
        }
      },
      "datapoints": {
        "d82452a": [
          {
            "timestamp": "1660180711394",
            "step": "0",
            "eval/loss": "2.4602549076080322",
            "dvc_data_version_info": {
              "revision": "d82452a",
              "filename": "dvclive/scalars/eval/loss.tsv",
              "field": "eval/loss"
            }
          },
          {
            "timestamp": "1660180723400",
            "step": "1",
            "eval/loss": "1.3761318922042847",
            "dvc_data_version_info": {
              "revision": "d82452a",
              "filename": "dvclive/scalars/eval/loss.tsv",
              "field": "eval/loss"
            }
          }
        ]
      }
    }
  ],

Unblocks, Related

iterative/vscode-dvc#2277
iterative/vscode-dvc#1649

Next Steps

  • A bit of research. JSON structure looks extremely suboptimal (tons of duplication), since we are changing it, I'd like to have a bit better understanding of how it's being used. Entry point into VS Code is here.
  • ⌛ Try to add an error for a single image
  • Classify and suggest how to add errors in all other cases - including directories, regular plots (e.g. linear).
@pared
Copy link
Contributor

pared commented May 4, 2022

What will be the error data used for? Do we only want to pass the error message to the user? Or will there be some logic involved with processing the errors on vs-code side?

@mattseddon
Copy link
Member

We'll be processing the errors before displaying anything to the user. Would be good to have a way to identify where certain revisions are missing data due to errors (as in the example provided in #7691).

@shcheklein shcheklein added the product: VSCode Integration with VSCode extension label May 4, 2022
@efiop
Copy link
Contributor

efiop commented Jul 19, 2022

@pared Any progress on this one?

@pared
Copy link
Contributor

pared commented Jul 20, 2022

No, but I believe we could include it as a part of implementing iterative/vscode-dvc#1757

@dberenbaum
Copy link
Contributor

@pared Any updates on looking into this one?

@pared
Copy link
Contributor

pared commented Aug 8, 2022

I consider it as a part of aforementioned issue on vscode - but the estimation for vscode depends on research on studio side. It is not yet finished.

@pared
Copy link
Contributor

pared commented Sep 13, 2022

Note to self: since returning errors will probably require data structure change, we need to remember to get rid of filling rev in datapoints - as vscode sometimes need to assign their own revision (eg 'main' vs short sha of main).

@pared
Copy link
Contributor

pared commented Sep 30, 2022

I didn't left any comment during research, so:
We were able to implement top level plots basing on old data format. In order to support errors we will need to change the data structure returned by dvc plots ... --json.

@shcheklein
Copy link
Member Author

@mattseddon could you please share the location of the code that parses the --json result for plots on our end.

@dberenbaum do we know if anyone else besides vs code depends on --json?

  • ⌛ A bit of research. JSON structure looks extremely suboptimal (tons of duplication), since we are changing it, I'd like to have a bit better understanding of how it's being used.
  • Another question to answer and agree on how we process directories (if the whole plot dir can't be expanded and we don't know what files it has we can't send an error per file then, we'll have to send it per directory)
  • Change the output

@mattseddon
Copy link
Member

Use https://github.com/iterative/vscode-dvc/blob/main/extension/src/plots/model/index.ts#L108 as an entry point.

@shcheklein
Copy link
Member Author

Screen Shot 2023-01-15 at 6 23 00 PM

I see that data collection depends on the datapoints field and is not using data in the content:

https://github.com/iterative/vscode-dvc/blob/main/extension/src/plots/model/collect.ts#L372
https://github.com/iterative/vscode-dvc/blob/main/extension/src/plots/model/collect.ts#L423

@mattseddon do you remember from the top of your head if we need data in the template, it looks identical (at least in the sample I have), do we need it in VS Code? And why did we decide to keep both (e.g. why don't we parse plot.content.data instead of datapoints).

Are there some proposals, tickets, PRs for the plots JSON format?

@mattseddon
Copy link
Member

I do not think that we need it.

@mattseddon
Copy link
Member

Are there some proposals, tickets, PRs for the plots JSON format?

The original PR is here: #7367. From reading the description it looks like the data being duplicated is a bug for the --split flag.

@dberenbaum
Copy link
Contributor

@dberenbaum do we know if anyone else besides vs code depends on --json?

No, I don't think so.

For the duplicated data, I'm missing something because I have different output from what @shcheklein shows above. I don't see all the data in content.data.values. My output for dvc plots diff 504206e f586d67 workspace -o .dvc/tmp/plots --split --json looks like this:

{
  "dvc.yaml::Accuracy": [
    {
      "type": "vega",
      "revisions": [
        "504206e",
        "f586d67",
        "workspace"
      ],
      "content": {
        "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
        "data": {
          "values": "<DVC_METRIC_DATA>" # Nothing else shows up in this field.
        },
...

@shcheklein
Copy link
Member Author

@dberenbaum my bad, I didn't use split I think. I wonder what's the purpose of datapoints in the non-split mode then? (not critical I think at all, since JSON is not used anywhere now).

@shcheklein
Copy link
Member Author

shcheklein commented Jan 17, 2023

Updated the description - some examples of the current output. Next - try to add an error for an image plot (not directory with images_ case for now (example-get-started's importance.png).

@skshetry
Copy link
Member

In general, I find returning errors to be a mistake. It increases a lot of maintenance burden, for which we are not ready internally.

@shcheklein
Copy link
Member Author

@skshetry it's a bad user experience to not show anything at all in case something fails and since we. I think if it's done right it won't be a bigger burden at all and code doesn't have to be complicated. We already have this data we just need to propagate it (I think so at least, I can be wrong). And to clarify, we don't talk here about processing specific types of errors, we just need a signal that plot exists in a revision and that it can't be loaded for some reason.

On the maintenance side - I think the whole plots logic and related index part should be the first thing to improve. E.g. after the last refactoring we still have two plot accessors (_plots and plots), still some custom collect logic, a lot of logic with path manipulations, old code (like output plots, etc) - those are points that should be remove, refactored, etc to make it lighter and simpler.

@skshetry
Copy link
Member

We already have this data we just need to propagate it

That's where the complexity is, right? It's easy to log or suppress but extremely hard to propagate up. We need small sets of APIs at the high level where we do this. At the moment we are spreading this logic to deep layers which increases the burden.

I think there should be a symmetry between the product and the engineering side, and here I think the expectation on the product side is too high (or, was too high). :)

@shcheklein
Copy link
Member Author

That's where the complexity is, right?

Doesn't have to be. Sometimes dropping some code (that removes and / or transforms things) instead of exposing them directly (which might be just fine in this case) can simplify it. We'll see how it goes. I definitely want to avoid adding tons of custom code for this.

I think there should be a symmetry between the product and the engineering side, and here I think the expectation on the product side is too high (or, was too high). :)

I think it's a wrong dichotomy in this case. I'm not sure if it's possible to do it now w/o complicating things. It's definitely doesn't add much complexity to do this from scratch. If we had the standard in mind (it's not high at all) we would have spent some small additional percent of time.

Product expectation - we talk about VS Code, right (that's what I have in mind in the first place), not DVC? Just in case. I'm fine (more or less) for DVC to return a generic error (and write something in logs). In VS Code it leads to bad experience. It's not top priority (that's why I'm doing this in background), but it can and should be fixed. And we should have a higher standard for out products.

@shcheklein
Copy link
Member Author

For visibility: got distracted by some other plots issues (broken smooth templates, new DVCLive release) and didn't have capacity for this hands on work (which is not a lot of time by default). I'll try to get back to this asap.

Some design decisions that are tricky here. If we have a plot directory we expand each file in that directory as its own plot when we return the result. It's fine. The problem is that we don't know the layout if we can't download the .dir in the first place. So, for these granular plots - we can't communicate errors at all- we don't know for sure if they exist or not in the directory in the failed revision. We'll have assume that they don't I guess + communicate that we were not able to process the whole directory.

@dberenbaum
Copy link
Contributor

@shcheklein Can you clarify the full scope of the issue? Is it only about plot directories, or is that merely one case you are trying to solve for?

@shcheklein
Copy link
Member Author

shcheklein commented Feb 8, 2023

Yes, @dberenbaum . It's related to this issues - iterative/vscode-dvc#2277 and iterative/vscode-dvc#1649 in VS Code repo. Very high level - we need to distinguish absent plots from errors and show some signal to users vs silently ignoring things and/or showing misleading messages (refresh button when there is nothing to refresh in an experiment).

Can you clarify the full scope of the issue? Is it only about plot directories, or is that merely one case you are trying to solve for?

Thus: The full scope: show error message for all plot definitions, not only directories / images.

@dberenbaum
Copy link
Contributor

@skshetry Can you follow up with questions you have, and @shcheklein and I can respond to define the scope better? By next week when you are finished with support duty, let's try to have a solid plan and estimate 🙏 .

@skshetry
Copy link
Member

I could not look into this during support duty, as some p0s/bugs came.

@dberenbaum
Copy link
Contributor

@skshetry
Copy link
Member

We do seem to preserve errors during plots.collect(). We transform internal representation to the JSON format, where we lose most of the information. We could start with exposing that, what would be a good json format for incorporating errors for vscode?

@shcheklein
Copy link
Member Author

We could start with exposing that, what would be a good json format for incorporating errors for vscode?

@skshetry 🤔 tbh I don't think VS Code requires anything specific here. We should come up with a decent general format for this data. We can adjust VS Code if needed.

@dberenbaum
Copy link
Contributor

I think what we've learned is that it's helpful to share drafts early and often to get feedback as you go so we know mostly what works in both products by the time we are ready to merge.

@dberenbaum dberenbaum added p3-nice-to-have It should be done this or next sprint p1-important Important, aka current backlog of things to do and removed p3-nice-to-have It should be done this or next sprint labels Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: api Related to the dvc.api A: plots Related to the plots feature request Requesting a new feature p1-important Important, aka current backlog of things to do product: VSCode Integration with VSCode extension
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants