Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plots smoothing #3906

Closed
dmpetrov opened this issue May 29, 2020 · 12 comments · Fixed by #4046
Closed

plots smoothing #3906

dmpetrov opened this issue May 29, 2020 · 12 comments · Fixed by #4046
Assignees
Labels
feature request Requesting a new feature
Milestone

Comments

@dmpetrov
Copy link
Member

Our default plot is not very informative when the number of data points is huge (1k):
visualization_

We need to introduce some smoothing into the default plot or create a new smoothing template. The result should look like:
visualization (2)

PS: I've done the last plot by adding loess transformation to the template which breaks the revision. We need to fix the revision or find another solution.

    "transform": [{
        "loess": "<DVC_METRIC_Y>",
        "on": "<DVC_METRIC_X>"
    }
    ],
@dmpetrov dmpetrov added feature request Requesting a new feature triage Needs to be triaged labels May 29, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label May 29, 2020
@pared
Copy link
Contributor

pared commented Jun 5, 2020

So, I see a few ways how we can approach that:

  1. Create new template, default_smoot or something like that - easy, but if we decide to go that way, soon we will have many plot templates, which is probably not desired.

  2. Since we already support plot definition in pipelines file, maybe we should consider introducing update_dict parameter, that would be a way to pass any desired modification, that user would like to made to existing plot. So in our case, entry for plot would probably look something like:

copy3:
    cmd: cat data >> result3
    deps:
    - data
    plots:
    - result3:
        update_dict:
        - transform: 
              - loess: <DVC_METRIC_Y>
              - on: <DVC_METRIC_X>

The cons:

  • in that case the user needs to come up with JSON update and transform it to yaml (since we work with Vega, I would assume that this would be the default way to go -> find a solution in vega -> apply to dvc to preserve it). That is not very convenient. Maybe we could add another plots command that would make this easier.
  • if we decide to go this way, I would reconsider using anchors. (DVC_METRIC_Y, DVC_METRIC_X, and others). In the proposed solution user needs to learn vega -> learn how to transform -> create transformation -> convert to yaml -> learn what anchors are -> replace x/y witch anchors -> write to yaml file.
    Anchors were introduced to let the user create their own HTML templates, but since diff already supports plotting multiple targets, and we have --show-vega option, I am not sure we need them anymore.

The pros:

  • this approach lets user modify the template in any way he/she wants, so basing on default template, user can modify the title, width,height, whatever comes to mind. Assuming certain level of vega.js knowledge, of course.

@dmpetrov
Copy link
Member Author

dmpetrov commented Jun 5, 2020

The 2nd approach makes sense for many different transformations. However, we should not use it as a default approach for smoothing since smoothing is a super common use case and we don't want to complicate the pipeline every time.

If it possible I'd add the smoothing in the default one. If not possible - create a new template.

@pared
Copy link
Contributor

pared commented Jun 9, 2020

@dmpetrov Tried adding smoothing by default, but "normal" use cases can get ugly. So I guess we should go with new template.

@dmpetrov dmpetrov added this to the 1.0 milestone Jun 9, 2020
@dmpetrov
Copy link
Member Author

dmpetrov commented Jun 9, 2020

@pared could you please share the result with code? Were you able to solve the revision issue?

@pared
Copy link
Contributor

pared commented Jun 10, 2020

@dmpetrov sure!
So, the thing that we had to do here was add groupby to our transform:

"transform": [
{
        "loess": "<DVC_METRIC_Y>",
        "on": "<DVC_METRIC_X>",
	"groupby": ["rev"]
}
]

And the results for "normal" use case looks as follows (example have 100 points, evenly spaced):

visualization (1)

and with loess applied:
visualization

@pared
Copy link
Contributor

pared commented Jun 10, 2020

@dmpetrov also, regarding the update_params: I agree we should implement it at some point, I could totally imagine, user preserving the original plot AND applying some transformation, just like in this example.

@pared
Copy link
Contributor

pared commented Jun 15, 2020

@dmpetrov
I played around with smoothing values and here are results for different bandwidth values:

  • 0.01
    001

  • 0.05
    005

  • 0.1
    01

  • 0.2
    02

It seems to me that setting default smoothing to very small value is a reasonable decision for the default plot, 0.01 case seems to affect only plot with really big amount of points. It does not require too much additional work from us, and we can deal with update_dict later. If we agree on that, I would mention the smoothing in docs.

@pared
Copy link
Contributor

pared commented Jun 15, 2020

To make the comparison easier I created new plots out of 12500 point case, by extracting particular amount of points from the dataset, evenly spaced. Number of were 1250, 125, 25:

  • 0.01
    new_001

  • 0.05
    new_005

  • 0.1
    new_01

  • 0.2
    new_02

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Jun 15, 2020

  1. Create new template, default_smoot or something like that - easy, but if we decide to go that way, soon we will have many plot templates

A smooth template is not a bad idea TBH. Quick and easy, we just document the template (e.g. link here) and emphasize that it can be customized to change the parameters.

  1. Since we already support plot definition in pipelines file, maybe we should consider introducing update_dict parameter, that would be a way to pass any desired modification

I don't understand why update_dict is needed in the YAML structure but anyway, if this means dvc plots modify will work more like dvc config accepting key:value pairs to set the display props (instead of having a specific -- flag per supported prop, which is a little confusing and limiting), I support this route.

@efiop
Copy link
Contributor

efiop commented Jun 15, 2020

agree with @jorgeorpinel about making plots modify more like config. It was a small experiment with plots modify in an attempt to possibly make it better than metrics modify that we used to have, but it feels more confusing indeed.

@jorgeorpinel
Copy link
Contributor

Per #3906 (comment) sounds like we're going with a template for this one, so should we extract the part about making plots modify more like config? Pawel suggested I commented here but I'd be happy to create a separate ticket.

@dmpetrov
Copy link
Member Author

@jorgeorpinel & @efiop yes, it has to be configurable (otherwise we need dozens of the templates :) ).

We already support dvc plots modify. The smoothing option should be part of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

4 participants