Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plots: Better confusion matrix, and normalized version to #4775

Merged

Conversation

sjawhar
Copy link
Contributor

@sjawhar sjawhar commented Oct 22, 2020

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Description of Changes

Relates to iterative/dvc-render#20

I really wasn't happy with the default confusion matrix provided in DVC, so I dove into Vega-Lite and figured out how to add a few features:

  • Text labels on each cell showing record counts
  • Added new confusion_normalized template to show counts normalized by actual label
    • I thought about adding a new anchor to control the normalization direction (groupby actual vs. predicted), but this can also easily be achieved by simply switching x and y πŸ˜„

Screenshots

New confusion template
visualization

Results of dvc plots diff with new confusion_normalized template
visualization

Feedback Wanted

Here are some stylistic decisions I made that I'm open to reverting in interest of genericness:

  • Made the plot much bigger (can't reasonably overlay text on the current tiny size)
  • Hid the title next to the color bar
  • In the normalized plot, the scale is clamped from 0 to 1
  • In the unnormalized plot, the scale minimum is clamped to 0 and the max is set to nice: true

Also, cells that are zero are empty. This is because of how the grouping works, and I couldn't find a way to "fill in" the missing values. If someone more expert in Vega syntax can help me figure that out, that would be awesome. Fixed!

@sjawhar sjawhar force-pushed the feature/plots-confusion-improvements branch from b9f365e to 9b355e8 Compare October 22, 2020 16:47
@efiop efiop requested a review from pared October 22, 2020 20:51
@efiop
Copy link
Contributor

efiop commented Oct 22, 2020

@elleobrien @dmpetrov Would appreciate your thoughts on this πŸ™‚

@elleobrien
Copy link

elleobrien commented Oct 22, 2020

I LOVE this. It's really needed- I had a number of things I disliked about the default template that I was just modifying in my local files, and I suspect these changes will be an upgrade for a lot of data scientists.

All good from me- I really don't see any downsides. Switching the scale to normalized 0-1 also has a nice side effect of removing the awkwardly long default legend label "Count of Records", which looks bad in CML reports.

100% approve from me. Thanks @sjawhar!

@elleobrien
Copy link

Also empty cells is fine because they default to a color that isn't on the color scale- only possible improvement might be to default it to a gray that can be universally understood as NaN :)

Definitely not a blocker.

@pared
Copy link
Contributor

pared commented Oct 23, 2020

This is a greate change @sjawhar!

Also, cells that are zero are empty.

That is not a blocker. Though I do believe this is kind of bug - scale shows that 0 is light yellow, so I guess it should be marked that way. I think we should ask Vega developers what they think about it. Since their default question channel is stack overflow we could post there. @sjawhar since you are the first to raise this issue, do you want to do that? If not, we will take care of that.

@sjawhar
Copy link
Contributor Author

sjawhar commented Oct 23, 2020

@pared I agree that it's a bug. I've posted the question on Stack Overflow

@sjawhar sjawhar force-pushed the feature/plots-confusion-improvements branch from 56dd154 to 3a28b78 Compare October 25, 2020 14:17
@sjawhar
Copy link
Contributor Author

sjawhar commented Oct 25, 2020

@pared @elleobrien Fixed the missing values. It should work in any case where a particular class isn't entirely missing from a column (i.e. class A was never predicted)

@pared
Copy link
Contributor

pared commented Nov 3, 2020

@sjawhar sorry for taking the time with merging. We can merge it right away, will you solve conflicts, or should I do that?

@sjawhar
Copy link
Contributor Author

sjawhar commented Nov 3, 2020

@sjawhar sorry for taking the time with merging. We can merge it right away, will you solve conflicts, or should I do that?

No worries, I'll take care of it today.

@sjawhar sjawhar force-pushed the feature/plots-confusion-improvements branch from 3a28b78 to f354bc6 Compare November 4, 2020 02:00
@sjawhar
Copy link
Contributor Author

sjawhar commented Nov 4, 2020

Rebased on the latest master branch

@pared
Copy link
Contributor

pared commented Nov 4, 2020

This is just amazing, thank you @sjawhar! πŸš€ πŸ”₯

@pared pared merged commit 6494bd0 into iterative:master Nov 4, 2020
@skshetry skshetry added the feature is a feature label Nov 12, 2020
@sjawhar sjawhar deleted the feature/plots-confusion-improvements branch November 20, 2020 01:18
@jorgeorpinel
Copy link
Contributor

Hello! Should we update docs examples per this improvement? Specifically in https://dvc.org/doc/command-reference/plots#example-confusion-matrix and https://dvc.org/doc/command-reference/plots/diff#example-confusion-matrix.

@pared
Copy link
Contributor

pared commented Nov 26, 2020

@jorgeorpinel great point! Created PR.

@jorgeorpinel
Copy link
Contributor

Ty @pared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature is a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants