Skip to content

Conversation

Oddant1
Copy link
Member

@Oddant1 Oddant1 commented Aug 2, 2019

Closes #148 Adds ability to adjust colorscale of heatmaps derived from confusion matrices via seaborn's vmin and vmax parameters. Is this along the lines of what you want @nbokulich? Once the parameters are properly implemented I can add some tests.

@nbokulich nbokulich self-assigned this Aug 7, 2019
Copy link
Member

@nbokulich nbokulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Oddant1 ! I pulled down these changes and tested this locally — all seems to work as intended.

However, these parameters make it easy to abuse or (worse yet) unintentionally mask data. I wonder whether we should at least describe this behavior in the parameter descriptions or even raise a warning if vmin/vmax are set to values internal to the value range (I am not too keen to bog down the code with a validation test so much prefer the former to the latter). @thermokarst any thoughts?

E.g., I made this plot from the moving pictures data and set vmin=0.9 and vmax=0.91:
image
this just causes all lower and higher values to be colored uniformly, which is very misleading. Compare to vmin/vmax="auto":
image

parameter_descriptions={
'truth': 'Metadata column (true values) to plot on y axis.',
'missing_samples': parameter_descriptions['base']['missing_samples'],
'vmin': 'The minimum color value for the heatmap',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we modify from the seaborn descriptions:
'vmin': 'The minimum value to use for anchoring the colormap. If "auto", vmin is set to the minimum value in the data.',
'vmax': 'The maximum value to use for anchoring the colormap. If "auto", vmax is set to the maximum value in the data.'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oddant1 please see this old change needed

@thermokarst
Copy link
Contributor

@nbokulich - the first thing that comes to my mind RE the unintentional masking of data is to promote that to a real mask (using the mask parameter in sns.heatmap).

@nbokulich
Copy link
Member

promote that to a real mask (using the mask parameter in sns.heatmap).

I suppose that works. @Oddant1 what do you think about adding a mask to _plot_heatmap_from_confusion_matrix if vmax or vmin is not None? See https://seaborn.pydata.org/generated/seaborn.heatmap.html for more details on masking.

@Oddant1
Copy link
Member Author

Oddant1 commented Aug 22, 2019

@nbokulich and @thermokarst, if I'm reading this correctly, what's happening is it's too easy to set vmin and vmax to values that cause some of the cells to fade into the background unintentionally. If this is the case, I'm not sure I understand how the mask is going to solve this. How would adding a mask change the appearance of this heatmap?
image
It says the mask allows you to intentionally cover up cells, but I don't see how allowing you to do it intentionally stops it from happening unintentionally. I apologize for my lack of understanding of how this particular utility works.

@nbokulich
Copy link
Member

@Oddant1 thank you for asking — I scratched my head for a bit first at @thermokarst 's suggestion. There are really two problems here: vmin/vmax can lead to unintentional (or nefarious) "masking" (in my parlance) of data that are off-scale, causing two distinct issues:

  1. off-scale values cannot be distinguished so datapoints can look higher or lower than they actually are.
  2. values that really are zero will appear to be vmin. E.g., in the heatmap above it looks like most squares are 0.9 when most are 0.0.

masking (in seaborn parlance, not mine) will allow values lower than vmin to be dropped, so that the square appears white instead of whatever the lowest color value is. This is effectively a fix for issue 2, but unfortunately not for 1. However, issue 1 might not be our job to solve just as it's not seaborn's job; e.g., users should be responsible and pay attention, it is a mostly foolproof step, and we can't run too much of a nanny framework here. Even if a user accidentally uses an inappropriate vmin value (e.g., they are copying/pasting from an existing workflow), the masked values will make it apparent to them that there's a problem and they need to re-run that step.

So this is a "good enough" fix in my mind, even if it's not perfect.

@Oddant1
Copy link
Member Author

Oddant1 commented Aug 22, 2019

@nbokulich so it wouldn't eliminate the issue, but it would make it obvious that the issue had occurred which would hopefully get the user to rerun and pay closer attention to their vmin and vmax values

@nbokulich
Copy link
Member

yep! I am also on the fence about whether we should expose a mask parameter to toggle this behavior. I'm thinking no for now and if it bugs us or users, we can add at a later date. What do you and @thermokarst think?

@thermokarst
Copy link
Contributor

thermokarst commented Aug 22, 2019

One way to expose it in a limited manner is to use TypeMap!

A, B, __ = TypeMap({
    (Float, Bool % Choices(False)): Visualization,
    (Str % Choices('auto'), Bool): Visualization,
})

...

parameters={
    'foo': A,
    'bar': B
...

This produces a signature like this:

image

Let me know if you want the example expanded on.

@Oddant1
Copy link
Member Author

Oddant1 commented Aug 22, 2019

So in that example, vmin and vmax look like foo, and mask looks like bar?

@thermokarst thermokarst assigned Oddant1 and unassigned nbokulich Aug 22, 2019
@Oddant1 Oddant1 assigned nbokulich and unassigned Oddant1 Aug 22, 2019
@nbokulich
Copy link
Member

@Oddant1 what is the status of this PR?

My understanding is that you are going to refactor to toss the TypeMap and masking, but keep vmin/vmax and guard against vmin > min val and vmax < max value. Is that still the plan? (or maybe that message never made it through).

I am reassigning to you to make these changes — let's connect on slack if you have questions.

@nbokulich nbokulich assigned Oddant1 and unassigned nbokulich Aug 28, 2019
@Oddant1
Copy link
Member Author

Oddant1 commented Aug 28, 2019

@nbokulich I'm not aware of any plan to get rid of TypeMap? Where did you message me? I probably just missed it.

@Oddant1
Copy link
Member Author

Oddant1 commented Aug 28, 2019

977b1bb Not sure why the tests in this commit fail (the regexes aren't matching for some reason, I'm sure I'm just missing something stupid), but is this this along the lines of what we want?

@nbokulich
Copy link
Member

yep! that looks like what we need — but pull out the vmin/vmax check into a separate utility function and call it as early as possible in the _plot_confusion_matrix function, rather than in the _plot_heatmap_from_confusion_matrix function.

Copy link
Member

@nbokulich nbokulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Oddant1 ! just 3 tiny language tweaks and this is ready to go!

parameter_descriptions={
'truth': 'Metadata column (true values) to plot on y axis.',
'missing_samples': parameter_descriptions['base']['missing_samples'],
'vmin': 'The minimum color value for the heatmap',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oddant1 please see this old change needed

error = ''
if vmin is not None:
if vmin > lowest_frequency:
error += ('Your vmin value must be less than or equal to the '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change to:
"vmin must be less than or equal to the lowest predicted class frequency"

if vmax < highest_frequency:
if error:
error += '\n'
error += ('Your vmax value must be greater than or equal to the '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to:
"vmax must be greater than or equal to the highest predicted class frequency"

Copy link
Member

@nbokulich nbokulich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @Oddant1 !

@nbokulich nbokulich merged commit 76878b2 into qiime2:master Aug 30, 2019
@Oddant1 Oddant1 deleted the color_scale branch June 25, 2020 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

confusion-matrix: add parameter for adjusting color scale?

3 participants