Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve multiple semantic legend titles #1440

Closed
mwaskom opened this issue May 30, 2018 · 24 comments
Closed

Improve multiple semantic legend titles #1440

mwaskom opened this issue May 30, 2018 · 24 comments

Comments

@mwaskom
Copy link
Owner

mwaskom commented May 30, 2018

Currently the legends for lineplot (#1285) and scatterplot (#1436) don't indicate what variable each of the levels represent. I punted on this but it needs a solution. One option is to cram everything into the legend title but that's not ideal. I think doing something with legend entries that have invisible handles and and then bumping the legend text so it looks centered would work:

f, ax = plt.subplots()

h1 = ax.scatter([], [], s=0, label="Title 1")
h2 = ax.scatter([], [], label="level 1")
h3 = ax.scatter([], [], s=0, label="Title 2")
h4 = ax.scatter([], [], label="level 2")

legend = ax.legend()
for t in legend.get_texts()[::2]:
    xfm = transforms.offset_copy(t.get_transform(), ax.figure, x=-10, units="points")
    t.set_transform(xfm)

image

But getting this to be really robust is going to be tricky.

@mwaskom mwaskom changed the title Improve multiple semantic legend titles: Improve multiple semantic legend titles May 30, 2018
@CRiddler
Copy link
Contributor

CRiddler commented Jun 7, 2018

Had my eye on this one for a few days now, and I just got some time to look into it, so here are some thoughts.

I think doing something with legend entries that have invisible handles and and then bumping the legend text so it looks centered would work

This was my thought at first too, however even when you specify s=0 and/or marker="" a legend marker is still drawn and is technically visible. At least based on your example, instead of looking for a marker thats invisible, we can look for a marker with a size of 0 (or an alpha of 0 if we want to go that route) and set it to be invisible. Once we set the marker to be invisible, we can set the legend column(s) to have a center align and that will properly align the elements that no longer have a visible marker. Copy paste example:

import matplotlib.pyplot as plt
import matplotlib

f, ax = plt.subplots()

h1 = ax.scatter([], [], alpha=0, label="Title 1") #example with alpha=0
h2 = ax.scatter([], [], label="level 1")
h3 = ax.scatter([], [], s=0, label="Title 2") #example with size=0
h4 = ax.scatter([], [], label="level 2")

legend = ax.legend()

for vpack in legend._legend_handle_box.get_children():
    vpack.align = 'center'
    
    for hpack in vpack.get_children():
        draw_area, text_area = hpack.get_children()
        for collection in draw_area.get_children():
            alpha = collection.get_alpha()
            sizes = collection.get_sizes()
            if alpha == 0 or all(sizes == 0):
                draw_area.set_visible(False)

plt.show()

centered_legend

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 13, 2018

That's a nice idea but unfortunately it won't work because we can't touch the private legend._legend_handle_box attribute. Is there a way to get at those boxes through the public API?

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 13, 2018

It's occurred to me that any tricks we do with the legend drawn by the axes-level plots won't propagate to the legends drawn by FacetGrid/PairGrid, so I'm starting to come to terms with adding labels for the semantics but just leaving then left-justified in the right column.

Perhaps there could be a special glyph to use that will identify what is a section title rather than an entry? Not sure what, though.

@CRiddler
Copy link
Contributor

CRiddler commented Jun 14, 2018

It's occurred to me that any tricks we do with the legend drawn by the axes-level plots won't propagate to the legends drawn by FacetGrid/PairGrid

Agreed. However since you end up building the legend by hand and adding it in for the user for both your axes level plots and your FacetGrids could we supply a post-hoc way of centering labels?

#Ex1 posthoc center legend labels that have alpha=0
def center_subtitles(legend):
    """Centers legend labels with alpha=0
    """
    vpackers = legend.findobj(matplotlib.offsetbox.VPacker)
    for vpack in vpackers[:-1]: #Last vpack will be the title box
        vpack.align = 'center'
        for hpack in vpack.get_children():
            draw_area, text_area = hpack.get_children()
            for collection in draw_area.get_children():
                alpha = collection.get_alpha()
                sizes = collection.get_sizes()
                if alpha == 0 or all(sizes == 0):
                    draw_area.set_visible(False)
    return legend


def ex1_mpl_posthoc_centering():
    """Explicit call to center the legend subtitles
    """
    f, ax = plt.subplots()
    
    h1 = ax.scatter([], [], alpha=0, label="Title 1")
    h2 = ax.scatter([], [], label="level 1")
    h3 = ax.scatter([], [], label="level 2")
    h4 = ax.scatter([], [], alpha=0, label="Title 2")
    h5 = ax.scatter([], [], label="level 1")
    
    ax.set_title("Ex1. outside func to center legend\nafter it has been created")
    ax_legend = ax.legend(loc="upper left")
    fig_legend = f.legend()
    
    #Center legend labels w/ alpha==0
    center_subtitles(ax_legend)
    center_subtitles(fig_legend)
    
    plt.show()

ex1_mpl_posthoc_centering()

figure_1

You could build the legend regularly, adding in labels with alpha=0 and then call it for

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 14, 2018

That looks great.

I'm a little worried about unintended consequences of simply grabbing all the vpackers, but it seems robust to trying a few things.

One thing that doesn't work, unfortunately, is that it's not robust to multiple calls to legend(). That is going to make it difficult to revise the legend after it's been drawn, e.g. to move it to a new place. The easiest way to do that in other places in seaborn is just to call ax.legend again.

As for interfacing with FacetGrid, the basic approach there has been to make that as general as possible and to not make any assumptions about the functions that are being passed to it to draw. But there will be a higher-level function to handle important bookkeeping when faceting lineplot and scatterplot, similar to the relationship between factorplot and the categorical plotters. This logic could go in there. And perhaps it could be a public-facing utility function, although that feels like a broken abstraction.

Ideally the best solution would be official support for "sectioned" legends in matplotlib but seaborn wouldn't be able to take advantage of that at this point anyway.

@CRiddler
Copy link
Contributor

I'm a little worried about unintended consequences of simply grabbing all the vpackers, but it seems robust to trying a few things.

Yeah I was worried about this as well. But, as long as the user hasn't meddled with the legend packing (I'm not sure if anyone does, because it would mean drawing up their own ax.legend() function essentially) this should work.

unfortunately, is that it's not robust to multiple calls to legend()

Yep. Maybe we could expose the centering function to users and they could use it if needed? Definitely not an ideal way of setting it up though. I did come up with a more robust solution, but it involves monkey patching the legend handlers which I'm not too keen on. I'll post it here so you want to take a look at it.

import matplotlib.pyplot as plt
from matplotlib.legend_handler import HandlerPathCollection
from matplotlib.legend import Legend
import functools

def subtitle_decorator(handler):
    @functools.wraps(handler)
    def wrapper(legend, orig_handle, fontsize, handlebox):
        handle_marker = handler(legend, orig_handle, fontsize, handlebox)
        if handle_marker.get_alpha() == 0:
            handlebox.set_visible(False)
    return wrapper

def mpl_patch_default_handler_map():
    
    #Adds our decorator to all legend handler functions
    for handler in Legend.get_default_handler_map().values():
        handler.legend_artist = subtitle_decorator(handler.legend_artist)
    
    f, ax = plt.subplots()
    
    h1 = ax.scatter([], [], alpha=0, label="Title 1")
    h2 = ax.scatter([], [], label="level 1")
    h3 = ax.scatter([], [], label="level 2")
    h4 = ax.scatter([], [], alpha=0, label="Title 2")
    h5 = ax.scatter([], [], label="level 1")
    
    ax.set_title("Ex2. set width of draw area to 0\nleft adjusts the text")
    ax_legend = ax.legend(loc="upper left")
    fig_legend = f.legend()
    plt.show()

mpl_patch_default_handler_map()

figure_1

But as you'll notice, we're gaining robustness at the cost of mucking around with matplotlib's internals which will cause the unintended side effect of users wondering why all of their markers with alpha=0 are left justifying their associated text in the legend.

Ideally the best solution would be official support for "sectioned" legends in matplotlib but seaborn wouldn't be able to take advantage of that at this point anyway

Yeah, it'd be a lot of code refactoring :(

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 14, 2018

That is clever, but illegal :)

However, is there no way to get something similar to work with the handler_map that you can pass to ax.legend?

@CRiddler
Copy link
Contributor

I did have a working function where you would pass it the handles you wanted to be subtitles and it would return a handler_map you can pass to ax.legend but I figured we would hit the same issue that we hit earlier with if a user repeated the call to ax.legend it would lose the handler_map data unless the user had access to it and knew to pass it into ax.legend()

However, thinking about it- I may have gotten another idea. I'll do some testing and post back.

@CRiddler
Copy link
Contributor

Instead of changing the defaults to check for alpha=0 we can create our own handler_map with a class factory. Essentially we iterate through handles we want to be "subtitles", map them get their appropriate Handler class, subclass that and decorate the legend_artist method set the visibility to be 0. Then return a handler_map of each subtitle handle instance to the subclassed Handler. This way we don't step on any of the defaults of matplotlib

Only downside, is the user will need to supply the handler_map to calls of ax.legend()

import matplotlib
import matplotlib.pyplot as plt
from matplotlib.legend import Legend
from matplotlib.legend_handler import HandlerBase, update_from_first_child

def subtitle_handler_factory(inherit_from):
    """Class factory to subclass Handlers and add our custom functionality
    """
    class SubtitleHandler(inherit_from):
        def legend_artist(self, legend, orig_handle, fontsize, handlebox):
            handlebox.set_visible(False)
            return inherit_from.legend_artist(self, legend,
                                              orig_handle, fontsize,
                                              handlebox)
    
    #HandlerPatch class needs a special unpdate_func
    if inherit_from is matplotlib.legend_handler.HandlerPatch:
        return SubtitleHandler(update_func=update_from_first_child)
    return SubtitleHandler()

def subtitle_handler_map(subtitles):
    defaults_handler_map = Legend.get_default_handler_map()
    handler_map = {}
    
    for orig_handle in subtitles:
        handler = Legend.get_legend_handler(defaults_handler_map, orig_handle)
        
        #Subclass the Handler
        new_handler = subtitle_handler_factory(type(handler))
        handler_map[orig_handle] = new_handler
    return handler_map

def mpl_new_default_handler_map():
    f, ax = plt.subplots()

    h1 = ax.scatter([],[], label="Title 1")
    h2 = ax.scatter([],[], label="level 1")
    h3 = ax.scatter([],[], label="level 2")
    h4 = ax.scatter([],[], label="Title 2")
    h5 = ax.scatter([],[], label="level 1")

    h6 = ax.bar(0,0, label="Title 3")
    h7 = ax.bar(0,0, label="level 1")
    h8 = ax.bar(0,0, label="level 2")
    h9 = ax.bar(0,0, label="Title 4")
    h10 = ax.bar(0,0, label="level 1")

    subtitles = [h1, h4, h6, h9]
    handler_map = subtitle_handler_map(subtitles)
    
    ax.set_title("Ex2. Class factory to pass subtitles to")
    ax_legend = ax.legend(loc="upper left", handler_map=handler_map)
    fig_legend = f.legend(handler_map=handler_map)
    
    plt.show()

mpl_new_default_handler_map()

figure_1

My only other idea, which I'm not even 100% sure on how to implement is to dynamically subclass the Handles themselves to be a Subtitle.[OriginalHandle] (in addition to subclassing their Handler counterparts). Then insert the new {Subtitle.[OriginalHandle]: subclassed-Handler} pair into the default Handler dictionary. The seaborn would still need to keep track of which handles they want to become subtitles, but it would circumvent the need of passing around a handler_map dictionary to ax.legend() all the time.

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 15, 2018

This last approach is definitely the most sophisticated, but I think if both options require user involvement, "call this function and it will left-align labels for non-visible legend entries" is a bit easier to understand than "pass this handler map when you call ax.legend and magic will happen".

@CRiddler
Copy link
Contributor

I’m in agreement here. I like the nonmagicalness of the center_legend function. However I’m not 100% a fan of how it relies on detecting invisible legend entries to determine if a label becomes a subtitle. Seems like a roundabout way and not as explicit as I would like.

Just bouncing ideas here, but what if the center_legend also had an argument where the user could pass a dictionary to restructure the legend ordering. Something like

subtitle_legend(ax, legend_order)

Where legend_order would be a user generated dictionary of subtitles they with to add, and the labels that would appear under the given subtitle like so:

{‘subtitle1’: [‘group1’, ‘group2’], 
 ‘subtitle2’: [‘level1’, ‘level2’]}

Then we can go through the legend handles/labels, pull out the handles that match the inputted labels and add a subtitle to them. Then submit that ordering to a call of ax.legend(handles, labels). I’ll write up a function to better demonstrate what I’m talking about when I get some time today. The only down side I see here is that a label under one subtitle can’t be the same as a label under another subtitle.

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 15, 2018

I think rather than checking for 0 alpha/size, we can use legend artist visibility as a proxy for whether the label should be left-aligned (which I actually think looks a bit better than centered).

@CRiddler
Copy link
Contributor

Ah I just saw your comment as I was coming to post this. I think checking on the legend artist visibility wouldn't be too hard and be a more appropriate proxy than alpha because we can set the visibility of the handle afterwards. What I was just thinking is that instead of even adding the subtitles to the Axes at all (right now, we're calling ax.scatter([], [], label="title 1") solely for the label) we don't really want anything to do with the scatter what if we had the user supply the ordering in a wrapper around ax.legend()? Sorry for throwing a billion ideas at you!

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

def subtitle_legend(ax, legend_format):
    new_handles = []
    
    handles, labels = ax.get_legend_handles_labels()
    label_dict = dict(zip(labels, handles))
    
    #Means 2 labels were the same
    if len(label_dict) != len(labels):
        raise ValueError("Can not have repeated levels in labels!")
    
    for subtitle, level_order in legend_format.items():
        #Roll a blank handle to add in the subtitle
        blank_handle = matplotlib.patches.Patch(visible=False, label=subtitle)
        new_handles.append(blank_handle)
        
        for level in level_order:
            handle = label_dict[level]
            new_handles.append(handle)

    #Labels are populated from handle.get_label() when we only supply handles as an arg
    legend = ax.legend(handles=new_handles)

    #Turn off DrawingArea visibility to left justify the text if it contains a subtitle
    for draw_area in legend.findobj(matplotlib.offsetbox.DrawingArea):
        for handle in draw_area.get_children():
            if handle.get_label() in legend_format:
                draw_area.set_visible(False)

    return legend


def seaborn_scatter():
    sns.set()
    
    tips = sns.load_dataset('tips')
    ax = sns.scatterplot(x="total_bill", y="tip",
                         hue="sex", style="smoker", size="size",
                         data=tips)

    #nice and explicit call to reorder the legend
    legend_format = {'Sex (hue)': ['Male', 'Female'],
                     'Group Size (size)': ['0', '2', '4', '6'],
                     'Smoker (shape)': ['Yes', 'No']
                    }

    # Use this instead of `ax.legend()`
    #   this returns the legend as well for further tweaking
    #   can also set it up to take kwargs to pass onto `ax.legend` call
    subtitle_legend(ax, legend_format=legend_format)

    plt.show()

seaborn_scatter()

figure_1

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 22, 2018

So to summarize, there are two good options:

  1. A function that can be called with no arguments and relies on the assumption that invisible handles in an existing legend are meant to be treated as section titles. All it does is adjust their alignment.
  2. A function that can flexibly restructure a legend by adding invisible artists to the axes and using their labels as section titles.

I think both functions are useful and are complimentary, since they do more or less different things. I would lean towards (1) as being most useful for a user who needs to quickly regenerate a legend, since they won't necessarily have access to the ordering that seaborn originally used. But (2) is a very nice utility function itself and I think would be nice to have ... it also is probably the most robust option to use inside of seaborn itself.

If you agree, can you please open a PR adding these functions to seaborn.utils?

@CRiddler
Copy link
Contributor

I agree that these options are good. I think they can be wrapped into a single function that changes how it works depending whether or not a legend_format argument is supplied.

My only qualm with this approach now is that we're planning on scatterplot and lineplot to insert these subtitles automatically- which means that if a user does this:

ax = sns.scatterplot(…, hue="hello") #make a plot with hue
ax.legend()

The legend will have odd right adjusted subtitle of "hello". Which won't be what the user expects. We could use an extra function to hide the subtitle legend entries (prefix the label with "_") from ax.legend() and then use the seaborn legend functions to "unhide" them when they draw the legend.

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 27, 2018

To be clear, scatterplot will draw its own legend and use this function internally to make the subtitles look nice.

A user might want to call ax.legend again because sometimes that's the easiest way to adjust the legend (I don't think the matplotlib legend object has an obvious public API for moving it, for example). And that's why we are making this function public, so that they can call it to fix the new legend . It's basically a hack, but I don't see any other way to get subtitles into the legend. (And subtitles in the right column convey the same information, they just don't look as nice).

@CRiddler
Copy link
Contributor

I see, so scatterplot will be changed to add in the subtitles as it adds in the points? Then these subtitle labels will show up in the legend no matter what (except for maybe the case where the user only passes 1 argument of hue or size or shape)?

What if by default the subtitles were hidden from the legend? So we had something that works like this:

  • User calls ax.legend(), legend is formed- all subtitles are invisible
  • User/seaborn calls sns.subtitle_legend(ax, **kwargs), legend is formed, and subtitles that were tucked away are now present.
  • User/seaborn calls sns.subtitle_legend(ax, legend_format, **kwargs) (where legend_format is a dictionary representing what subtitles map to what labels, as seen earlier in this thread). Legend is formed, and subtitles that were tucked away are ignored. New subtitles are added depending on the keys in the legend_format dictionary

subtitle_legend(fig_or_ax, legend_format=None, **kwargs), where fig_or_ax is a Figure or Axes, legend_format is an optional dictionary mapping {"subtitle1": ["label1", "label2"]…}, and kwargs are passed to a call of fig_or_ax.legend

This flexibility gives the user a choice on whether or not they want subtitles in the first place, determine if they want to redraw the legend to update some argument in **kwargs, or entirely change up the ordering of the subtitles/labels via the format_legend option. I prefer this approach, because then the user isn't stuck in a box of "If you don't want the subtitles, then you must delete the handle from inside the Axes."

figure_1

fig, ax = plt.subplots()

h1 = ax.scatter([], [], label="Title 1")
h2 = ax.scatter([], [], label="level 1")
h3 = ax.scatter([], [], label="level 2")
h4 = ax.scatter([], [], label="Title 2")
h5 = ax.scatter([], [], label="group A")
h5 = ax.scatter([], [], label="group B")
register_subtitles(h1, h4)

ax.set_title("Flexible Legend handling")
legend_format = {
    "custom Title 1": ["level 1", "level 2"],
    "custom Title 2": ["group A", "group B"],
}

### Add our Axes level legends
# Regular calls to construct legend, subtitles remain hidden
original_legend = ax.legend(loc="upper left", title="ax.legend()")
ax.add_artist(original_legend)

# Seaborn call to construct legend, subtitles are visible and centered
new_legend_noargs = subtitle_legend(ax, loc="upper center", title="subtitle_legend(ax)")
ax.add_artist(new_legend_noargs)

# Seaborn call to construct legend, subtitles follow the format provided
subtitle_legend(
    ax,
    legend_format=legend_format,
    loc="upper right",
    title="subtitle_legend(\n    ax, legend_format\n)",
)


### Do the same as above, but at the Figure level
fig.legend(loc="lower left", title="fig.legend()")

subtitle_legend(
    fig, loc="lower center", title="subtitle_legend(fig)"
)

subtitle_legend(
    fig,
    legend_format=legend_format,
    loc="lower right",
    title="subtitle_legend(\n    fig, legend_format\n)",
)

plt.show()

This is a little more of a complex set up on the backend, but I think this set up makes it magical when it needs to be "subtitle_legend(ax) creates a legend with subtitles from seemingly nothing", but also "transparent when it should be subtitle_legend(ax, legend_format) makes subtitles that I choose."

But if we want to keep it simple we can definitely implement a function that centers legend entries with alpha==0 and the other subtitled legend function that takes a dictionary.

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 27, 2018

What if by default the subtitles were hidden from the legend? So we had something that works like this.

seaborn's default behavior in most contexts is to add semantic labels where they exist, and would be here too.

@CRiddler
Copy link
Contributor

Fair enough, I'll get started on a PR implementing the two aforementioned functions.

  • center_subtitles(legend)
    centers any labels that have a handle with alpha=0

  • subtitle_legend(ax_or_fig, legend_format)
    creates a legend that follows the legend_format provided and uses the handles found in the supplied Axes (or if a Figure is provided, handles from Figure.axes)

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 27, 2018

A few initial suggestions:

center_subtitles(legend)

This would be better as align_subtitles (or ideally align_legend_subtitles) with a position parameter that defaults to "left" but accepts other alignments.

centers any labels that have a handle with alpha=0

I think we agreed that visible=False is a stronger signal.

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 27, 2018

subtitle_legend could have an optional remove boolean parameter that does the work of getting rid of the subtitles. Not an obvious API but would save some effort for those who know about it.

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 27, 2018

Also not something that needs to be in the first pass but it would be nice if subtitle_legend were eventually enhanced to let each subtitled group take up a column in the legend. I think this will often make for a more efficient use of space in a wide-aspect plot.

So to the extent that there are design decisions with tradeoffs, try to avoid making this impossible to add in the future (unclear if this will be a concern).

@CRiddler
Copy link
Contributor

visible=False

Yes, but we can't pass visible=False to ax.scatter or the like, whereas we can with alpha.

@mwaskom
Copy link
Owner Author

mwaskom commented Jun 28, 2018

Why not?

f, ax = plt.subplots()
ax.scatter(1, 1, visible=False)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants