Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: allow seaborn to annotate a subplot with 0 counts, when there are 0 counts for all bars in a subplot #3568

Closed
jonng1000 opened this issue Nov 22, 2023 · 9 comments

Comments

@jonng1000
Copy link

Hi, first up, wanna say to the developer, thanks for developing seaborn! It has been a great tool for me to visualise data =). Would like to ask if its possible to add the ability for seaborn, to annotate a subplot with 0 counts, when there are 0 counts for all bars in a subplot? A more detailed explanation is below.

I have code using seaborn catplot, to draw categorical plots onto a FacetGrid. I am using a countplot in the catplot function, hence am using kind='count'. The col argument in the catplot is set to the col_cat variable, which in this context is defined as age_category. age_category is a column in my df, which as its name suggests, represents age categories. This is an ordered pandas categorical dtype.

My df is as follows:

ipdb> df
                         spirometryResult_category     age_category habits-smoking
_id                                                                               
63bb97708e5f58ef85f6e4ea                    Normal  20-39 years old            Yes
63bd1b228e5f58ef85f73130                    Normal  20-39 years old            Yes
6423cb1c174e67af0aa0f0fc                    Normal  20-39 years old             No
6423d85e174e67af0aa10cda               Restrictive  20-39 years old             No
6423d8bb174e67af0aa10d98               Obstructive  20-39 years old             No
...                                            ...              ...            ...
6549a0df0941d048fdfd94c4               Obstructive  20-39 years old             No
6549d0ab0941d048fdfd960d                    Normal  40-59 years old             No
6549d0ee0941d048fdfd962b                    Normal  20-39 years old             No
654b17a20941d048fdfda256                    Normal  20-39 years old             No
654d81700941d048fdfdc27d                    Normal  40-59 years old             No

[106 rows x 3 columns]

The age_category column in df is as follows:

ipdb> df['age_category']
_id
63bb97708e5f58ef85f6e4ea    20-39 years old
63bd1b228e5f58ef85f73130    20-39 years old
6423cb1c174e67af0aa0f0fc    20-39 years old
6423d85e174e67af0aa10cda    20-39 years old
6423d8bb174e67af0aa10d98    20-39 years old
                                 ...       
6549a0df0941d048fdfd94c4    20-39 years old
6549d0ab0941d048fdfd960d    40-59 years old
6549d0ee0941d048fdfd962b    20-39 years old
654b17a20941d048fdfda256    20-39 years old
654d81700941d048fdfdc27d    40-59 years old
Name: age_category, Length: 106, dtype: category
Categories (4, object): ['20-39 years old' < '40-59 years old' < '60-79 years old' < '>= 80 years old']

The distribution of categories in the age_category column is as follows:

ipdb> df['age_category'].value_counts()
age_category
20-39 years old    89
40-59 years old    14
60-79 years old     3
>= 80 years old     0
Name: count, dtype: int64

The number of subjects in the age category of '>= 80 years old' is 0, which gives me problems in plotting its annotations for the bars.

In general, the code which is below works. My objective is to plot multiple subplots, one for each age category, showing the subject counts for each combination of spirometryResult_category and habits-smoking.

    # Getting colours as specified in the config, for each hue category
    # Need to remove this hardcoding when i improve script
    colour_map =  config['seaborn_colourmaps'][hue_cat]

    # Plotting graph
    # count refers to param_category counts
    plt.subplots(figsize=figsize)
    # Not sure why setting axes.labelsize here doesnt
    # work
    sns.set_context('paper', rc={'font.size':fontsize})
    # height=4, aspect=.6,
    g = sns.catplot(
        data=df, x=param_category, hue=hue_cat, col=col_cat,
        kind='count', palette=colour_map, col_wrap=wrap_num,
        saturation=1
    )

    for ax in g.axes: 
        ax.tick_params(left=False, labelbottom=True)
        ax.set_xticklabels(ax.get_xticklabels(), size=fontsize)
        # Replacing subplot title if needed
        if col_cat in config['seaborn_alt_names']:
            new_title = config['seaborn_alt_names'][col_cat]
            ax.set_title( ax.get_title().replace(col_cat, new_title), size=fontsize)
        # Auto-label bars
        for container in ax.containers:
            container.datavalues = np.nan_to_num(container.datavalues)
            ax.bar_label(container, fmt='%.0f', padding=2)

    # In contrast to prev plotting code, despine goes here, as facetgrid
    # requires it to be done this way
    g.despine(top=True, right=True, left=True)
    # Fine adjustment of aesthetics    
    g.set(yticklabels=[], ylabel=None, xlabel=None)
    g.tick_params('x', rotation=90)
    # Checking if legend title is needed
    legend = False
    if 'legend' in plot_info:
        legend = plot_info['legend']
    if not legend:
        g.get_legend().set_title(None)
    else:
        # If an alternative legend title is specified,
        # use that, if not, use the default one
        if hue_cat in config['seaborn_alt_names']:
            new_title = config['seaborn_alt_names'][hue_cat]
            g.legend.set_title(new_title)
    # Continuing adjustment of aesthetics
    plt.subplots_adjust(hspace=1, wspace=0.3)
    g.figure.savefig(filename, bbox_inches='tight')
    plt.close()

The output picture is show here:
image

As you can see, the category of ">= 80 years old" has no subjects, hence for its corresponding subplots, the text "0" is not plotted at all. All other age categories have their corresponding bars and annotations created correctly. For this case, where ">= 80 years old" has no subjects, ax.containers is an empty list, therefore my for loop usingfor container in ax.containers:to annotate cases with 0 counts, does not work.

How do I force seaborn to annotate subplots with 0 counts, in the correct location (automatically decided by seaborn so i dont have to hardcode anything), in this case, where the category has 0 subjects, and ax.containers is an empty list? Seaborn doesn't seem to allow me to do that, so would it be possible to add this in please?

@thuiop
Copy link
Contributor

thuiop commented Nov 22, 2023

First, it is better to provide a reproducible example, typically using the datasets integrated to seaborn. But at least your problem is somewhat clear.

Secondly, this is not entirely a seaborn issue. You are basically using a "hack" to get your desired result, as catplot does not provide a way to annotate bars ; this part is pure matplotlib. You could achieve your desired by manually adding the text at the correct positions (by using ax.text, which you can get by checking the ax.texts for the axes where you do have the text.
If you really want to do this entirely with seaborn you can probably achieve it using the objects interface ; I would have provided the necessary code but since I do not have your dataset I cannot ensure it gives the desired result.

@mwaskom
Copy link
Owner

mwaskom commented Nov 22, 2023

Thanks @thuiop, that's one good suggestion. Alternatively, if you want to have "0 height bars" you can compute the counts yourself (including for the unobserved categories) and then use barplot.

I am going to close as this is a question about building on top of seaborn not a question about seaborn functionality itself. You may want to ask on StackOverflow (but you'll definitely want to reduce the example to something that others can work with).

@mwaskom mwaskom closed this as completed Nov 22, 2023
@jonng1000
Copy link
Author

@thuiop thanks for your advice, will take it to heart for future posts =). Thanks for your advice on how to solve this, I am interested in seeing your solution and learning from you, so would it be possible for me to see your code please? Not sure if this is allowed since mwaskom has closed this thread, but if it is, I have provided a modified form of my dataset here
modified_spiro_data.csv

I also modified my code above to make it work with this attached dataset, not sure if you want it, but it is below

import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

FILE = 'modified_spiro_data.csv'
OUTPUT = 'spiro_github.png'

df = pd.read_csv(FILE, sep=',') 

# Get age_category
categories = pd.cut(df['age'], [20, 40, 60, 80, 101], right=False, 
                    labels=['20-39 years old', '40-59 years old', 
                            '60-79 years old', '>= 80 years old'])
df['age_category'] = categories

# Getting colours as specified in the config, for each hue category
# Need to remove this hardcoding when i improve script
colour_map =  {'Yes': '#FF5E5E',
               'No': '#33CCCC'}
# Making a mock config dict, as this is a stand alone script, for testing
config = {'seaborn_alt_names': {'habits-smoking': 'Smoking'}}

# Plotting graph
# count refers to param_category counts
plt.subplots(figsize=(6,6))
# Not sure why setting axes.labelsize here doesnt
# work
sns.set_context('paper', rc={'font.size':11})
# height=4, aspect=.6,
# Defining arguments, as this is a stand alone script, for testing
param_category = 'spirometryResult'
hue_cat = 'habits-smoking'
col_cat = 'age_category'
wrap_num = 2
# Makes plot
g = sns.catplot(
    data=df, x=param_category, hue=hue_cat, col=col_cat,
    kind='count', palette=colour_map, col_wrap=wrap_num,
    saturation=1
)

# Customise annotation and axes labels and ticks
for ax in g.axes: 
    ax.tick_params(left=False, labelbottom=True)
    ax.set_xticklabels(ax.get_xticklabels(), size=11)
    # Replacing subplot title if needed
    if col_cat in config['seaborn_alt_names']:
        new_title = config['seaborn_alt_names'][col_cat]
        ax.set_title( ax.get_title().replace(col_cat, new_title), size=11)
    # Auto-label bars
    for container in ax.containers:
        container.datavalues = np.nan_to_num(container.datavalues)
        ax.bar_label(container, fmt='%.0f', padding=2)

# In contrast to prev plotting code, despine goes here, as facetgrid
# requires it to be done this way
# g.despine can be used here, as this is for the whole figure
g.despine(top=True, right=True, left=True)

# Fine adjustment of aesthetics    
g.set(yticklabels=[], ylabel=None, xlabel=None)
g.tick_params('x', rotation=90)
# If an alternative legend title is specified,
# use that, if not, use the default one
new_title = config['seaborn_alt_names'][hue_cat]
g.legend.set_title(new_title)
# Continuing adjustment of aesthetics
plt.subplots_adjust(hspace=1, wspace=0.3)
g.figure.savefig(OUTPUT, bbox_inches='tight')
plt.close()

Picture of the output is below
spiro_github

@jonng1000
Copy link
Author

@mwaskom thanks very much for your reply too. Yes using a barplot instead could work, so I will keep it in mind =)

@mwaskom
Copy link
Owner

mwaskom commented Nov 23, 2023

@jonng1000 my unsolicited advice is that you'd make it a lot easier for someone to help you if you reduce your example much further to eliminate any steps that aren't relevant to the specific question that you are asking. For example, all of the aesthetic tweaks you're doing (despining, modifying labels, setting non-default colors, etc.) are orthogonal to the specific question that you have, but they make it harder to digest your code and suggest where you want to make changes. Additionally, it is a good exercise for you: often the process of simplifying an example will help you understand the parts better and perhaps even give rise to a spontaneous insight about the solution.

This post may be helpful (it is written about "bug reports" but most of it applies here too): https://matthewrocklin.com/minimal-bug-reports

@thuiop
Copy link
Contributor

thuiop commented Nov 23, 2023

@jonng1000 it was trickier than I thought actually, so I had to do the counting with pandas instead of full seaborn ; this code should do the trick

sizes = df.groupby([col_cat,hue_cat,param_category]).size()
grouped_df = pd.DataFrame({"size":sizes})

fig = plt.figure()
p = (
	so.Plot(data=grouped_df,x=param_category,y="size",color=hue_cat,text="size")
	.facet(col=col_cat,wrap=wrap_num)
	.add(so.Bar(),so.Dodge(empty="fill"))
	.add(so.Text(valign="bottom",color="black"),so.Dodge(empty="fill"))
	.scale(color=so.Nominal(['#FF5E5E','#33CCCC'],order=["Yes","No"]))
)
p.on(fig).plot()
for ax in fig.axes:
	ax.xaxis.set_tick_params(rotation=90)
plt.show()

I did not do all the aesthetic stuff, I leave that to you to tinker with (I agree with the above post though, a lot of it is a bit outside of scope of your real question).
@mwaskom I actually ran into a rough edge here ; adding the text argument to Plot has a weird effect on the widths of the bars ; I believe this is due to the fact that the additional grouping introduces a lot of elements with a NaN width, and in the default mode for Dodge these are filled with zeros. Adding empty="fill" restores the original behaviour. I don't know if the default should be changed there ? This does seem like a messy issue. Here is some code to reproduce if needed :

import seaborn as sns
import seaborn.objects as so

tips = sns.load_dataset("tips")
print(tips)
p = (
    so.Plot(tips, x="time", color="smoker",text="time") # Remove the text argument here
    .add(so.Bar(), so.Count(), so.Dodge())
)
p.show()

@mwaskom
Copy link
Owner

mwaskom commented Nov 23, 2023

I think that might be the issue tracked here? #2981

@thuiop
Copy link
Contributor

thuiop commented Nov 23, 2023

Oh, yes, seems like it.

@jonng1000
Copy link
Author

@mwaskom and @thuiop ok thanks so much for your advice on how to improve future questions, will keep it in mind. thuiop, thanks so much for giving me your solution too =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants