Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using log-scale on the x-axis catplot gives wrong results #2006

Closed
cbpygit opened this issue Mar 25, 2020 · 4 comments
Closed

When using log-scale on the x-axis catplot gives wrong results #2006

cbpygit opened this issue Mar 25, 2020 · 4 comments

Comments

@cbpygit
Copy link

cbpygit commented Mar 25, 2020

The x-axis seems broken when switching to log-scale after plotting with catplot via set(xscale="log"). Here is a minimum working example:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

xx, yy = np.meshgrid(np.arange(1., 4.),
                     np.round(np.logspace(0, 2, 10), 2))

df = pd.DataFrame(np.stack([xx.ravel(), yy.ravel()]).T,
                  columns=['variable', 'x'])

df['value'] = np.log10(df['variable'] * df['x'])

g = sns.catplot(x='x', y='value', hue='variable',
                data=df, aspect=1.5, kind="point")

g.set(xscale="log")  # <-- comment/uncomment this line to see the issue
plt.grid(which='both')

With linear scale (g.set(xscale="log") commented out) this creates:
Screenshot from 2020-03-25 10-10-27

With log-scale it gives:
Screenshot from 2020-03-25 10-10-42

Issues observed with log-scale:

  • x-axis should go from 10^0 to 10^2, points are at wrong x-position
  • Lines are extended in –x-direction beyond data limits, and with a straight line

Environment:

  • matplotlib: 3.0.2
  • seaborn: 0.9.0
  • matplotlib backend: 'module://ipykernel.pylab.backend_inline'
@mwaskom
Copy link
Owner

mwaskom commented Mar 25, 2020

Setting a log scale on the x axis doesn't make sense. catplot is a categorical plot. From the docs:

This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.

Because you're setting the attributes directly at the matplotlib layer, there's nothing that can be done to prevent you from trying. But there should be no expectation that it will work.

@mwaskom mwaskom closed this as completed Mar 25, 2020
@cbpygit
Copy link
Author

cbpygit commented Mar 25, 2020

Thanks for the response @mwaskom!

I am sorry to say that, but it is kind of hard to agree here. First, data visualization of that kind should be agnostic to what the user tries to visualize. Why is it not valid to investigate a logarithmic functional dependency of a categorical variable? Esp., since the function has a kind="point" option drawing lines between the points! This is certainly only valid if we demand continuity of y(x), isn't it? Otherwise it would be very unclear what the lines try to indicate.

Despite from that, the mentioned note in the docs and the presumed fact that "log-scale does not make sense if x is categorical" will most probably not be apparent for the majority of seaborn users. Just like g.set(xscale="log") does not feel like "leaving the seaborn domain" or brute-forcing a matplotlib-solution on top.

Hence, I see quite some risk that this can be a pitfall for many others. And without knowing how complex it would be to fix it, I would love to simply see it working as (wrongly) expected :)

@cbpygit
Copy link
Author

cbpygit commented Mar 25, 2020

Anyway, just in case other stumble upon this issue:

You can simply use the FacetGrid directly and resolve the issue. Here is a fixed version of the code above which plots properly:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

xx, yy = np.meshgrid(np.arange(1., 4.),
                     np.round(np.logspace(0, 2, 10), 2))

df = pd.DataFrame(np.stack([xx.ravel(), yy.ravel()]).T,
                  columns=['variable', 'x'])

df['value'] = np.log10(df['variable'] * df['x'])


def logplot(x, y, **kwargs):
    ax = plt.gca()
    data = kwargs.pop("data")
    data.plot(x, y, ax=ax, style='.-', ms=15, lw=2, grid=True, **kwargs)
    ax.set_xscale("log")

    
g = sns.FacetGrid(df, hue='variable', aspect=1.5, height=5.)
g = g.map_dataframe(logplot, "x", "value")
g.add_legend()

Result:

Screenshot from 2020-03-25 19-46-47

@cbpygit
Copy link
Author

cbpygit commented Mar 26, 2020

Just another proposal @mwaskom: to be on the safe side it would be a good measure to protect the user from making this mistake. Since, for example, g.set(yscale="log") would be perfectly valid, while g.set(xscale="log") is not, we could add a check for it as indicated here:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

xx, yy = np.meshgrid(np.arange(1., 4.),
                     np.round(np.logspace(0, 2, 10), 2))

df = pd.DataFrame(np.stack([xx.ravel(), yy.ravel()]).T,
                  columns=['variable', 'x'])

df['value'] = np.log10(df['variable'] * df['x'])

g = sns.catplot(x='x', y='value', hue='variable',
                data=df, aspect=1.5, kind="point")


# Example how to patch the `set` method
def set_safe(self, **kwargs):
    if 'xscale' in kwargs and kwargs['xscale'] == 'log':
        raise ValueError('Setting log scale on a categorical axis is '
                         'not valid.')
    return self.set(**kwargs)

set_safe(g, xscale="log")  # <-- comment/uncomment this line to see the issue
plt.grid(which='both')

Of course this should happen as a monkey-patch on the FacetGrid called by catplot. Another way would be to support certain axis protections in FacetGrid, which are then activated on initialization in catplot.

Note: a cleaner way would be to use a subclass of matplotlib.Axis for the x-axis in catplot that prohibits certain transformations, because this would be the only way to prevent this mistake in all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants