-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Frequently Asked Questions (FAQs)
This is a collection of answers to questions that are commonly raised about seaborn
. It may eventually be integrated into the docs. Suggestions for topics to add are very welcome!
It looks like you successfully installed seaborn by doing pip install seaborn
, but it cannot be imported: you get ModuleNotFoundError: No module named 'seaborn'
when you try.
This is probably not a seaborn
problem, per se. If you have multiple Python environments on your computer, it is possible that you did pip install
in one environment and tried to import the library in another. On a unix system, you could check whether the terminal commands which pip
, which python
, and (if applicable) which jupyter
point to the same bin/
directory. If not, you'll need to sort out the definition of your $PATH
variable.
Two alternate patterns for installing with pip
may also be more robust to this problem:
- Invoke
pip
on the command line withpython -m pip install <package>
rather thanpip install <package>
- Use
%pip install <package>
in a Jupyter notebook to install it in the same place as the kernel
You've definitely installed seaborn in the right place, but importing it produces a long traceback and a confusing error message, perhaps something like ImportError: DLL load failed: The specified module could not be found
.
Such errors usually indicate a problem with the way Python libraries are using compiled resources. Because seaborn is pure Python, it won't directly encounter these problems, but its dependencies (numpy, scipy, matplotlib and pandas) might. To fix the problem, you'll first need to read through the traceback and figure out which dependency was being imported at the time of the error. Then consult the installation documentation for the relevant package, which might have advice for getting an installation working on your specific system.
The most common culprit of these issues is scipy, which has many compiled components. Starting in seaborn version 0.12, scipy will become an optional dependency, which should help to reduce the frequency of these issues.
You're calling seaborn functions — maybe in a terminal or IDE with an integrated IPython console — and they are printing statements like <AxesSubplot:>
or <seaborn.axisgrid.FacetGrid at 0x7fe0a963ec40>
but not showing any plots.
In matplotlib, there is a distinction between creating a figure and showing it, and in some cases it's necessary to explicitly call plt.show()
at the point when you want to see the plot. Because that command blocks by default and is not always desired (for instance, you may be executing a script that saves files to disk) seaborn does not deviate from standard matplotlib practice here.
Yet most of the examples in the seaborn docs do not have this line, because there are multiple ways to avoid needing it. In a Jupyter notebook with the "inline" (default) or "widget" backends, plt.show
is automatically called, so any figures will appear in a cell's outputs. You can also activate a more interactive experience by executing %matplotlib
in any Jupyter or IPython interface or by calling plt.ion()
anywhere in Python. Both methods will configure matplotlib to show or update the figure after every plotting command.
You're using seaborn in a Jupyter notebook and every cell prints something like <AxesSuplot:>
or <seaborn.axisgrid.FacetGrid at 0x7f840e279c10>
before showing the plot.
Jupyter notebooks will show the result of the final statement in the cell as part of its output, and each of seaborn's plotting functions return a reference to the matplotlib or seaborn object that contain the plot. If this is bothersome, you can suppress this output in a few ways:
- Always assign the result of the final statement to a variable (e.g.
ax = sns.histplot(...)
) - Add a semicolon to the end of the final statement (e.g.
sns.histplot(...);
) - End every cell with a function that has no return value (e.g.
plt.show()
, which isn't needed but also causes no problems) - Add cell metadata tags, if you're converting the notebook to a different representation
The default "inline" backend (defined by IPython) uses an unusually low dpi ("dots per inch") for figure output. This is a space-saving measure: lower dpi figures take up less disk space. (Also, lower dpi inline graphics appear physically smaller because they are represented as PNGs, which do not exactly have a concept of resolution.) So one faces an economy/quality tradeoff.
You can increase the dpi for all plots by doing plt.rcParams.update({"figure.dpi": 100})
or sns.set_theme(..., rc={"figure.dpi": 100})
. Or, if you have a high pixel-density monitor, you can make your plots sharper by setting %config InlineBackend.figure_format = "retina"
. This won't change the apparent size of your plots in a Jupyter interface, but they might appear very large in other contexts (i.e. on GitHub). And they will take up 4x the disk space.
Alternatively, you can set %config InlineBackend.figure_format = "svg"
to output vector graphics with "infinite resolution". The downside is that file size will now scale with the number and complexity of the artists in your plot and in some cases (e.g., a large scatterplot matrix) the load will impact browser responsiveness.
You've encountered the term "figure-level" or "axes-level", maybe in the seaborn docs, StackOverflow answer, or GitHub thread, but you don't understand what it means.
In brief, all plotting functions in seaborn fall into one of two categories:
- "axes-level" functions, which plot onto a single subplot that may or may not exist at the time the function is called
- "figure-level" functions, which internally create a matplotlib figure, potentially including multiple subplots
This design is intended to satisfy two objectives:
- seaborn should offer functions that are "drop-in" replacements for matplotlib methods
- seaborn should be able to produce figures that show "facets" or marginal distributions on distinct subplots
The figure-level functions always combine one or more axes-level functions with a Grid
object that manages the figure. So, for example, relplot
is a figure-level function that combines either scatterplot
or lineplot
with a FacetGrid
. jointplot
is a figure-level function that can combine multiple different axes-level functions — scatterplot
and histplot
by default — with a JointGrid
.
If all you're doing is creating a plot with a single seaborn function call, this is not something you need to worry too much about. But it becomes relevant when you want to customize at a level beyond what the API of each function offers. It is also the source of various other points of confusion, so it is an important distinction understand (at least broadly) and keep in mind.
This is explained in more detail in the seaborn user guide and in this blog post.
Next to the figure-level/axes-level distinction, this concept is probably the second biggest source of confusing behavior.
Several seaborn functions are referred to as "categorical" because they are designed to support a use-case where either the x or y variable in a plot is categorical (that is, the variable takes a finite number of potentially non-numeric values).
At the time these functions were written, matplotlib did not have any direct support for non-numeric data types. So seaborn internally builds a mapping from unique values in the data to 0-based integer indexes, which is what it passes to matplotlib. If your data are strings, that's great, and it more-or-less matches how matplotlib now handles string-typed data.
But a major gotcha is that these functions always do this, even if both the x and y variables are numeric. This gives rise to a number of confusing behaviors, especially when mixing categorical and non-categorical plots (e.g., a combo bar-and-line plot).
A future release will add the option to treat numeric variables as numeric in this set of functions, but the current behavior will almost certainly remain the default, so this is an important API wrinkle to understand.
To get the most out of seaborn, your data should have a "long-form" or "tidy" representation. In a dataframe, this means that each variable has its own column, each observation has its own row, and each value has its own cell. With long-form data, you can succinctly and exactly specify a visualization by assigning variables in the dataset (columns) to roles in the plot.
Data organization is a common stumbling block for beginners, in part because data are often not collected or stored in a long-form representation. Therefore, it is often necessary to reshape the data using pandas before plotting, and data reshaping is a complex undertaking that requires both a solid grasp of dataframe structure and knowledge of the pandas API.
But while seaborn is most powerful when provided with long-form data, nearly every seaborn function will accept and plot "wide-form" data, which you can achieve by passing a dataset to seaborn's data=
parameter without specifying any other plot variables (x
, y
, and so on). You'll be much more limited when using wide-form data: each function can make only one kind of wide-form plot. In most cases, seaborn tries to match what matplotlib or pandas would do with a dataset of the same structure. But while reshaping your data into long-form will give you substantially more flexibility, it can be helpful to take a quick look at your data very early in the process, and seaborn tries to make this possible.
Understanding how your data should be represented — and how to get it that way if it starts out messy — is very important for making efficient and complete use of seaborn, and it is elaborated on at length in the user-guide.
Generally speaking, no: seaborn is quite flexible about how your dataset needs to be represented.
In most cases, long-form data represented by multiple vector-like types can be passed directly to x
, y
, or other plotting parameters. Or you can pass a dictionary of vector types to data
rather than a DataFrame. And when plotting with wide-form data, you can use a 2D numpy array or even nested lists to plot in wide-form mode.
There are a couple older functions (namely, catplot
and lmplot
) that do require you to pass a pandas DataFrame. But at this point, they are the exception, and they will gain more flexibility over the next few release cycles.
This is going to be more complicated than you might hope, in part because there are multiple ways to change the figure size in matplotlib, and in part because of the figure-level/axes-level distinction in seaborn.
In matplotlib, you can usually set the default size for all figures with the figure.figsize
rcParam, and you can set the size of an individual figure when you create it (e.g. plt.subplots(figsize=(w, h))
). If you're using an axes-level seaborn function, both of these will work as expected.
Figure-level functions both ignore the default figure size in the rcParams
and parameterize the figure size differently. When calling a figure-level function, you can pass values to height=
and aspect=
to set (roughly) the size of each subplot. The advantage here is that the size of the figure automatically adapts when you add faceting variables. But it can be confusing.
Fortunately, there's a consistent way to set the exact figure size in a function-independent manner. Instead of setting the figure size when the figure is created, modify it after you plot by calling obj.figure.set_size_inches(...)
, where obj
is either a matplotlib axes (usually assigned to ax
) or a seaborn FacetGrid
(usually assigned to g
).
(Note that g.figure
exists only on seaborn >= 0.11.2; before that you'll have to access g.fig
).
Also, if you're making pngs (or in a Jupyter notebook), you can — perhaps surprisingly — scale all your plots up or down by changing the dpi (figure.set_dpi(...)
).
You've explicitly created a matplotlib figure with one or more subplots and tried to draw a seaborn plot on it, but you end up with an extra figure and a blank subplot. Perhaps your code looks something like
f, ax = plt.subplots()
sns.catplot(..., ax=ax)
This is a figure-level/axes-level gotcha. Figure-level functions always create their own figure, so you can't direct them towards an existing axes the way you can with axes-level functions. Most functions will warn you when this happens, suggest the appropriate axes-level function, and ignore the ax=
parameter. A few older functions might put the plot where you want it (because they internally pass ax
to their axes-level function) while still creating an extra figure. This latter behavior should be considered a bug, and it is not to be relied on.
The way things currently work, you can either set up the matplotlib figure yourself, or you can use a figure-level function, but you can't do both at the same time.
You're trying to create a single plot using multiple seaborn functions, perhaps by drawing a lineplot
or regplot
over a barplot
, boxplot
, stripplot
, or violinplot
. You expect the line to go through the mean value for each box (etc.), but it doesn't seem to line up, or maybe it's all the way off to the side.
You are trying to combine a "categorical plot" with another plot type. If your x
variable has numeric values, it seems like this should work. But recall: seaborn's categorical plots map unique values on the categorical axis to integer indexes. So if your data have unique x
values of 1, 6, 20, 94, the corresponding plot elements will get drawn at 0, 1, 2, 3 (and the tick labels will be changed to represent the actual value).
The line or regression plot doesn't know that this has happened, so it will use the actual numeric values, and the plots won't line up at all.
As of now, there are two ways to work around this. In situations where you want to draw a line, you could use the (somewhat misleadingly named) pointplot
function, which is also a "categorical" function and will use the same rules for drawing the plot. If this doesn't solve the problem (for one, it's not as visually flexibly as lineplot
), you could implement the mapping from actual values to integer indexes yourself and draw the plot that way:
sns.violinplot(data=df, x="x", y="y")
unique_xs = list(sorted(df["x"].unique()))
sns.lineplot(data=df, x=df["x"].map(unique_xs.index), y="y")
This is something that will be easier in a planned future release, as it will become possible to make the categorical functions treat numeric data as numeric.
TODO (sns.move_legend
)
TODO (matplotlib kwargs, matplotlib axes methods, or matplotlib artist attributes)
TODO (yep)
You prefer to use matplotlib's explicit or "object-oriented" interface, because it makes your code easier to reason about and maintain. But the object-orient interface consists of methods on matplotlib objects, whereas seaborn offers you independent functions.
This is another case where it will be helpful to keep the figure-level/axes-level distinction in mind.
Axes-level functions can be used like any matplotlib axes method, but instead of calling ax.func(...)
, you call func(..., ax=ax)
. They also return the axes object (which they may have created, if no figure was currently active in matplotlib's global state). You can use the methods on that object to further customize the plot even if you didn't start with plt.figure
or plt.subplots
:
ax = sns.histplot(...)
ax.set(...)
Figure-level functions can't be directed towards an existing figure, but they do store the matplotlib objects on the FacetGrid
object that they return (which seaborn docs always assign to a variable named g
).
If your figure-level function created only one subplot, you can access it directly:
g = sns.displot(...)
g.ax.set(...)
For multiple subplots, you can either use g.axes
(which is always a 2D array of axes) or g.axes_dict
(which maps the row/col keys to the corresponding matplotlib object):
g = sns.displot(..., col=...)
for col, ax in g.axes_dict.items():
ax.set(...)
But if you're batch-setting attributes on all subplots, use the set
method on the Grid
object itself rather than iterating over the individual axes:
g = sns.displot(...)
g.set(...)
To access the underlying matplotlib figure, use g.figure
on seaborn >= 0.11.2 or g.fig
on any other version.
Nothing like this is built into seaborn, but matplotlib v3.4.0 added a convenience function that makes it relatively easy. Here are a couple of recipes; note that you'll need to use a different approach depending on whether your bars come from a figure-level or axes-level function:
# Axes-level
ax = sns.histplot(df, x="x_var")
for bars in ax.containers:
plt.bar_label(bars)
# Figure-level, one subplot
g = sns.displot(df, x="x_var")
for bars in g.ax.containers:
plt.bar_label(bars)
# Figure-level, multiple subplots
g = sns.displot(df, x="x_var", col="col_var)
for ax in g.axes.flat:
for bars in ax.containers:
plt.bar_label(bars)
There's no direct support for this in seaborn, but matplotlib has a "dark_background" style-sheet that you could use, e.g.:
sns.set_theme(style="ticks", rc=plt.style.library["dark_background"])
Note that "dark_background" changes the default color palette to "Set2", and that will override any palette you define in set_theme
. If you'd rather use a seaborn (or any other) color palette, you'll have to call sns.set_palette
separately. The default seaborn palette ("deep") has poor contrast against a dark background, so you'd be better off using "muted", "bright", or "pastel".
TODO (nope)
This is an obscure reference to the namesake of the library, but you can also think of it as "seaborn name space".
Good question. Probably because you get to use the word "geom" a lot, and it's fun to say. "Geom". "Geeeeeooom".