Skip to content

Commit

Permalink
Merge branch 'master' into v0.13
Browse files Browse the repository at this point in the history
  • Loading branch information
mwaskom committed Sep 29, 2023
2 parents 43d762e + a8b6cac commit d657c54
Show file tree
Hide file tree
Showing 11 changed files with 57 additions and 29 deletions.
2 changes: 1 addition & 1 deletion doc/_docstrings/catplot.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
"source": [
"sns.catplot(\n",
" data=df, x=\"age\", y=\"class\", hue=\"sex\",\n",
" kind=\"violin\", bw=.25, cut=0, split=True,\n",
" kind=\"violin\", bw_adjust=.5, cut=0, split=True,\n",
")"
]
},
Expand Down
1 change: 1 addition & 0 deletions doc/_templates/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
Archive
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="dropdownMenuLink">
<a class="dropdown-item" href="/archive/0.12/index.html">v0.12</a>
<a class="dropdown-item" href="/archive/0.11/index.html">v0.11</a>
<a class="dropdown-item" href="/archive/0.10/index.html">v0.10</a>
<a class="dropdown-item" href="/archive/0.9/index.html">v0.9</a>
Expand Down
27 changes: 20 additions & 7 deletions doc/_tutorial/categorical.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@
"cell_type": "raw",
"metadata": {},
"source": [
"Unlike with numerical data, it is not always obvious how to order the levels of the categorical variable along its axis. In general, the seaborn categorical plotting functions try to infer the order of categories from the data. If your data have a pandas ``Categorical`` datatype, then the default order of the categories can be set there. If the variable passed to the categorical axis looks numerical, the levels will be sorted. But the data are still treated as categorical and drawn at ordinal positions on the categorical axes (specifically, at 0, 1, ...) even when numbers are used to label them:"
"Unlike with numerical data, it is not always obvious how to order the levels of the categorical variable along its axis. In general, the seaborn categorical plotting functions try to infer the order of categories from the data. If your data have a pandas ``Categorical`` datatype, then the default order of the categories can be set there. If the variable passed to the categorical axis looks numerical, the levels will be sorted. But, by default, the data are still treated as categorical and drawn at ordinal positions on the categorical axes (specifically, at 0, 1, ...) even when numbers are used to label them:"
]
},
{
Expand All @@ -145,6 +145,22 @@
"sns.catplot(data=tips.query(\"size != 3\"), x=\"size\", y=\"total_bill\")"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"As of v0.13.0, all categorical plotting functions have a `native_scale` parameter, which can be set to `True` when you want to use numeric or datetime data for categorical grouping without changing the underlying data properties: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.catplot(data=tips.query(\"size != 3\"), x=\"size\", y=\"total_bill\", native_scale=True)"
]
},
{
"cell_type": "raw",
"metadata": {},
Expand Down Expand Up @@ -205,7 +221,7 @@
"cell_type": "raw",
"metadata": {},
"source": [
"When adding a ``hue`` semantic, the box for each level of the semantic variable is moved along the categorical axis so they don't overlap:"
"When adding a ``hue`` semantic, the box for each level of the semantic variable is made narrower and shifted along the categorical axis:"
]
},
{
Expand All @@ -221,7 +237,7 @@
"cell_type": "raw",
"metadata": {},
"source": [
"This behavior is called \"dodging\" and is turned on by default because it is assumed that the semantic variable is nested within the main categorical variable. If that's not the case, you can disable the dodging:"
"This behavior is called \"dodging\", and it is controlled by the `dodge` parameter. By default (as of v0.13.0), elements dodge only if they would otherwise overlap:"
]
},
{
Expand All @@ -231,10 +247,7 @@
"outputs": [],
"source": [
"tips[\"weekend\"] = tips[\"day\"].isin([\"Sat\", \"Sun\"])\n",
"sns.catplot(\n",
" data=tips, x=\"day\", y=\"total_bill\", hue=\"weekend\",\n",
" kind=\"box\", dodge=False,\n",
")"
"sns.catplot(data=tips, x=\"day\", y=\"total_bill\", hue=\"weekend\", kind=\"box\")"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion doc/_tutorial/data_structure.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"As a data visualization library, seaborn requires that you provide it with data. This chapter explains the various ways to accomplish that task. Seaborn supports several different dataset formats, and most functions accept data represented with objects from the `pandas <https://pandas.pydata.org/>`_ or `numpy <https://numpy.org/>`_ libraries as well as built-in Python types like lists and dictionaries. Understanding the usage patterns associated with these different options will help you quickly create useful visualizations for nearly any dataset.\n",
"\n",
".. note::\n",
" As of current writing (v0.11.0), the full breadth of options covered here are supported by only a subset of the modules in seaborn (namely, the :ref:`relational <relational_api>` and :ref:`distribution <distribution_api>` modules). The other modules offer much of the same flexibility, but have some exceptions (e.g., :func:`catplot` and :func:`lmplot` are limited to long-form data with named variables). The data-ingest code will be standardized over the next few release cycles, but until that point, be mindful of the specific documentation for each function if it is not doing what you expect with your dataset."
" As of current writing (v0.13.0), the full breadth of options covered here are supported by most, but not all, of the functions in seaborn. Namely, a few older functinos (e.g., :func:`lmplot` and :func:`regplot`) anre more limited in what they accept."
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions doc/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,9 +122,9 @@ Several :ref:`seaborn functions <categorical_api>` are referred to as "categoric

At the time these functions were written, matplotlib did not have any direct support for non-numeric data types. So seaborn internally builds a mapping from unique values in the data to 0-based integer indexes, which is what it passes to matplotlib. If your data are strings, that's great, and it more-or-less matches how `matplotlib now handles <https://matplotlib.org/stable/gallery/lines_bars_and_markers/categorical_variables.html>`_ string-typed data.

But a potential gotcha is that these functions *always do this*, even if both the x and y variables are numeric. This gives rise to a number of confusing behaviors, especially when mixing categorical and non-categorical plots (e.g., a combo bar-and-line plot).
But a potential gotcha is that these functions *always do this by default*, even if both the x and y variables are numeric. This gives rise to a number of confusing behaviors, especially when mixing categorical and non-categorical plots (e.g., a combo bar-and-line plot).

The v0.12 release added a `native_scale` parameter to :func:`stripplot` and :func:`swarmplot`, which provides control over this behavior. It will be rolled out to other categorical functions in future releases. But the current behavior will almost certainly remain the default, so this is an important API wrinkle to understand.
The v0.13 release added a `native_scale` parameter which provides control over this behavior. It is `False` by default, but setting it to `True` will preserve the original properties of the data used for categorical grouping.

Specifying data
---------------
Expand Down
8 changes: 4 additions & 4 deletions doc/whatsnew/v0.13.0.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
v0.13.0 (Unreleased)
--------------------
v0.13.0 (September 2023)
------------------------

This is a major release with a number of important new features and changes. The highlight is a major overhaul to seaborn's categorical plotting functions, providing them with many new capabilities and better aligning their API with the rest of the library. There is also provisional support for alternate dataframe libraries like `polars <https://www.pola.rs>`_, a new theme and display configuration system for :class:`objects.Plot`, and many smaller bugfixes and enhancements.

Expand Down Expand Up @@ -34,8 +34,8 @@ Two related idiosyncratic color specifications are deprecated, but they will con

Finally, like other seaborn functions, the default palette now depends on the variable type, and a sequential palette will be used with numeric data. To retain the previous behavior, pass the name of a qualitative palette (e.g., `palette="deep"` for seaborn's default). Accordingly, the functions have gained a parameter to control numeric color mappings (`hue_norm`).

Other features, enhancements, and changes to categorical plots
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Other features, enhancements, and changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following updates apply to multiple categorical functions.

Expand Down
7 changes: 3 additions & 4 deletions examples/grouped_violinplots.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,12 @@
_thumb: .44, .47
"""
import seaborn as sns
sns.set_theme(style="whitegrid")
sns.set_theme(style="dark")

# Load the example tips dataset
tips = sns.load_dataset("tips")

# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(data=tips, x="day", y="total_bill", hue="smoker",
split=True, inner="quart", linewidth=1,
palette={"Yes": "b", "No": ".85"})
sns.despine(left=True)
split=True, inner="quart", fill=False,
palette={"Yes": "g", "No": ".35"})
4 changes: 2 additions & 2 deletions seaborn/_core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def agg(self, data: DataFrame, *args, **kwargs) -> DataFrame:

res = (
data
.groupby(grouper, sort=False, observed=True)
.groupby(grouper, sort=False, observed=False)
.agg(*args, **kwargs)
.reindex(groups)
.reset_index()
Expand All @@ -113,7 +113,7 @@ def apply(
return self._reorder_columns(func(data, *args, **kwargs), data)

parts = {}
for key, part_df in data.groupby(grouper, sort=False):
for key, part_df in data.groupby(grouper, sort=False, observed=False):
parts[key] = func(part_df, *args, **kwargs)
stack = []
for key in groups:
Expand Down
4 changes: 2 additions & 2 deletions seaborn/axisgrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -1522,7 +1522,7 @@ def _map_diag_iter_hue(self, func, **kwargs):
fixed_color = kwargs.pop("color", None)

for var, ax in zip(self.diag_vars, self.diag_axes):
hue_grouped = self.data[var].groupby(self.hue_vals)
hue_grouped = self.data[var].groupby(self.hue_vals, observed=True)

plot_kwargs = kwargs.copy()
if str(func.__module__).startswith("seaborn"):
Expand Down Expand Up @@ -1629,7 +1629,7 @@ def _plot_bivariate_iter_hue(self, x_var, y_var, ax, func, **kwargs):
else:
axes_vars = [x_var, y_var]

hue_grouped = self.data.groupby(self.hue_vals)
hue_grouped = self.data.groupby(self.hue_vals, observed=True)
for k, label_k in enumerate(self._hue_order):

kws = kwargs.copy()
Expand Down
17 changes: 11 additions & 6 deletions seaborn/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
_scatter_legend_artist,
_version_predates,
)
from seaborn._compat import MarkerStyle
from seaborn._statistics import EstimateAggregator, LetterValues
from seaborn.palettes import light_palette
from seaborn.axisgrid import FacetGrid, _facet_docs
Expand Down Expand Up @@ -481,6 +482,9 @@ def plot_strips(
ax = self.ax
dodge_move = jitter_move = 0

if "marker" in plot_kws and not MarkerStyle(plot_kws["marker"]).is_filled():
plot_kws.pop("edgecolor", None)

for sub_vars, sub_data in self.iter_data(iter_vars,
from_comp_data=True,
allow_empty=True):
Expand Down Expand Up @@ -521,6 +525,9 @@ def plot_swarms(
point_collections = {}
dodge_move = 0

if "marker" in plot_kws and not MarkerStyle(plot_kws["marker"]).is_filled():
plot_kws.pop("edgecolor", None)

for sub_vars, sub_data in self.iter_data(iter_vars,
from_comp_data=True,
allow_empty=True):
Expand All @@ -534,6 +541,7 @@ def plot_swarms(
sub_data[self.orient] = sub_data[self.orient] + dodge_move

self._invert_scale(ax, sub_data)

points = ax.scatter(sub_data["x"], sub_data["y"], color=color, **plot_kws)
if "hue" in self.variables:
points.set_facecolors(self._hue_map(sub_data["hue"]))
Expand Down Expand Up @@ -2755,17 +2763,14 @@ def catplot(
elif x is not None and y is not None:
raise ValueError("Cannot pass values for both `x` and `y`.")

if kind == "point" and palette is None and color is None:
# Handle special backwards compatibility where pointplot originally
# did *not* default to multi-colored unless a palette was specified.
color = "C0"

p = Plotter(
data=data,
variables=dict(x=x, y=y, hue=hue, row=row, col=col, units=units),
order=order,
orient=orient,
color=color,
# Handle special backwards compatibility where pointplot originally
# did *not* default to multi-colored unless a palette was specified.
color="C0" if kind == "point" and palette is None and color is None else color,
legend=legend,
)

Expand Down
10 changes: 10 additions & 0 deletions tests/test_categorical.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import itertools
from functools import partial
import warnings

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -256,6 +257,15 @@ def test_supplied_color_array(self, long_df):
_draw_figure(ax.figure)
assert_array_equal(ax.collections[0].get_facecolors(), colors)

def test_unfilled_marker(self, long_df):

with warnings.catch_warnings():
warnings.simplefilter("error", UserWarning)
ax = self.func(long_df, x="y", y="a", marker="x", color="r")
for points in ax.collections:
assert same_color(points.get_facecolors().squeeze(), "r")
assert same_color(points.get_edgecolors().squeeze(), "r")

@pytest.mark.parametrize(
"orient,data_type", [
("h", "dataframe"), ("h", "dict"),
Expand Down

0 comments on commit d657c54

Please sign in to comment.