Merge branch 'master' into v0.13

mwaskom · Sep 29, 2023 · d657c54 · d657c54
2 parents 43d762e + a8b6cac
commit d657c54
Show file tree

Hide file tree

Showing 11 changed files with 57 additions and 29 deletions.
diff --git a/doc/_docstrings/catplot.ipynb b/doc/_docstrings/catplot.ipynb
@@ -87,7 +87,7 @@
    "source": [
     "sns.catplot(\n",
     "    data=df, x=\"age\", y=\"class\", hue=\"sex\",\n",
-    "    kind=\"violin\", bw=.25, cut=0, split=True,\n",
+    "    kind=\"violin\", bw_adjust=.5, cut=0, split=True,\n",
     ")"
    ]
   },

diff --git a/doc/_templates/layout.html b/doc/_templates/layout.html
@@ -14,6 +14,7 @@
             Archive
         </a>
         <div class="dropdown-menu dropdown-menu-right" aria-labelledby="dropdownMenuLink">
+          <a class="dropdown-item" href="/archive/0.12/index.html">v0.12</a>
           <a class="dropdown-item" href="/archive/0.11/index.html">v0.11</a>
           <a class="dropdown-item" href="/archive/0.10/index.html">v0.10</a>
           <a class="dropdown-item" href="/archive/0.9/index.html">v0.9</a>

diff --git a/doc/_tutorial/categorical.ipynb b/doc/_tutorial/categorical.ipynb
@@ -133,7 +133,7 @@
    "cell_type": "raw",
    "metadata": {},
    "source": [
-    "Unlike with numerical data, it is not always obvious how to order the levels of the categorical variable along its axis. In general, the seaborn categorical plotting functions try to infer the order of categories from the data. If your data have a pandas ``Categorical`` datatype, then the default order of the categories can be set there. If the variable passed to the categorical axis looks numerical, the levels will be sorted. But the data are still treated as categorical and drawn at ordinal positions on the categorical axes (specifically, at 0, 1, ...) even when numbers are used to label them:"
+    "Unlike with numerical data, it is not always obvious how to order the levels of the categorical variable along its axis. In general, the seaborn categorical plotting functions try to infer the order of categories from the data. If your data have a pandas ``Categorical`` datatype, then the default order of the categories can be set there. If the variable passed to the categorical axis looks numerical, the levels will be sorted. But, by default, the data are still treated as categorical and drawn at ordinal positions on the categorical axes (specifically, at 0, 1, ...) even when numbers are used to label them:"
    ]
   },
   {
@@ -145,6 +145,22 @@
     "sns.catplot(data=tips.query(\"size != 3\"), x=\"size\", y=\"total_bill\")"
    ]
   },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "As of v0.13.0, all categorical plotting functions have a `native_scale` parameter, which can be set to `True` when you want to use numeric or datetime data for categorical grouping without changing the underlying data properties: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sns.catplot(data=tips.query(\"size != 3\"), x=\"size\", y=\"total_bill\", native_scale=True)"
+   ]
+  },
   {
    "cell_type": "raw",
    "metadata": {},
@@ -205,7 +221,7 @@
    "cell_type": "raw",
    "metadata": {},
    "source": [
-    "When adding a ``hue`` semantic, the box for each level of the semantic variable is moved along the categorical axis so they don't overlap:"
+    "When adding a ``hue`` semantic, the box for each level of the semantic variable is made narrower and shifted along the categorical axis:"
    ]
   },
   {
@@ -221,7 +237,7 @@
    "cell_type": "raw",
    "metadata": {},
    "source": [
-    "This behavior is called \"dodging\" and is turned on by default because it is assumed that the semantic variable is nested within the main categorical variable. If that's not the case, you can disable the dodging:"
+    "This behavior is called \"dodging\", and it is controlled by the `dodge` parameter. By default (as of v0.13.0), elements dodge only if they would otherwise overlap:"
    ]
   },
   {
@@ -231,10 +247,7 @@
    "outputs": [],
    "source": [
     "tips[\"weekend\"] = tips[\"day\"].isin([\"Sat\", \"Sun\"])\n",
-    "sns.catplot(\n",
-    "    data=tips, x=\"day\", y=\"total_bill\", hue=\"weekend\",\n",
-    "    kind=\"box\", dodge=False,\n",
-    ")"
+    "sns.catplot(data=tips, x=\"day\", y=\"total_bill\", hue=\"weekend\", kind=\"box\")"
    ]
   },
   {

diff --git a/doc/_tutorial/data_structure.ipynb b/doc/_tutorial/data_structure.ipynb
@@ -19,7 +19,7 @@
     "As a data visualization library, seaborn requires that you provide it with data. This chapter explains the various ways to accomplish that task. Seaborn supports several different dataset formats, and most functions accept data represented with objects from the `pandas <https://pandas.pydata.org/>`_ or `numpy <https://numpy.org/>`_ libraries as well as built-in Python types like lists and dictionaries. Understanding the usage patterns associated with these different options will help you quickly create useful visualizations for nearly any dataset.\n",
     "\n",
     ".. note::\n",
-    "   As of current writing (v0.11.0), the full breadth of options covered here are supported by only a subset of the modules in seaborn (namely, the :ref:`relational <relational_api>` and :ref:`distribution <distribution_api>` modules). The other modules offer much of the same flexibility, but have some exceptions (e.g., :func:`catplot` and :func:`lmplot` are limited to long-form data with named variables). The data-ingest code will be standardized over the next few release cycles, but until that point, be mindful of the specific documentation for each function if it is not doing what you expect with your dataset."
+    "   As of current writing (v0.13.0), the full breadth of options covered here are supported by most, but not all, of the functions in seaborn. Namely, a few older functinos (e.g., :func:`lmplot` and :func:`regplot`) anre more limited in what they accept."
    ]
   },
   {

diff --git a/doc/faq.rst b/doc/faq.rst
@@ -122,9 +122,9 @@ Several :ref:`seaborn functions <categorical_api>` are referred to as "categoric
 
 At the time these functions were written, matplotlib did not have any direct support for non-numeric data types. So seaborn internally builds a mapping from unique values in the data to 0-based integer indexes, which is what it passes to matplotlib. If your data are strings, that's great, and it more-or-less matches how `matplotlib now handles <https://matplotlib.org/stable/gallery/lines_bars_and_markers/categorical_variables.html>`_ string-typed data.
 
-But a potential gotcha is that these functions *always do this*, even if both the x and y variables are numeric. This gives rise to a number of confusing behaviors, especially when mixing categorical and non-categorical plots (e.g., a combo bar-and-line plot).
+But a potential gotcha is that these functions *always do this by default*, even if both the x and y variables are numeric. This gives rise to a number of confusing behaviors, especially when mixing categorical and non-categorical plots (e.g., a combo bar-and-line plot).
 
-The v0.12 release added a `native_scale` parameter to :func:`stripplot` and :func:`swarmplot`, which provides control over this behavior. It will be rolled out to other categorical functions in future releases. But the current behavior will almost certainly remain the default, so this is an important API wrinkle to understand.
+The v0.13 release added a `native_scale` parameter which provides control over this behavior. It is `False` by default, but setting it to `True` will preserve the original properties of the data used for categorical grouping.
 
 Specifying data
 ---------------

diff --git a/doc/whatsnew/v0.13.0.rst b/doc/whatsnew/v0.13.0.rst
@@ -1,5 +1,5 @@
-v0.13.0 (Unreleased)
---------------------
+v0.13.0 (September 2023)
+------------------------
 
 This is a major release with a number of important new features and changes. The highlight is a major overhaul to seaborn's categorical plotting functions, providing them with many new capabilities and better aligning their API with the rest of the library. There is also provisional support for alternate dataframe libraries like `polars <https://www.pola.rs>`_, a new theme and display configuration system for :class:`objects.Plot`, and many smaller bugfixes and enhancements.
 
@@ -34,8 +34,8 @@ Two related idiosyncratic color specifications are deprecated, but they will con
 
 Finally, like other seaborn functions, the default palette now depends on the variable type, and a sequential palette will be used with numeric data. To retain the previous behavior, pass the name of a qualitative palette (e.g., `palette="deep"` for seaborn's default). Accordingly, the functions have gained a parameter to control numeric color mappings (`hue_norm`).
 
-Other features, enhancements, and changes to categorical plots
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Other features, enhancements, and changes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 The following updates apply to multiple categorical functions.
 

diff --git a/examples/grouped_violinplots.py b/examples/grouped_violinplots.py
@@ -5,13 +5,12 @@
 _thumb: .44, .47
 """
 import seaborn as sns
-sns.set_theme(style="whitegrid")
+sns.set_theme(style="dark")
 
 # Load the example tips dataset
 tips = sns.load_dataset("tips")
 
 # Draw a nested violinplot and split the violins for easier comparison
 sns.violinplot(data=tips, x="day", y="total_bill", hue="smoker",
-               split=True, inner="quart", linewidth=1,
-               palette={"Yes": "b", "No": ".85"})
-sns.despine(left=True)
+               split=True, inner="quart", fill=False,
+               palette={"Yes": "g", "No": ".35"})
diff --git a/seaborn/_core/groupby.py b/seaborn/_core/groupby.py
@@ -93,7 +93,7 @@ def agg(self, data: DataFrame, *args, **kwargs) -> DataFrame:
 
         res = (
             data
-            .groupby(grouper, sort=False, observed=True)
+            .groupby(grouper, sort=False, observed=False)
             .agg(*args, **kwargs)
             .reindex(groups)
             .reset_index()
@@ -113,7 +113,7 @@ def apply(
             return self._reorder_columns(func(data, *args, **kwargs), data)
 
         parts = {}
-        for key, part_df in data.groupby(grouper, sort=False):
+        for key, part_df in data.groupby(grouper, sort=False, observed=False):
             parts[key] = func(part_df, *args, **kwargs)
         stack = []
         for key in groups:

diff --git a/seaborn/axisgrid.py b/seaborn/axisgrid.py
@@ -1522,7 +1522,7 @@ def _map_diag_iter_hue(self, func, **kwargs):
         fixed_color = kwargs.pop("color", None)
 
         for var, ax in zip(self.diag_vars, self.diag_axes):
-            hue_grouped = self.data[var].groupby(self.hue_vals)
+            hue_grouped = self.data[var].groupby(self.hue_vals, observed=True)
 
             plot_kwargs = kwargs.copy()
             if str(func.__module__).startswith("seaborn"):
@@ -1629,7 +1629,7 @@ def _plot_bivariate_iter_hue(self, x_var, y_var, ax, func, **kwargs):
         else:
             axes_vars = [x_var, y_var]
 
-        hue_grouped = self.data.groupby(self.hue_vals)
+        hue_grouped = self.data.groupby(self.hue_vals, observed=True)
         for k, label_k in enumerate(self._hue_order):
 
             kws = kwargs.copy()

diff --git a/seaborn/categorical.py b/seaborn/categorical.py
@@ -27,6 +27,7 @@
     _scatter_legend_artist,
     _version_predates,
 )
+from seaborn._compat import MarkerStyle
 from seaborn._statistics import EstimateAggregator, LetterValues
 from seaborn.palettes import light_palette
 from seaborn.axisgrid import FacetGrid, _facet_docs
@@ -481,6 +482,9 @@ def plot_strips(
         ax = self.ax
         dodge_move = jitter_move = 0
 
+        if "marker" in plot_kws and not MarkerStyle(plot_kws["marker"]).is_filled():
+            plot_kws.pop("edgecolor", None)
+
         for sub_vars, sub_data in self.iter_data(iter_vars,
                                                  from_comp_data=True,
                                                  allow_empty=True):
@@ -521,6 +525,9 @@ def plot_swarms(
         point_collections = {}
         dodge_move = 0
 
+        if "marker" in plot_kws and not MarkerStyle(plot_kws["marker"]).is_filled():
+            plot_kws.pop("edgecolor", None)
+
         for sub_vars, sub_data in self.iter_data(iter_vars,
                                                  from_comp_data=True,
                                                  allow_empty=True):
@@ -534,6 +541,7 @@ def plot_swarms(
                 sub_data[self.orient] = sub_data[self.orient] + dodge_move
 
             self._invert_scale(ax, sub_data)
+
             points = ax.scatter(sub_data["x"], sub_data["y"], color=color, **plot_kws)
             if "hue" in self.variables:
                 points.set_facecolors(self._hue_map(sub_data["hue"]))
@@ -2755,17 +2763,14 @@ def catplot(
         elif x is not None and y is not None:
             raise ValueError("Cannot pass values for both `x` and `y`.")
 
-    if kind == "point" and palette is None and color is None:
-        # Handle special backwards compatibility where pointplot originally
-        # did *not* default to multi-colored unless a palette was specified.
-        color = "C0"
-
     p = Plotter(
         data=data,
         variables=dict(x=x, y=y, hue=hue, row=row, col=col, units=units),
         order=order,
         orient=orient,
-        color=color,
+        # Handle special backwards compatibility where pointplot originally
+        # did *not* default to multi-colored unless a palette was specified.
+        color="C0" if kind == "point" and palette is None and color is None else color,
         legend=legend,
     )
 

diff --git a/tests/test_categorical.py b/tests/test_categorical.py
@@ -1,5 +1,6 @@
 import itertools
 from functools import partial
+import warnings
 
 import numpy as np
 import pandas as pd
@@ -256,6 +257,15 @@ def test_supplied_color_array(self, long_df):
         _draw_figure(ax.figure)
         assert_array_equal(ax.collections[0].get_facecolors(), colors)
 
+    def test_unfilled_marker(self, long_df):
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("error", UserWarning)
+            ax = self.func(long_df, x="y", y="a", marker="x", color="r")
+            for points in ax.collections:
+                assert same_color(points.get_facecolors().squeeze(), "r")
+                assert same_color(points.get_edgecolors().squeeze(), "r")
+
     @pytest.mark.parametrize(
         "orient,data_type", [
             ("h", "dataframe"), ("h", "dict"),