# Next-generation seaborn interface

Over the past 8 months, I have been developing an entirely new interface for making plots with seaborn. This page demonstrates some of its functionality.

## Background and goals

This work grew out of long-running efforts to refactor the seaborn internals so that its functions could rely on common code-paths. At a certain point, I decided that I was developing an API that would also be interesting for external users too.

Of course, "write a new interface" quickly turned into "rethink every aspect of the library." The current interface has some [pain points](https://michaelwaskom.medium.com/three-common-seaborn-difficulties-10fdd0cc2a8b) that arise from early constraints and path dependence. By starting fresh, these can be avoided.

More broadly, seaborn was originally conceived as a toolbox of domain-specific statistical graphics to be used alongside matplotlib. As the library (and data science) grew, it became more common to reach for — or even learn — seaborn first. But one inevitably desires some customization that is not offered within the (already much-too-long) list of parameters in seaborn's functions. Currently, this necessitates direct use of matplotlib.

I've always thought that, if you're comfortable with both libraries, this setup offers a powerful blend of convenience and flexibility. But it can be hard to know which library will let you accomplish some specific task. And, as seaborn has become more powerful, one has to write increasing amounts of matpotlib code to recreate what it is doing.

So the goal is to expose seaborn's core features — integration with pandas, automatic mapping between data and graphics, statistical transformations — within an interface that is more compositional, extensible, and comprehensive.

One will note that the result looks a bit (a lot?) like ggplot. That's not unintentional, but the goal is also *not* to "port ggplot2 to Python". (If that's what you're looking for, check out the very nice [plotnine](https://plotnine.readthedocs.io/en/stable/) package). There is an immense amount of wisdom in the grammar of graphics and in its particular implementation as ggplot2. But I think that, as languages, R and Python are just too different for idioms from one to feel natural when translated literally into the other. So while I have taken much inspiration from ggplot, I've also made plenty of choices differently, for better or for worse.

---

## The basic interface

OK enough preamble. What does this look like? The new interface exists as a set of classes that can be acessed through a single namespace import:

In [None]:
import seaborn.objects as so

This is a clean namespace, and I'm leaning towards recommending `from seaborn.objects import *` for interactive usecases. But let's not go so far just yet.

Let's also import the main namespace so we can load our trusty example datasets.

In [None]:
import seaborn
seaborn.set_theme()

The main object is `seaborn.objects.Plot`. You instantiate it by passing data and some assignments from columns in the data to roles in the plot:

In [None]:
tips = seaborn.load_dataset("tips")
so.Plot(tips, x="total_bill", y="tip")

But instantiating the `Plot` object doesn't actually plot anything. For that you need to add some layers:

In [None]:
so.Plot(tips, x="total_bill", y="tip").add(so.Scatter())

Variables can be defined globally, or for a specific layer:

In [None]:
so.Plot(tips).add(so.Scatter(), x="total_bill", y="tip")

Each layer can also have its own data:

In [None]:
(
    so.Plot(tips, x="total_bill", y="tip")
    .add(so.Scatter(color=".6"))
    .add(so.Scatter(), data=tips.query("size == 2"))
)

As in the existing interface, variables can be keys to the `data` object or vectors of various kinds:

In [None]:
(
    so.Plot(tips.to_dict(), x="total_bill")
    .add(so.Scatter(), y=tips["tip"].to_numpy())
)

The interface also supports semantic mappings between data and plot variables. But the specification of those mappings uses more explicit parameter names:

In [None]:
so.Plot(tips, x="total_bill", y="tip", color="time").add(so.Scatter())

It also offers a wider range of mappable features:

In [None]:
(
    so.Plot(tips, x="total_bill", y="tip", color="day", fill="time")
    .add(so.Scatter(fillalpha=.8))
)

---

## Core components

### Visual representation: the Mark

Each layer needs a `Mark` object, which defines how to draw the plot. There will be marks corresponding to existing seaborn functions and ones offering new functionality. But not many have been implemented yet:

In [None]:
fmri = seaborn.load_dataset("fmri").query("region == 'parietal'")
so.Plot(fmri, x="timepoint", y="signal").add(so.Line())

`Mark` objects will expose an API to set features directly, rather than mapping them:

In [None]:
so.Plot(tips, y="day", x="total_bill").add(so.Dot(color="#698", alpha=.5))

### Data transformation: the Stat


Built-in statistical transformations are one of seaborn's key features. But currently, they are tied up with the different visual representations. E.g., you can aggregate data in `lineplot`, but not in `scatterplot`.

In the new interface, these concerns are separated. Each layer can accept a `Stat` object that applies a data transformation:

In [None]:
so.Plot(fmri, x="timepoint", y="signal").add(so.Line(), so.Agg())

The `Stat` is computed on subsets of data defined by the semantic mappings:

In [None]:
so.Plot(fmri, x="timepoint", y="signal", color="event").add(so.Line(), so.Agg())

Each mark also accepts a `group` mapping that creates subsets without altering visual properties:

In [None]:
(
    so.Plot(fmri, x="timepoint", y="signal", color="event")
    .add(so.Line(), so.Agg(), group="subject")
)

The `Mark` and `Stat` objects allow for more compositionality and customization. There will be guidelines for how to define your own objects to plug into the broader system:

In [None]:
class PeakAnnotation(so.Mark):
    def _plot_split(self, keys, data, ax, kws):
        ix = data["y"].idxmax()
        ax.annotate(
            "The peak", data.loc[ix, ["x", "y"]],
            xytext=(10, -100), textcoords="offset points",
            va="top", ha="center",
            arrowprops=dict(arrowstyle="->", color=".2"),
            
        )

(
    so.Plot(fmri, x="timepoint", y="signal")
    .add(so.Line(), so.Agg())
    .add(PeakAnnotation(), so.Agg())
)

The new interface understands not just `x` and `y`, but also range specifiers; some `Stat` objects will output ranges, and some `Mark` objects will accept them. (This means that it will finally be possible to pass pre-defined error-bars into seaborn):

In [None]:
(
    fmri
    .groupby("timepoint")
    .signal
    .describe()
    .pipe(so.Plot, x="timepoint")
    .add(so.Line(), y="mean")
    .add(so.Area(alpha=.2), ymin="min", ymax="max")
)

-----

### Overplotting resolution: the Move

Existing seaborn functions have parameters that allow adjustments for overplotting, such as `dodge=` in several categorical functions, `jitter=` in several functions based on scatterplots, and the `multiple=` paramter in distribution functions. In the new interface, those adjustments are abstracted away from the particular visual representation into the concept of a `Move`:

In [None]:
(
    so.Plot(tips, "day", "total_bill", color="time")
    .add(so.Bar(), so.Agg(), move=so.Dodge())
)

Separating out the positional adjustment makes it possible to add additional flexibility without overwhelming the signature of a single function. For example, there will be more options for handling missing levels when dodging and for fine-tuning the adjustment.

In [None]:
(
    so.Plot(tips, "day", "total_bill", color="time")
    .add(so.Bar(), so.Agg(), move=so.Dodge(empty="fill", gap=.1))
)

By default, the `move` will resolve all overlapping semantic mappings:

In [None]:
(
    so.Plot(tips, "day", "total_bill", color="time", alpha="sex")
    .add(so.Bar(), so.Agg(), move=so.Dodge())
)

But you can specify a subset:

In [None]:
(
    so.Plot(tips, "day", "total_bill", color="time", alpha="smoker")
    .add(so.Dot(), move=so.Dodge(by=["color"]))
)

It's also possible to stack multiple moves or kinds of moves by passing a list:

In [None]:
(
    so.Plot(tips, "day", "total_bill", color="time", alpha="smoker")
    .add(so.Dot(), move=[so.Dodge(by=["color"]), so.Jitter(.5)])
)

Separating the `Stat` and `Move` from the visual representation affords more flexibility, greatly expanding the space of graphics that can be created.

---

## Configuring and customization

All of the existing customization (and more) is available, but in dedicated methods rather than one long list of keyword arguments:

In [None]:
planets = seaborn.load_dataset("planets").query("distance < 1000000")
(
    so.Plot(planets, x="mass", y="distance", color="year")
    .map_color("flare", norm=(2000, 2010))
    .scale_numeric("x", "log")
    .add(so.Scatter(pointsize=3))
)

The interface is declarative; methods can be called in any order:

In [None]:
(
    so.Plot(planets, x="mass", y="distance", color="year")
    .add(so.Scatter(pointsize=3))
    .scale_numeric("x", "log")
    .map_color("flare", norm=(2000, 2010))
)

When an axis has a nonlinear scale, any statistical transformations or adjustments take place in the appropriate space:

In [None]:
(
    so.Plot(planets, x="year", y="orbital_period")
    .scale_numeric("y", "log")
    .add(so.Scatter(alpha=.5, marker="x"), color="method")
    .add(so.Line(linewidth=2, color=".2"), so.Agg())
)

The object tries to do inference and use smart defaults for mapping and scaling:

In [None]:
so.Plot(tips, x="size", y="total_bill", color="size").add(so.Dot())

But also allows explicit control:

In [None]:
(
    so.Plot(tips, x="size", y="total_bill", color="size")
    .scale_categorical("x")
    .scale_categorical("color")
    .add(so.Dot())
)

As well as passing through literal values for the visual properties:

In [None]:
(
    so.Plot(x=[1, 2, 3], y=[1, 2, 3], color=["dodgerblue", "#569721", "C3"])
    .scale_identity("color")
    .add(so.Dot(pointsize=20))
)

Layers can be generically passed an `orient` parameter that controls the axis of statistical transformation and how the mark is drawn:

In [None]:
(
    so.Plot(planets, y="year", x="orbital_period")
    .scale_numeric("x", "log")
    .add(so.Scatter(alpha=.5, marker="x"), color="method")
    .add(so.Line(linewidth=2, color=".2"), so.Agg(), orient="h")
)

----

## Defining subplot structure

Faceting is built into the interface implicitly by assigning a faceting variable:

In [None]:
so.Plot(tips, x="total_bill", y="tip", col="time").add(so.Scatter())

Or by explicit declaration:

In [None]:
(
    so.Plot(tips, x="total_bill", y="tip")
    .facet("time", order=["Dinner", "Lunch"])
    .add(so.Scatter())
)

Unlike the existing `FacetGrid` it is simple to *not* facet a layer, so that a plot is simply replicated across each column (or row):

In [None]:
(
    so.Plot(tips, x="total_bill", y="tip", col="day")
    .add(so.Scatter(color=".75"), col=None)
    .add(so.Scatter(), color="day")
    .configure(figsize=(7, 3))
)

The `Plot` object *also* subsumes the `PairGrid` functionality:

In [None]:
(
    so.Plot(tips, y="day")
    .pair(x=["total_bill", "tip"])
    .add(so.Dot())
)

Pairing and faceting can be combined in the same plot:

In [None]:
(
    so.Plot(tips, x="day")
    .facet("sex")
    .pair(y=["total_bill", "tip"])
    .add(so.Dot())
)

Or the `Plot.pair` functionality can be used to define unique pairings between variables:

In [None]:
(
    so.Plot(tips, x="day")
    .pair(x=["day", "time"], y=["total_bill", "tip"], cartesian=False)
    .add(so.Dot())
)

It's additionally possible to "pair" with a single variable, for univariate plots like histograms.

Both faceted and paired plots with subplots along a single dimension can be "wrapped", and this works both columwise and rowwise:

In [None]:
class Histogram(so.Mark):  # TODO replace once we implement
    def _plot_split(self, keys, data, ax, kws):
        ax.hist(data["x"], bins="auto", **kws)
        ax.set_ylabel("count")

(
    so.Plot(tips)
    .pair(x=tips.columns, wrap=3)
    .configure(sharey=False)
    .add(Histogram())
)      

Importantly, there's no distinction between "axes-level" and "figure-level" here. Any kind of plot can be faceted or paired by adding a method call to the `Plot` definition, without changing anything else about how you are creating the figure.

---

## Iterating and displaying

It is possible (and in fact the deafult behavior) to be completely pyplot-free, and all the drawing is done by directly hooking into Jupyter's rich display system. Unlike in normal usage of the inline backend, writing code in a cell to define a plot is indendent from showing it:

In [None]:
p = so.Plot(fmri, x="timepoint", y="signal").add(so.Line(), so.Agg())

In [None]:
p

By default, the methods on `Plot` do *not* mutate the object they are called on. This means that you can define a common base specification and then iterate on different versions of it.

In [None]:
p = (
    so.Plot(fmri, x="timepoint", y="signal", color="event")
    .map_color(palette="crest")
)

In [None]:
p.add(so.Line())

In [None]:
p.add(so.Line(), group="subject")

In [None]:
p.add(so.Line(), so.Agg())

In [None]:
(
    p
    .add(so.Line(linewidth=.5, alpha=.5), group="subject")
    .add(so.Line(linewidth=3), so.Agg())
)

It's also possible to hook into the `pyplot` system by calling `Plot.show`. (As you might in a terminal interface, or to use a GUI). Notice how this looks lower-res: that's because `Plot` is generating "high-DPI" figures internally!

In [None]:
(
    p
    .add(so.Line(linewidth=.5, alpha=.5), group="subject")
    .add(so.Line(linewidth=3), so.Agg())
    .show()
)

---

## Matplotlib integration

It's always been a design aim in seaborn to allow complicated seaborn plots to coexist within the context of a larger matplotlib figure. This is acheived within the "axes-level" functions, which accept an `ax=` parameter. The `Plot` object *will* provide a similar functionality:

In [None]:
import matplotlib as mpl
_, ax = mpl.figure.Figure(constrained_layout=True).subplots(1, 2)
(
    so.Plot(tips, x="total_bill", y="tip")
    .on(ax)
    .add(so.Scatter())
)

But a limitation has been that the "figure-level" functions, which can produce multiple subplots, cannot be directed towards an existing figure. That is no longer the case; `Plot.on()` also accepts a `Figure` (created either with or without `pyplot`) object:

In [None]:
f = mpl.figure.Figure(constrained_layout=True)
(
    so.Plot(tips, x="total_bill", y="tip")
    .on(f)
    .add(so.Scatter())
    .facet("time")
)

Providing an existing figure is perhaps only marginally useful. While it will ease the integration of seaborn with GUI frameworks, seaborn is still using up the whole figure canvas. But with the introduction of the `SubFigure` concept in matplotlib 3.4, it becomes possible to place a small-multiples plot *within* a larger set of subplots:

In [None]:
f = mpl.figure.Figure(constrained_layout=True, figsize=(8, 4))
sf1, sf2 = f.subfigures(1, 2)
(
    so.Plot(tips, x="total_bill", y="tip", color="day")
    .add(so.Scatter())
    .on(sf1)
    .plot()
)
(
    so.Plot(tips, x="total_bill", y="tip", color="day")
    .facet("day", wrap=2)
    .add(so.Scatter())
    .on(sf2)
    .plot()
)