LOWESS Smoother for The .objects Interface #3320

tbpassin · 2023-04-13T13:11:30Z

Here's a LOWESS smoother for the .object interface. Like @tomicapretto, I slightly modified the PolyFit implementation. Until there is a release with a LOWESS smoother, this may do.

"""A smoother that has the same interface as the Seaborn PolyFit class."""

from __future__ import annotations
from dataclasses import dataclass

import pandas as pd

from seaborn._stats.base import Stat
import statsmodels.api as sm


@dataclass
class Lowess(Stat):
    """
    Fit a LOWESS smooth of data.  Modeled on PolyFit.
    """
    frac: float = 0.2  # Mysterious incantation to make the argument work.
    def _fit(self, data):
        self.frac = min(self.frac, 1.0)
        x = data["x"]
        y = data["y"]
        yy = sm.nonparametric.lowess(exog=x, endog=y, frac=self.frac)
        df = pd.DataFrame(data = yy, columns = ('x', 'y') )

        return df

    # TODO we should have a way of identifying the method that will be applied
    # and then only define __call__ on a base-class of stats with this pattern

    def __call__(self, data, groupby, orient, scales):

        return (
            groupby
            .apply(data.dropna(subset=["x", "y"]), self._fit)
        )

The text was updated successfully, but these errors were encountered:

kcarnold · 2023-08-21T19:55:05Z

Thanks! Note: this implementation will (if I understand correctly) sample at the x values of the data, which will mean too few points in areas where there's a big gap in x (so it'll look jagged). Revision:

import numpy as np
import pandas as pd
from dataclasses import dataclass
from seaborn._stats.base import Stat
import statsmodels.api as sm


@dataclass
class Lowess(Stat):
    """
    Fit a locally-weighted regression to smooth the data.
    """
    frac: float = 0.2   # Fraction of data to use when estimating each y-value
    gridsize: int = 100 # How fine-grained to plot the curve. Increase if jagged.

    def _fit_predict(self, data):
        x = data['x']
        xx = np.linspace(x.min(), x.max(), self.gridsize)
        # https://www.statsmodels.org/devel/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html
        yy = sm.nonparametric.lowess(
            exog=x, endog=data['y'],
            xvals=xx,
            frac=self.frac)
        return pd.DataFrame(dict(x=xx, y=yy))

    def __call__(self, data, groupby, orient, scales):
        return (
            groupby
            .apply(data.dropna(subset=["x", "y"]), self._fit_predict)
        )

tbpassin · 2023-08-21T20:45:41Z

I find I almost always want to get output points at the same x values as the input. If that's not a goal, a variation like this seems fine.

mwaskom added the wishlist label Aug 27, 2023

nickeubank mentioned this issue Nov 3, 2023

Interest in seaborn.objects API contributions? #3546

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LOWESS Smoother for The .objects Interface #3320

LOWESS Smoother for The .objects Interface #3320

tbpassin commented Apr 13, 2023

kcarnold commented Aug 21, 2023

tbpassin commented Aug 21, 2023

LOWESS Smoother for The .objects Interface #3320

LOWESS Smoother for The .objects Interface #3320

Comments

tbpassin commented Apr 13, 2023

kcarnold commented Aug 21, 2023

tbpassin commented Aug 21, 2023