Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOWESS Smoother for The .objects Interface #3320

Open
tbpassin opened this issue Apr 13, 2023 · 2 comments
Open

LOWESS Smoother for The .objects Interface #3320

tbpassin opened this issue Apr 13, 2023 · 2 comments
Labels

Comments

@tbpassin
Copy link

Here's a LOWESS smoother for the .object interface. Like @tomicapretto, I slightly modified the PolyFit implementation. Until there is a release with a LOWESS smoother, this may do.

"""A smoother that has the same interface as the Seaborn PolyFit class."""

from __future__ import annotations
from dataclasses import dataclass

import pandas as pd

from seaborn._stats.base import Stat
import statsmodels.api as sm


@dataclass
class Lowess(Stat):
    """
    Fit a LOWESS smooth of data.  Modeled on PolyFit.
    """
    frac: float = 0.2  # Mysterious incantation to make the argument work.
    def _fit(self, data):
        self.frac = min(self.frac, 1.0)
        x = data["x"]
        y = data["y"]
        yy = sm.nonparametric.lowess(exog=x, endog=y, frac=self.frac)
        df = pd.DataFrame(data = yy, columns = ('x', 'y') )

        return df

    # TODO we should have a way of identifying the method that will be applied
    # and then only define __call__ on a base-class of stats with this pattern

    def __call__(self, data, groupby, orient, scales):

        return (
            groupby
            .apply(data.dropna(subset=["x", "y"]), self._fit)
        )
@kcarnold
Copy link

Thanks! Note: this implementation will (if I understand correctly) sample at the x values of the data, which will mean too few points in areas where there's a big gap in x (so it'll look jagged). Revision:

import numpy as np
import pandas as pd
from dataclasses import dataclass
from seaborn._stats.base import Stat
import statsmodels.api as sm


@dataclass
class Lowess(Stat):
    """
    Fit a locally-weighted regression to smooth the data.
    """
    frac: float = 0.2   # Fraction of data to use when estimating each y-value
    gridsize: int = 100 # How fine-grained to plot the curve. Increase if jagged.

    def _fit_predict(self, data):
        x = data['x']
        xx = np.linspace(x.min(), x.max(), self.gridsize)
        # https://www.statsmodels.org/devel/generated/statsmodels.nonparametric.smoothers_lowess.lowess.html
        yy = sm.nonparametric.lowess(
            exog=x, endog=data['y'],
            xvals=xx,
            frac=self.frac)
        return pd.DataFrame(dict(x=xx, y=yy))

    def __call__(self, data, groupby, orient, scales):
        return (
            groupby
            .apply(data.dropna(subset=["x", "y"]), self._fit_predict)
        )

@tbpassin
Copy link
Author

I find I almost always want to get output points at the same x values as the input. If that's not a goal, a variation like this seems fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants