Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derivative method for Series and DataFrame #26680

Closed
scls19fr opened this issue Jun 5, 2019 · 3 comments
Closed

derivative method for Series and DataFrame #26680

scls19fr opened this issue Jun 5, 2019 · 3 comments

Comments

@scls19fr
Copy link
Contributor

scls19fr commented Jun 5, 2019

Hello,

when dealing with sensors data (energy meter, volumetric meter, ...) and having such data in a DataFrame (or in a Series) it can be convenient to easily calculate derivative of values (generally over local time) ie calculate power from energy consumption for example or flow rate from volume over time.

This question have been asked on StackOverflow (at least 2 times)
https://stackoverflow.com/questions/39235712/calculate-local-time-derivative-of-series
https://stackoverflow.com/questions/26245242/time-differentiation-in-pandas/26246562

Maybe Series and DataFrame could have a derivative method.

I wrote some convenient methods (both for DataFrame and for Series) to calculate derivative.

I currently monkey patch Pandas using the following code:

import pandas as pd


def _derivative_series(self):
    if isinstance(self.index, pd.DatetimeIndex):
        den = self.index.to_series(keep_tz=True).diff().dt.total_seconds()
    else:
        den = self.index.to_series().diff()
    num = self.diff()
    return num.div(den, axis=0)


def _derivative_dataframe(self, var=None):
    if var is None:
        if isinstance(self.index, pd.DatetimeIndex):
            den = self.index.to_series(keep_tz=True).diff().dt.total_seconds()
        else:
            den = self.index.to_series().diff()
        num = self.diff()
        return num.div(den, axis=0)
    else:
        if pd.api.types.is_datetime64_any_dtype(self[var]):
            den = self[var].diff().dt.total_seconds()
        else:
            den = self[var].diff()
        num = self.loc[:, self.columns != var].diff()
        result = num.div(den, axis=0)
        result[var] = self[var]
        return result.loc[:, self.columns]


def monkey_patch_pandas(pd):
    pd.Series.derivative = _derivative_series
    pd.DataFrame.derivative = _derivative_dataframe

So you can now use derivative method with DataFrame with DatetimeIndex

import pandas as pd

monkey_patch_pandas(pd)

from io import StringIO

dat = """time,sensor1,sensor2
2019-05-27 13:49:47.703850+02:00,0.0,100.2
2019-05-27 13:49:47.827518+02:00,0.4,102.2
2019-05-27 13:49:47.974124+02:00,0.8,102.4
2019-05-27 13:49:48.097793+02:00,1.1,104.1
2019-05-27 13:49:48.222461+02:00,1.2,101.1
2019-05-27 13:49:48.355105+02:00,1.4,102.0
"""

df = pd.read_csv(StringIO(dat), index_col='time', parse_dates=True)

print("df:")
print(df)
print("")
print("df.derivative():")
print(df.derivative())
print("")

it should also work with Series with DatetimeIndex

print("df['sensor1'].derivative():")
print(df['sensor1'].derivative())
print("")

but you can also derivate using a column name as variable

print("df.reset_index().derivative(var='time'):")
print(df.reset_index().derivative(var='time'))

derivate method should also work fine with DataFrame with float index like

dat2 = """x,y1,y2
0.0,0.0,100.2
0.2,0.4,102.2
0.3,0.8,102.4
0.45,1.1,104.1
0.5,1.2,101.1
0.7,1.4,102.0
"""

df = pd.read_csv(StringIO(dat2), index_col='x', parse_dates=True)

print("df:")
print(df)
print("")
print("df.derivative():")
print(df.derivative())
print("")

with Series also

print("df['y1'].derivative():")
print(df['y1'].derivative())
print("")

and given a variable name (a DataFrame column name)

print("df.reset_index().derivative(var='x'):")
print(df.reset_index().derivative(var='x'))

I wonder if some other Pandas users have similar use cases and / if adding such a feature directly in Pandas code could be valuable.

Kind regards

PS : An other approach could be to make use of numpy.gradient function

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 5, 2019 via email

@scls19fr
Copy link
Contributor Author

scls19fr commented Jun 5, 2019

derivate and integrate are not really domain-specific calculations. I not a big fan of the monkey patch approach.

@jreback
Copy link
Contributor

jreback commented Jun 5, 2019

these are out of scope for pandas; can you not simply use the scipy methods directly?

alternatively if someone wanted to make a package which added these as accessors could be easily done I think, e.g.

df['float_column'].derivative.integrate()

via the accessors API: http://pandas.pydata.org/pandas-docs/stable/development/extending.html#registering-custom-accessors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants