pandas.DataFrame.transform
- DataFrame.transform(func, axis=0, *args, **kwargs)
- Call func on self producing a DataFrame with the same axis shape as self.

In [None]:
import pandas as pd
import numpy as np

In [18]:
df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
df

Unnamed: 0,A,B
0,0,1
1,1,2
2,2,3


In [19]:
df.transform(lambda x: x + 1)

Unnamed: 0,A,B
0,1,2
1,2,3
2,3,4


In [20]:
s = pd.Series(range(3))
s

0    0
1    1
2    2
dtype: int64

In [21]:
s.transform([np.sqrt, np.exp])

Unnamed: 0,sqrt,exp
0,0.0,1.0
1,1.0,2.718282
2,1.414214,7.389056


You can call transform on a GroupBy object:

In [22]:
df = pd.DataFrame({
    "Date": [
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
    "Data": [5, 8, 6, 1, 50, 100, 60, 120],
})
df

Unnamed: 0,Date,Data
0,2015-05-08,5
1,2015-05-07,8
2,2015-05-06,6
3,2015-05-05,1
4,2015-05-08,50
5,2015-05-07,100
6,2015-05-06,60
7,2015-05-05,120


In [23]:
df.groupby('Date')['Data'].transform('sum')

0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64

In [24]:
df = pd.DataFrame({
    "c": [1, 1, 1, 2, 2, 2, 2],
    "type": ["m", "n", "o", "m", "m", "n", "n"]
})
df

Unnamed: 0,c,type
0,1,m
1,1,n
2,1,o
3,2,m
4,2,m
5,2,n
6,2,n


In [25]:
df['size'] = df.groupby('c')['type'].transform(len)
df

Unnamed: 0,c,type,size
0,1,m,3
1,1,n,3
2,1,o,3
3,2,m,4
4,2,m,4
5,2,n,4
6,2,n,4


In [5]:
index = pd.date_range("10/1/1999", periods=1100)
index

DatetimeIndex(['1999-10-01', '1999-10-02', '1999-10-03', '1999-10-04',
               '1999-10-05', '1999-10-06', '1999-10-07', '1999-10-08',
               '1999-10-09', '1999-10-10',
               ...
               '2002-09-25', '2002-09-26', '2002-09-27', '2002-09-28',
               '2002-09-29', '2002-09-30', '2002-10-01', '2002-10-02',
               '2002-10-03', '2002-10-04'],
              dtype='datetime64[ns]', length=1100, freq='D')

In [14]:
ts = pd.Series(np.random.normal(0.5, 2, 1100), index)
ts

1999-10-01    0.478447
1999-10-02   -0.975488
1999-10-03   -1.454406
1999-10-04    2.105756
1999-10-05   -0.615747
                ...   
2002-09-30   -1.142066
2002-10-01    3.477890
2002-10-02    2.881521
2002-10-03    2.065927
2002-10-04   -1.725988
Freq: D, Length: 1100, dtype: float64

In [15]:
ts2 = ts.rolling(window=100, min_periods=100).mean().dropna()
ts2

2000-01-08    0.492242
2000-01-09    0.504473
2000-01-10    0.543029
2000-01-11    0.569492
2000-01-12    0.525283
                ...   
2002-09-30    0.720988
2002-10-01    0.747493
2002-10-02    0.758747
2002-10-03    0.802045
2002-10-04    0.779451
Freq: D, Length: 1001, dtype: float64

In [17]:
transformed = ts2.groupby(lambda x: x.year).transform(
    lambda x: (x - x.mean()) / x.std()
)
transformed

2000-01-08   -1.353126
2000-01-09   -1.225617
2000-01-10   -0.823701
2000-01-11   -0.547838
2000-01-12   -1.008688
                ...   
2002-09-30   -0.015952
2002-10-01    0.154131
2002-10-02    0.226343
2002-10-03    0.504192
2002-10-04    0.359202
Freq: D, Length: 1001, dtype: float64