Resampling refers to the process of converting a time series from one frequency to
another. Aggregating higher frequency data to lower frequency is called downsam‐
pling, while converting lower frequency to higher frequency is called upsampling. Not
all resampling falls into either of these categories; for example, converting W-WED
(weekly on Wednesday) to W-FRI is neither upsampling nor downsampling.

pandas objects are equipped with a resample method, which is the workhorse func‐
tion for all frequency conversion. resample has a similar API to groupby; you call
resample to group the data, then call an aggregation function:

In [1]:
import pandas as pd
import numpy as np


In [2]:
rng = pd.date_range('2000-01-01', periods=100, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2000-01-01   -0.221094
2000-01-02    2.336370
2000-01-03    0.747098
2000-01-04    1.018634
2000-01-05   -0.141017
                ...   
2000-04-05    0.417651
2000-04-06    0.791793
2000-04-07    0.942789
2000-04-08    0.872599
2000-04-09    0.506693
Freq: D, Length: 100, dtype: float64

In [3]:
ts.resample('M').mean()

2000-01-31    0.123695
2000-02-29   -0.150505
2000-03-31   -0.042518
2000-04-30    0.683523
Freq: M, dtype: float64

In [4]:
ts.resample('M', kind='period').mean()

2000-01    0.123695
2000-02   -0.150505
2000-03   -0.042518
2000-04    0.683523
Freq: M, dtype: float64

resample is a flexible and high-performance method that can be used to process very
large time series. The examples in the following sections illustrate its semantics and
use. Table 11-5 summarizes some of its options.

Table 11-5. Resample method arguments

Argument            Description

freq                String or DateOffset indicating desired resampled 
                    frequency (e.g., ‘M', ’5min', or Second(15))

axis                Axis to resample on; default axis=0

fill_method         How to interpolate when upsampling, as in 'ffill' or 'bfill'; 
                    by default does no interpolation closed In downsampling, which 
                    end of each interval is closed (inclusive), 'right' or 'left'
                    label In downsampling, how to label the aggregated result, 
                    with the 'right' or 'left' bin edge (e.g., the                    
                    9:30 to 9:35 five-minute interval could be labeled 9:30 or 9:35)

loffset             Time adjustment to the bin labels, such as '-1s' / Second(-1) 
                    to shift the aggregate labels one second earlier

limit               When forward or backward filling, the maximum number of periods to fill

kind                Aggregate to periods ('period') or timestamps ('timestamp'); 
                    defaults to the type of index the time series has

convention          When resampling periods, the convention ('start' or 'end') 
                    for converting the low-frequency period to high frequency; defaults to 'end'

### Downsampling

Aggregating data to a regular, lower frequency is a pretty normal time series task. The
data you’re aggregating doesn’t need to be fixed frequently; the desired frequency
defines bin edges that are used to slice the time series into pieces to aggregate. For
example, to convert to monthly, 'M' or 'BM', you need to chop up the data into one-
month intervals. Each interval is said to be half-open; a data point can only belong to
one interval, and the union of the intervals must make up the whole time frame.
There are a couple things to think about when using resample to downsample data:

• Which side of each interval is closed

• How to label each aggregated bin, either with the start of the interval or the end

In [5]:
rng = pd.date_range('2000-01-01', periods=12, freq='T')
ts = pd.Series(np.arange(12), index=rng)
ts

2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int32

In [6]:
# Suppose you wanted to aggregate this data into five-minute chunks or bars by taking
# the sum of each group:
ts.resample('5min', closed='right').sum()

1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int32

The frequency you pass defines bin edges in five-minute increments. By default,
the left bin edge is inclusive, so the 00:00 value is included in the 00:00 to 00:05
interval.Passing closed='right' changes the interval to be closed on the right:

The choice of the default values for closed and label might seem a bit odd to some users. In practice the
choice is somewhat arbitrary; for some target frequencies, closed='left' is preferable, while for others
closed='right' makes more sense. The important thing is that you keep in mind exactly how you are seg‐
menting the data.

In [7]:
ts.resample('5min', closed='right').sum()

1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int32

The resulting time series is labeled by the timestamps from the left side of each bin.
By passing label='right' you can label them with the right bin edge:

In [8]:
ts.resample('5min', closed='right', label='right').sum()

2000-01-01 00:00:00     0
2000-01-01 00:05:00    15
2000-01-01 00:10:00    40
2000-01-01 00:15:00    11
Freq: 5T, dtype: int32

Lastly, you might want to shift the result index by some amount, say subtracting one
second from the right edge to make it more clear which interval the timestamp refers
to. To do this, pass a string or date offset to loffset:

In [9]:
ts.resample('5min', closed='right',label='right', loffset='-1s').sum()

# You also could have accomplished the effect of loffset by calling the shift method
# on the result without the loffset.

TypeError: Series.resample() got an unexpected keyword argument 'loffset'

__Open-High-Low-Close (OHLC) resampling__

In finance, a popular way to aggregate a time series is to compute four values for each
bucket: the first (open), last (close), maximum (high), and minimal (low) values. By
using the ohlc aggregate function you will obtain a DataFrame having columns con‐
taining these four aggregates, which are efficiently computed in a single sweep of the
data:

In [10]:
ts.resample('5min').ohlc()

Unnamed: 0,open,high,low,close
2000-01-01 00:00:00,0,4,0,4
2000-01-01 00:05:00,5,9,5,9
2000-01-01 00:10:00,10,11,10,11


### Upsampling and Interpolation

When converting from a low frequency to a higher frequency, no aggregation is
needed. Let’s consider a DataFrame with some weekly data:

In [11]:
frame = pd.DataFrame(np.random.randn(2, 4),
                     index=pd.date_range('1/1/2000', periods=2,
                                         freq='W-WED'),
                                         columns=['Colorado', 'Texas', 'New York', 'Ohio'])
frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-2.351594,-0.315756,0.600572,-0.880947
2000-01-12,-0.203121,0.916661,-0.704407,0.109152


When you are using an aggregation function with this data, there is only one value
per group, and missing values result in the gaps. We use the asfreq method to con‐
vert to the higher frequency without any aggregation:

In [12]:
df_daily = frame.resample('D').asfreq()
df_daily

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-2.351594,-0.315756,0.600572,-0.880947
2000-01-06,,,,
2000-01-07,,,,
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-0.203121,0.916661,-0.704407,0.109152


Suppose you wanted to fill forward each weekly value on the non-Wednesdays. The
same filling or interpolation methods available in the fillna and reindex methods
are available for resampling:

In [13]:
frame.resample('D').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-2.351594,-0.315756,0.600572,-0.880947
2000-01-06,-2.351594,-0.315756,0.600572,-0.880947
2000-01-07,-2.351594,-0.315756,0.600572,-0.880947
2000-01-08,-2.351594,-0.315756,0.600572,-0.880947
2000-01-09,-2.351594,-0.315756,0.600572,-0.880947
2000-01-10,-2.351594,-0.315756,0.600572,-0.880947
2000-01-11,-2.351594,-0.315756,0.600572,-0.880947
2000-01-12,-0.203121,0.916661,-0.704407,0.109152


You can similarly choose to only fill a certain number of periods forward to limit how
far to continue using an observed value:

In [14]:
frame.resample('D').ffill(limit=2)

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-2.351594,-0.315756,0.600572,-0.880947
2000-01-06,-2.351594,-0.315756,0.600572,-0.880947
2000-01-07,-2.351594,-0.315756,0.600572,-0.880947
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-0.203121,0.916661,-0.704407,0.109152


In [15]:
# Notably, the new date index need not overlap with the old one at all:
frame.resample('W-THU').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-06,-2.351594,-0.315756,0.600572,-0.880947
2000-01-13,-0.203121,0.916661,-0.704407,0.109152


### Resampling with Periods

Resampling data indexed by periods is similar to timestamps:

In [17]:
frame = pd.DataFrame(np.random.randn(24, 4),
                     index=pd.period_range('1-2000', '12-2001',freq='M'),
                     columns=['Colorado', 'Texas', 'New York', 'Ohio'])
frame[:5]

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01,-0.16551,-0.537824,-0.866164,0.561937
2000-02,-0.360703,-1.290298,0.815228,0.507127
2000-03,0.660895,-0.273805,-1.559603,-0.820251
2000-04,0.078544,-2.313172,0.667439,-0.116105
2000-05,0.385366,0.267598,0.567636,-0.471763


In [18]:
annual_frame = frame.resample('A-DEC').mean()
annual_frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000,-0.109284,-0.36283,0.335046,0.098427
2001,-0.48453,0.185885,-0.317265,0.158048


Upsampling is more nuanced, as you must make a decision about which end of the
timespan in the new frequency to place the values before resampling, just like the
asfreq method. The convention argument defaults to 'start' but can also be 'end':

In [19]:
# Q-DEC: Quarterly, year ending in December
annual_frame.resample('Q-DEC').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q1,-0.109284,-0.36283,0.335046,0.098427
2000Q2,-0.109284,-0.36283,0.335046,0.098427
2000Q3,-0.109284,-0.36283,0.335046,0.098427
2000Q4,-0.109284,-0.36283,0.335046,0.098427
2001Q1,-0.48453,0.185885,-0.317265,0.158048
2001Q2,-0.48453,0.185885,-0.317265,0.158048
2001Q3,-0.48453,0.185885,-0.317265,0.158048
2001Q4,-0.48453,0.185885,-0.317265,0.158048


In [20]:
annual_frame.resample('Q-DEC', convention='end').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q4,-0.109284,-0.36283,0.335046,0.098427
2001Q1,-0.109284,-0.36283,0.335046,0.098427
2001Q2,-0.109284,-0.36283,0.335046,0.098427
2001Q3,-0.109284,-0.36283,0.335046,0.098427
2001Q4,-0.48453,0.185885,-0.317265,0.158048


Since periods refer to timespans, the rules about upsampling and downsampling are
more rigid:

• In downsampling, the target frequency must be a subperiod of the source
frequency.

• In upsampling, the target frequency must be a superperiod of the source
frequency.

If these rules are not satisfied, an exception will be raised. This mainly affects the
quarterly, annual, and weekly frequencies; for example, the timespans defined by Q-
MAR only line up with A-MAR, A-JUN, A-SEP, and A-DEC:

In [21]:
annual_frame.resample('Q-MAR').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q4,-0.109284,-0.36283,0.335046,0.098427
2001Q1,-0.109284,-0.36283,0.335046,0.098427
2001Q2,-0.109284,-0.36283,0.335046,0.098427
2001Q3,-0.109284,-0.36283,0.335046,0.098427
2001Q4,-0.48453,0.185885,-0.317265,0.158048
2002Q1,-0.48453,0.185885,-0.317265,0.158048
2002Q2,-0.48453,0.185885,-0.317265,0.158048
2002Q3,-0.48453,0.185885,-0.317265,0.158048
