In [1]:
import pandas as pd
import numpy as np

In [2]:
rng = pd.date_range('1/1/2011', periods=72, freq='H')
ts = pd.Series(range(len(rng)), index=rng)
ts.head()

2011-01-01 00:00:00    0
2011-01-01 01:00:00    1
2011-01-01 02:00:00    2
2011-01-01 03:00:00    3
2011-01-01 04:00:00    4
Freq: H, dtype: int64

In [3]:
converted = ts.asfreq('45Min', method='ffill')
converted.head()

2011-01-01 00:00:00    0
2011-01-01 00:45:00    0
2011-01-01 01:30:00    1
2011-01-01 02:15:00    2
2011-01-01 03:00:00    3
Freq: 45T, dtype: int64

In [4]:
# Does asfreq change the # of rows?
print(ts.shape)
print(converted.shape)

(72,)
(95,)


In [5]:
# What do the different methods do?
# method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}
# 'pad' is the same as 'ffill'
converted = ts.asfreq('45Min', method='pad')
converted.head()

2011-01-01 00:00:00    0
2011-01-01 00:45:00    0
2011-01-01 01:30:00    1
2011-01-01 02:15:00    2
2011-01-01 03:00:00    3
Freq: 45T, dtype: int64

In [6]:
# 'backfill' is the same as 'bfill'
converted = ts.asfreq('45Min', method='bfill')
converted.head()

2011-01-01 00:00:00    0
2011-01-01 00:45:00    1
2011-01-01 01:30:00    2
2011-01-01 02:15:00    3
2011-01-01 03:00:00    3
Freq: 45T, dtype: int64

In [7]:
converted = ts.asfreq('45Min', method=None)
converted.head()

2011-01-01 00:00:00    0.0
2011-01-01 00:45:00    NaN
2011-01-01 01:30:00    NaN
2011-01-01 02:15:00    NaN
2011-01-01 03:00:00    3.0
Freq: 45T, dtype: float64

### Might any of these methods have pitfalls from a logical point of view?
Backfill involves looking into the future when dealing with time series and can cause problems if being used to build predictive models. The None option returns NaN for every time stamp there is not already data for.

### What's the difference between going to a higher frequency and a lower frequency?
Going to a lower frequency is simply a matter of downselecting data when the new frequency is an integer multiple of the original frequency, such as going from hourly data to data every two hours. However, the dataset grows when moving to a higher frequency and one needs to try and infer what happens between data points.

In [8]:
converted = ts.asfreq('2H', method='ffill')
converted.head()

2011-01-01 00:00:00    0
2011-01-01 02:00:00    2
2011-01-01 04:00:00    4
2011-01-01 06:00:00    6
2011-01-01 08:00:00    8
Freq: 2H, dtype: int64

### What's different logically about going to a higher frequency vs a lower frequency? 
Going to a lower frequency involves removing data, while going to a higher frequency involves adding data at times not yet sampled.

### What do you want to do when switching to a lower freqeuncy that is not logical when switching to a higher frequency?
Ideally when switching to a lower frequency data already exists at all the new time stamps and data only needs to be removed. It is impossible to already have all the desired times when switching to a higher frequency.

In [9]:
ts.resample('2H').mean()[1:10]

2011-01-01 02:00:00     2.5
2011-01-01 04:00:00     4.5
2011-01-01 06:00:00     6.5
2011-01-01 08:00:00     8.5
2011-01-01 10:00:00    10.5
2011-01-01 12:00:00    12.5
2011-01-01 14:00:00    14.5
2011-01-01 16:00:00    16.5
2011-01-01 18:00:00    18.5
Freq: 2H, dtype: float64

In [10]:
ts.resample('D').sum()

2011-01-01     276
2011-01-02     852
2011-01-03    1428
Freq: D, dtype: int64

In [11]:
# What if you want to downsample and you don't want to ffill or bfill?
# There is the None option, but will return NaN for all time stamps we do not already have data for.
converted = ts.asfreq('90Min', method=None)
converted.head()

2011-01-01 00:00:00    0.0
2011-01-01 01:30:00    NaN
2011-01-01 03:00:00    3.0
2011-01-01 04:30:00    NaN
2011-01-01 06:00:00    6.0
Freq: 90T, dtype: float64

### What is the difference between .resample() and .asfreq()?
The asfreq method is just for changing frequencies, but the resample method is much more flexible and allows aggregation operations.

### What are some special things you can do with .resample() you can't do with .asfreq()?
The resample method allows uers to group data at a specified frequency and apply functions to the grouped data.