In [1]:
import pandas as pd
import numpy as np

In [2]:
rng = pd.date_range('1/1/2011', periods=72, freq='H')
ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [3]:
converted = ts.asfreq('45Min', method='pad')

In [4]:
# Does asfreq change the # of rows?
converted.shape

(95,)

In [5]:
ts.shape

(72,)

Yes, the number of rows changes.

In [6]:
# What do the different methods do?
# method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}

From the pandas documentation:
* ‘pad’ / ‘ffill’: propagate last valid observation forward to next valid
* ‘backfill’ / ‘bfill’: use NEXT valid observation to fill

In [7]:
# Might any of these methods have pitfalls from a logical point of view?

Pad and Forward Fills are more logical in their use because backfills use data of the next observation which basically means that you are looking into the future. This is a poor way to build a model as more often than not, we will not have data on future data points and their values.

In [8]:
# What's the difference between going to a higher frequency and a lower frequency?

Higher frequency means more data in a given time range. Lower frequency means lesser data in the same time range. 

In [9]:
converted = ts.asfreq('90Min', method = 'bfill')
converted

2011-01-01 00:00:00    0.840635
2011-01-01 01:30:00   -1.548443
2011-01-01 03:00:00    0.368248
2011-01-01 04:30:00   -0.331805
2011-01-01 06:00:00   -0.069850
2011-01-01 07:30:00   -1.929173
2011-01-01 09:00:00   -1.357124
2011-01-01 10:30:00    0.048016
2011-01-01 12:00:00   -0.644281
2011-01-01 13:30:00    0.403824
2011-01-01 15:00:00    0.532065
2011-01-01 16:30:00    0.720402
2011-01-01 18:00:00    0.921410
2011-01-01 19:30:00    1.306856
2011-01-01 21:00:00   -0.970646
2011-01-01 22:30:00    1.402118
2011-01-02 00:00:00    0.828029
2011-01-02 01:30:00    0.640548
2011-01-02 03:00:00   -1.190857
2011-01-02 04:30:00   -0.326832
2011-01-02 06:00:00    0.290995
2011-01-02 07:30:00   -0.624131
2011-01-02 09:00:00    0.394241
2011-01-02 10:30:00   -0.852693
2011-01-02 12:00:00    0.270712
2011-01-02 13:30:00   -0.391201
2011-01-02 15:00:00   -0.614069
2011-01-02 16:30:00    0.632296
2011-01-02 18:00:00    0.777989
2011-01-02 19:30:00   -0.348001
2011-01-02 21:00:00   -0.120516
2011-01-

In [10]:
# What's different logically about going to a higher frequency vs a lower frequency? 
# What do you want to do when switching to a lower freqeuncy that is not logical when switching to a higher frequency?

With higher frequencies, we are creating more data points than there were originally whereas with a lower frequency, we are trying to downsample our dataset. When switching to a lower frequency, we need to decide on a way to fill in the data correponding to the datetimes that arise from downsampling.This is not an issue with higher frequencies as you could use a ffill or a bfill since there is no real loss of data.

In [11]:
ts.resample('D').sum()

2011-01-01    3.956656
2011-01-02    0.461862
2011-01-03   -2.103953
Freq: D, dtype: float64

In [12]:
# What if you want to downsample and you don't want to ffill or bfill?

Set the method of filling to None.

In [13]:
# What is the difference between .resample() and .asfreq()?

.asfreq() is more limited than .resample(). The Resample method is an aggregation object and can be used to do complex stuff that is not possible with asfreq.

In [14]:
# What are some special things you can do with .resample() you can't do with .asfreq()?

It is possible to aggregate values while downsampling using .resample() which is not possible with .asfreq()