In [1]:
import numpy as np
import pandas as pd

import datetime
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline
pd.set_option('precision', 3)

## Contents
- Pandas is at its most powerful when handling time-series data
- Lots more capabilities on time-series data exist than shown here.
  - RTFM: http://pandas.pydata.org/pandas-docs/stable/timeseries.html

### Scraping web data
~~~~
from pandas_datareader import data, wb
data = data.DataReader('MSFT', data_source='google', start='2012-01-01', end='2013-12-30')
~~~~
### DatetimeIndex

- Specific dates are represented as a pandas Timestamp class
     - Based on the Numpy datetime64 high precision objects (_nanoseconds_!!!!)
- Sequences of timestamp objects are represented as a DatetimeIndex object

#### distinct datetime objects --> a single DatetimeIndex object
~~~~
dates = [datetime(2014, 8, 1), datetime(2014, 8, 2)]
dti = pd.DatetimeIndex(dates)
~~~~
#### Series and DataFrame with DatetimeIndex
** Creation of a Series with a DatetimeIndex**
1. Create by passing a pre-existing index of datetime objects
~~~~
dates = [datetime(2014, 8, 1), datetime(2014, 8, 2)]
ts = pd.Series(np.random.randn(2), index = dates)
~~~~
2. Create by passing strings in any common format
~~~~
dates = ['2012-01-01', '2012-01-02']
ts = pd.Series(np.random.randn(2), index = dates)
~~~~
3. Create the DatetimeIndex directly by passing into pd.to_datetime()
~~~~
dates = ['Aug 1, 2014', '2014-08-02', '2014.8.3', None]
dti = pd.to_datetime(dates)
pd.Series(np.random.randn(4), index = dti)
~~~~

4. Used **errors = 'coerce'** to make pd.to_datetime() robust to failure if passed  invalid formats
   - The invalid time index will be set to a NaT
~~~~
pd.to_datetime(['Aug 1, 2014', 'foo'], errors='coerce')
~~~~

** Best practice for time-series is to create a DataFrame and index selection with .loc **
- Example of the inconsistency:

  - ** Series selection **
~~~~
ts['2012-01-01'] # works!
ts['2012-01-01':'2012-01-05'] # works!
~~~~
  - ** DataFrame selection **
~~~~
ts = pd.DataFrame(ts)
ts['2012-01-01':'2012-01-05'] # works!
ts['2012-01-01'] # FAILS!  DataFrame thinks you're saying find a column called '2012-01-01'
~~~~

- Best practice with .loc:

  - ** Use df.loc **.  It just works
~~~~
ts = pd.DataFrame(ts)
ts.loc['2012-01-01'] 
ts.loc['2012-01-01':'2012-01-05']
~~~~

### Create time-series with specific frequencies, Date offsets

**freq** keyword in pd.date_range()
- freq = 'T' --> minutes
- See full list of offset aliases here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
~~~~
minute_interval = pd.date_range('2014-08-01', '2014-10-29 23:59:00', freq='T')
bymin = pd.Series(np.arange(0, len(minute_interval)), index = minute_interval)
~~~~

### Representations of time intervals using periods

1. 'Period' objects efficiently store bounded intervals
~~~~
aug2014 = pd.Period('2014-08', freq='M')
aug2014_T = pd.Period('2014-08', freq='T')
~~~~
2. Internal variables can be accessed to get useful info
~~~~
aug2014.start_time, aug2014.end_time
##### end_time gives nanosecond resolution because it's last observable data of the month!
~~~~
3. Period objects are useful because they handle the subtly of date & time addition
~~~~
feb_2014 = pd.Period('2014-02', M) # A period object
feb_2014.start_time, feb_2014.end_date # returns (Timestamp('2014-02-01 00:00:00'), Timestamp('2014-02-28 23:59:59.999999999'))
feb_2014 + 1  # Returns a new Period object that is equivalent to pd.Period('2014-03', M). 
    # The frequency of the initial period was used to set the frequency manipulated by the addition
~~~~

4. **pd.period_range** is used to pass groups of Period objects to form a PeriodIndex 
5. Use PeriodIndex to construct a Series
- ** The time series is indexed by whole periods, not single timestamps at the given time **
- ** This will become important later when we resample to diff
~~~~
my_monthly_periods = pd.period_range('1/1/2013', '12/1/2013', freq='M')
pd.Series(np.random.randn(len(my_monthly_periods)), index = my_monthly_periods)
~~~~

### Shifting and lagging

### Frequency conversion 

### Upsampling and downsampling

### DatetimeIndex

- Specific dates are represented as a pandas Timestamp class
     - Based on the Numpy datetime64 high precision objects (_nanoseconds_!!!!)
- Sequences of timestamp objects are represented as a DatetimeIndex

#### datetime objects --> DatetimeIndex
~~~~
dates = [datetime(2014, 8, 1), datetime(2014, 8, 2)]
dti = pd.DatetimeIndex(dates)
~~~~
#### Series and DataFrame with DatetimeIndex
** Creation of a Series with a DatetimeIndex**
1. Create by passing a pre-existing index of datetime objects
~~~~
dates = [datetime(2014, 8, 1), datetime(2014, 8, 2)]
ts = pd.Series(np.random.randn(2), index = dates)
~~~~
2. Create by passing strings in any common format
~~~~
dates = ['2012-01-01', '2012-01-02']
ts = pd.Series(np.random.randn(2), index = dates)
~~~~
3. Create the DatetimeIndex directly by passing into pd.to_datetime()
~~~~
dates = ['Aug 1, 2014', '2014-08-02', '2014.8.3', None]
dti = pd.to_datetime(dates)
pd.Series(np.random.randn(4), index = dti)
~~~~

4. Used **errors = 'coerce'** to make pd.to_datetime() robust to failure if passed  invalid formats
   - The invalid time index will be set to a NaT
~~~~
pd.to_datetime(['Aug 1, 2014', 'foo'], errors='coerce')
~~~~

** Best practice for time-series is to create a DataFrame and index selection with .loc **
- Example of the inconsistency:

  - ** Series selection **
~~~~
ts['2012-01-01'] # works!
ts['2012-01-01':'2012-01-05'] # works!
~~~~
  - ** DataFrame selection **
~~~~
ts = pd.DataFrame(ts)
ts['2012-01-01':'2012-01-05'] # works!
ts['2012-01-01'] # FAILS!  DataFrame thinks you're saying find a column called '2012-01-01'
~~~~

- Best practice with .loc:

  - ** Use df.loc **.  It just works
~~~~
ts = pd.DataFrame(ts)
ts.loc['2012-01-01'] 
ts.loc['2012-01-01':'2012-01-05']
~~~~

In [2]:
print 'Create a DatetimeIndex using the datetime library'
dates = [datetime(2014, 8, 1), datetime(2014, 8, 2)]
dti = pd.DatetimeIndex(dates)
dti

Create a DatetimeIndex using the datetime library


DatetimeIndex(['2014-08-01', '2014-08-02'], dtype='datetime64[ns]', freq=None)

In [3]:
print 'A series passed datetime objects will automatically create a DatetimeIndex from them'
np.random.seed(123456)
ts = pd.Series(np.random.randn(2), index = dates)
type(ts.index)

A series passed datetime objects will automatically create a DatetimeIndex from them


pandas.core.indexes.datetimes.DatetimeIndex

In [4]:
print 'Our Series with DatetimeIndex:'
ts

Our Series with DatetimeIndex:


2014-08-01    0.469
2014-08-02   -0.283
dtype: float64

In [5]:
print 'Flexibly accessing the elements in the Series:'
print ts[datetime(2014, 8, 2)]
print ts['2014-8-2']

Flexibly accessing the elements in the Series:
-0.282863344329
-0.282863344329


In [6]:
print 'Using strings instead of datetime objects for Date Series creation:'
dates = ['2014-08-01', '2014-08-02']
ts = pd.Series(np.random.randn(2), dates)

Using strings instead of datetime objects for Date Series creation:


#### pd.to_datetime() - string manipulation to DatetimeIndex

In [7]:
print 'Create a DatetimeIndex even from gratuitously mixing date formats.'
print 'An invalid type is convereted to NaT, which means Not A Time'
dti = pd.to_datetime(['Aug 1, 2014', '2014-08-02', '2014.8.3', None])
dti

Create a DatetimeIndex even from gratuitously mixing date formats.
An invalid type is convereted to NaT, which means Not A Time


DatetimeIndex(['2014-08-01', '2014-08-02', '2014-08-03', 'NaT'], dtype='datetime64[ns]', freq=None)

In [8]:
print 'Pandas will unexpectedly throw an errorif it can\'t parse a value in to_datetime()'
print 'To avoid that, set errors = \'coerce\''
pd.to_datetime(['Aug 1, 2014', 'foo'], errors='coerce')

Pandas will unexpectedly throw an errorif it can't parse a value in to_datetime()
To avoid that, set errors = 'coerce'


DatetimeIndex(['2014-08-01', 'NaT'], dtype='datetime64[ns]', freq=None)

In [9]:
print 'Set date strings to be month or day first.  Month first (AMERICA) is default.'
dti1 = pd.to_datetime(['8/1/2014'])
dti2 = pd.to_datetime(['2/8/2014'], dayfirst=True)
dti1[0], dti2[0]

Set date strings to be month or day first.  Month first (AMERICA) is default.


(Timestamp('2014-08-01 00:00:00'), Timestamp('2014-08-02 00:00:00'))

#### pd.date_range(start, end, periods = X) - Create a DatetimeIndex.  Must specify two of start, end_periods

In [10]:
np.random.seed(123456)
dates = pd.date_range('8/1/2014', periods = 10)
s1 = pd.Series(np.random.randn(10), dates)
s1[:5]

2014-08-01    0.469
2014-08-02   -0.283
2014-08-03   -1.509
2014-08-04   -1.136
2014-08-05    1.212
Freq: D, dtype: float64

In [11]:
print 'Read data from the web about Microsoft in 2012 - 2013'
#pd.read_csv("data/Chapter03/data/msft.csv")
from pandas_datareader import data, wb
msft = data.DataReader('MSFT', data_source='google', start='2012-01-01', end='2013-12-30')

Read data from the web about Microsoft in 2012 - 2013


In [12]:
print 'Slice a range by passing strings'
msft['2012-01-01':'2012-01-05']

Slice a range by passing strings


Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-01-03,26.55,26.96,26.39,26.76,64735391
2012-01-04,26.82,27.47,26.78,27.4,80519402
2012-01-05,27.38,27.73,27.29,27.68,56082205


In [13]:
print 'However, passing a single value doesn\'t work because that\'s interpreted as a column'
msft['2012-01-01']

However, passing a single value doesn't work because that's interpreted as a column


KeyError: '2012-01-01'

In [14]:
print 'If we create a Series from the DataFrame, then passing a single value does work.  CONFUSING!'
msftC = msft['Close']
msftC['2012-01-03']

If we create a Series from the DataFrame, then passing a single value does work.  CONFUSING!


26.760000000000002

** Best practice for time-series is to always convert Series to DateFrame and use .loc !! **

In [15]:
print 'Use loc like any other Dataframe'
print 'A single index value:'
msft.loc['2012-01-03']

Use loc like any other Dataframe
A single index value:


Open      2.655e+01
High      2.696e+01
Low       2.639e+01
Close     2.676e+01
Volume    6.474e+07
Name: 2012-01-03 00:00:00, dtype: float64

In [16]:
print 'A Datetime index range selection:'
msft.loc['2012-01-03':'2012-01-5']

A Datetime index range selection:


Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-01-03,26.55,26.96,26.39,26.76,64735391
2012-01-04,26.82,27.47,26.78,27.4,80519402
2012-01-05,27.38,27.73,27.29,27.68,56082205


In [17]:
print 'Can pass partial datetime values and get sensible output:'
msft.loc['2012-02':'2012-03'] # Feb and March 

Can pass partial datetime values and get sensible output:


Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-02-01,29.79,30.05,29.76,29.89,67413817
2012-02-02,29.9,30.17,29.71,29.95,52226255
2012-02-03,30.14,30.4,30.09,30.24,41845397
2012-02-06,30.04,30.22,29.97,30.2,28040378
2012-02-07,30.15,30.48,30.05,30.35,39242529
2012-02-08,30.26,30.67,30.22,30.66,49662740
2012-02-09,30.68,30.8,30.48,30.77,50481549
2012-02-10,30.64,30.8,30.36,30.5,44606751
2012-02-13,30.63,30.77,30.43,30.58,33322516
2012-02-14,30.33,30.46,29.85,30.25,59662711


### Time-series with Specfic Frequencies

In [18]:
print 'A Series with minute-level intervals.'
minute_interval = pd.date_range('2014-08-01', '2014-10-29 23:59:00', freq='T')
bymin = pd.Series(np.arange(0, len(minute_interval)), minute_interval)
bymin

A Series with minute-level intervals.


2014-08-01 00:00:00         0
2014-08-01 00:01:00         1
2014-08-01 00:02:00         2
2014-08-01 00:03:00         3
2014-08-01 00:04:00         4
2014-08-01 00:05:00         5
2014-08-01 00:06:00         6
2014-08-01 00:07:00         7
2014-08-01 00:08:00         8
2014-08-01 00:09:00         9
2014-08-01 00:10:00        10
2014-08-01 00:11:00        11
2014-08-01 00:12:00        12
2014-08-01 00:13:00        13
2014-08-01 00:14:00        14
2014-08-01 00:15:00        15
2014-08-01 00:16:00        16
2014-08-01 00:17:00        17
2014-08-01 00:18:00        18
2014-08-01 00:19:00        19
2014-08-01 00:20:00        20
2014-08-01 00:21:00        21
2014-08-01 00:22:00        22
2014-08-01 00:23:00        23
2014-08-01 00:24:00        24
2014-08-01 00:25:00        25
2014-08-01 00:26:00        26
2014-08-01 00:27:00        27
2014-08-01 00:28:00        28
2014-08-01 00:29:00        29
                        ...  
2014-10-29 23:30:00    129570
2014-10-29 23:31:00    129571
2014-10-29

In [19]:
print 'Slice the series - all data in a half hour range'
bymin['2014-08-01 12:30':'2014-08-01 12:59']

Slice the series - all data in a half hour range


2014-08-01 12:30:00    750
2014-08-01 12:31:00    751
2014-08-01 12:32:00    752
2014-08-01 12:33:00    753
2014-08-01 12:34:00    754
2014-08-01 12:35:00    755
2014-08-01 12:36:00    756
2014-08-01 12:37:00    757
2014-08-01 12:38:00    758
2014-08-01 12:39:00    759
2014-08-01 12:40:00    760
2014-08-01 12:41:00    761
2014-08-01 12:42:00    762
2014-08-01 12:43:00    763
2014-08-01 12:44:00    764
2014-08-01 12:45:00    765
2014-08-01 12:46:00    766
2014-08-01 12:47:00    767
2014-08-01 12:48:00    768
2014-08-01 12:49:00    769
2014-08-01 12:50:00    770
2014-08-01 12:51:00    771
2014-08-01 12:52:00    772
2014-08-01 12:53:00    773
2014-08-01 12:54:00    774
2014-08-01 12:55:00    775
2014-08-01 12:56:00    776
2014-08-01 12:57:00    777
2014-08-01 12:58:00    778
2014-08-01 12:59:00    779
Freq: T, dtype: int64

### Intervals of time using Periods

1. 'Period' objects efficiently store bounded intervals
~~~~
aug2014 = pd.Period('2014-08', freq='M')
aug2014_T = pd.Period('2014-08', freq='T')
~~~~
2. Internal variables can be accessed to get useful info
~~~~
aug2014.start_time, aug2014.end_time
##### end_time gives nanosecond resolution because it's last observable data of the month!
~~~~
3. Period objects are useful because they handle the subtly of date & time addition
~~~~
feb_2014 = pd.Period('2014-02', M) # A period object
feb_2014.start_time, feb_2014.end_date # returns (Timestamp('2014-02-01 00:00:00'), Timestamp('2014-02-28 23:59:59.999999999'))
feb_2014 + 1  # Returns a new Period object that is equivalent to pd.Period('2014-03', M). 
    # The frequency of the initial period was used to set the frequency manipulated by the addition
~~~~

4. **pd.period_range** is used to pass groups of Period objects to form a PeriodIndex 
5. Use PeriodIndex to construct a Series
- ** The time series is indexed by whole periods, not single timestamps at the given time **
- ** This will become important later when we resample to diff
~~~~
my_monthly_periods = pd.period_range('1/1/2013', '12/1/2013', freq='M')
pd.Series(np.random.randn(len(my_monthly_periods)), index = my_monthly_periods)
~~~~

In [20]:
print 'The passed time string is coerced as needed to match the format of the freqency'
aug2014 = pd.Period('2014-08', freq='M')
aug2014_T = pd.Period('2014-08', freq='T')
(aug2014, aug2014_T)

The passed time string is coerced as needed to match the format of the freqency


(Period('2014-08', 'M'), Period('2014-08-01 00:00', 'T'))

In [21]:
print 'Period objects have a lot of internal variables that can be accessed '
aug2014.start_time, aug2014.end_time

Period objects have a lot of internal variables that can be accessed 


(Timestamp('2014-08-01 00:00:00'), Timestamp('2014-08-31 23:59:59.999999999'))

In [22]:
print 'Operations are overloaded to conform to the Period\'s initial freq'
aug2014 + 1, aug2014_T + 1

Operations are overloaded to conform to the Period's initial freq


(Period('2014-09', 'M'), Period('2014-08-01 00:01', 'T'))

In [23]:
print 'The result of the operation is a new Period object that mimics the intial one,'
print 'as if the new object were created by pd.Period(_new_time_, freq=_freq_).'
print 'Adding 1 to aug2014 gives start_time and end_time of:'
sep2014 = aug2014 + 1
sep2014.start_time, sep2014.end_time

The result of the operation is a new Period object that mimics the intial one,
as if the new object were created by pd.Period(_new_time_, freq=_freq_).
Adding 1 to aug2014 gives start_time and end_time of:


(Timestamp('2014-09-01 00:00:00'), Timestamp('2014-09-30 23:59:59.999999999'))

#### PeriodIndex

In [24]:
print 'A data structure containing month-level frequencies for the year 2013'
mp2013 = pd.period_range('1/1/2013', '12/31/2013', freq='M')
mp2013

A data structure containing month-level frequencies for the year 2013


PeriodIndex(['2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06',
             '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12'],
            dtype='period[M]', freq='M')

In [25]:
print 'Examine the contents of the data structure'
for p in mp2013:
    print "%s %s %s %s" % (p, p.freq, p.start_time, p.end_time)

Examine the contents of the data structure
2013-01 <MonthEnd> 2013-01-01 00:00:00 2013-01-31 23:59:59.999999999
2013-02 <MonthEnd> 2013-02-01 00:00:00 2013-02-28 23:59:59.999999999
2013-03 <MonthEnd> 2013-03-01 00:00:00 2013-03-31 23:59:59.999999999
2013-04 <MonthEnd> 2013-04-01 00:00:00 2013-04-30 23:59:59.999999999
2013-05 <MonthEnd> 2013-05-01 00:00:00 2013-05-31 23:59:59.999999999
2013-06 <MonthEnd> 2013-06-01 00:00:00 2013-06-30 23:59:59.999999999
2013-07 <MonthEnd> 2013-07-01 00:00:00 2013-07-31 23:59:59.999999999
2013-08 <MonthEnd> 2013-08-01 00:00:00 2013-08-31 23:59:59.999999999
2013-09 <MonthEnd> 2013-09-01 00:00:00 2013-09-30 23:59:59.999999999
2013-10 <MonthEnd> 2013-10-01 00:00:00 2013-10-31 23:59:59.999999999
2013-11 <MonthEnd> 2013-11-01 00:00:00 2013-11-30 23:59:59.999999999
2013-12 <MonthEnd> 2013-12-01 00:00:00 2013-12-31 23:59:59.999999999


In [26]:
print 'Series from a PeriodIndex'
np.random.seed(123456)
ps = pd.Series(np.random.randn(12), index=mp2013)
ps

Series from a PeriodIndex


2013-01    0.469
2013-02   -0.283
2013-03   -1.509
2013-04   -1.136
2013-05    1.212
2013-06   -0.173
2013-07    0.119
2013-08   -1.044
2013-09   -0.862
2013-10   -2.105
2013-11   -0.495
2013-12    1.072
Freq: M, dtype: float64

### Shifting and Lagging time-series data

Practical Applications
- Calculating day-over-day changes in closing prices

Notes
1. Shifting by a constant value equal to the original index
 -  Will lose data during the realignment process, either at head or tail of the set
~~~~
msftC.shift(2) # The first two date values become NaN's 
~~~~

2. Shifting by the constant value of an arbitrary frequency
 - Will NOT lose data. Essentially just shifts the entire series by the given value
~~~~
msft.shift(2, freq='S')
~~~~

3. tshift is a more specific version of shfit that always uses frequency shifting. Syntax is identical.
~~~~
msft.tshift(2, freq='S')
~~~~

In [42]:
print 'Shifting forward 1 time unit (can also shift backwards using negative numbers passed to shift() ): '
print 'Original:'
print msftC[:5]
shifted_forward = msftC.shift(1)

print '\nShifted:'
print shifted_forward[:5]

Shifting forward 1 time unit (can also shift backwards using negative numbers passed to shift() ): 
Original:
Date
2012-01-03    26.76
2012-01-04    27.40
2012-01-05    27.68
2012-01-06    28.10
2012-01-09    27.74
Name: Close, dtype: float64

Shifted:
Date
2012-01-03      NaN
2012-01-04    26.76
2012-01-05    27.40
2012-01-06    27.68
2012-01-09    28.10
Name: Close, dtype: float64


In [45]:
print 'Shifting by a constant frequency value. No data is lost: '
msftC.shift(1, freq='S')[:5]

Shifting by a constant frequency value. No data is lost: 


Date
2012-01-03 00:00:01    26.76
2012-01-04 00:00:01    27.40
2012-01-05 00:00:01    27.68
2012-01-06 00:00:01    28.10
2012-01-09 00:00:01    27.74
Name: Close, dtype: float64

In [53]:
print 'tshift is an alternate form that is more specific for time series.'  
print 'Shift will work on all series.  Tshift will always use a freqency shift, either provided or inferred from the index.'
msftC.tshift(1, freq='D')[:5]

tshift is an alternate form that is more specific for time series.
Shift will work on all series.  Tshift will always use a freqency shift, either provided or inferred from the index.


Date
2012-01-04    26.76
2012-01-05    27.40
2012-01-06    27.68
2012-01-07    28.10
2012-01-10    27.74
Name: Close, dtype: float64

In [57]:
print 'Calculate day-over-day change in the closing price:'
msftC / msftC.shift(1, freq='D') - 1

Calculate day-over-day change in the closing price:


Date
2012-01-03      NaN
2012-01-04    0.024
2012-01-05    0.010
2012-01-06    0.015
2012-01-07      NaN
2012-01-09      NaN
2012-01-10    0.004
2012-01-11   -0.004
2012-01-12    0.010
2012-01-13    0.009
2012-01-14      NaN
2012-01-17      NaN
2012-01-18   -0.001
2012-01-19   -0.004
2012-01-20    0.057
2012-01-21      NaN
2012-01-23      NaN
2012-01-24   -0.013
2012-01-25    0.007
2012-01-26   -0.002
2012-01-27   -0.009
2012-01-28      NaN
2012-01-30      NaN
2012-01-31   -0.003
2012-02-01    0.012
2012-02-02    0.002
2012-02-03    0.010
2012-02-04      NaN
2012-02-06      NaN
2012-02-07    0.005
              ...  
2013-11-27    0.007
2013-11-28      NaN
2013-11-29      NaN
2013-11-30      NaN
2013-12-02      NaN
2013-12-03   -0.004
2013-12-04    0.016
2013-12-05   -0.024
2013-12-06    0.009
2013-12-07      NaN
2013-12-09      NaN
2013-12-10   -0.015
2013-12-11   -0.013
2013-12-12   -0.010
2013-12-13   -0.014
2013-12-14      NaN
2013-12-16      NaN
2013-12-17   -0.010
2013-12-18    0

### Frequency Conversion of time-series data

1. df.asfreq
2. Upsampling: requires filling in data or you're left with NaN's
3. Downsamplng: requires a data reduction (e.g. aggregation, min, max, etc) 

~~~~
print sample.asfreq("H") # Upsampling to hourly leaves us with NaN
print sample.asfreq("H").fillna(method='ffill') # Forward Fill, for example, to fill them in
~~~~

In [65]:
print 'Frequency conversion with df.asfreq()'
sample = msftC[:2]

print 'Upsampling to hourly leaves us with NaN\'s:'
print sample.asfreq("H")

print '\nForward Fill, for example, to fill them in: '
print sample.asfreq("H").fillna(method='ffill')

Frequency conversion with df.asfreq()
Upsampling to hourly leaves us with NaN's:
Date
2012-01-03 00:00:00    26.76
2012-01-03 01:00:00      NaN
2012-01-03 02:00:00      NaN
2012-01-03 03:00:00      NaN
2012-01-03 04:00:00      NaN
2012-01-03 05:00:00      NaN
2012-01-03 06:00:00      NaN
2012-01-03 07:00:00      NaN
2012-01-03 08:00:00      NaN
2012-01-03 09:00:00      NaN
2012-01-03 10:00:00      NaN
2012-01-03 11:00:00      NaN
2012-01-03 12:00:00      NaN
2012-01-03 13:00:00      NaN
2012-01-03 14:00:00      NaN
2012-01-03 15:00:00      NaN
2012-01-03 16:00:00      NaN
2012-01-03 17:00:00      NaN
2012-01-03 18:00:00      NaN
2012-01-03 19:00:00      NaN
2012-01-03 20:00:00      NaN
2012-01-03 21:00:00      NaN
2012-01-03 22:00:00      NaN
2012-01-03 23:00:00      NaN
2012-01-04 00:00:00    27.40
Freq: H, Name: Close, dtype: float64

Forward Fill, for example, to fill them in: 
Date
2012-01-03 00:00:00    26.76
2012-01-03 01:00:00    26.76
2012-01-03 02:00:00    26.76
2012-01-03 03:

### Resampling of time-series