[View in Colaboratory](https://colab.research.google.com/github/gauravsingh1012/Understanding-Pandas-For-Data-Analysis-and-Data-Visualisation/blob/master/Pandas_For_Finance_Chapter_2_Time_series.ipynb)

#Chapter 2: Time-series

** Notebook Setup: Loading historical stock data from the quandl **


In [3]:
!pip install quandl

Collecting quandl
  Downloading https://files.pythonhosted.org/packages/91/c4/dffb7ebe00231e7bde2c6841d68710705ef4fdc799e1f6ea9cc3d1fe751b/Quandl-3.4.1-py2.py3-none-any.whl
Collecting inflection>=0.3.1 (from quandl)
  Downloading https://files.pythonhosted.org/packages/d5/35/a6eb45b4e2356fe688b21570864d4aa0d0a880ce387defe9c589112077f8/inflection-0.3.1.tar.gz
Collecting more-itertools (from quandl)
[?25l  Downloading https://files.pythonhosted.org/packages/79/b1/eace304ef66bd7d3d8b2f78cc374b73ca03bc53664d78151e9df3b3996cc/more_itertools-4.3.0-py3-none-any.whl (48kB)
[K    100% |████████████████████████████████| 51kB 5.4MB/s 
Building wheels for collected packages: inflection
  Running setup.py bdist_wheel for inflection ... [?25l- done
[?25h  Stored in directory: /content/.cache/pip/wheels/9f/5a/d3/6fc3bf6516d2a3eb7e18f9f28b472110b59325f3f258fe9211
Successfully built inflection
Installing collected packages: inflection, more-itertools, quandl
Successfully installed inflection-0.3.

In [0]:
import quandl
import pandas as pd
import numpy as np
import time

In [0]:
quandl.ApiConfig.api_key = input('Please enter your quandl Key: ')
exchange = 'WIKI'

In [6]:
ticker = 'MSFT'
msft = quandl.get('%s/%s' % (exchange, ticker))
msft.head(1)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ex-Dividend,Split Ratio,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1986-03-13,25.5,29.25,25.5,28.0,3582600.0,0.0,1.0,0.058941,0.067609,0.058941,0.06472,1031789000.0


>**1. DatetimeIndex and its use in time-series data**

>Excelling at manipulating time-series data, pandas was created initially for use in
finance, and from its inception, it has had facilities for managing complete date and
time-series operations to handle complex financial scenarios. These capabilities have
been progressively expanded and refined over all of its versions.

>The representations of dates, times, and time intervals and periods provided by
pandas, which are pandas's own, are above and beyond those provided in other
Python frameworks such as SciPy and NumPy. The pandas implementations provide
additional capabilities that are required to model time-series data, and to transform
data across different frequencies, periods, and calendars for different organizations
and financial markets.

>Specific dates and times in pandas are represented using the pandas Timestamp
class. Timestamp is based on NumPy's dtype datetime64 and has higher precision
than Python's built-in datetime object. This increased precision is frequently
required for accurate financial calculations.

>Sequences of timestamp objects are represented by pandas as a DatetimeIndex ,
which is a type of pandas index that is optimized for indexing by dates and times.
There are several ways to create DatetimeIndex objects in pandas. The following
command creates a DatetimeIndex from an array of datetime objects:

In [7]:
pd.DatetimeIndex(msft.reset_index(level=0)["Date"])


DatetimeIndex(['1986-03-13', '1986-03-14', '1986-03-17', '1986-03-18',
               '1986-03-19', '1986-03-20', '1986-03-21', '1986-03-24',
               '1986-03-25', '1986-03-26',
               ...
               '2018-03-14', '2018-03-15', '2018-03-16', '2018-03-19',
               '2018-03-20', '2018-03-21', '2018-03-22', '2018-03-23',
               '2018-03-26', '2018-03-27'],
              dtype='datetime64[ns]', name='Date', length=8076, freq=None)

In [0]:
dates = pd.DatetimeIndex(msft.reset_index(level=0)["Date"])

In [0]:
np.random.seed(123456)
ts = pd.Series(np.random.randn(len(dates)),dates)

In [10]:
type(ts.index)

pandas.core.indexes.datetimes.DatetimeIndex

>A Series will also automatically construct a DatetimeIndex as its index when
passing a list of datetime objects as the index parameter.

>The Series object has taken the datetime objects and constructed a DatetimeIndex
from the date values, where each value of the DatetimeIndex is a Timestamp object,
and each element of the index can be used to access the corresponding value in the
Series object. To demonstrate this, the following command shows several ways
in which we can access the value in the Series with the date 2018-03-14 as an
index label:

In [11]:
ts.index

DatetimeIndex(['1986-03-13', '1986-03-14', '1986-03-17', '1986-03-18',
               '1986-03-19', '1986-03-20', '1986-03-21', '1986-03-24',
               '1986-03-25', '1986-03-26',
               ...
               '2018-03-14', '2018-03-15', '2018-03-16', '2018-03-19',
               '2018-03-20', '2018-03-21', '2018-03-22', '2018-03-23',
               '2018-03-26', '2018-03-27'],
              dtype='datetime64[ns]', name='Date', length=8076, freq=None)

In [12]:
ts['2018-03-14']

-0.2947681722979329

>One of the advantages of pandas is the ability to be able to select based upon partial
datetime specifications. As an example, the following command selects data
for the month of March 2018:

In [13]:
ts['2018-03']

Date
2018-03-01    1.069782
2018-03-02   -0.424193
2018-03-05    0.986069
2018-03-06    0.209142
2018-03-07   -0.978009
2018-03-08    0.018495
2018-03-09    0.284592
2018-03-12   -1.092970
2018-03-13    0.035153
2018-03-14   -0.294768
2018-03-15    0.965046
2018-03-16    0.025081
2018-03-19   -0.792228
2018-03-20    1.020248
2018-03-21   -0.141247
2018-03-22    0.279738
2018-03-23   -1.276618
2018-03-26    0.501666
2018-03-27   -1.257674
dtype: float64

>Note that this did not require the use of the .loc method, as pandas
first identifies this as a partial date and then looks along the index of
the DataFrame instead of a column (although .loc can be used to
perform an equivalent operation).

>Also provided by pandas is the pd.to_datetime() function, which is used to
perform a conversion of a list of potentially mixed type items into a DatetimeIndex :

In [14]:
dti = pd.to_datetime(['Aug 1, 2014', '2014-08-02',
'2014.8.3', None])
dti #Notice that None is converted into a not-a-time value, NaT, which represents that the source data could not be converted into datetime.

DatetimeIndex(['2014-08-01', '2014-08-02', '2014-08-03', 'NaT'], dtype='datetime64[ns]', freq=None)

>The pandas default is that date strings are always month first. If you need to parse
dates with the day as the first component, you can use the dayfirst=True option,
which can be useful as data can often have day first, particularly when it is non-U.S.
data. The following command demonstrates this in action and also shows how the
ordering can be changed:

In [15]:
dti1 = pd.to_datetime(['8/1/2014'])
dti2 = pd.to_datetime(['1/8/2014'], dayfirst=True)
dti1[0], dti2[0]

(Timestamp('2014-08-01 00:00:00'), Timestamp('2014-08-01 00:00:00'))

>A range of timestamps at a specific frequency can easily be created using the
pd.date_range() function. The following command creates a Series from a
DatetimeIndex of 10 consecutive days:

In [16]:
np.random.seed(123456)
dates = pd.date_range('8/1/2014', periods=10)
s1 = pd.Series(np.random.randn(10), dates)
s1[:10]

2014-08-01    0.469112
2014-08-02   -0.282863
2014-08-03   -1.509059
2014-08-04   -1.135632
2014-08-05    1.212112
2014-08-06   -0.173215
2014-08-07    0.119209
2014-08-08   -1.044236
2014-08-09   -0.861849
2014-08-10   -2.104569
Freq: D, dtype: float64

>Like any pandas index, a DatetimeIndex can be used for various index operations,
such as data alignment, selection, and slicing.

In [17]:
msft.loc['2012-01-01':'2012-01-05'] #slicing

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ex-Dividend,Split Ratio,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2012-01-03,26.55,26.96,26.39,26.765,64731500.0,0.0,1.0,22.609162,22.958305,22.47291,22.792249,64731500.0
2012-01-04,26.8199,27.47,26.78,27.4,80516100.0,0.0,1.0,22.839,23.392605,22.805022,23.332995,80516100.0
2012-01-05,27.38,27.728,27.29,27.68,56081400.0,0.0,1.0,23.315964,23.61231,23.239323,23.571435,56081400.0


In [18]:
msft.loc['2012-01':'2012-01-05'] #slicing till particular date

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Ex-Dividend,Split Ratio,Adj. Open,Adj. High,Adj. Low,Adj. Close,Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2012-01-03,26.55,26.96,26.39,26.765,64731500.0,0.0,1.0,22.609162,22.958305,22.47291,22.792249,64731500.0
2012-01-04,26.8199,27.47,26.78,27.4,80516100.0,0.0,1.0,22.839,23.392605,22.805022,23.332995,80516100.0
2012-01-05,27.38,27.728,27.29,27.68,56081400.0,0.0,1.0,23.315964,23.61231,23.239323,23.571435,56081400.0


> As shown above, It is also possible to slice, starting at the beginning of a specific month and ending at
a specific day of the month

>A specific item can be retrieved from a time-series represented by a DataFrame by
specifying the date/time index value and using the .loc method. The result is a
Series where the index labels are the column names, with the values for each being
in a specific row for each of the columns:

In [19]:
msft.loc['2012-01-03']

Open           2.655000e+01
High           2.696000e+01
Low            2.639000e+01
Close          2.676500e+01
Volume         6.473150e+07
Ex-Dividend    0.000000e+00
Split Ratio    1.000000e+00
Adj. Open      2.260916e+01
Adj. High      2.295830e+01
Adj. Low       2.247291e+01
Adj. Close     2.279225e+01
Adj. Volume    6.473150e+07
Name: 2012-01-03 00:00:00, dtype: float64

In [20]:
#msft['2012-01-03'] while this will not work for dataframe, this syntax can work for series
msftAC = msft['Adj. Close']
msftAC['2012-01-03']

22.792248934631

>So, This is a subtle difference that sometimes causes headaches when using
time-series data in pandas. **So be careful or always convert your Series
objects to DataFrame objects to use a lookup, i.e using .loc to lookup
using the index.**

>**2. Creating time-series with specific frequencies**

>Time-series data in pandas can also be created to represent intervals of time other
than daily frequency. Different frequencies can be generated with **pd.date_range()**
by utilizing the freq parameter. This parameter defaults to a value of D , which
represents daily frequency.

>To introduce the creation of nondaily frequencies, the following command creates
a DatetimeIndex with one-minute intervals using **freq='T'** :

In [24]:
date_range = pd.date_range('2014-08-01','2014-08-01 00:19:00',freq='T')

bymin = pd.Series(np.arange(0, len(date_range)),date_range)
bymin

2014-08-01 00:00:00     0
2014-08-01 00:01:00     1
2014-08-01 00:02:00     2
2014-08-01 00:03:00     3
2014-08-01 00:04:00     4
2014-08-01 00:05:00     5
2014-08-01 00:06:00     6
2014-08-01 00:07:00     7
2014-08-01 00:08:00     8
2014-08-01 00:09:00     9
2014-08-01 00:10:00    10
2014-08-01 00:11:00    11
2014-08-01 00:12:00    12
2014-08-01 00:13:00    13
2014-08-01 00:14:00    14
2014-08-01 00:15:00    15
2014-08-01 00:16:00    16
2014-08-01 00:17:00    17
2014-08-01 00:18:00    18
2014-08-01 00:19:00    19
Freq: T, dtype: int64

>This time-series allows us to use forms of slicing at finer resolution. Earlier, we saw
slicing at day and month levels, but now we have a time-series with minute-based
data that we can slice down to hours and minutes (and smaller intervals if we use
finer frequencies):

In [25]:
bymin['2014-08-01 12:30':'2014-08-01 12:39']

Series([], Freq: T, dtype: int64)

>**3. Representing intervals of time using periods**

>It is often required to represent not just a specific time or sequence of timestamps,
but to represent an interval of time using a start date and an end date (an example of
this would be a financial quarter). This representation of a bounded interval of time
can be represented in pandas using **pd.Period** objects.
Period objects consist of a start time and an end time and are created from a
start date with a given frequency. The start time is referred to as the anchor of the
Period object, and the end time is then calculated from the start date and the period
specification.

>To demonstrate this, the following command creates a period representing a 1-month
period **freq='M'** anchored in August 2014:

In [26]:
aug2014 = pd.Period('2014-08', freq='M')
aug2014 , aug2014.start_time, aug2014.end_time

(Period('2014-08', 'M'),
 Timestamp('2014-08-01 00:00:00'),
 Timestamp('2014-08-31 23:59:59.999999999'))

>Mathematical operations are overloaded on Period objects, so as to calculate another
period based upon the value represented in Period . As an example, the following
command creates a new Period based upon the aug2014 period object by adding 1
to the period.

In [28]:
sep2014 = aug2014 + 1
sep2014,sep2014.start_time, sep2014.end_time

(Period('2014-09', 'M'),
 Timestamp('2014-09-01 00:00:00'),
 Timestamp('2014-09-30 23:59:59.999999999'))

>Period objects are useful when combined into a collection referred to as a
PeriodIndex . The following command creates a pandas PeriodIndex consisting of
1-month intervals for the year of 2013:

In [29]:
mp2013 = pd.period_range('1/1/2013', '12/31/2013', freq='M')
mp2013

PeriodIndex(['2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06',
             '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12'],
            dtype='period[M]', freq='M')

>A PeriodIndex differs from a DatetimeIndex in that in a PeriodIndex , the index labels are Period objects

In [33]:
for p in mp2013:
  print("{0} {1} {2} {3}".format(p, p.freq,  p.start_time, p.end_time))

2013-01 <MonthEnd> 2013-01-01 00:00:00 2013-01-31 23:59:59.999999999
2013-02 <MonthEnd> 2013-02-01 00:00:00 2013-02-28 23:59:59.999999999
2013-03 <MonthEnd> 2013-03-01 00:00:00 2013-03-31 23:59:59.999999999
2013-04 <MonthEnd> 2013-04-01 00:00:00 2013-04-30 23:59:59.999999999
2013-05 <MonthEnd> 2013-05-01 00:00:00 2013-05-31 23:59:59.999999999
2013-06 <MonthEnd> 2013-06-01 00:00:00 2013-06-30 23:59:59.999999999
2013-07 <MonthEnd> 2013-07-01 00:00:00 2013-07-31 23:59:59.999999999
2013-08 <MonthEnd> 2013-08-01 00:00:00 2013-08-31 23:59:59.999999999
2013-09 <MonthEnd> 2013-09-01 00:00:00 2013-09-30 23:59:59.999999999
2013-10 <MonthEnd> 2013-10-01 00:00:00 2013-10-31 23:59:59.999999999
2013-11 <MonthEnd> 2013-11-01 00:00:00 2013-11-30 23:59:59.999999999
2013-12 <MonthEnd> 2013-12-01 00:00:00 2013-12-31 23:59:59.999999999


>With a PeriodIndex , we can then construct a Series using it as the index:

In [34]:
np.random.seed(123456)
ps = pd.Series(np.random.randn(12), mp2013)
ps

2013-01    0.469112
2013-02   -0.282863
2013-03   -1.509059
2013-04   -1.135632
2013-05    1.212112
2013-06   -0.173215
2013-07    0.119209
2013-08   -1.044236
2013-09   -0.861849
2013-10   -2.104569
2013-11   -0.494929
2013-12    1.071804
Freq: M, dtype: float64

>We now have a time-series where the value at a specific index label represents a
measurement that spans a period of time, such as the average value of a security
in a given month, instead of at a specific time. This becomes very useful when we
perform resampling of the time-series to another frequency, which we will do a little
later in this chapter.

> **4. Shifting and lagging time-series data**

>A common operation on time-series data is to shift or "lag" the values back and
forward in time, such as to calculate percentage change from sample to sample. The
pandas method for this is **.shift()** , which will shift the values in the index by a
specified number of units of the index's period.
To demonstrate shifting and lagging, we will use the adjusted close values for MSFT.
As a refresher, the following command shows the first 5 items in that time-series:

In [35]:
msftAC[:5]

Date
1986-03-13    0.064720
1986-03-14    0.067031
1986-03-17    0.068187
1986-03-18    0.066454
1986-03-19    0.065298
Name: Adj. Close, dtype: float64

>The following command shifts the adjusted closing prices forward by 1 day:

In [36]:
shifted_forward = msftAC.shift(1)
shifted_forward[:5]

Date
1986-03-13         NaN
1986-03-14    0.064720
1986-03-17    0.067031
1986-03-18    0.068187
1986-03-19    0.066454
Name: Adj. Close, dtype: float64

>Notice that the value of the index label of 1986-03-13 is now NaN . When shifting
at the same frequency as that of the index, the shift will result in one or more NaN
values being added for the labels at one end of the Series , and a loss of the same
number of values at the other end. The amount of NaN values is the same as the
number of specified periods.

>If we examine the tail of both the original and shifted Series , we will see that the
last value in the Series was shifted away:

In [37]:
msftAC.tail(5), shifted_forward.tail(5)

(Date
 2018-03-21    92.48
 2018-03-22    89.79
 2018-03-23    87.18
 2018-03-26    93.78
 2018-03-27    89.47
 Name: Adj. Close, dtype: float64, Date
 2018-03-21    93.13
 2018-03-22    92.48
 2018-03-23    89.79
 2018-03-26    87.18
 2018-03-27    93.78
 Name: Adj. Close, dtype: float64)

>so the value that was in original one at the end is now lost in shifted one.
It is also possible to shift values in the opposite direction. The following command
demonstrates this by **shifting the Series by -2, However this will result in two NANs at the end**:

In [40]:
shifted_backwards = msftAC.shift(-2)
shifted_backwards[:5], shifted_backwards.tail(5)

(Date
 1986-03-13    0.068187
 1986-03-14    0.066454
 1986-03-17    0.065298
 1986-03-18    0.063564
 1986-03-19    0.061831
 Name: Adj. Close, dtype: float64, Date
 2018-03-21    87.18
 2018-03-22    93.78
 2018-03-23    89.47
 2018-03-26      NaN
 2018-03-27      NaN
 Name: Adj. Close, dtype: float64)

>It is possible to shift by different frequencies using the freq parameter. This will
create a time-series with a new index, where the index labels are adjusted by the
number of specified units of the given frequency. As an example, the following
command shifts forward the time-series with a frequency of 1 day by one second:

In [41]:
msftAC.shift(1, freq="S")[:5]

Date
1986-03-13 00:00:01    0.064720
1986-03-14 00:00:01    0.067031
1986-03-17 00:00:01    0.068187
1986-03-18 00:00:01    0.066454
1986-03-19 00:00:01    0.065298
Name: Adj. Close, dtype: float64

>The resulting DataFrame or Series is essentially the same as the original, with the
specified number of units of frequency added to each index label. No data will be
shifted out or replaced with NaN as this is not performing realignment.

>**An alternate form of shifting is provided by pandas using the .tshift() method.**
Rather than changing the alignment of the data, .tshift() simply results in a
new Series or DataFrame , where the **values of the index labels are changed by the
specified number of offsets of the value of the freq parameter**. This is demonstrated
by the following command, which modifies the index labels by 1 day:

In [43]:
msftAC.tshift(1, freq="D")[:5]

Date
1986-03-14    0.064720
1986-03-15    0.067031
1986-03-18    0.068187
1986-03-19    0.066454
1986-03-20    0.065298
Name: Adj. Close, dtype: float64

>**A practical application of shifting is the calculation of daily percentage changes from
the previous day. The following command calculates the day-to-day percentage
change in the adjusted closing price for MSFT**:

In [46]:
(msftAC / msftAC.shift(1) - 1) [:5]

Date
1986-03-13         NaN
1986-03-14    0.035714
1986-03-17    0.017241
1986-03-18   -0.025424
1986-03-19   -0.017391
Name: Adj. Close, dtype: float64

>**5. Frequency conversion of time-series data**

>The frequency of the data in a time-series can be converted in pandas using the
**.asfreq() method of a Series or DataFrame** . To demonstrate, we will use the
following small subset of the MSFT stock closing values:

In [48]:
sample = msftAC[:2]
sample

Date
1986-03-13    0.064720
1986-03-14    0.067031
Name: Adj. Close, dtype: float64

In [49]:
sample.asfreq("H")

Date
1986-03-13 00:00:00    0.064720
1986-03-13 01:00:00         NaN
1986-03-13 02:00:00         NaN
1986-03-13 03:00:00         NaN
1986-03-13 04:00:00         NaN
1986-03-13 05:00:00         NaN
1986-03-13 06:00:00         NaN
1986-03-13 07:00:00         NaN
1986-03-13 08:00:00         NaN
1986-03-13 09:00:00         NaN
1986-03-13 10:00:00         NaN
1986-03-13 11:00:00         NaN
1986-03-13 12:00:00         NaN
1986-03-13 13:00:00         NaN
1986-03-13 14:00:00         NaN
1986-03-13 15:00:00         NaN
1986-03-13 16:00:00         NaN
1986-03-13 17:00:00         NaN
1986-03-13 18:00:00         NaN
1986-03-13 19:00:00         NaN
1986-03-13 20:00:00         NaN
1986-03-13 21:00:00         NaN
1986-03-13 22:00:00         NaN
1986-03-13 23:00:00         NaN
1986-03-14 00:00:00    0.067031
Freq: H, Name: Adj. Close, dtype: float64

>A new index with hourly index labels has been created by pandas, but when aligning
to the original time-series, only two values were found, |**thereby leaving the others
filled with NaN**.
We can change this default behavior using the method parameter of the **.asfreq()**
method. **One method is pad or ffill that will fill with the last known value**:

In [50]:
sample.asfreq("H", method="ffill")

Date
1986-03-13 00:00:00    0.064720
1986-03-13 01:00:00    0.064720
1986-03-13 02:00:00    0.064720
1986-03-13 03:00:00    0.064720
1986-03-13 04:00:00    0.064720
1986-03-13 05:00:00    0.064720
1986-03-13 06:00:00    0.064720
1986-03-13 07:00:00    0.064720
1986-03-13 08:00:00    0.064720
1986-03-13 09:00:00    0.064720
1986-03-13 10:00:00    0.064720
1986-03-13 11:00:00    0.064720
1986-03-13 12:00:00    0.064720
1986-03-13 13:00:00    0.064720
1986-03-13 14:00:00    0.064720
1986-03-13 15:00:00    0.064720
1986-03-13 16:00:00    0.064720
1986-03-13 17:00:00    0.064720
1986-03-13 18:00:00    0.064720
1986-03-13 19:00:00    0.064720
1986-03-13 20:00:00    0.064720
1986-03-13 21:00:00    0.064720
1986-03-13 22:00:00    0.064720
1986-03-13 23:00:00    0.064720
1986-03-14 00:00:00    0.067031
Freq: H, Name: Adj. Close, dtype: float64

>**6. Resampling of time-series**

>Frequency conversion provides basic conversion of data using the new frequency
intervals and allows the filling of missing data using either NaN, forward filling,
or backward filling. More elaborate control is provided through the process of
resampling.

>Resampling can be either **downsampling, where data is converted to wider
frequency ranges (such as downsampling from day-to-day to month-to-month)**
or **upsampling, where data is converted to narrower time ranges**. Data for the
associated labels are then calculated by a function provided to pandas instead
of simple filling.

>To demonstrate upsampling, we will calculate the daily cumulative returns for
the MSFT stock and resample it to monthly frequency. We will
examine the return calculation in more detail in Chapter 5, Time-series Stock Data,
but for now, we will use it as a demonstration of the mechanics of up and down
resampling of time-series data.

>**The cumulative daily return for MSFT can be calculated with the following command
using .shift() and application of the .cumprod() method**, as shown here:

In [53]:
msft_cum_ret = (1 + (msftAC / msftAC.shift() - 1)).cumprod()
msft_cum_ret.head(),msft_cum_ret.tail()

(Date
 1986-03-13         NaN
 1986-03-14    1.035714
 1986-03-17    1.053571
 1986-03-18    1.026786
 1986-03-19    1.008929
 Name: Adj. Close, dtype: float64, Date
 2018-03-21    1428.925048
 2018-03-22    1387.361376
 2018-03-23    1347.033799
 2018-03-26    1449.011581
 2018-03-27    1382.416999
 Name: Adj. Close, dtype: float64)

>A time-series can be resampled using the .resample() method. This method
provides a very flexible means to specify the frequency conversion involved in the
resampling, as well as the means by which the resampled values are selected or
calculated.

>**The following command downsamples the daily cumulative returns from day-to-day
to month-to-month**:

In [55]:
msft_monthly_cum_ret = msft_cum_ret.resample("M")
msft_monthly_cum_ret[:5]

.resample() is now a deferred operation
You called __getitem__(...) on this deferred object which materialized it into a series
by implicitly taking the mean.  Use .resample(...).mean() instead
  


Date
1986-03-31    0.989448
1986-04-30    1.053766
1986-05-31    1.144133
1986-06-30    1.145187
1986-07-31    1.054172
Freq: M, Name: Adj. Close, dtype: float64

>As the resample period is specified as monthly, **pandas will break the index labels
into monthly intervals bounded on calendar months**, and the new index label for a
group will be the month's end date. **The value for each index entry will be the mean
of the values for the month. This can be verified for January 2012 with the following
command**:

In [56]:
msft_cum_ret['1986-03'].mean()

0.9894480519480549

>So by default resampling is done using mean. The means by which the value for each index label is calculated can be controlled
using the **how parameter**. Any function that is available via dispatching can be
used and given to the how parameter by name. The default is to use the np.mean()
function, We can use **how="ohlc"** , which will give us a summary of the open , high , low , and
close values during each sampling period. 

>For each resampling period (monthly in
this example), **pandas will return the first value in the period ( open ), the maximum
value ( high ), the lowest value ( low ), and the final value in the period ( close )**:

In [57]:
msft_cum_ret.resample("M", how="ohlc")[:5]

the new syntax is .resample(...).ohlc()
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,open,high,low,close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1986-03-31,1.035714,1.053571,0.928571,0.982143
1986-04-30,0.973214,1.214286,0.973214,1.151786
1986-05-31,1.133929,1.25,1.107143,1.25
1986-06-30,1.214286,1.223214,1.0625,1.098214
1986-07-31,1.098214,1.116071,0.973214,1.017857


>The **type of index** resulting from a resampling is controlled by the **kind** parameter,
which **can be set to timestamp (the default) or period**. In the resampling examples
up to this point, the resample has returned Timestamp and, in particular, returned
the last day of the month. The following command demonstrates **returning an index
based on periods instead of time stamps, which can be quite useful if we need to
have the start and end timestamps for each sample**:

In [62]:
def printPeriodicData(periodObj, Index=5):
  for i in periodObj.index[:Index]:
    print ("{0}:{1} {2}".format(i.start_time, i.end_time,periodObj[i]))

by_periods = msft_cum_ret.resample("M",how="mean",kind="period")

printPeriodicData(by_periods)

1986-03-01 00:00:00:1986-03-31 23:59:59.999999999 0.9894480519480549
1986-04-01 00:00:00:1986-04-30 23:59:59.999999999 1.0537662337662366
1986-05-01 00:00:00:1986-05-31 23:59:59.999999999 1.1441326530612284
1986-06-01 00:00:00:1986-06-30 23:59:59.999999999 1.1451870748299353
1986-07-01 00:00:00:1986-07-31 23:59:59.999999999 1.05417207792208


the new syntax is .resample(...).mean()
  """


To demonstrate **upsampling**, we will examine the process using the second and third
days of MSFT's adjusted close values:

In [64]:
sample = msft_cum_ret[1:3]
sample

Date
1986-03-14    1.035714
1986-03-17    1.053571
Name: Adj. Close, dtype: float64

In [71]:
by_hour = sample.resample("H")
by_hour.head(2),by_hour.tail(2) 

.resample() is now a deferred operation
You called head(...) on this deferred object which materialized it into a series
by implicitly taking the mean.  Use .resample(...).mean() instead
  
.resample() is now a deferred operation
You called tail(...) on this deferred object which materialized it into a series
by implicitly taking the mean.  Use .resample(...).mean() instead
  


(Date
 1986-03-14 00:00:00    1.035714
 1986-03-14 01:00:00         NaN
 Freq: H, Name: Adj. Close, dtype: float64, Date
 1986-03-16 23:00:00         NaN
 1986-03-17 00:00:00    1.053571
 Freq: H, Name: Adj. Close, dtype: float64)

>Hourly index labels have been created by pandas, but **the alignment only propagates
two values into the new time-series and fills the others with NaN**. This is an inherent
issue with upsampling as in the result there is missing information. By default,
pandas uses NaN but provide other methods to fill in values.

>As with frequency conversion, the **new index labels can be forward filled or back
filled** using the fill_method parameter and **specifying bfill or ffill . Another
option is to interpolate the missing data, which can be done using the time-series
object's .interpolate() method, which will perform a linear interpolation**:

In [73]:
by_hour.interpolate()[:10]

Date
1986-03-14 00:00:00    1.035714
1986-03-14 01:00:00    1.035962
1986-03-14 02:00:00    1.036210
1986-03-14 03:00:00    1.036458
1986-03-14 04:00:00    1.036706
1986-03-14 05:00:00    1.036954
1986-03-14 06:00:00    1.037202
1986-03-14 07:00:00    1.037450
1986-03-14 08:00:00    1.037698
1986-03-14 09:00:00    1.037946
Freq: H, Name: Adj. Close, dtype: float64

#Summary:

In this chapter, we have covered the following:


1.   DatetimeIndex: DatetimeIndex and time-series data
2.   Creating time-series with specific frequencies: with pd.date_range('2014-08-01','2014-08-01 00:19:00',freq='T')
3.   Representation of intervals of time user periods: with pd.Period('2014-08', freq='M') 
4.   Shifting and lagging time-series data: msftAC.shift(1) and msftAC.shift(-2)
5.   Frequency conversion of time-series data: msftAC[:2].asfreq("H", method="ffill")
6.   Downsampling and Upsampling of time-series data: msft_cum_ret.resample("M", how="ohlc")[:5] and msft_cum_ret[1:3]..resample("H")