## Working on Time Series data with Pandas
### gyleodhis@outlook.com
### [@gyleodhis](https://www.twitter.com/gyleodhis)
### ![@gyleodhis](../img/gyle.jpg)
#### Licence:
You can use this code for anything you may wish only leave this page:
#### AS IS; HOW IS, WHERE IS

Pandas was developed in the context of financial modeling, so as you might expect, it
contains a fairly extensive set of tools for working with dates, times, and timeindexed data. Date and time data comes in a few flavors as shown below:
     ### Time Stamps: For refering a particular moment in time.
     ### Time Intervals and period: For referring to length of time.
     ### Time Deltas or durations refering to an exact legnth of time

## Dates and Times in Python
### Native Python dates and times: datetime and dateutil
Python’s basic objects for working with dates and times reside in the built-in date
time module. Along with the third-party dateutil module

In [1]:
import numpy as np
import pandas as pd
from datetime import datetime
datetime(year = 1994, month=12, day=2)

datetime.datetime(1994, 12, 2, 0, 0)

In [2]:
# When using dateutil module, we can parse dates from a variet of string formats.
from dateutil import parser
date = parser.parse("2nd December, 1994")
date

datetime.datetime(1994, 12, 2, 0, 0)

In [3]:
date.strftime('%A') # Once we have the date object we can do operations like find which day it was.

'Friday'

In [4]:
# Using Numpy datatime64 to encode dates in 64-bits integers.
date = np.array('2015-07-04', dtype=np.datetime64)
date

array('2015-07-04', dtype='datetime64[D]')

In [5]:
# Once we have this we can now perform some vector operations to it:
date + np.arange(10) # prints 10 dates from 2015-07-04.


array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13'], dtype='datetime64[D]')

Notice that the time zone is automatically set to the local time on the computer exe‐
cuting the code. You can force any desired fundamental unit using one of many for‐
mat codes; for example, here we’ll force a nanosecond-based time:

In [6]:
np.datetime64('2015-07-04 12:59:59.50', 'ns')

numpy.datetime64('2015-07-04T12:59:59.500000000')

## Pandas Time Series: Indexing by Time

In [7]:
# Lets construct a series object that has time indexed data:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04','2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

In [8]:
# We can now perform any Series operation on it ie slicing:
data['2014-07-04':'2015-07-04']

2014-07-04    0
2014-08-04    1
2015-07-04    2
dtype: int64

In [9]:
# We can also pass a year to obtain a slice of all the dates in that year:
data['2014']

2014-07-04    0
2014-08-04    1
dtype: int64

## Pandas Time Series Data Structures
While the above objects can be invoked directly, it is more common to use the pd.to_datetime() function, which can parse a wide variety of formats. Passing a single date to pd.to_datetime() yields a Timestamp; passing a series of dates by default yields a DatetimeIndex:

In [10]:
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015','2015-Jul-6', '07-07-2015', '20150708'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08'],
              dtype='datetime64[ns]', freq=None)

Any DatetimeIndex can be converted to a PeriodIndex with the to_period() function with the addition of a frequency code; here we’ll use 'D' to indicate daily frequency:

In [11]:
 dates.to_period('D') # You can also use Y for year, M for month etc

PeriodIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
             '2015-07-08'],
            dtype='period[D]', freq='D')

In [12]:
# A TimedeltaIndex is created, for example, when one date is subtracted from another:
dates-dates[1]

TimedeltaIndex(['-1 days', '0 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None)

## Regular sequences: 
To make the creation of regular date sequences more convenient, Pandas offers a few
functions for this purpose: pd.date_range() for timestamps, pd.period_range() for
periods, and pd.timedelta_range() for time deltas.
### pd.date_range()
pd.date_range() accepts a start date, an end date, and an optional frequency code to create a regular sequence of dates. By default, the frequency is one day:

In [13]:
pd.date_range('2015-07-03', '2015-07-10')

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10'],
              dtype='datetime64[ns]', freq='D')

Alternatively, the date range can be specified not with a start- and endpoint, but with a startpoint and a number of periods:

In [14]:
pd.date_range('2015-07-03', periods=10)

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
               '2015-07-11', '2015-07-12'],
              dtype='datetime64[ns]', freq='D')

You can modify the spacing by altering the freq argument, which defaults to D. For
example, here we will construct a range of hourly timestamps:

In [15]:
pd.date_range('2015-07-03', periods=8, freq='H')

DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00'],
              dtype='datetime64[ns]', freq='H')

To create regular sequences of period or time delta values, the very similar pd.period_range() and pd.timedelta_range() functions are useful. Here are some monthly periods:

In [16]:
 pd.period_range('2015-07', periods=10, freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02', '2016-03', '2016-04'],
            dtype='period[M]', freq='M')

In [17]:
# We can also create a sequence of durations increasing by an hour:
 pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['00:00:00', '01:00:00', '02:00:00', '03:00:00', '04:00:00',
                '05:00:00', '06:00:00', '07:00:00', '08:00:00', '09:00:00'],
               dtype='timedelta64[ns]', freq='H')

### Frequencies and Offsets
These are the main codes:
D: Calender Day
W: Weekly
M: Month End
Q: Quarter end
A: Year End
H: Hours
T: Minutes
S: Seconds
L: Milliseconds
U: Microseconds
N: Nanoseconds
B: Business Day (Only 5 days)