<b><em>TODO: (Maybe) Name the repo DateTimes,Timedeltas, DateRanges, and Time_Series</em></b>

In [51]:
import pandas as pd
import numpy as np

In [52]:
%%HTML
<style>
b em {
    color: blue;
}

em b {
    color: blue;
}
</style>

### Timestamp

In [53]:
pd.Timestamp('9/1/2016 10:05AM')

Timestamp('2016-09-01 10:05:00')

### Period

In [54]:
pd.Period('1/2016')

Period('2016-01', 'M')

In [55]:
pd.Period('3/5/2016')

Period('2016-03-05', 'D')

### DatetimeIndex

In [56]:
t1 = pd.Series(list('abc'), [pd.Timestamp('2016-09-01'), pd.Timestamp('2016-09-02'), pd.Timestamp('2016-09-03')])
t1

2016-09-01    a
2016-09-02    b
2016-09-03    c
dtype: object

In [57]:
type(t1.index)

pandas.tseries.index.DatetimeIndex

### PeriodIndex

In [58]:
t2 = pd.Series(list('def'), [pd.Period('2016-09'), pd.Period('2016-10'), pd.Period('2016-11')])
t2

2016-09    d
2016-10    e
2016-11    f
Freq: M, dtype: object

In [59]:
type(t2.index)

pandas.tseries.period.PeriodIndex

In [60]:
tDaily = pd.Series(list('def'), [pd.Period('2016-09-01'),pd.Period('2016-09-02'),pd.Period('2016-09-03')])
tDaily

2016-09-01    d
2016-09-02    e
2016-09-03    f
Freq: D, dtype: object

In [61]:
type(tDaily.index)

pandas.tseries.period.PeriodIndex

In [62]:
t_ambiguous_frequency = pd.Series(list('def'), [pd.Period('2016-09-01'), pd.Period('2016-09-05'), pd.Period('2016-09-25')])
t_ambiguous_frequency

2016-09-01    d
2016-09-05    e
2016-09-25    f
Freq: D, dtype: object

<b><em>turns out the Freq attribute of the index in the above pandas series is daily ('D') even though the interval between the first and second element of the index is 4 days and that between the second and third element of the index is 20 days</em></b>

In [63]:
type(t_ambiguous_frequency.index)

pandas.tseries.period.PeriodIndex

**Question: What if some of the periods in the PeriodIndex of the series were days and others were months. What would the freq of the series be listed as? Would error be thrown when we try to construct the series?**

In [64]:
#t_someDays_othersMonths = pd.Series(list('def'),[pd.Period('2016-09'), pd.Period('2016-09-05'), pd.Period('2016-10-03')])
#t_someDays_othersMonths

**Answer: <em>It turns out that we do get an error if some Periods in the index are days and others are months (that's why the code in the above cell is commented out). Specifically, we get the error:</em> <b><em style = 'color:red;'>'IncompatibleFrequency: Input has different freq=D from PeriodIndex(freq=M)'</em></b>**

### Converting strings to DateTime objects

In [65]:
d1 = ['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16']
ts3 = pd.DataFrame(np.random.randint(10, 100, (4,2)), index=d1, columns=list('ab'))
ts3

Unnamed: 0,a,b
2 June 2013,34,29
"Aug 29, 2014",52,63
2015-06-26,23,78
7/12/16,79,46


In [66]:
ts3.index

Index(['2 June 2013', 'Aug 29, 2014', '2015-06-26', '7/12/16'], dtype='object')

*Pandas's to_datetime function intelligently converts strings to DateTime objects (I believe these include TimeStamps and Periods); by default, the to_datetime function assumes that the month is listed before the day is listed*

In [67]:
ts3.index = pd.to_datetime(ts3.index)
ts3

Unnamed: 0,a,b
2013-06-02,34,29
2014-08-29,52,63
2015-06-26,23,78
2016-07-12,79,46


In [68]:
ts3.index

DatetimeIndex(['2013-06-02', '2014-08-29', '2015-06-26', '2016-07-12'], dtype='datetime64[ns]', freq=None)

<b><em>Europeans like to put their days before their months, so July 4th, 2021 would be <span style = 'color:red;'>'4.7.2021'</span></em></b>

*Pandas's to_datetime function can handle strings in this format so long as the parameter <b><em>dayFirst</em></b> is set to True*

In [69]:
pd.to_datetime('4.7.2021', dayfirst=True)

Timestamp('2021-07-04 00:00:00')

### Timedeltas

*If you subtract two Timestamps, you get a Timedelta*

In [70]:
pd.Timestamp('9/3/2016')-pd.Timestamp('9/1/2016')

Timedelta('2 days 00:00:00')

*If you add a Timedelta to a Timestamp, you get a Timestamp corresponding to the appropriate time*

In [71]:
pd.Timestamp('9/2/2016 8:10AM') + pd.Timedelta('12D 3H')

Timestamp('2016-09-14 11:10:00')

### date_range function

*The pandas date_range function can construct a DateTimeIndex when supplied a starting Timestamp, a number of periods, and a frequency. The frequency can be supplied either with or without an offset*
<br>
* <b><em>frequency <u>with</u> an offset: freq = '2W-Sun'</em></b>
<br>
* <b><em>frequency <u>without</u> an offset: freq = '5H'</em></b> 

*Below cell creates a DatetimeIndex beginning on the first Sunday after October 1st with 9 total elements each spaced 2 weeks apart*

In [72]:
dates = pd.date_range('10-01-2016', periods = 9, freq = '2W-Sun')
dates

DatetimeIndex(['2016-10-02', '2016-10-16', '2016-10-30', '2016-11-13',
               '2016-11-27', '2016-12-11', '2016-12-25', '2017-01-08',
               '2017-01-22'],
              dtype='datetime64[ns]', freq='2W-SUN')

*Let's make the above Datetimeindex the index of a DataFrame. We'll make the collumns two 9 element random walks around 100 and 120, respectively.*

In [73]:
v100 = 100 + np.random.randint(-5,5,9).cumsum()
v120 = 120 + np.random.randint(-5,5,9).cumsum()
df = pd.DataFrame({'Walk_Around_100':v100, 'Walk_Around_120':v120}, index = dates)
df

Unnamed: 0,Walk_Around_100,Walk_Around_120
2016-10-02,104,117
2016-10-16,106,115
2016-10-30,105,113
2016-11-13,107,116
2016-11-27,105,111
2016-12-11,109,113
2016-12-25,111,109
2017-01-08,109,110
2017-01-22,113,114


### weekday_name function

*When calling the weekday_name function on a Timestamp, you'll get the name of the weekday*

In [74]:
df.index.tolist()[0].weekday_name

'Sunday'

### Basic Time Series

*Time series data (in the context of data frames) is just a data frame with an index that's a DatetimeIndex*

<b><em>The .loc accessor smartly and flexibly accesses the data at a particular location in the datetimeIndex</em></b>

In [75]:
df.loc['2017-01-08']

Walk_Around_100    109
Walk_Around_120    110
Name: 2017-01-08 00:00:00, dtype: int64

In [76]:
df.loc['January 8, 2017']

Walk_Around_100    109
Walk_Around_120    110
Name: 2017-01-08 00:00:00, dtype: int64

In [77]:
df.loc['January 8th, 2017']

Walk_Around_100    109
Walk_Around_120    110
Name: 2017-01-08 00:00:00, dtype: int64

In [78]:
df.loc['2017 8th January']

Walk_Around_100    109
Walk_Around_120    110
Name: 2017-01-08 00:00:00, dtype: int64

In [79]:
df.loc['2017 January 8th']

Walk_Around_100    109
Walk_Around_120    110
Name: 2017-01-08 00:00:00, dtype: int64

<b><em>Notice how all of the index access methods in the five cells above return the same data element</em></b>

<b><em>Partial matching is possible too</em></b>

Just the month

In [80]:
df.loc['2017 January']

Unnamed: 0,Walk_Around_100,Walk_Around_120
2017-01-08,109,110
2017-01-22,113,114


Just the year

In [81]:
df.loc['2016']

Unnamed: 0,Walk_Around_100,Walk_Around_120
2016-10-02,104,117
2016-10-16,106,115
2016-10-30,105,113
2016-11-13,107,116
2016-11-27,105,111
2016-12-11,109,113
2016-12-25,111,109


### Downsampling

*Downsampling is fairly clean in that unlike upsampling, you don't have to fill in missing values*

Here we downsample monthly by taking the first value of the month as representative of that month's data.

<b><em>Notice that even though the data used in the monthly sample is the first data element of the month in the original dataset, the actual date in the index ends up being the last day of each month<b></em>
<br>
<br>
<br>
<b><em style = 'color: red;'>TODO: figure out how to change dates that show up in index when downsampling a time series monthly</em></b>

In [83]:
df.resample('M').first()

Unnamed: 0,Walk_Around_100,Walk_Around_120
2016-10-31,104,117
2016-11-30,107,116
2016-12-31,109,113
2017-01-31,109,110
