## Working with Time Series Data in Python

#### Using dates and times with pandas

Pushkar Sheth

https://ph.sheth.cc

19 August 2019



#### Date and Time Series Functionality
* At the root: data types for date and time information
    * Objects for points in time and periods.
    * Attributes and methods reflect time-related details
* Sequences of dates and periods:
    * Series or DataFrame columns
    * Index: convert object into Time Series
* Many Series/DataFrame methods rely on time information in the index to provide time-series functionality.
    

#### Basic Building Block: pd.Timestamp

In [1]:
import pandas as pd
import numpy as np

In [2]:
from datetime import datetime # To manually create dates

In [3]:
time_stamp = pd.Timestamp(datetime(2019,1,1))

In [4]:
pd.Timestamp('2019-01-01') == time_stamp # Understands dates as strings

True

In [5]:
time_stamp

Timestamp('2019-01-01 00:00:00')

In [6]:
type(time_stamp)

pandas._libs.tslibs.timestamps.Timestamp

In [7]:
time_stamp.year

2019

In [8]:
time_stamp.weekday_name

  """Entry point for launching an IPython kernel.


'Tuesday'

Timestamp object has many attributes to store time-specific information

#### More building blocks: pd.Period and freq

In [9]:
period = pd.Period('2019-01')

In [10]:
period # default: month-end

Period('2019-01', 'M')

Period object has freq attribute to store frequency info

In [11]:
period.asfreq('D') # convert to daily

Period('2019-01-31', 'D')

In [12]:
period.to_timestamp().to_period('M') # Convert pd.Period() to pd.Timestamp() and back

Period('2019-01', 'M')

In [13]:
period + 2

Period('2019-03', 'M')

In [14]:
pd.Timestamp('2019-01-31', 'M') + 1

Timestamp('2019-02-28 00:00:00', freq='M')

#### Sequences of Dates and Times

In [15]:
index = pd.date_range(start='2019-1-1', periods = 12, freq = 'M')

# pd.date_range: start, end, periods, freq

In [16]:
index

DatetimeIndex(['2019-01-31', '2019-02-28', '2019-03-31', '2019-04-30',
               '2019-05-31', '2019-06-30', '2019-07-31', '2019-08-31',
               '2019-09-30', '2019-10-31', '2019-11-30', '2019-12-31'],
              dtype='datetime64[ns]', freq='M')

In [17]:
index[0]

Timestamp('2019-01-31 00:00:00', freq='M')

In [18]:
index.to_period()

PeriodIndex(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06',
             '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12'],
            dtype='period[M]', freq='M')

#### Create a Time series: pd.DateTimeIndex

In [19]:
pd.DataFrame({'data': index}).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 1 columns):
data    12 non-null datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 176.0 bytes


ns means nanoseconds

In [20]:
data = np.random.random(size=(12,2))  # Random number [0,1]; 12 rows, 2 columns

In [21]:
pd.DataFrame(data=data, index=index).info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 12 entries, 2019-01-31 to 2019-12-31
Freq: M
Data columns (total 2 columns):
0    12 non-null float64
1    12 non-null float64
dtypes: float64(2)
memory usage: 288.0 bytes


#### Frequency Aliases and Time Info

There are many frequency aliases besides 'M' and 'D':

| Period      | Aliases     |
|   :----:   |   :----:  |
| Hour      | H       |
| Day   | D        |
| Week  | W |
| Month  | M |
| Quarter  | q |
| Year  | Y |

These may be further differentiated by beginning/end of period, or business-specific definition

You can also access these pd.Timestamp() attributes:

| attribute      | 
|   :----:   |   
| .second, .minute, hour,      | 
| .dat, .month, .quarter, .year   | 
| .dayofweek  |
| .weekofyear  |
| .dayofyear  |



Exercise:

In [22]:
# Create the range of dates here
seven_days = pd.date_range(start='2017-1-1', periods = 7, freq = 'D')

In [23]:
# Iterate over the dates and print the number and name of the weekday
for day in seven_days:
    print(day.dayofweek, day.weekday_name)

6 Sunday
0 Monday
1 Tuesday
2 Wednesday
3 Thursday
4 Friday
5 Saturday


  This is separate from the ipykernel package so we can avoid doing imports until


#### EOC