# Time Methods

## Python Datetime Review

Basic Python outside of Pandas contains a datetime library:

In [1]:
import numpy as np
import pandas as pd

from datetime import datetime

In [2]:
# To illustrate the order of arguments
#this is just to show you the order of the construction of a datetime object
my_year = 2017
my_month = 1
my_day = 2
my_hour = 13
my_minute = 30
my_second = 15

In [3]:
mydate = datetime(my_year,my_month,my_day)

In [4]:
mydate

datetime.datetime(2017, 1, 2, 0, 0)

In [5]:
mydatetime = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)

In [6]:
mydatetime

datetime.datetime(2017, 1, 2, 13, 30, 15)

You can grab any part of the datetime object you want

In [7]:
mydatetime.year

2017

# Pandas

# Converting to datetime

Often when data sets are stored, the time component may be a string. Pandas easily converts strings to datetime objects.

In [8]:
myser = pd.Series(["Nov 3,1990","2000-01-01",None])

In [9]:
myser

Unnamed: 0,0
0,"Nov 3,1990"
1,2000-01-01
2,


In [10]:
myser[0]

'Nov 3,1990'

### pd.to_datetime()

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#converting-to-timestamps

In [11]:
timeser = pd.to_datetime(myser)

  timeser = pd.to_datetime(myser)


In [12]:
timeser

Unnamed: 0,0
0,1990-11-03
1,2000-01-01
2,NaT


In [13]:
timeser[0].year

1990

In [14]:
obvi_euro_date = '31-12-2000'

In [15]:
pd.to_datetime(obvi_euro_date)

  pd.to_datetime(obvi_euro_date)


Timestamp('2000-12-31 00:00:00')

In [16]:
# 10th of Dec OR 12th of October?
# We may need to tell pandas
euro_date = '10-12-2000'

In [17]:
pd.to_datetime(euro_date)

Timestamp('2000-10-12 00:00:00')

In [18]:
pd.to_datetime(euro_date,dayfirst=True)

Timestamp('2000-12-10 00:00:00')

## Custom Time String Formatting

Sometimes dates can have a non standard format, luckily you can always specify to pandas the format. You should also note this could speed up the conversion, so it may be worth doing even if pandas can parse on its own.

A full table of codes can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

In [19]:
style_date = "12--Dec--2000"

In [20]:
pd.to_datetime(style_date,format="%d--%b--%Y")

Timestamp('2000-12-12 00:00:00')

In [21]:
custom_date = "12th of Dec 2000"

In [22]:
pd.to_datetime(custom_date)

Timestamp('2000-12-12 00:00:00')

## Data

Retail Sales: Beer, Wine, and Liquor Stores

Units:  Millions of Dollars, Not Seasonally Adjusted

Frequency:  Monthly


U.S. Census Bureau, Retail Sales: Beer, Wine, and Liquor Stores [MRTSSM4453USN], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/MRTSSM4453USN, July 2, 2020.

In [25]:
sales = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Pandas/RetailSales_BeerWineLiquor.csv")

In [26]:
sales

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822
...,...,...
335,2019-12-01,6630
336,2020-01-01,4388
337,2020-02-01,4533
338,2020-03-01,5562


In [35]:
sales.iloc[0]['DATE']

Timestamp('1992-01-01 00:00:00')

In [36]:
type(sales.iloc[0]['DATE'])

pandas._libs.tslibs.timestamps.Timestamp

In [28]:
sales["DATE"]
# dtype: object .it thinks it's a string
# we need to do to convert this to be a datetime object

Unnamed: 0,DATE
0,1992-01-01
1,1992-02-01
2,1992-03-01
3,1992-04-01
4,1992-05-01
...,...
335,2019-12-01
336,2020-01-01
337,2020-02-01
338,2020-03-01


In [29]:
sales["DATE"] = pd.to_datetime(sales["DATE"])

In [31]:
sales["DATE"]

Unnamed: 0,DATE
0,1992-01-01
1,1992-02-01
2,1992-03-01
3,1992-04-01
4,1992-05-01
...,...
335,2019-12-01
336,2020-01-01
337,2020-02-01
338,2020-03-01


In [33]:
sales["DATE"].dt.year

Unnamed: 0,DATE
0,1992
1,1992
2,1992
3,1992
4,1992
...,...
335,2019
336,2020
337,2020
338,2020


## Attempt to Parse Dates Automatically

**parse_dates** - bool or list of int or names or list of lists or dict, default False
The behavior is as follows:

    boolean. If True -> try parsing the index.

    list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.

    list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.

    dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

    If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.

In [34]:
sales.head()

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822


In [37]:
# way to read in a CSV file and actually from the very start parse the dates
# Parse Column at Index 0 as Datetime
sales = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Pandas/RetailSales_BeerWineLiquor.csv",parse_dates=[0])

In [38]:
sales

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822
...,...,...
335,2019-12-01,6630
336,2020-01-01,4388
337,2020-02-01,4533
338,2020-03-01,5562


In [39]:
type(sales.iloc[0]['DATE'])
# iloc: index location
# it is used to select data by location.

pandas._libs.tslibs.timestamps.Timestamp

## Resample
A common operation with time series data is resampling based on the time series index. Let's see how to use the resample() method. [[reference](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html)]

In [40]:
sales = sales.set_index("DATE")

In [41]:
sales

Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-01-01,1509
1992-02-01,1541
1992-03-01,1597
1992-04-01,1675
1992-05-01,1822
...,...
2019-12-01,6630
2020-01-01,4388
2020-02-01,4533
2020-03-01,5562


In [44]:
# Yearly means
# it is grouped everything by year and taking the avg per year
sales.resample(rule="A").mean()

  sales.resample(rule="A").mean()


Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-12-31,1807.25
1993-12-31,1794.833333
1994-12-31,1841.75
1995-12-31,1833.916667
1996-12-31,1929.75
1997-12-31,2006.75
1998-12-31,2115.166667
1999-12-31,2206.333333
2000-12-31,2375.583333
2001-12-31,2468.416667


When calling `.resample()` you first need to pass in a **rule** parameter, then you need to call some sort of aggregation function.

The **rule** parameter describes the frequency with which to apply the aggregation function (daily, monthly, yearly, etc.)<br>
It is passed in using an "offset alias" - refer to the table below. [[reference](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)]

The aggregation function is needed because, due to resampling, we need some sort of mathematical rule to join the rows (mean, sum, count, etc.)

# .dt Method Calls

Once a column or index is ina  datetime format, you can call a variety of methods off of the .dt library inside pandas:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html

In [45]:
sales = sales.reset_index()

In [46]:
sales

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822
...,...,...
335,2019-12-01,6630
336,2020-01-01,4388
337,2020-02-01,4533
338,2020-03-01,5562


In [47]:
help(sales['DATE'].dt)

Help on DatetimeProperties in module pandas.core.indexes.accessors object:

class DatetimeProperties(Properties)
 |  DatetimeProperties(data: 'Series', orig) -> 'None'
 |  
 |  Accessor object for datetimelike properties of the Series values.
 |  
 |  Examples
 |  --------
 |  >>> seconds_series = pd.Series(pd.date_range("2000-01-01", periods=3, freq="s"))
 |  >>> seconds_series
 |  0   2000-01-01 00:00:00
 |  1   2000-01-01 00:00:01
 |  2   2000-01-01 00:00:02
 |  dtype: datetime64[ns]
 |  >>> seconds_series.dt.second
 |  0    0
 |  1    1
 |  2    2
 |  dtype: int32
 |  
 |  >>> hours_series = pd.Series(pd.date_range("2000-01-01", periods=3, freq="h"))
 |  >>> hours_series
 |  0   2000-01-01 00:00:00
 |  1   2000-01-01 01:00:00
 |  2   2000-01-01 02:00:00
 |  dtype: datetime64[ns]
 |  >>> hours_series.dt.hour
 |  0    0
 |  1    1
 |  2    2
 |  dtype: int32
 |  
 |  >>> quarters_series = pd.Series(pd.date_range("2000-01-01", periods=3, freq="QE"))
 |  >>> quarters_series
 |  0   200

In [48]:
sales['DATE'].dt.month

Unnamed: 0,DATE
0,1
1,2
2,3
3,4
4,5
...,...
335,12
336,1
337,2
338,3


In [49]:
sales['DATE'].dt.is_leap_year

Unnamed: 0,DATE
0,True
1,True
2,True
3,True
4,True
...,...
335,False
336,True
337,True
338,True
