* [the construction of a datetime object](#construction)
* [pd.to_datetime() function](#pd.to_datetime)
* [pd.to_datetime(dayfirst=True)](#dayfirst)
* [string codes for formatting, pd.to_datetime(format=)](#strcodesformat)
* [pd.read_csv(parse_dates=)](#parseDates)
* [DataFrame.resample()](#resample)
* [.dt method calls](#dt)

___

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#converting-to-timestamps

In [53]:
import numpy as np
import pandas as pd

from datetime import datetime

pandas, just like it has a list of __.str methods__, has a list of __.dt methods__, datetime methods that allow us to easily extract information from a date time object.

Pandas will allow us to extract information from the timestamp, such as:
- day of the week
- weekend vs weekday
- AM or PM and so on

Many machine learning methods are not able to actually understand a full datetime object.
<br>However, they can easily understand things that are more categorical, such as the day of the week, weekend versus weekday or AM vs PM.

<a id='construction'></a>
## the construction of a datetime object

In [54]:
myyear = 2015
mymonth = 1
myday = 1
myhour = 2
mymin = 30
mysec = 15

In [55]:
mydate = datetime(myyear, mymonth, myday)

In [56]:
mydate

# notice that it automatically fills in our minutes, seconds, essentially everything smaller than what we've provided

datetime.datetime(2015, 1, 1, 0, 0)

In [57]:
mydatetime = datetime(myyear, mymonth, myday, myhour, mymin, mysec)

In [58]:
mydatetime # this is a datetime object

datetime.datetime(2015, 1, 1, 2, 30, 15)

__mydatetime.__    press __TAB__    you'll see the various attributes that are available here.

In [59]:
mydatetime.year

2015

In [98]:
mydatetime.day

1

In [99]:
mydatetime.hour

2

___

There's different types of ways to format data.

__European formatting__ - the year, then the month, then the day
<br>__American style formatting__ - the month, then the day, then a year

____

<a id='pd.to_datetime'></a>
## pd.to_datetime() function

<u>default format - year, month, day, hour, min, sec (from largest to smallest)</u>

__Keep in mind__, you can always call pd.to_datetime() on an entire pandas series, not just one day.

In [60]:
myser = pd.Series(['Nov 3, 1990', '2000-01-01', None])

In [61]:
myser # dtype object meaning strings

0    Nov 3, 1990
1     2000-01-01
2           None
dtype: object

In [62]:
myser[0]

'Nov 3, 1990'

In [64]:
pd.to_datetime(myser)

# three different types of things (different formats), but pandas doesn't complain.
# and the default format uses is going to be year, month and then day.

0   1990-11-03
1   2000-01-01
2          NaT
dtype: datetime64[ns]

The obvious question is how does it actually know if something is a European date or a USA date?

In [65]:
timeser = pd.to_datetime(myser)
timeser                           # with a detail of nanoseconds

0   1990-11-03
1   2000-01-01
2          NaT
dtype: datetime64[ns]

In [66]:
timeser[0], timeser[0].year

(Timestamp('1990-11-03 00:00:00'), 1990)

In [67]:
obvi_euro_date = '31-12-2000'  

# here it's obvious that 31 is the day, not the month

In [68]:
pd.to_datetime(obvi_euro_date)

Timestamp('2000-12-31 00:00:00')

Let's make it not so obvious.

In [72]:
euro_date = '10-12-2000' 

# because this is a European style date, I know that this should be the 10th of December 2000
# but an American might think that this is actually the 12th of October.

In [73]:
pd.to_datetime(euro_date)

# when passing to pandas, 10 is considered month,
# so it's actually, because python's developed by an American developer

Timestamp('2000-10-12 00:00:00')

<a id='dayfirst'></a>

In [74]:
# But if this is actually a European date and we want the day to be first. (we want to say this is the 10th of December)
# just set 'dayfirst' parameter to True

pd.to_datetime(euro_date, dayfirst=True)

Timestamp('2000-12-10 00:00:00')

___

<a id='strcodesformat'></a>
## string codes for formatting

https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

Sometimes dates can have a non standard format, luckily you can always specify to pandas the format. You should also note this could speed up the conversion, so it may be worth doing even if pandas can parse on its own.

In [139]:
style_date = '12--Dec--2000'

In [77]:
pd.to_datetime(style_date, format='%d--%b--%Y') 

# got string codes for formatting from the above mentioned link

Timestamp('2000-12-12 00:00:00')

In [78]:
custom_date = '12th of Dec 2000'

In [79]:
pd.to_datetime(custom_date)

Timestamp('2000-12-12 00:00:00')

___

In [109]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv')

In [110]:
sales

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822
...,...,...
335,2019-12-01,6630
336,2020-01-01,4388
337,2020-02-01,4533
338,2020-03-01,5562


In [111]:
sales['DATE'] # object (string) dtype

0      1992-01-01
1      1992-02-01
2      1992-03-01
3      1992-04-01
4      1992-05-01
          ...    
335    2019-12-01
336    2020-01-01
337    2020-02-01
338    2020-03-01
339    2020-04-01
Name: DATE, Length: 340, dtype: object

In [112]:
sales.iloc[0]['DATE']

'1992-01-01'

In [113]:
type(sales.iloc[0]['DATE'])

str

In [114]:
sales['DATE'] = pd.to_datetime(sales["DATE"])

In [115]:
sales['DATE']

0     1992-01-01
1     1992-02-01
2     1992-03-01
3     1992-04-01
4     1992-05-01
         ...    
335   2019-12-01
336   2020-01-01
337   2020-02-01
338   2020-03-01
339   2020-04-01
Name: DATE, Length: 340, dtype: datetime64[ns]

In [116]:
sales['DATE'][0]

Timestamp('1992-01-01 00:00:00')

<a id='parseDates'></a>

## Parsing dates automatically when reading data

Let me show you a way to try to read a csv file and actually from the very start parse the dates (instead of manually changing it).

In [117]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv', parse_dates=[0])

# inside the list, you pass in the index of the columns that you want to be treated as a date time object.

In [118]:
sales.head()

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822


In [119]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 340 entries, 0 to 339
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   DATE           340 non-null    datetime64[ns]
 1   MRTSSM4453USN  340 non-null    int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 5.4 KB


___

<a id='resample'></a>

__resampling__ or grouping by when the actual time series has the time as the index

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

In [130]:
sales = sales.set_index('DATE')

In [131]:
sales.head()

Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-01-01,1509
1992-02-01,1541
1992-03-01,1597
1992-04-01,1675
1992-05-01,1822


When you call resample, you can actually think of it as just group by. But in this case, because it's a specialized timestamp object, we can say something like group by the year or group by the month.

And if I want to group by the year, I simply say the rule is equal to A.

The **rule** parameter describes the frequency with which to apply the aggregation function (daily, monthly, yearly, etc.)

In [133]:
sales.resample(rule='A')

<pandas.core.resample.DatetimeIndexResampler object at 0x0000022E3EDBEB08>

And now it's essentially like a group by object, it's waiting for the aggregation method.

In [134]:
sales.resample(rule='A').mean()

Unnamed: 0_level_0,MRTSSM4453USN
DATE,Unnamed: 1_level_1
1992-12-31,1807.25
1993-12-31,1794.833333
1994-12-31,1841.75
1995-12-31,1833.916667
1996-12-31,1929.75
1997-12-31,2006.75
1998-12-31,2115.166667
1999-12-31,2206.333333
2000-12-31,2375.583333
2001-12-31,2468.416667


<table style="display: inline-block">
    <caption style="text-align: center"><strong>TIME SERIES OFFSET ALIASES</strong></caption>
<tr><th>ALIAS</th><th>DESCRIPTION</th></tr>
<tr><td>B</td><td>business day frequency</td></tr>
<tr><td>C</td><td>custom business day frequency (experimental)</td></tr>
<tr><td>D</td><td>calendar day frequency</td></tr>
<tr><td>W</td><td>weekly frequency</td></tr>
<tr><td>M</td><td>month end frequency</td></tr>
<tr><td>SM</td><td>semi-month end frequency (15th and end of month)</td></tr>
<tr><td>BM</td><td>business month end frequency</td></tr>
<tr><td>CBM</td><td>custom business month end frequency</td></tr>
<tr><td>MS</td><td>month start frequency</td></tr>
<tr><td>SMS</td><td>semi-month start frequency (1st and 15th)</td></tr>
<tr><td>BMS</td><td>business month start frequency</td></tr>
<tr><td>CBMS</td><td>custom business month start frequency</td></tr>
<tr><td>Q</td><td>quarter end frequency</td></tr>
<tr><td></td><td><font color=white>intentionally left blank</font></td></tr></table>

<table style="display: inline-block; margin-left: 40px">
<caption style="text-align: center"></caption>
<tr><th>ALIAS</th><th>DESCRIPTION</th></tr>
<tr><td>BQ</td><td>business quarter endfrequency</td></tr>
<tr><td>QS</td><td>quarter start frequency</td></tr>
<tr><td>BQS</td><td>business quarter start frequency</td></tr>
<tr><td>A</td><td>year end frequency</td></tr>
<tr><td>BA</td><td>business year end frequency</td></tr>
<tr><td>AS</td><td>year start frequency</td></tr>
<tr><td>BAS</td><td>business year start frequency</td></tr>
<tr><td>BH</td><td>business hour frequency</td></tr>
<tr><td>H</td><td>hourly frequency</td></tr>
<tr><td>T, min</td><td>minutely frequency</td></tr>
<tr><td>S</td><td>secondly frequency</td></tr>
<tr><td>L, ms</td><td>milliseconds</td></tr>
<tr><td>U, us</td><td>microseconds</td></tr>
<tr><td>N</td><td>nanoseconds</td></tr></table>

___

<a id='dt'></a>
## .dt method calls

Once we have a date time object, I can call attributes of it, such as the year, the month, the day, etc. and that's also available in pandas.

And the way you call that is the  __.dt method__

In [135]:
sales = pd.read_csv('RetailSales_BeerWineLiquor.csv', parse_dates=[0])
sales.head()

Unnamed: 0,DATE,MRTSSM4453USN
0,1992-01-01,1509
1,1992-02-01,1541
2,1992-03-01,1597
3,1992-04-01,1675
4,1992-05-01,1822


In [136]:
sales['DATE'].dt.year

0      1992
1      1992
2      1992
3      1992
4      1992
       ... 
335    2019
336    2020
337    2020
338    2020
339    2020
Name: DATE, Length: 340, dtype: int64

In [137]:
sales['DATE'].dt.month

0       1
1       2
2       3
3       4
4       5
       ..
335    12
336     1
337     2
338     3
339     4
Name: DATE, Length: 340, dtype: int64

In [138]:
sales['DATE'].dt.is_leap_year

0       True
1       True
2       True
3       True
4       True
       ...  
335    False
336     True
337     True
338     True
339     True
Name: DATE, Length: 340, dtype: bool