### Working with Dates and Times
Topics:
- Python standard datetime library
- Timestamps
- Datetime Index Object
- pd.to_datetime( )
- Create a range of dates
- data reader library
- Date offset object
- Time series offsets
- Time delta object
- Time deltas in a dataset

In [31]:
import pandas as pd

# from python standard library fro date and time
import datetime as dt

### Review of python datetime module


In [32]:
# Create a date object - year, month, day
someday = dt.date(2021, 5, 15)

# click tab to see available mthods on the object
someday.year
someday.month
someday.day

15

In [33]:
# create a datetime object - time is not required, default is midnight
dt.datetime(2021, 5, 15)

datetime.datetime(2021, 5, 15, 0, 0)

In [34]:
# sequnce for adding time is hours, minutes, seconds, etc
# 8:15 am
dt.datetime(2021, 5, 15, 8, 15, 57)

datetime.datetime(2021, 5, 15, 8, 15, 57)

In [35]:
# to make anyhting in the afternoon, you have to use military time - 5:15:57 sec pm , 17hr in pm = 5pm
# use str to make it more human readable format, all methods apply as well
str(dt.datetime(2021, 5, 15, 17, 15, 57))

'2021-05-15 17:15:57'

In [36]:
sometime2 = dt.datetime(2021, 5, 15, 17, 15, 57)

sometime2.year
sometime2.month
sometime2.day
sometime2.hour
sometime2.minute
sometime2.second

57

### Pandas Timestamp object
- Essentailly pandas version Python datetime
- Defaults to midnight for time if not specified
- Using the format of 2021-5-1 or 2021/5/1 or 2021,5,1 doesnt matter, pandas will generate it.

In [37]:
pd.Timestamp('2021-3-31')
pd.Timestamp('2021/3/31')

# with am time
pd.Timestamp('2021-3-31 08:35:15')

# with pm time, converts to mil time
pd.Timestamp('2021-3-31 06:10:15 pm')

Timestamp('2021-03-31 18:10:15')

#### Pandas DateTimeIndex Object
- essentially a colection of pandas timestamps

In [38]:
dates = ['2016-01-02', '2021-05-15', '2009-09-7']

# converts list of strings to dt index object
pd.DatetimeIndex(dates)

DatetimeIndex(['2016-01-02', '2021-05-15', '2009-09-07'], dtype='datetime64[ns]', freq=None)

In [39]:
# view type
type(pd.DatetimeIndex(dates))

pandas.core.indexes.datetimes.DatetimeIndex

In [40]:
# use py date time module to recreate dates var list
dates = [dt.date(2016, 1, 10), dt.date(2020, 1, 1), dt.date(2003, 4, 15)]

pd.DatetimeIndex(dates)

DatetimeIndex(['2016-01-10', '2020-01-01', '2003-04-15'], dtype='datetime64[ns]', freq=None)

In [41]:
# create a series and we will use the Datetime index as the index for this series

dtIndex = pd.DatetimeIndex(dates)

values = [100,200,300]
pd.Series(data = values, index = dtIndex)

2016-01-10    100
2020-01-01    200
2003-04-15    300
dtype: int64

### The pd.to_datetime( ) method
- converts to relevant time, timestamp in pandas

In [42]:
# automatically converts to a pandas timestamp
pd.to_datetime('2001/04/19')

Timestamp('2001-04-19 00:00:00')

In [43]:
# convert date time arg to time stamp
pd.to_datetime(dt.datetime(2015, 1, 1, 14, 35, 20))

Timestamp('2015-01-01 14:35:20')

In [44]:
# pandas can take a list with mutiple formatts as well
pd.to_datetime(['2015-01-03', 'July 4th, 1999', '2014/02/08'])

DatetimeIndex(['2015-01-03', '1999-07-04', '2014-02-08'], dtype='datetime64[ns]', freq=None)

In [45]:
# create a series and convrt it to datetime
times = pd.Series(['2015-01-03', 'July 4th, 1999', '2014/02/08'])

times

0        2015-01-03
1    July 4th, 1999
2        2014/02/08
dtype: object

In [46]:
# convert the series to date time by passing the series variable to pd.to_datetime - formatted as DT
pd.to_datetime(times)

0   2015-01-03
1   1999-07-04
2   2014-02-08
dtype: datetime64[ns]

#### Dealing with errors from bad data when using pd.to_datetime( )

In [47]:
# create a series with a slist and some bad data in it - word and imposible date in February
dates = pd.Series(['2015-01-03', 'July 4th, 1999', 'Hello', '2021-02-31'])
dates

0        2015-01-03
1    July 4th, 1999
2             Hello
3        2021-02-31
dtype: object

In [48]:
# try passing to pd.to_datetime will result in an error, you must use erros = "coerce"
# this turns the bad data into a NaT - not a tdatetime rather outputs null value
pd.to_datetime(dates, errors="coerce")

0   2015-01-03
1   1999-07-04
2          NaT
3          NaT
dtype: datetime64[ns]

### Create a range of Dates with the pd.date_range( ) method pt1
- Using start and end parameter
- Default is D for days

In [50]:
# requires at least 2 paramters, use shift tab to view the docs
# This code generates a date time index between these two dates by the day
times = pd.date_range(start = '2016-01-01', end = '2016-01-10', freq='D')

type(times)

pandas.core.indexes.datetimes.DatetimeIndex

In [51]:
times

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
               '2016-01-09', '2016-01-10'],
              dtype='datetime64[ns]', freq='D')

In [52]:
# 1st item'
times[0]

Timestamp('2016-01-01 00:00:00', freq='D')

In [53]:
# additional use case - get only business days use FREQ="B", THIS EXCLUDES WEEKENDS
pd.date_range(start = '2016-01-01', end = '2016-01-10', freq='B')

DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
               '2016-01-07', '2016-01-08'],
              dtype='datetime64[ns]', freq='B')

### Create a range of Dates with the pd.date_range( ) method pt2


### Create a range of Dates with the pd.date_range( ) method pt3

### .dt Accessor

### Time Delta Object
- uses a measurement of time to capture difference
- can use days, hours, minutes, seconds all at the same time in paramters.

In [56]:
# creating a time delta object and subtract them from one another
time_a = pd.to_datetime('2020-03-31 04:35:16PM')

time_b = pd.to_datetime('2020-03-20 02:15:49AM')

# this method prodcues a time delts object
time_a - time_b

Timedelta('11 days 14:19:27')

In [58]:
# CREATE OUR OWN TIMEDELTS WITH PANDAS!
# You can provide parameter fro days, months etc
# below adds 3 days to the inital date used for time_a which was 2020-03-31 nad ends now with 2020-04-03
time_a + pd.Timedelta(days=3)

Timestamp('2020-04-03 16:35:16')

In [60]:
# TD can parse text as well
pd.Timedelta("6 hours 5 minutes")

Timedelta('0 days 06:05:00')

### Time Deltas in a dataset
- real world applied tech

In [78]:
# import a dataset to use, make sure to parse dates since we are using the dates for the delta
shipping = pd.read_csv('../datasets/ecommerce.csv', index_col="ID", parse_dates=['order_date', 'delivery_date'])

shipping.head(3)

Unnamed: 0_level_0,order_date,delivery_date
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1998-05-24,1999-02-05
2,1992-04-22,1998-03-06
4,1991-02-10,1992-08-26


#### How long did it take to ship for each customer?

In [79]:
# add as a brand new column in the DF
shipping['Delivery_Time'] = shipping['delivery_date'] - shipping['order_date']

shipping.head()

Unnamed: 0_level_0,order_date,delivery_date,Delivery_Time
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,1998-05-24,1999-02-05,257 days
2,1992-04-22,1998-03-06,2144 days
4,1991-02-10,1992-08-26,563 days
5,1992-07-21,1997-11-20,1948 days
7,1993-09-02,1998-06-10,1742 days


In [80]:
# add a new col called expected delivery date
shipping['Expected_Delivery_Date30'] = shipping['order_date'] + pd.Timedelta(days=30)+pd.Timedelta(days=1)

shipping.head()

Unnamed: 0_level_0,order_date,delivery_date,Delivery_Time,Expected_Delivery_Date30
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1998-05-24,1999-02-05,257 days,1998-06-24
2,1992-04-22,1998-03-06,2144 days,1992-05-23
4,1991-02-10,1992-08-26,563 days,1991-03-13
5,1992-07-21,1997-11-20,1948 days,1992-08-21
7,1993-09-02,1998-06-10,1742 days,1993-10-03


In [81]:
# add 60
shipping['Expected_Delivery_Date60'] = shipping['order_date'] + pd.Timedelta(days=60)+pd.Timedelta(days=1)

shipping.head()

Unnamed: 0_level_0,order_date,delivery_date,Delivery_Time,Expected_Delivery_Date30,Expected_Delivery_Date60
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1998-05-24,1999-02-05,257 days,1998-06-24,1998-07-24
2,1992-04-22,1998-03-06,2144 days,1992-05-23,1992-06-22
4,1991-02-10,1992-08-26,563 days,1991-03-13,1991-04-12
5,1992-07-21,1997-11-20,1948 days,1992-08-21,1992-09-20
7,1993-09-02,1998-06-10,1742 days,1993-10-03,1993-11-02


In [83]:
# add 90
shipping['Expected_Delivery_Date90'] = shipping['order_date'] + pd.Timedelta(days=90)+pd.Timedelta(days=1)

shipping.head()

Unnamed: 0_level_0,order_date,delivery_date,Delivery_Time,Expected_Delivery_Date30,Expected_Delivery_Date60,Expected_Delivery_Date90
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,1998-05-24,1999-02-05,257 days,1998-06-24,1998-07-24,1998-08-23
2,1992-04-22,1998-03-06,2144 days,1992-05-23,1992-06-22,1992-07-22
4,1991-02-10,1992-08-26,563 days,1991-03-13,1991-04-12,1991-05-12
5,1992-07-21,1997-11-20,1948 days,1992-08-21,1992-09-20,1992-10-20
7,1993-09-02,1998-06-10,1742 days,1993-10-03,1993-11-02,1993-12-02


In [85]:
# find the worst orders that were > 1 year
mask = shipping['Delivery_Time'] > '365 days'

shipping[mask]

Unnamed: 0_level_0,order_date,delivery_date,Delivery_Time,Expected_Delivery_Date30,Expected_Delivery_Date60,Expected_Delivery_Date90
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2,1992-04-22,1998-03-06,2144 days,1992-05-23,1992-06-22,1992-07-22
4,1991-02-10,1992-08-26,563 days,1991-03-13,1991-04-12,1991-05-12
5,1992-07-21,1997-11-20,1948 days,1992-08-21,1992-09-20,1992-10-20
7,1993-09-02,1998-06-10,1742 days,1993-10-03,1993-11-02,1993-12-02
9,1990-01-25,1994-10-02,1711 days,1990-02-25,1990-03-27,1990-04-26
...,...,...,...,...,...,...
986,1990-12-10,1992-12-16,737 days,1991-01-10,1991-02-09,1991-03-11
990,1991-06-24,1996-02-02,1684 days,1991-07-25,1991-08-24,1991-09-23
991,1991-09-09,1998-03-30,2394 days,1991-10-10,1991-11-09,1991-12-09
993,1990-11-16,1998-04-27,2719 days,1990-12-17,1991-01-16,1991-02-15


In [87]:
# find the longest shipment time
shipping['Delivery_Time'].max()

Timedelta('3583 days 00:00:00')