# Working with dates and times
* Created on: 05/28/2021
* Created by: Michael Monahan

## Module Overview
* Section 1: Dates and Calendars
* Section 2: Combining dates and times
* Section 3: Time zones and daylight savings
* Section 4: Dates and times in Pandas

## Summary
* the `date()` class takes as arguments; year, month, day
* the `date` object has accessors such as `.year`, and methods such as `.weekday()`
* `date` objects can be aggregateed and sorted using `min()`,`max()`,`sort()`
* to calculate aggregates, such as `.first()`,`.min()`,`.mean()`, by group use `.groupby()`
* to get a `timedelta` two dates can be subtracted
* `.isoformat()` or `.strftime()` can be used to chage `date` objects into strings
* the `datetime()` class expands on the `date()` class and allows for hour, minute, second and microsecond
    * if not used, are set to zero by default
* the `.replace()` method permits the replacement of any value in `datetime`
* convert a `timedelta` into an integer with `.total_seconds()` method
* turn dates into strings with `.strftime()` and strings into dates with `.strptime()`
* a `datetime` is "timezone naive" unless a `tzinfo` parameter has been set
* setting `tzinfo` tells a `datetime` how to align/offset itself from UTC
* to shift the date and time to match a new timezone use the `.astimezone()` method
* import `tz` from `dateutil` to access a comprehensive, updated timezone database
* use the `parse_dates` argument when using `pd.read_csv()` to set date time coulmns
    * if setting `parse_dates` does not work try using the `pd.to_datetime()` function   
* use `.tz_localize()` to set a timezone, keeping the date and time the same
* use `.tz_convert()` to change the date and time to match a new timezone

## Import the required librarys

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
import statsmodels as sm
import seaborn as sns
import numpy as np

## Dates in Python
### Why date classes are needed
* How to:
    * Figure out how many days elapsed?
    * Verify the dates are in temporal order?
    * Know which day of the week?
    * Filter data between specified dates?

In [13]:
# Example
two_events = ['9/5/2016','6/22/2020']

In [17]:
# Create date objects

# Import date
from datetime import date

# Create dates
two_event_dates = [date(2016,9,5), date(2020,6,22)]
print(two_event_dates[0].year)
print(two_event_dates[0].month)
print(two_event_dates[0].day)
print(two_event_dates[0].weekday())

2016
9
5
0


## Math with Dates

In [25]:
# Example numbers
a = 11
b = 14
l = [a,b]
# Find the smallest number in the list
print(min(l))

# Subtract two numbers
print(b-a)

# Add 4 to a number
print(b + 4)

11
3
18


In [33]:
# Now with dates
d1 = date(2020,6,6)
d2 = date(2021,6,6)
l2 = [d1,d2]
print(d2 - d1)
delta = d2 - d1
print(delta.days)

# Create a date object for May 9th, 2007
start = date(2007, 5, 9)

# Create a date object for December 13th, 2007
end = date(2007, 12, 13)

# Subtract the two dates and print the number of days
print((end - start).days)

365 days, 0:00:00
365
218


In [32]:
# Import timedelta
from datetime import timedelta
# Create a 29 date timedelta
td = timedelta(days = 30)
print(d1 + td)

2020-07-06


## Formatting dates

In [34]:
# Turning dates into strings

# Example date 
d = date(2011,7,22)
# ISO 8601 format YYYY-MM-DD
print(d)

2011-07-22


In [35]:
# Express the date in ISO 8601 format and put into a list
print([d.isoformat()])

['2011-07-22']


In [43]:
# strftime formatting

# Example date
d = date(2021,7,22)
print(d.strftime('%Y'))

# To format a string with additional text in it
print(d.strftime('The current year is %Y'))

# Change the formatting
print(d.strftime('%Y/%m/%d'))
print(d.strftime('%d %B %Y'))
print(d.strftime('%d-%b-%y'))

2021
The current year is 2021
2021/07/22
22 July 2021
22-Jul-21


## Adding time to the date

In [49]:
# Import datetime
from datetime import datetime

# Create a datetime object
dt = datetime(2017, 10, 1, 15, 26, 26)

# Print the results in ISO 8601 format
print(dt.isoformat())

2017-10-01T15:26:26


In [48]:
dt_hr  = dt.replace(minute = 0, second = 0, microsecond = 0)
print(dt_hr)

2021-06-06 10:00:00


## Printing and parsing datetimes

In [52]:
# Printing datetimes
dt = datetime(2017, 12, 30, 15, 19, 13)
print(dt.strftime('%Y-%m-%d'))
print(dt.strftime('%Y-%m-%d %H:%M:%S'))
print(dt.isoformat())

2017-12-30
2017-12-30 15:19:13
2017-12-30T15:19:13


In [110]:
# Parsing datetimes
dt = datetime\
    .strptime('12/30/2017 15:19:13',
                      '%m/%d/%Y %H:%M:%S')
print(type(dt))
print(dt)

<class 'datetime.datetime'>
2017-12-30 15:19:13


In [75]:
# Parsing a timestamp
ts = 2147483649.0
# Convert to datetime and print
print(datetime.fromtimestamp(ts))

2038-01-18 21:14:09


## Working with durations

In [78]:
# Create example datetimes
start = datetime(2017,10,8,23,46,47)
end = datetime(2017,10,9,0,10,57)

# Subtract datetimes to create time delta
duration = end - start
print(duration)
print(duration.total_seconds())

0:24:10
1450.0


## UTC offsets

In [81]:
# Import relevant classes
from datetime import datetime, timedelta, timezone

# US Eastern Standard time zone 
ET = timezone(timedelta(hours=-5))
CT = timezone(timedelta(hours=-6))

# Timezone-specific datetime
dt = datetime(2017, 12, 30, 15, 19, 13, tzinfo = CT)
print(dt)

2017-12-30 15:19:13-06:00


In [86]:
# Access the TZ database
from dateutil import tz

# Eastern US time
ct = tz.gettz('America/Chicago')
dt = datetime(2017, 12, 30, 15, 19, 13, tzinfo = ct)
print(dt)

2017-12-30 15:19:13-06:00


In [94]:
## Daylight Savings time
spring_ahead_159am = datetime(2020,3,12,1,59,59)
spring_ahead_159am.isoformat()

'2020-03-12T01:59:59'

In [95]:
spring_ahead_3am = datetime(2020,3,12,3,0,0)
spring_ahead_3am.isoformat()

'2020-03-12T03:00:00'

In [96]:
(spring_ahead_3am-spring_ahead_159am).total_seconds()

3601.0

In [97]:
from datetime import timezone, timedelta

EST = timezone(timedelta(hours=-5))
EDT = timezone(timedelta(hours=-4))

In [98]:
spring_ahead_159am = spring_ahead_159am.replace(tzinfo = EST)
spring_ahead_159am.isoformat()

'2020-03-12T01:59:59-05:00'

In [99]:
spring_ahead_159am = spring_ahead_159am.replace(tzinfo = EDT)
spring_ahead_159am.isoformat()

'2020-03-12T01:59:59-04:00'

In [102]:
# Daylight savings time with dateutil
# Import tz
from dateutil import tz

# Create eastern timezone
eastern = tz.gettz('America/New_York')
spring_ahead_159am = datetime(2020,3,12,1,59,59, tzinfo = eastern)
spring_ahead_3am = datetime(2020,3,12,3,0,0, tzinfo = eastern)

### How many hours elapsed around daylight saving?

In [103]:
# Import datetime, timedelta, tz, timezone
from datetime import datetime, timedelta, timezone
from dateutil import tz

# Start on March 12, 2017, midnight, then add 6 hours
start = datetime(2017, 3, 12, tzinfo = tz.gettz('America/New_York'))
end = start + timedelta(hours=6)
print(start.isoformat() + " to " + end.isoformat())

# How many hours have elapsed?
print((end - start).total_seconds()/(60*60))

# What if we move to UTC?
print((end.astimezone(timezone.utc) - start.astimezone(timezone.utc))\
      .total_seconds()/(60*60))

2017-03-12T00:00:00-05:00 to 2017-03-12T06:00:00-04:00
6.0
5.0


### Daylight Saving rules are complicated: they're different in different places, they change over time, and they usually start on a Sunday (and so they move around the calendar).
* For example, in the United Kingdom, as of the time this lesson was written, Daylight Saving begins on the last Sunday in March. Let's look at the UTC offset for March 29, at midnight, for the years 2000 to 2010

In [108]:
# Import datetime and tz
from datetime import datetime
from dateutil import tz

# Create starting date
dt = datetime(2000, 3, 29, tzinfo = tz.gettz('Europe/London'))

# Loop over the dates, replacing the year, and print the ISO timestamp
for y in range(2000, 2011):
  print(dt.replace(year=y)\
        .isoformat())

2000-03-29T00:00:00+01:00
2001-03-29T00:00:00+01:00
2002-03-29T00:00:00+00:00
2003-03-29T00:00:00+00:00
2004-03-29T00:00:00+01:00
2005-03-29T00:00:00+01:00
2006-03-29T00:00:00+01:00
2007-03-29T00:00:00+01:00
2008-03-29T00:00:00+00:00
2009-03-29T00:00:00+00:00
2010-03-29T00:00:00+01:00


### Ending Daylight Savings Time

In [None]:
eastern = tz.gettz('US/Eastern')
# 2017-11-05 01:00:00
first_1am = datetime(2017,11,5,1,0,0,
                     tzinfo = eastern)
tz.datetime_ambiguous(first_1am)

second_1am = datetime(2017,11,5,1,0,0,
                     tzinfo = eastern)
second_1am = tz.enfold(second_1am)

(first_1am - second_1am).total_seconds()

In [None]:
first_1am = first_1am.astimezone(tz.UTC)
second_1am = second_1am.astimezone(tz.UTC)
(second_1am - first_1am).total_seconds()

### Reading date and time with pandas
* A simple Pandas example
```python 
# Load Pandas
import pandas as pd
# Import the data 
data = pd.read_csv('file_name.csv')
# Look at the first three rows of data
print(data.head(3))
```

### Loading datetimes with parse_dates

```python
# Import file and have pandas parse the dates
data = pd.read_csv('file_name.csv',
                   parse_dates = ['Date_column'])
```
```python
#Or:
data['Date_column'] = pd.to_datetime(data['Date_column'],
                                     format = '%Y-%m-%d %H:%M:%S')
```
```python
# Select date for second row
data['Date_column'].iloc[2]
```

### Summarizing time data in Pandas

```python
# Average time
data['column_duration'].mean()
```
```python
# Total time
data['column_duration'].sum()
```
```python
# Percent of time
data['column_duration'].sum() / timedelta(days = 30)
```
```python
# Summarize by grouping
data['interesting_coulmn'].value_counts()
```
```python
# Percent time by grouped column
data['interesting_column'].value_counts() / len(data)
```
```python
# Resampling dates: 'M' for months, 'D' for days
data.resample('M', on = 'date_column')['duration_column'].mean()
```
```python
# Size per group
data.groupby('interesting_column').size()
```
```python
# First value per group
data.groupby('interesting_column').first()
```

### Timezones in Pandas

```python
# Try to set a timezone in Pandas...
data['Date_column'] = data['Date_column']\
    .dt.tz_localize('America/Chicago')
# May result in an error.  Try using the 'ambiguous' argument
```

```python
# To handle ambiguous times
data['Date_column'] = data['Date_column']\
    dt.localize('America/Chicago', ambiguous = 'NaT') #Not a Time
```

* Other datetime operations
```python
# Year of first five rows
data['Start date']\
    .head()\
    .dt.year
```
```python
# To see the weekdays
data['Date column']\
    .head()\
    .dt.weekday_name
```
```python
# To shift a date column and pad missing value with 'NaT'
data['Date_column']\
    .shift(1)\
    .head()
```