# Dates and times in Python

In addition to common data types like strings, integers and booleans, Python also has date and time data types. You'll probably come across these most often via the built-in [`datetime`](https://docs.python.org/3/library/datetime.html) module, so that's what we'll start with in this notebook.

Here's what we'll cover in this notebook:
- [Creating new dates and times](#Creating-new-dates-and-times)
- [Parsing dates and times from text](#Parsing-dates-and-times-from-text)
- [Formatting dates as text](#Formatting-dates-as-text)
- [Calculating the difference between two datetimes](#Calculating-the-difference-between-two-datetimes)
- [Working with dates in pandas](#Working-with-dates-in-pandas)

Let's start by importing the `datetime` object from the `datetime` module -- this will allow us to create dates _and_ times tied to a particular date, which is a common use.

In [2]:
from datetime import datetime

### Creating new dates and times

The `datetime` object expects arguments in this order: year, month, day, hour, minute, second, microsecond, and `tzinfo`, a keyword argument for passing in timezone information. Let's create a date object for Oct. 7, 2018.

In [3]:
our_date = datetime(2018, 10, 7)

In [4]:
print(our_date)

2018-10-07 00:00:00


You can access attributes of this date now, like `year` and `month`:

In [34]:
our_date.year

2018

In [35]:
our_date.month

10

We could also make a specific time on that day -- say, 1:30 p.m.:

In [5]:
our_datetime = datetime(2018, 10, 7, 13, 30)

In [6]:
print(our_datetime)

2018-10-07 13:30:00


We could also make it 1:30 p.m. _EST_:

(Dealing with timezones in Python can be a huge pain, even for uncomplicated data, so it's usually easier to use a third-party library like [`maya`](https://github.com/kennethreitz/maya) or [`pytz`](http://pytz.sourceforge.net/) -- which we'll use here -- instead of manually calculating timezone offsets with a `datetime.timezone` object.)

In [8]:
import pytz

In [12]:
est = pytz.timezone('America/New_York')

In [None]:
# you can get a list of all available timezones by running this cell
pytz.all_timezones

In [14]:
our_datetime_with_tz = datetime(2018, 10, 7, 13, 30, tzinfo=est)

In [15]:
print(our_datetime_with_tz)

2018-10-07 13:30:00-04:56


If you need to get the datetime of _now_ -- i.e., when the script is run -- you can use the handy method `now()`.

In [23]:
datetime.now()

datetime.datetime(2018, 8, 13, 14, 15, 45, 526915)

### Parsing dates and times from text

Let's say we've got a list of dates stored as strings (a common thing), and we want to make them into dates for future analysis:

In [17]:
our_dates = [
    '2018-09-10',
    '2018-10-30',
    '2017-03-13',
    '2000-01-02'
]

We can use datetime's `strptime` method for this. It expects two arguments:
- The string to parse into a date
- The _pattern_ of the dates

In this case, the pattern is: four-digit year, dash, two-digit month, dash, two-digit day. According to the little [mini-language of directives](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) used to represent each piece of a date, that translates to `'%Y-%m-%d'`.

I can never remember these, so I have [strftime.org](http://strftime.org/) bookmarked as a reference, and I check it _all the time_.

In [24]:
for d in our_dates:
    native_date = datetime.strptime(d, '%Y-%m-%d')
    print(native_date, type(native_date))

2018-09-10 00:00:00 <class 'datetime.datetime'>
2018-10-30 00:00:00 <class 'datetime.datetime'>
2017-03-13 00:00:00 <class 'datetime.datetime'>
2000-01-02 00:00:00 <class 'datetime.datetime'>


### Formatting dates as text

You can use the `strftime` method to turn date objects into strings, formatted however you like and using the same mini-language of date directives we used with `strptime`. For these examples, we'll use the `our_date` variable.

In [25]:
our_date

datetime.datetime(2018, 10, 7, 0, 0)

In [26]:
our_date.strftime('%Y-%m-%d')

'2018-10-07'

In [27]:
our_date.strftime('%m/%d/%Y')

'10/07/2018'

In [33]:
our_date.strftime('It is %A, %B %-d, Year of our Lord %Y')

'It is Sunday, October 7, Year of our Lord 2018'

### Calculating the difference between two datetimes

How old was each victim when they died? For each criminal case, how much time elapsed between the indictment and the sentencing? On average, how soon after the market-moving tweet did investors begin dumping stock in the company?

"Date diff" questions like this show up all the time in newsrooms. If you're using Python to work with your data, you can use basic math to yield a [`datetime.timedelta`](https://docs.python.org/3/library/datetime.html#timedelta-objects) object and calculate the difference between two datetimes.

(For more complex data, using a third-party library like [`dateutil`](https://github.com/dateutil/dateutil) would be worth your while.)

But let's start with a simple example: How many minutes elapsed between two dates?

In [65]:
datetime1 = datetime(2018, 10, 7, 13, 30)
datetime2 = datetime(2018, 10, 7, 17, 45)

In [66]:
datetime2 - datetime1

datetime.timedelta(seconds=15300)

We can access the `seconds` attribute of the `timedelta` object that gets returned from this math problem:

In [67]:
elapsed_time = datetime2 - datetime1
print(elapsed_time.seconds)

15300


... and if we want minutes, just divide by 60.

In [68]:
print(elapsed_time.seconds / 60)

255.0


`timedelta` objects are also useful for "what was the date 160 days ago"-type problems:

In [70]:
from datetime import timedelta

rn = datetime.now()
date_248_days_ago = rn - timedelta(days=248)

print(date_248_days_ago)

2017-12-08 14:46:51.410051


In [71]:
# two weeks ago
rn - timedelta(weeks=2)

datetime.datetime(2018, 7, 30, 14, 46, 51, 410051)

In [72]:
# 89382 seconds ago
rn - timedelta(seconds=89382)

datetime.datetime(2018, 8, 12, 13, 57, 9, 410051)

Date math can get a little complicated (time zones! leap years! birthdays!), but if you get stuck it's probable that someone on the Internet has solved your problem already.

### Working with dates in pandas

Let's take a look at a couple of things you might want to do when working with dates in pandas: Parsing values as dates when you _import_ the data into a dataframe, and coercing _existing_ data to dates.

Let's import pandas and load up some congressional junkets data (`../data/congress_junkets.csv`) with date values in two of the columns (`DepartureDate` and `ReturnDate`).

We're going to specify the [`parse_dates`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) argument when we call the `read_csv()` function -- we'll hand this keyword argument a _list_ of columns to parse as dates.

In [74]:
import pandas as pd

In [77]:
df = pd.read_csv('../data/congress_junkets.csv',
                 parse_dates=['DepartureDate', 'ReturnDate'])

In [78]:
df.head()

Unnamed: 0,DocID,FilerName,MemberName,State,District,Year,Destination,FilingType,DepartureDate,ReturnDate,TravelSponsor
0,500005076,Bobby Cornett,"Franks, Trent",AZ,8.0,2011,"Las Vegas, NV",Original,2011-01-07,2011-01-09,Consumer Electronics Association
1,500005077,Michael Strittmatter,"Franks, Trent",AZ,8.0,2011,"Las Vegas, NV",Original,2011-01-07,2011-01-09,CEA Leaders in Technology
2,500005081,Diane Rinaldo,"Rogers, Mike",AL,3.0,2011,"Las Vegas, NV",Original,2011-01-06,2011-01-08,Consumer Electronics Association
3,500005082,Kenneth DeGraff,"Doyle, Michael",PA,14.0,2011,"Las Vegas, NV",Original,2011-01-06,2011-01-09,Consumer Electronics Association
4,500005083,Michael Ryan Clough,"Lofgren, Zoe",CA,19.0,2011,"Las Vegas, NV",Original,2011-01-06,2011-01-08,Consumer Electronics Association


This method can be slow if you have a lot of data; one way to speed it up is to supply pandas with the correct date format using the specification language mentioned above ([here's an example from StackOverflow](https://stackoverflow.com/questions/23797491/parse-dates-in-pandas)).

We'll use a [lambda expression](Functions.ipynb#Lambda-expressions) here for brevity, but you can also define a new function and pass that in as the `date_parser` argument, as well.

In [79]:
df = pd.read_csv('../data/congress_junkets.csv',
                 parse_dates=['DepartureDate', 'ReturnDate'],
                 date_parser=lambda x: pd.datetime.strptime(x, '%m/%d/%Y'))

Much quicker. We can verify that the column is now a date column by checking the data frame's `dtypes` attribute:

In [80]:
df.dtypes

DocID                     int64
FilerName                object
MemberName               object
State                    object
District                float64
Year                      int64
Destination              object
FilingType               object
DepartureDate    datetime64[ns]
ReturnDate       datetime64[ns]
TravelSponsor            object
dtype: object

Sometimes, for Reasons™️, it makes more sense to do the conversion after you've loaded your data. Let's use the same data but do the conversion _after_ it's in the dataframe.

In [81]:
df = pd.read_csv('../data/congress_junkets.csv')

In [82]:
df.head()

Unnamed: 0,DocID,FilerName,MemberName,State,District,Year,Destination,FilingType,DepartureDate,ReturnDate,TravelSponsor
0,500005076,Bobby Cornett,"Franks, Trent",AZ,8.0,2011,"Las Vegas, NV",Original,1/7/2011,1/9/2011,Consumer Electronics Association
1,500005077,Michael Strittmatter,"Franks, Trent",AZ,8.0,2011,"Las Vegas, NV",Original,1/7/2011,1/9/2011,CEA Leaders in Technology
2,500005081,Diane Rinaldo,"Rogers, Mike",AL,3.0,2011,"Las Vegas, NV",Original,1/6/2011,1/8/2011,Consumer Electronics Association
3,500005082,Kenneth DeGraff,"Doyle, Michael",PA,14.0,2011,"Las Vegas, NV",Original,1/6/2011,1/9/2011,Consumer Electronics Association
4,500005083,Michael Ryan Clough,"Lofgren, Zoe",CA,19.0,2011,"Las Vegas, NV",Original,1/6/2011,1/8/2011,Consumer Electronics Association


To do this, we'll create a new column for each date with the `to_datetime()` method. We'll hand it three things:
- The column to convert to a datetime
- The expected `format` of the dates (m/d/y, in this case)
- What to do if the parser runs into an error -- instead of throwing an error, we'll `coerce` (errors will become `NaN`)

In [93]:
df['departure_date_new'] = pd.to_datetime(df['DepartureDate'],
                                          format='%m/%d/%Y',
                                          errors='coerce')

df['return_date_new'] = pd.to_datetime(df['ReturnDate'],
                                       format='%m/%d/%Y',
                                       errors='coerce')

In [91]:
df.head()

Unnamed: 0,DocID,FilerName,MemberName,State,District,Year,Destination,FilingType,DepartureDate,ReturnDate,TravelSponsor,departure_date_new,return_date_new
0,500005076,Bobby Cornett,"Franks, Trent",AZ,8.0,2011,"Las Vegas, NV",Original,1/7/2011,1/9/2011,Consumer Electronics Association,2011-01-07,2011-01-09
1,500005077,Michael Strittmatter,"Franks, Trent",AZ,8.0,2011,"Las Vegas, NV",Original,1/7/2011,1/9/2011,CEA Leaders in Technology,2011-01-07,2011-01-09
2,500005081,Diane Rinaldo,"Rogers, Mike",AL,3.0,2011,"Las Vegas, NV",Original,1/6/2011,1/8/2011,Consumer Electronics Association,2011-01-06,2011-01-08
3,500005082,Kenneth DeGraff,"Doyle, Michael",PA,14.0,2011,"Las Vegas, NV",Original,1/6/2011,1/9/2011,Consumer Electronics Association,2011-01-06,2011-01-09
4,500005083,Michael Ryan Clough,"Lofgren, Zoe",CA,19.0,2011,"Las Vegas, NV",Original,1/6/2011,1/8/2011,Consumer Electronics Association,2011-01-06,2011-01-08
