In [1]:
import pandas as pd
import numpy as np

# Handling Date and Time

Pandas is chock full of methods that allow you to handle and manipulate time and date we will cover the following
* datetime foundations in pure Python and Numpy
* the pandas Timestamp and DatetimeIndex
* up- and down-sampling with `resample()` and `asfreq()`
* interpolation and aggregation
* rolloing windows and moving averages

Time series data is handled as regular Series and Dataframe objects, which is good because our previous knowledge will easy transfer.

## The Python *datetime* Module

Python has a built-in module called `datetime` that allows you to work with dates and times. However it is not in the default namespace, so it has to be imported.

* https://docs.python.org/3/library/datetime.html

We'll be covering:
* the "what" and "why" of the datetime module
* the `date` class
* the `time` class
* the `datetime` class

Let's start by importing the date and time classes from the datetime module.


In [2]:
from datetime import date, time

Using date, we can create Python objects that create date information. Date information is characterized by year - month - date

In [3]:
date_A = date(2020, 4, 25)

In [4]:
date_A

datetime.date(2020, 4, 25)

In [5]:
type(date_A)

datetime.date

Why would we want to make such an objects, as opposed to storing dates in, for example, a plain string? The biggest reason is that the `date` object is much more easily manipulated, combined, changed, or updated with a suite of methods designed specifically for working with dates. This unlocks a lot of functionality that would be difficult to replicate using just strings, because strings have no date-specific methods.

Let's check out some of the **date attributes**, such as `.day` and `.year`

In [6]:
date_A.day

25

In [7]:
date_A.year

2020

Now let's take a look at the time class, which stores date in hour - minute - second - microsecond
* If you omit one or more of these attributes, Python will default them to 0.

In [8]:
time_A = time(4, 30, 12, 943212)

In [9]:
time_A

datetime.time(4, 30, 12, 943212)

In [10]:
time_B = time(6)

In [11]:
time_B

datetime.time(6, 0)

Time also has dedicated attributes:

In [12]:
time_B.microsecond

0

Both the **date** and **time** classes have methods associated with them as well. One of them, `isoformat()`, will return a string representing the date in ISO 8601 format, which is YYYY-MM-DD

In [13]:
time_A.isoformat()

'04:30:12.943212'

The datetime module also has a `datetime` class

In [14]:
from datetime import datetime

The `datetime` class is a standalone container that combines and stores attributes from both date and time objects, thus storing information from both the date and the time.

In [15]:
dt_A = datetime(2020, 4, 25, 19, 1, 23, 123123)

In [16]:
dt_A

datetime.datetime(2020, 4, 25, 19, 1, 23, 123123)

We can also ask Python to construct a new datetime object capturing a precise point in time. One example is the `.now()` method, which generates a datetime object for that precise moment.

In [17]:
datetime.now()

datetime.datetime(2021, 11, 16, 0, 58, 40, 243272)

We can extract attributes from datetime objects:

In [18]:
dt_A.year

2020

In [19]:
dt_A.microsecond

123123

We can also print it all, giving us a nice visual of the datetime.

In [20]:
print(dt_A)

2020-04-25 19:01:23.123123


In [21]:
print(datetime.now())

2021-11-16 00:58:40.346647


## Parsing Dates from Text with `strptime()`

Oftentimes we will not have datetime objects already made for us. Rather, we will have to work with raw text (strings) and will need to tease out or convert this text into dates. 

So, what do we do if we don't have date objects? It turns out that the `datetime` module has a method dedicated to extracting dates out of text called `strptime` (or string parse time)
* https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

It takes two arguments: the string containing that datetime that we want to parse, and the format that the date-like or time-like text is going to be expressed in. The latter makes use of format codes for the individual items that can be viewed in the link above.

In [22]:
datetime.strptime('2019-10-31','%Y-%m-%d')

datetime.datetime(2019, 10, 31, 0, 0)

Notice how this method converts our text into a datetime object that we can now work with. Let's save it as a variable.

In [23]:
dt_B = datetime.strptime('2019-10-31','%Y-%m-%d')

In [24]:
dt_B.year

2019

In [25]:
dt_B.isoformat()

'2019-10-31T00:00:00'

Going back to the conversion step, we defined the structure of the date using the **format codes**. Don't be scared by these, they are simply well-defined codes for how to describe the different datetime elements.

Let's try another example, with a future date and completely different formatting.

In [26]:
try_this = "jan 20 2090 4pm"

Our previous format code will not work when using `datetime`. The format of the format codes is not consistent with the datetime implied by our string.

In [28]:
## Results in a ValueError: time data 'jan 20 2090 4pm' does not match format '%Y-%m-%d'
# datetime.strptime(try_this, '%Y-%m-%d')

This means we need to define a new format structure using the format codes. The task now is to scan the format codes table and determine which ones to use for our datetime string.

Important note: the spacing in the format string is critical. It must mirror the spacing in the string, and must account for any extraneous or excess characters.

In [30]:
datetime.strptime(try_this, '%b %d %Y %I%p')

datetime.datetime(2090, 1, 20, 16, 0)

And there you have it, we've successfully parsed dates using the `strptime` module.

## An Even Better Way To Parse Dates and Times: `dateutil`

The `strptime` method works great, but it's a bit laborious because one has to utilize those pesky format codes. There are alternatives to this, and one of the most popular is `dateutil`. It is external to the Python standard library, but it is included in most data science distributions, such as Anaconda and the Colaboratory.

In [31]:
pip show python-dateutil

Name: python-dateutil
Version: 2.8.2
Summary: Extensions to the standard Python datetime module
Home-page: https://github.com/dateutil/dateutil
Author: Gustavo Niemeyer
Author-email: gustavo@niemeyer.net
License: Dual License
Location: /usr/local/lib/python3.7/dist-packages
Requires: six
Required-by: pandas, matplotlib, LunarCalendar, kaggle, jupyter-client, holidays, fbprophet, bokeh


If for some reason you need to install it, you can use the following command (colab already has it installed, so it will not be installed)

In [32]:
pip install python-dateutil



Let's get to using it. The heavy lifting will be done with the `parser` class, which we have to import from `dateutil`.

In [33]:
from dateutil import parser

The beauty of using `parser` is that we can easily convert text to dates *without* needing to define the structure of the date.

In [34]:
parser.parse('Jan 21st 1990')

datetime.datetime(1990, 1, 21, 0, 0)

That's incredible. Let's try another example with something absolutely insane.

In [35]:
parser.parse('22 apriL 2068 at 4pm and 17 minutes 20 seconds')

datetime.datetime(2068, 4, 22, 16, 17, 20)

Are you freaking kidding me?! That's awesome. The dateutil parser has a very forgiving mechanism of action and is quite powerful. However, it's always a good idea however to double-check and make sure the parser did what you wanted it to do.

The `dateutil` package has many other methods as well, but the instructor finds that `parser` is among the most useful.
* https://dateutil.readthedocs.io/en/stable/parser.html

## Going from Datetime to String using `strftime()`

We've seen how to convert dates and times from strings to datetime objects using `strptime()` and dateutil's `parser.parse()`. But how do we go the other way around and convert our datetime objects into strings? In this lecture we'll learn exactly how.

The first method is `strftime()`, which works very similarly to `strptime()` but does the exact opposite. It takes a datetime object and converts it into a string. 
* https://docs.python.org/3/library/datetime.html#datetime.date.strftime

To illustrate, let's start with a new datetime object where we capture the exact moment in time when we execute it.

In [36]:
dt = datetime.now()

In [37]:
dt

datetime.datetime(2021, 11, 16, 2, 8, 41, 223995)

Now we'll use `strftime()` and format codes to create a string of our design. All you need to do is stick your format code into the string template that you provide, and the method takes care of the rest.

Suppose we just want to get the year into a string. We can do that.

In [38]:
dt.strftime('%Y')

'2021'

We have full control of how we structure our string. We can throw in a month, day, and extra text if we so choose.

In [39]:
dt.strftime('Year: %Y; Month: %m; Day: %d')

'Year: 2021; Month: 11; Day: 16'

Let's do one better and try to represent our datetime fully with a single format code. For this we can use `%c`, which captures the day, month, year, and time.

In [40]:
dt.strftime('%c')

'Tue Nov 16 02:08:41 2021'

That's great - we got a very nicely formatted string with minimal formatting code.

There is another way of converting dates to custom string representation that works a bit differently. We start with a string that embeds a format code, such that the formatted datetime string is now embedded within our wider string. We can do this with old-school `.format()` string formatting.

In [44]:
"My date is {:%c}".format(dt)

'My date is Tue Nov 16 02:08:41 2021'

## Datetimes with **Numpy**

Sometimes we need to manipulate large arrays of dates and times. The pure Python approach is not appropriate for this. Numpy has created a special datatype, `datetime64`, that encodes and stores datetime information more efficiently, allowing data scientists to conduct large-scale operations on dates.
* https://numpy.org/doc/stable/reference/arrays.datetime.html

Let's start by building a datetime64 object. As the name implies, Numpy encodes these values as 64-bit integers.


In [47]:
np.datetime64('2020-03-04')

numpy.datetime64('2020-03-04')

And assign it to a variable.

In [48]:
a = np.datetime64('2020-03-04')

And we'll create a second datetime64 variable capturing the current date and time.

In [49]:
b = np.datetime64(datetime.now())

In [50]:
b

numpy.datetime64('2021-11-16T02:24:06.386045')

Now that we have these two datetime64 objects, we can conduct operations on them.

We can do things like add days to a date.

In [51]:
a + 10

numpy.datetime64('2020-03-14')

What if we add 10 to the other date? What will it do?

In [52]:
b + 10

numpy.datetime64('2021-11-16T02:24:06.386055')

Interestingly, this command added 10 the microseconds counter. The reason this is happening is that our two numpy datetime64s have two fundamentally different time units. "a" has a unit of day, as the lowest level of precision encoded. So when we add 10, we get 10 more days. But "b" has the microsecond time unit as the lowest level of precision, so when we add 10, we get 10 more microseconds.

we can change the time unit by rescaling the numpy datetime.

In [53]:
np.datetime64(b, "D")

numpy.datetime64('2021-11-16')

Now when we add 10, we'll get 10 more days instead of 10 more microseconds.

In [55]:
np.datetime64(b, "D") + 10

numpy.datetime64('2021-11-26')

Keep in mind that the numpy datetime64 type is fixed length at 64 bits, but by changing the time unit, we change the precision of our dates. When we lower the precision as we did for "b" above, we increase the timestamp. In other words, the range of dates that we can possibly increase in numpy increases. There is a tradeoff between precision and span.

datetime64 objects allow us to perform efficient vectorized operations on a collection of dates - **vectorized ops**. Let's create an array of dates.

In [58]:
np.array([
          '2019-02-20',
          '2019-06-20',
          '2090-03-23'
], dtype = np.datetime64)

array(['2019-02-20', '2019-06-20', '2090-03-23'], dtype='datetime64[D]')

Let's analyze the above. We create a numpy array by passing in a Python list of dates. By specifying the dtype as `np.datetime64`, the method parses those dates as `datetime64` objects. Notice also how the time unit is day (the capitalized D). 

Let's assign this to a variable that we can work with.

In [59]:
dates = np.array([
          '2019-02-20',
          '2019-06-20',
          '2090-03-23'
], dtype = np.datetime64)

Now we can do things like subtract dates from each day in the array very quickly and efficiently.

In [60]:
dates - 10

array(['2019-02-10', '2019-06-10', '2090-03-13'], dtype='datetime64[D]')

We can do more complicated calculations, like getting the business day that falls 10 days before each of the dates in the array. This specific task can be accomplished using the `numpy.busday_offset()` method.
* https://numpy.org/doc/stable/reference/generated/numpy.busday_offset.html#numpy.busday_offset

In [61]:
np.busday_offset(dates, offsets = -10, roll = 'backward')

array(['2019-02-06', '2019-06-06', '2090-03-09'], dtype='datetime64[D]')

We can spend a whole course talking about how Numpy handles datetimes, but we're exploring Numpy as a stepping stone to understanding how Pandas handles datetimes. The key takeaway is that Numpy has very powerful methods that make handling datetimes much easier than using pure Python methods, as great as they are.