# Datetime, Timedelta, and Period Data Types

In this chapter, we will cover the datetime, timedelta, and period data types, which all relate to time, but are completely independent from one another. This chapter will not go into usage of these data types within the context of a data analysis - that is left to the chapters in the Time Series part. We will focus on learning what these data types are and how they are constructed in an array or a pandas Series.

## Definitions

* **Datetime** - A single moment in time. It contains both a date (year, month, day) and a time (hour, minute, second, part of second) component. Example - June 8, 1989 10:45 AM and 33.456 seconds
* **Timedelta** - An amount of time. It has units of days, hours, minutes, seconds, and parts of a second. It is not attached to any date. Example - 5 hours, 34 minutes, and 2.89 seconds
* **Period** - A span of time. There is a start and end of a period. The units depend on the length of the time period. Example - June 8, 1989 - The start time would be June 8, 1989 at midnight and end time would be just before midnight of June 9, 1989. Another example - February 2, 1900 to February 10, 2020.

## The numpy datetime64 data type

Both numpy and pandas have a 'datetime64' data type. pandas uses numpy's datetime64 data type as a base and builds quite a bit more functionality on top of it. As the name implies, a datetime64 value always uses 64 bits of memory. There is no other size for datetimes other than 64 bits. However, a datetime64 object must have a date or time **unit**. The units can be years, months, weeks, days, hours, minutes, seconds, and parts of a second up to an attosecond (10<sup>-18</sup> of a second).

The unit determines the precision of a datetime value. For instance, if the unit is months then each datetime will have a year and month component. If the unit is hours then each datetime will have year, month, day, and hour components.

The official string representation of datetime64 data types must contain the units placed within brackets following the word 'datetime'. For instance, `'datetime64[s]'` is the official string representation for a datetime64 object with second precision and `'datetime64[ns]'` has nanosecond precision. Visit the [numpy documentation to view all of the possible units][1].

There are a few ways to create a datetime array in numpy. Just as we did previously, we will set the `dtype` parameter to the precise data type we desire. The values passed to `np.array` must be integers. numpy converts these integers to a datetime with the specified unit. It does this by treating 0 as the **unix epoch** which is January 1, 1970 at midnight. In the following example, we create an array from the three integers 10, -120, and 410. The data type is a datetime with month precision (referenced by the string 'M') . The integer 10 corresponds to 10 months after the epoch or November, 1970. The integer -120 corresponds to 120 months (10 years) before the epoch or January 1960 and the last represents 410 months after the epoch.

[1]: https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-units

In [39]:
import numpy as np
import pandas as pd
np.array([10, -120, 410], dtype='datetime64[M]')

array(['1970-11', '1960-01', '2004-03'], dtype='datetime64[M]')

Let's use second precision with the same integers. Now, the first value corresponds to 10 seconds after the epoch. Notice how the last unit in the output is seconds. A capital 'T' separates the date and time components.

In [40]:
np.array([10, -120, 410], dtype='datetime64[s]')

array(['1970-01-01T00:00:10', '1969-12-31T23:58:00',
       '1970-01-01T00:06:50'], dtype='datetime64[s]')

In this final example, hour precision is used with the same data. Notice again in the output that the precision stops at hours.

In [41]:
np.array([10, -120, 410], dtype='datetime64[h]')

array(['1970-01-01T10', '1969-12-27T00', '1970-01-18T02'],
      dtype='datetime64[h]')

### Available integers

You can use all integers that are available to 64-bit integers. Let's print this info out using the `iinfo` function.

In [42]:
np.iinfo('int64')

iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)

### Available time span

The precision of the datetime will limit its available time span. For instance, the very highest datetime possible will be with the maximum 64-bit integer which is 9223372036854775807 (2<sup>63</sup> - 1). Let's convert the min and max 64-bit integers to datetimes with millisecond precision.

In [43]:
np.array([-9223372036854775808, 9223372036854775807], dtype='datetime64[ms]')

array([                         'NaT', '292278994-08-17T07:12:55.807'],
      dtype='datetime64[ms]')

### NaT

Notice that the first value returned as 'NaT' which stands for 'Not a Time'. Instead of using the minimum integer as a datetime, numpy uses it to signal a missing value. A value such as this is usually referred to as a **sentinel value**, a special reserved value for a specific situation. The minimum 64-bit integer is not available to be used as a normal datetime. 

### Finding the real available time span

Let's use the second lowest integer instead to find the actual available timespan.

In [44]:
np.array([-9223372036854775807, 9223372036854775807], dtype='datetime64[ms]')

array(['-292275055-05-16T16:47:04.193',  '292278994-08-17T07:12:55.807'],
      dtype='datetime64[ms]')

When allowing for millisecond precision, we are able to use dates between 292 million years ago to 292 million years in the future.. If we require nanosecond precision, then our available timespan reduces dramatically such that it begins on September 21, 1677 and ends on April 11, 2262. Any datetime outside of that range will not be able to be represented with nanosecond precision.

In [45]:
np.array([-9223372036854775807, 9223372036854775807], dtype='datetime64[ns]')

array(['1677-09-21T00:12:43.145224193', '2262-04-11T23:47:16.854775807'],
      dtype='datetime64[ns]')

## The pandas datetime64 data type

pandas datetime64 data type is very similar to numpy's, but not quite the same. One major difference is with its precision. pandas datetime64 is only available with **nanosecond** precision. Let's create a pandas Series from a numpy datetime array that has month precision.

In [46]:
a = np.array([10, -120, 410], dtype='datetime64[M]')
a

array(['1970-11', '1960-01', '2004-03'], dtype='datetime64[M]')

We take this array and pass it to the Series constructor.

In [47]:
s = pd.Series(a)
s

0   1970-11-01
1   1960-01-01
2   2004-03-01
dtype: datetime64[ns]

Notice that the Series data type has nanosecond precision even though it was created from a numpy array with month precision. You might be wondering why the hour, minute, second, and nanoseconds are not viewable in the above Series output. These components do exist, but pandas intelligently does not output them as showing lots of zeros would dilute the information. You can view the underlying numpy array to verify that the nanosecond precision exists. A nanosecond is one-billionth of a second, which is in the 9th decimal place.

In [48]:
s.values

array(['1970-11-01T00:00:00.000000000', '1960-01-01T00:00:00.000000000',
       '2004-03-01T00:00:00.000000000'], dtype='datetime64[ns]')

### Converting integer Series to datetime

You can convert a Series of integers to a datetime with the `astype` method. Let's first create a Series of integers.

In [49]:
s = pd.Series([0, 4, 30])
s

0     0
1     4
2    30
dtype: int64

We convert the Series to a datetime with year precision. As always, the resulting precision will always be in nanoseconds.

In [50]:
s.astype('datetime64[Y]')

0   1970-01-01
1   1974-01-01
2   2000-01-01
dtype: datetime64[ns]

Use months as the units which again eventually converts to nanoseconds.

In [51]:
s.astype('datetime64[M]')

0   1970-01-01
1   1970-05-01
2   1972-07-01
dtype: datetime64[ns]

### Using strings to construct datetime Series

You can pass strings of the format `'YYYY-MM-DD hh-mm-ss'` (and add parts of second as decimals following seconds) to the Series constructor setting the `dtype` parameter to `'datetime64[ns]'`.

In [52]:
pd.Series(['2001-10-01', '2022-01-09 14:29:33.51'], dtype='datetime64[ns]')

0   2001-10-01 00:00:00.000
1   2022-01-09 14:29:33.510
dtype: datetime64[ns]

It's possible to create a Series with missing values by using either the string 'NaT' or the actual pandas object `pd.NaT`.

In [53]:
pd.Series(['2001-10-01', '2022-01-09 14:29', 'NaT', pd.NaT], dtype='datetime64[ns]')

0   2001-10-01 00:00:00
1   2022-01-09 14:29:00
2                   NaT
3                   NaT
dtype: datetime64[ns]

### More ways to create datetimes

There are more ways to create datetimes, such as with `pd.Timestamp`, `pd.to_datetime`, and `pd.date_range`. These methods will be discussed in the Time Series part of the book.

## The numpy timedelta64 data type

A timedelta refers to an amount of time like 4 days and 24 minutes or 123 milliseconds. In numpy, a timedelta is expressed as an integer along with a unit ranging from years to attoseconds with the same character abbreviation as datetimes. Let's create a numpy array of `timedelta64[D]` values. The `D` represents day precision.

In [54]:
np.array([1, 2, 100], dtype='timedelta64[D]')

array([  1,   2, 100], dtype='timedelta64[D]')

The output for numpy timedeltas will only ever be integers. If you try and use a float to represent a portion of time, then your values will be truncated.

In [55]:
np.array([1.2, 2.7]).astype('timedelta64[Y]')

array([1, 2], dtype='timedelta64[Y]')

Because of this, you'll need to reduce the amount of time to the lowest available unit if you want to use numpy's timedelta. For instance, if you want to create a timedelta of 5 hours, 34 minutes, and 17 seconds, you'd need to convert to seconds (5 * 3600 + 34 * 60 + 17).

In [56]:
5 * 3600 + 34 * 60 + 17

20057

In [57]:
np.array([20057], dtype='timedelta64[s]')

array([20057], dtype='timedelta64[s]')

## The pandas timedelta64 data type

The pandas timedelta64 data type is more intuitive to use than numpy's and has more features. In pandas, all timedeltas have nanosecond precision. You are not given a choice. Below, we convert a Series of integers to timedeltas. We specify the unit as minutes ('m'), but pandas will eventually convert this to nanosecond precision.

In [58]:
pd.Series([10, 50, 423]).astype('timedelta64[m]')

0   0 days 00:10:00
1   0 days 00:50:00
2   0 days 07:03:00
dtype: timedelta64[ns]

Take a look at the value 423. We are telling pandas to treat this as 423 minutes, which is 7 hours, 3 minutes, 0 seconds, and 0 nanoseconds. This is the value that is returned. Let's use the same Series of integers and use hours as the units.

In [59]:
s = pd.Series([10, 50, 423]).astype('timedelta64[h]')
s

0    0 days 10:00:00
1    2 days 02:00:00
2   17 days 15:00:00
dtype: timedelta64[ns]

To prove that pandas uses nanosecond precision, view the underlying array.

In [60]:
s.values

array([  36000000000000,  180000000000000, 1522800000000000],
      dtype='timedelta64[ns]')

Even though numpy allows timedeltas to have year precision, the largest unit used in the representation of timedeltas within pandas is days. The following Series still has nanosecond precision, but the visual representation is shown in days. pandas does not use years in its representation as a year is not a consistent measure of time. One year from now could mean 365 or 366 days. Similarly, months are also not consistent measures of time. Days are the largest unit of time that have a consistent measure (always 24 hours), so this is what pandas has chosen to represent timedeltas that span more than 24 hours.

In [61]:
pd.Series([1, 10, 50]).astype('timedelta64[Y]')

0     365 days
1    3652 days
2   18262 days
dtype: timedelta64[ns]

## The pandas period data type

The period data type is unique to pandas and does not exist in numpy. To construct a Series with data type period, use a list of strings with precision up to the unit you desire. Below, we construct a Series with three strings that have a year and month component, but no further precision. In the constructor, we set the `dtype` parameter to `period[M]` where the 'M' represents month precision.

In [62]:
s = pd.Series(['2000-10', '2002-06', '2010-08'], dtype='period[M]')
s

0    2000-10
1    2002-06
2    2010-08
dtype: period[M]

The first value, `2000-10` represents the entire month of October, 2000. Technically, this is from October 1, 2000 at midnight until one nanosecond before midnight of November 1, 2000. The other values each represent an entire month of time.

It is also possible to convert Series of strings to the period data type with the `astype` method in a similar fashion.

In [63]:
pd.Series(['2000-10', '2002-06', '2010-08']).astype('period[M]')

0    2000-10
1    2002-06
2    2010-08
dtype: period[M]

You can also convert a datetime Series to a period with the `astype` method. Below, we create a two-item Series of datetimes.

In [64]:
s = pd.Series(['2001-10-15', '2002-01-09 14:29:33.51'], dtype='datetime64[ns]')
s

0   2001-10-15 00:00:00.000
1   2002-01-09 14:29:33.510
dtype: datetime64[ns]

Here, we complete the conversion to a period with the specified unit (month in this instance).

In [65]:
s.astype('period[M]')

0    2001-10
1    2002-01
dtype: period[M]

## Datetime, Timedelta, and Period data type summary

![0]

[0]: images/datetime_dtypes.png

## Exercises

### Exercise 1

<span style="color:green; font-size:16px">Create a numpy array of datetimes with year precision for the years 2000, 2010, and 2020. Assign the result to a variable.</span>

### Exercise 2

<span style="color:green; font-size:16px">Staying in numpy, convert the array created in exercise 1 to a data type with second precision and assign the result to a new variable.</span>

### Exercise 3

<span style="color:green; font-size:16px">Staying in numpy, use the `astype` method to return the number of seconds after the epoch for each value from the array created in exercise 2.</span>

### Exercise 4

<span style="color:green; font-size:16px">Use the integers from exercise 3 within the numpy array constructor to get the same result as exercise 2.</span>

### Exercise 5

<span style="color:green; font-size:16px">Construct a Series of integers for the years 2000, 2010, and 2020. Then convert it to datetime with the `astype` method.</span>

### Exercise 6

<span style="color:green; font-size:16px">What month is it 1 million minutes after the unix epoch?</span>

### Exercise 7

<span style="color:green; font-size:16px">Construct a datetime Series using strings with precision down to nanoseconds (9 digits after the decimal).</span>

### Exercise 8

<span style="color:green; font-size:16px">Using only arithmetic operations, find the amount of time 1 million seconds is. Report your answer as 'W days, X hours, Y minutes, Z seconds'.</span>

### Exercise 9

<span style="color:green; font-size:16px">Verify the results of exercise 8 by creating a pandas timedelta Series.</span>

### Exercise 10

<span style="color:green; font-size:16px">Construct a Series with the data type period that has the hour 10 a.m. through 11 a.m. as the time period on January 1st for the years 2019, 2020, and 2021.</span>