# Working with Time Series

Pandas was developed in the context of financial modeling, so as you might expect, it contains a fairly extensive set of tools for working with dates, times, and time-indexed data. Date and time data comes in a few flavors, which we will discuss here:

   - Time stamps reference particular moments in time (e.g., July 4th, 2015 at 7:00am).
   - Time intervals and periods reference a length of time between a particular beginning and end point; for example, the year 2015. Periods usually reference a special case of time intervals in which each interval is of uniform length and does not overlap (e.g., 24 hour-long periods comprising days).
    
   - Time deltas or durations reference an exact length of time (e.g., a duration of 22.56 seconds).
   
In this section, we will introduce how to work with each of these types of date/time data in Pandas. This short section is by no means a complete guide to the time series tools available in Python or Pandas, but instead is intended as a broad overview of how you as a user should approach working with time series. We will start with a brief discussion of tools for dealing with dates and times in Python, before moving more specifically to a discussion of the tools provided by Pandas. After listing some resources that go into more depth, we will review some short examples of working with time series data in Pandas.

## Dates and Times in Python

The Python world has a number of available representations of dates, times, deltas, and timespans. While the time series tools provided by Pandas tend to be the most useful for data science applications, it is helpful to see their relationship to other packages used in Python.

## Native Python dates and times: datetime & dateutil

Python's basic objects for working with dates and times reside in the built-in datetime module. Along with the third-party dateutil module, you can use it to quickly perform a host of useful functionalities on dates and times. For example, you can manually build a date using the datetime type:

In [8]:
from datetime import datetime 
datetime(year=2015, month=7, day=4)

datetime.datetime(2015, 7, 4, 0, 0)

or, using the dateutil module, you can parse dates from a variety of string formats:

In [11]:
from dateutil import parser

date = parser.parse("4th of july, 2015")
date

datetime.datetime(2015, 7, 4, 0, 0)

Once you have a datetime object, you can do things like printing the day of the week.

In [13]:
date.strftime('%A')

'Saturday'

In the final line, we've used one of the standard string format codes for printing dates ("%A"), which you can read about in the [strftime](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior) section of Python's [datetime documentation](https://docs.python.org/3/library/datetime.html). Documentation of other useful date utilities can be found in [dateutil's online documentation](http://labix.org/python-dateutil). A related package to be aware of is pytz, which contains tools for working with the most migrane-inducing piece of time series data: time zones.

The power of datetime and dateutil lie in their flexibility and easy syntax: you can use these objects and their built-in methods to easily perform nearly any operation you might be interested in. Where they break down is when you wish to work with large arrays of dates and times: just as lists of Python numerical variables are suboptimal compared to NumPy-style typed numerical arrays, lists of Python datetime objects are suboptimal compared to typed arrays of encoded dates.

## Types arrays of times: NumPy's datetime64

The weaknesses of Python's datatime format inspried the NumPy team to add a set of native time series data type to NumPy. The datetime84 dtype encodes dates as a 64-bit integers, and thus allows arrays of dates to be represented very compactly. The datetime64 requires a very specific input format



In [19]:
import numpy as np
date = np.array('2015-07-04', dtype=np.datetime64)
date


array(datetime.date(2015, 7, 4), dtype='datetime64[D]')

Once we have this date formateed, however, we can quiclkly do vectorized operations on it

In [20]:
date + np.arange(12)

array(['2015-07-04', '2015-07-05', '2015-07-06', '2015-07-07',
       '2015-07-08', '2015-07-09', '2015-07-10', '2015-07-11',
       '2015-07-12', '2015-07-13', '2015-07-14', '2015-07-15'], dtype='datetime64[D]')

Because of the uniform type in NumPy datetime64 arrays, this type of operation can be accomplished much more quickly than if we were working directly with Python's datetime objects, especially as arrays get large (we introduced this type of vectorization in Computation on NumPy Arrays: Universal Functions).

One detail of the datetime64 and timedelta64 objects is that they are built on a fundamental time unit. Because the datetime64 object is limited to 64-bit precision, the range of encodable times is $2^{64}$ times this fundamental unit. In other words, datetime64 imposes a trade-off between time resolution and maximum time span.

For example, if you want a time resolution of one nanosecond, you only have enough information to encode a range of $2^{64}$ nanoseconds, or just under 600 years. NumPy will infer the desired unit from the input; for example, here is a day-based datetime:

In [25]:
np.datetime64('2015-07-04')

numpy.datetime64('2015-07-04')

Here is a minute-based datetime:

In [26]:
np.datetime64('2015-07-04 12:00')

numpy.datetime64('2015-07-04T12:00')

Notice that the time zone is automatically set to the local time on the computer executing code. You can force any desired fundamental unit using one of many format codes; for example, here we'll force a nanosecond-based time:

In [27]:
np.datetime64('2015-07-05 12:59:59.5', 'ns')

numpy.datetime64('2015-07-05T12:59:59.500000000')