In [34]:
from pandas import DataFrame, Series
import pandas as pd
import sys
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline


from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse

# Time series
Time series data is an important form of structured data in many different fields, such
as finance, economics, ecology, neuroscience, or physics. Anything that is observed or
measured at many points in time forms a time series. Many time series are fixed frequency,
which is to say that data points occur at regular intervals according to some
rule, such as every 15 seconds, every 5 minutes, or once per month. Time series can
also be irregular without a fixed unit or time or offset between units. How you mark
and refer to time series data depends on the application and you may have one of the
following:

* Timestamps, specific instants in time
* Fixed periods, such as the month January 2007 or the full year 2010
* Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
* Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time. For example, the diameter of a cookie baking each second since being placed in the oven

In this chapter, I am mainly concerned with time series in the first 3 categories, though
many of the techniques can be applied to experimental time series where the index may
be an integer or floating point number indicating elapsed time from the start of the
experiment. The simplest and most widely used kind of time series are those indexed
by timestamp.
pandas provides a standard set of time series tools and data algorithms. With this, you
can efficiently work with very large time series and easily slice and dice, aggregate, and
resample irregular and fixed frequency time series. As you might guess, many of these
tools are especially useful for financial and economics applications, but you could certainly
use them to analyze server log data, too.

# Date and Time Data Types and Tools
The Python standard library includes data types for date and time data, as well as
calendar-related functionality. The datetime, time, and calendar modules are the main
places to start. The datetime.datetime type, or simply datetime, is widely used:

In [1]:
from datetime import datetime

In [2]:
now = datetime.now()

In [3]:
now

datetime.datetime(2018, 3, 9, 19, 37, 5, 394123)

In [4]:
now.year, now.month, now.day

(2018, 3, 9)

datetime stores both the date and time down to the microsecond. datetime.time delta represents the temporal difference between two datetime objects:

In [5]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)

In [6]:
delta

datetime.timedelta(926, 56700)

In [7]:
delta

datetime.timedelta(926, 56700)

In [8]:
delta.days

926

In [10]:
delta.seconds

56700

You can add (or subtract) a timedelta or multiple thereof to a datetime object to yield a new shifted object:

In [11]:
from datetime import timedelta

In [12]:
start = datetime(2011, 1, 7)

In [None]:
start + timedelta(12)

The data types in the datetime module are summarized in Table 10-1. While this chapter is mainly concerned with the data types in pandas and higher level time series manipulation, you will undoubtedly encounter the datetime-based types in many other places in Python the wild.

Type Description

date Store calendar date (year, month, day) using the Gregorian calendar.

time Store time of day as hours, minutes, seconds, and microseconds

datetime Stores both date and time

timedelta Represents the difference between two datetime values (as days, seconds, and microseconds)

# Converting between string and datetime
datetime objects and pandas Timestamp objects, which I’ll introduce later, can be formatted as strings using str or the strftime method, passing a format specification:

In [16]:
stamp = datetime(2011, 1, 3)

In [17]:
str(stamp)

'2011-01-03 00:00:00'

In [19]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

See Table 10-2 for a complete list of the format codes. These same format codes can be used to convert strings to dates using datetime.strptime:


Table 10-2. Datetime format specification (ISO C89 compatible)
Type Description

%Y 4-digit year

%y 2-digit year

%m 2-digit month [01, 12]

%d 2-digit day [01, 31]

%H Hour (24-hour clock) [00, 23]

%I Hour (12-hour clock) [01, 12]

%M 2-digit minute [00, 59]

%S Second [00, 61] (seconds 60, 61 account for leap seconds)

%w Weekday as integer [0 (Sunday), 6]

%U Week number of the year [00, 53]. Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”.

%W Week number of the year [00, 53]. Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”.

%z UTC time zone offset as +HHMM or -HHMM, empty if time zone naive

%F Shortcut for %Y-%m-%d, for example 2012-4-18

%D Shortcut for %m/%d/%y, for example 04/18/12

In [22]:
value = '2011-01-03'

In [23]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [24]:
datestrs = ['7/6/2011', '8/6/2011']

In [25]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

datetime.strptime is the best way to parse a date with a known format. However, it can be a bit annoying to have to write a format spec each time, especially for common date formats. In this case, you can use the parser.parse method in the third party dateutil package:

In [26]:
from dateutil.parser import parse

In [27]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

dateutil is capable of parsing almost any human-intelligible date representation:

In [28]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [29]:
test = parse('Jan 31, 1997 10:45 PM')

In international locales, day appearing before month is very common, so you can pass dayfirst=True to indicate this:

In [31]:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

pandas is generally oriented toward working with arrays of dates, whether used as an axis index or a column in a DataFrame. The to_datetime method parses many different kinds of date representations. Standard date formats like ISO8601 can be parsed very quickly.

In [32]:
datestrs

['7/6/2011', '8/6/2011']

In [35]:
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06', '2011-08-06'], dtype='datetime64[ns]', freq=None)

It also handles values that should be considered missing (None, empty string, etc.):

In [36]:
# It also handles values that should be considered missing (None, empty string, etc.):
idx = pd.to_datetime(datestrs + [None])

In [37]:
idx

DatetimeIndex(['2011-07-06', '2011-08-06', 'NaT'], dtype='datetime64[ns]', freq=None)

In [38]:
idx[2]

NaT

In [40]:
pd.isnull(idx)

array([False, False,  True], dtype=bool)

NaT (Not a Time) is pandas’s NA value for timestamp data.

dateutil.parser is a useful, but not perfect tool. Notably, it will recognize some strings as dates that you might prefer that it didn’t, like '42' will be parsed as the year 2042 with today’s calendar date.


datetime objects also have a number of locale-specific formatting options for systems
in other countries or languages. For example, the abbreviated month names will be
different on German or French systems compared with English systems.

Table 10-3. Locale-specific date formatting

Type Description

%a Abbreviated weekday name

%A Full weekday name

%b Abbreviated month name

%B Full month name

%c Full date and time, for example ‘Tue 01 May 2012 04:20:57 PM’
        
%p Locale equivalent of AM or PM

%x Locale-appropriate formatted date; e.g. in US May 1, 2012 yields ’05/01/2012’

%X Locale-appropriate time, e.g. ’04:24:12 PM’
