Time series data is an important form of structured data. Anything that is observed or
measured at many points in time forms a time series. How you mark
and refer to time series data depends on the application and you may have one of the
following:

• Timestamps, specific instants in time

• Fixed periods, such as the month January 2007 or the full year 2010

• Intervals of time, indicated by a start and end timestamp. Periods can be thought
of as special cases of intervals

• Experiment or elapsed time; each timestamp is a measure of time relative to a
particular start time.

The Python standard library includes data types for date and time data, as well as
calendar-related functionality. The datetime, time, and calendar modules are the main
places to start. The datetime.datetime type, or simply datetime, is widely used:

In [3]:
from datetime import datetime

now = datetime.now()

In [4]:
now

datetime.datetime(2017, 7, 21, 11, 34, 47, 222650)

In [5]:
now.year, now.month, now.day

(2017, 7, 21)

datetime stores both the date and time down to the microsecond. datetime.time
delta represents the temporal difference between two datetime objects:

In [6]:
delta = datetime(2017, 7, 21) - datetime(2014, 12, 22, 8, 15)

In [7]:
delta

datetime.timedelta(941, 56700)

In [8]:
delta.days

941

In [9]:
delta.seconds

56700

You can add (or subtract) a timedelta or multiple thereof to a datetime object to yield
a new shifted object:

In [11]:
from datetime import timedelta

In [12]:
start = datetime(2015, 1, 7)

In [13]:
start + timedelta(12)

datetime.datetime(2015, 1, 19, 0, 0)

In [16]:
start - 2 * timedelta(12)

datetime.datetime(2014, 12, 14, 0, 0)

Types in datetime module:

    date > Store calendar date (year, month, day) using the Gregorian calendar.
    
    time > Store time of day as hours, minutes, seconds, and microseconds
    
    datetime > Stores both date and time
    
    timedelta > Represents the difference between two datetime values (as days, seconds, and microseconds)

Converting between string and datetime: 

datetime objects and pandas Timestamp objects can be formatted
as strings using str or the strftime method, passing a format specification:

In [17]:
stamp = datetime(2017, 1, 3)

In [18]:
str(stamp)

'2017-01-03 00:00:00'

In [19]:
stamp.strftime('%Y-%m-%d')

'2017-01-03'

These same format codes can be
used to convert strings to dates using datetime.strptime:

In [20]:
date_input = '2017-07-21'

In [21]:
datetime.strptime(date_input, '%Y-%m-%d')

datetime.datetime(2017, 7, 21, 0, 0)

In [22]:
datestrs = ['7/6/2017', '8/6/2017']

In [23]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2017, 7, 6, 0, 0), datetime.datetime(2017, 8, 6, 0, 0)]

datetime.strptime is the best way to parse a date with a known format. However, it
can be a bit annoying to have to write a format spec each time, especially for common
date formats. In this case, you can use the parser.parse method in the third party
dateutil package:

In [24]:
from dateutil.parser import parse

In [25]:
parse('2017-01-08')

datetime.datetime(2017, 1, 8, 0, 0)

dateutil is capable of parsing almost any human-intelligible date representation:

In [27]:
parse('Jul 21, 2017 12:02 PM')

datetime.datetime(2017, 7, 21, 12, 2)

In international locales, day appearing before month is very common, so you can pass
dayfirst=True to indicate this:

In [28]:
parse('21/7/2017', dayfirst=True)

datetime.datetime(2017, 7, 21, 0, 0)

pandas is generally oriented toward working with arrays of dates, whether used as an
axis index or a column in a DataFrame. The to_datetime method parses many different
kinds of date representations. Standard date formats like ISO8601 can be parsed very
quickly.

In [29]:
import pandas as pd

In [30]:
datestrs

['7/6/2017', '8/6/2017']

In [31]:
pd.to_datetime(datestrs)

DatetimeIndex(['2017-07-06', '2017-08-06'], dtype='datetime64[ns]', freq=None)

It also handles values that should be considered missing (None, empty string, etc.):

In [32]:
idx = pd.to_datetime(datestrs + [None])

In [33]:
idx

DatetimeIndex(['2017-07-06', '2017-08-06', 'NaT'], dtype='datetime64[ns]', freq=None)

In [34]:
idx[2]

NaT

NaT (Not a Time) is pandas’s NA value for timestamp data.

In [35]:
pd.isnull(idx)

array([False, False,  True], dtype=bool)

Datetime format specification (ISO C89 compatible):
    
    %Y > 4-digit year
    
    %y > 2-digit year
    
    %m > 2-digit month [01, 12]
    
    %d > 2-digit day [01, 31]
    
    %H > Hour (24-hour clock) [00, 23]
    
    %I > Hour (12-hour clock) [01, 12]
    
    %M > 2-digit minute [00, 59]
    
    %S > Second [00, 61] (seconds 60, 61 account for leap seconds)
    
    %w > Weekday as integer [0 (Sunday), 6]
    
    %U > Week number of the year [00, 53]. Sunday is considered the first day of the week, and days before the first
         Sunday of the year are “week 0”.
     
    %W > Week number of the year [00, 53]. Monday is considered the first day of the week, and days before the first
         Monday of the year are “week 0”.
    
    %z > UTC time zone offset as +HHMM or -HHMM, empty if time zone naive.
    
    %F > Shortcut for %Y-%m-%d, for example 2012-4-18
    
    %D > Shortcut for %m/%d/%y, for example 04/18/12

datetime objects also have a number of locale-specific formatting options for systems
in other countries or languages. For example, the abbreviated month names will be
different on German or French systems compared with English systems.

Locale-specific date formatting: 

    %a > Abbreviated weekday name
    
    %A > Full weekday name
    
    %b > Abbreviated month name
    
    %B > Full month name
    
    %c > Full date and time, for example ‘Tue 01 May 2012 04:20:57 PM’
    
    %p > Locale equivalent of AM or PM
    
    %x > Locale-appropriate formatted date; e.g. in US May 1, 2012 yields ’05/01/2012’
    
    %X > Locale-appropriate time, e.g. ’04:24:12 PM’

Time Series basics:

The most basic kind of time series object in pandas is a Series indexed by timestamps,
which is often represented external to pandas as Python strings or datetime objects:
    
    

In [37]:
from datetime import datetime
import pandas as pd
import numpy as np

In [36]:
dates = [datetime(2017, 7, 1), datetime(2017, 7, 4), datetime(2017, 7, 7),
        datetime(2017, 7, 10), datetime(2017, 7, 12), datetime(2017, 7, 15)]

In [39]:
ts = pd.Series(np.random.randn(6), index = dates)

In [40]:
ts

2017-07-01   -0.624124
2017-07-04   -0.506535
2017-07-07   -0.796637
2017-07-10    1.556780
2017-07-12    0.156342
2017-07-15   -0.953531
dtype: float64

In [41]:
type(ts)

pandas.core.series.Series

In [42]:
ts.index

DatetimeIndex(['2017-07-01', '2017-07-04', '2017-07-07', '2017-07-10',
               '2017-07-12', '2017-07-15'],
              dtype='datetime64[ns]', freq=None)

Like other Series, arithmetic operations between differently-indexed time series automatically
align on the dates:

In [43]:
ts + ts[::2]

2017-07-01   -1.248247
2017-07-04         NaN
2017-07-07   -1.593273
2017-07-10         NaN
2017-07-12    0.312684
2017-07-15         NaN
dtype: float64

pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond resolution:

In [50]:
ts.index.dtype_str

'datetime64[ns]'

Scalar values from a DatetimeIndex are pandas Timestamp objects

In [52]:
stamp = ts.index[0]

In [53]:
stamp

Timestamp('2017-07-01 00:00:00')

A Timestamp can be substituted anywhere you would use a datetime object. Additionally,
it can store frequency information (if any) and understands how to do time zone
conversions and other kinds of manipulations.

Indexing, Selection, Subsetting:

TimeSeries is a subclass of Series and thus behaves in the same way with regard to
indexing and selecting data based on label:

In [56]:
ts

2017-07-01   -0.624124
2017-07-04   -0.506535
2017-07-07   -0.796637
2017-07-10    1.556780
2017-07-12    0.156342
2017-07-15   -0.953531
dtype: float64

In [54]:
stamp = ts.index[2]

In [55]:
ts[stamp]

-0.79663665572425824

As a convenience, you can also pass a string that is interpretable as a date:

In [57]:
ts['7/12/2017']

0.15634202286856422

In [58]:
ts['20170704']

-0.50653504373114522

For longer time series, a year or only a year and month can be passed to easily select
slices of data:

In [60]:
longer_ts = pd.Series(np.random.randn(1000), index = pd.date_range('1/1/2017', periods=1000))

In [61]:
longer_ts

2017-01-01   -1.020274
2017-01-02    0.525391
2017-01-03    0.542173
2017-01-04    0.223636
2017-01-05   -0.841831
2017-01-06    1.749637
2017-01-07   -0.123948
2017-01-08   -1.179558
2017-01-09    0.361225
2017-01-10   -1.773824
2017-01-11    0.291970
2017-01-12    0.194238
2017-01-13   -1.896861
2017-01-14    0.867734
2017-01-15   -0.298983
2017-01-16   -0.312290
2017-01-17   -0.641412
2017-01-18   -0.534009
2017-01-19    0.109856
2017-01-20    0.585710
2017-01-21   -1.033172
2017-01-22    0.863000
2017-01-23    0.394097
2017-01-24   -0.495147
2017-01-25    0.612105
2017-01-26   -0.288252
2017-01-27   -0.695481
2017-01-28   -1.581062
2017-01-29    0.726267
2017-01-30    0.604149
                ...   
2019-08-29   -1.134327
2019-08-30    1.537903
2019-08-31   -0.097935
2019-09-01    0.412928
2019-09-02    0.757533
2019-09-03    0.309326
2019-09-04   -0.323455
2019-09-05    0.653967
2019-09-06   -0.696215
2019-09-07   -0.308931
2019-09-08   -0.232449
2019-09-09    3.372820
2019-09-10 

In [62]:
longer_ts['2017']

2017-01-01   -1.020274
2017-01-02    0.525391
2017-01-03    0.542173
2017-01-04    0.223636
2017-01-05   -0.841831
2017-01-06    1.749637
2017-01-07   -0.123948
2017-01-08   -1.179558
2017-01-09    0.361225
2017-01-10   -1.773824
2017-01-11    0.291970
2017-01-12    0.194238
2017-01-13   -1.896861
2017-01-14    0.867734
2017-01-15   -0.298983
2017-01-16   -0.312290
2017-01-17   -0.641412
2017-01-18   -0.534009
2017-01-19    0.109856
2017-01-20    0.585710
2017-01-21   -1.033172
2017-01-22    0.863000
2017-01-23    0.394097
2017-01-24   -0.495147
2017-01-25    0.612105
2017-01-26   -0.288252
2017-01-27   -0.695481
2017-01-28   -1.581062
2017-01-29    0.726267
2017-01-30    0.604149
                ...   
2017-12-02    0.317193
2017-12-03   -0.595492
2017-12-04    1.726154
2017-12-05   -0.636209
2017-12-06    0.097057
2017-12-07    0.913001
2017-12-08   -0.421553
2017-12-09    1.526800
2017-12-10    1.623848
2017-12-11    0.345322
2017-12-12   -1.484703
2017-12-13   -0.387909
2017-12-14 

In [63]:
longer_ts['2017-07']

2017-07-01    1.803742
2017-07-02   -1.119635
2017-07-03    0.187859
2017-07-04   -1.347123
2017-07-05    0.439801
2017-07-06    0.437486
2017-07-07    0.783154
2017-07-08   -0.392527
2017-07-09   -0.477032
2017-07-10    1.856573
2017-07-11    0.660579
2017-07-12   -0.969307
2017-07-13    2.158718
2017-07-14    1.754190
2017-07-15    0.310822
2017-07-16    1.004142
2017-07-17   -0.013859
2017-07-18    2.800424
2017-07-19    0.782208
2017-07-20   -0.931560
2017-07-21   -1.262167
2017-07-22    0.702078
2017-07-23    0.192346
2017-07-24   -0.016388
2017-07-25   -1.460406
2017-07-26    0.433458
2017-07-27   -0.506552
2017-07-28   -0.051210
2017-07-29    0.196609
2017-07-30    0.262469
2017-07-31    1.336575
Freq: D, dtype: float64

In [64]:
#Slicing with dates works just like with a regular Series:

ts

2017-07-01   -0.624124
2017-07-04   -0.506535
2017-07-07   -0.796637
2017-07-10    1.556780
2017-07-12    0.156342
2017-07-15   -0.953531
dtype: float64

In [65]:
ts[datetime(2017,7,7):]

2017-07-07   -0.796637
2017-07-10    1.556780
2017-07-12    0.156342
2017-07-15   -0.953531
dtype: float64

Because most time series data is ordered chronologically, you can slice with timestamps
not contained in a time series to perform a range query:

In [66]:
ts['7/7/2017':'7/12/2017']

2017-07-07   -0.796637
2017-07-10    1.556780
2017-07-12    0.156342
dtype: float64

As before you can pass either a string date, datetime, or Timestamp. Remember that
slicing in this manner produces views on the source time series just like slicing NumPy
arrays. There is an equivalent instance method truncate which slices a TimeSeries between
two dates:

In [72]:
ts.truncate(after='7/10/2017')

2017-07-01   -0.624124
2017-07-04   -0.506535
2017-07-07   -0.796637
2017-07-10    1.556780
dtype: float64

In [73]:
ts.truncate(before='7/10/2017')

2017-07-10    1.556780
2017-07-12    0.156342
2017-07-15   -0.953531
dtype: float64

In [82]:
# All of the above holds true for DataFrame as well, indexing on its rows:

dates = pd.date_range('1/1/2017', periods=100, freq='W-WED')


In [86]:
long_df = pd.DataFrame(np.random.randn(100,4),
                    index=dates,
                   columns=['Bangalore', 'Gurgaon', 'Mumbai', 'Pune'])

In [87]:
long_df['5-2017']

Unnamed: 0,Bangalore,Gurgaon,Mumbai,Pune
2017-05-03,1.035246,-0.084437,0.514409,-2.396768
2017-05-10,-1.148397,-1.156102,1.115474,0.05219
2017-05-17,1.668546,-0.123014,0.366537,-0.47837
2017-05-24,-0.300024,-1.024657,1.432893,-1.249161
2017-05-31,-1.776464,0.447122,1.414563,-1.120234


Time Series with Duplicate Indices:

In [88]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000', '1/3/2000'])

In [89]:
dup_ts = pd.Series(np.arange(5), index=dates)

In [90]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

In [91]:
# We can tell that the index is not unique by checking its is_unique property:

dup_ts.index.is_unique

False

Indexing into this time series will now either produce scalar values or slices depending
on whether a timestamp is duplicated:

In [92]:
dup_ts['1/3/2000'] # not duplicated

4

In [93]:
dup_ts['1/2/2000'] # duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

Suppose you wanted to aggregate the data having non-unique timestamps. One way
to do this is to use groupby and pass level=0 (the only level of indexing!):

In [94]:
grouped = dup_ts.groupby(level=0)

In [96]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [97]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64