## Chapter 11. Time Series
<a id='index'></a>

* Timestamps, specific instants in time
* Fixed periods, such as the month January 2007 or the full year 2010
* Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
* Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)

***pandas provides many built-in time series tools and data algorithms.***

## Table of Content
- [11.1 Date and Time Data Types and Tools](#111)
    - [11.1.1 Converting Between String and Datetime](#1111)
- [11.2 Time Series Basics](#112)
    - [11.2.1 Indexing, Selection, Subsetting](#1121)
    - [11.2.2 Time Series with Duplicate Indices](#1122)
- [11.3 Date Ranges, Frequencies, and Shifting](#113)
    - [11.3.1 Generating Date Ranges](#1131)
    - [11.3.2 Frequencies and Date Offsets](#1132)
        - [11.3.2.1 Week of month dates](#11321)
    - [11.3.3 Shifting (Leading and Lagging) Data](#1133)
        - [11.3.3.1 Shifting dates with offsets](#11331)
- [11.4 Time Zone Handling](#114)
    - [11.4.1 Time Zone Localization and Conversion](#1141)
    - [11.4.2 Operations with Time Zone -Aware Timestamp Objects](#1142)
    - [11.4.3 Operations Between Different Time Zones](#1143)
- [11.5 Periods and Period Arithmetic](#115)
    - [11.5.1 Period Frequency Conversion](#1151)

<hr>

In [3]:
import pandas as pd
import numpy as np
from datetime import datetime

## 11.1 Date and Time Data Types and Tools
<a id='111'></a>

In [4]:
now = datetime.now()
now

datetime.datetime(2017, 12, 27, 11, 22, 8, 238883)

In [5]:
now.year, now.month, now.day

(2017, 12, 27)

In [6]:
# timedelta represents the temporal difference between two datetime objects:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(926, 56700)

In [7]:
delta.days, delta.seconds

(926, 56700)

In [8]:
# You can add (or subtract) a timedelta or multiple thereof to a datetime object to yield a new shifted object:
from datetime import timedelta
start = datetime(2011, 1, 7)
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [9]:
start - 2 * timedelta(12)

datetime.datetime(2010, 12, 14, 0, 0)

### 11.1.1 Converting Between String and Datetime
<a id='1111'></a>

In [10]:
stamp = datetime(2011, 1, 3)
str(stamp)

'2011-01-03 00:00:00'

In [11]:
stamp.strftime("%Y-%m-%d")

'2011-01-03'

In [12]:
value = '2011-01-03'

datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [13]:
datestrs=['7/6/2011', '8/6/2011']
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

In [15]:
# use the parser.parse method in the third-party dateutil package 
# (this is installed automatically when you install pandas):
from dateutil.parser import parse
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [16]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [19]:
# In international locales, day appearing before month is very common, so you can pass dayfirst=True to indicate this:
parse('6/12/2011', dayfirst=True)

datetime.datetime(2011, 12, 6, 0, 0)

pandas is generally oriented toward working with arrays of dates, whether used as an axis index or a column in a DataFrame. The ***to_datetime*** method parses many different kinds of date representations.

In [20]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [22]:
# It also handles values that should be considered missing (None, empty string, etc.):
idx = pd.to_datetime(datestrs + [None])
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [25]:
# NaT (Not a Time) is pandas’s null value for timestamp data.
idx[2]

NaT

In [24]:
pd.isnull(idx)

array([False, False,  True], dtype=bool)

<hr>

## 11.2 Time Series Basics
<a id='112'></a>

In [29]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]

ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02   -1.125144
2011-01-05    1.285811
2011-01-07   -0.333656
2011-01-08    0.767509
2011-01-10    0.361046
2011-01-12    0.715506
dtype: float64

In [30]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [31]:
ts + ts[::2]

2011-01-02   -2.250289
2011-01-05         NaN
2011-01-07   -0.667311
2011-01-08         NaN
2011-01-10    0.722092
2011-01-12         NaN
dtype: float64

In [33]:
# pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond resolution:
ts.index.dtype

dtype('<M8[ns]')

In [35]:
# A Timestamp can be substituted anywhere you would use a datetime object. 
# Additionally, it can store frequency information (if any) and understands 
# how to do time zone conversions and other kinds of manipulations. More on both of these things later.

stamp = ts.index[0]
stamp

Timestamp('2011-01-02 00:00:00')

### 11.2.1 Indexing, Selection, Subsetting
<a id='1121'></a>

In [36]:
stamp = ts.index[2]
ts[stamp]

-0.33365551907458257

In [38]:
# you can also pass a string that is interpretable as a date:
ts['1/07/2011']

-0.33365551907458257

In [40]:
ts['20110107']

-0.33365551907458257

In [43]:
longer_ts = pd.Series(np.random.rand(1000), index=pd.date_range('1/1/2000', periods=1000))
longer_ts.head()

2000-01-01    0.360510
2000-01-02    0.457516
2000-01-03    0.729575
2000-01-04    0.274396
2000-01-05    0.616497
Freq: D, dtype: float64

In [45]:
longer_ts['2001']

2001-01-01    0.596949
2001-01-02    0.552498
2001-01-03    0.617247
2001-01-04    0.529737
2001-01-05    0.003253
2001-01-06    0.638703
2001-01-07    0.885529
2001-01-08    0.005200
2001-01-09    0.533438
2001-01-10    0.450961
2001-01-11    0.671177
2001-01-12    0.963391
2001-01-13    0.157949
2001-01-14    0.461103
2001-01-15    0.643402
2001-01-16    0.705398
2001-01-17    0.382288
2001-01-18    0.254731
2001-01-19    0.116858
2001-01-20    0.660160
2001-01-21    0.735336
2001-01-22    0.762415
2001-01-23    0.067742
2001-01-24    0.947596
2001-01-25    0.607270
2001-01-26    0.045495
2001-01-27    0.110533
2001-01-28    0.486155
2001-01-29    0.233867
2001-01-30    0.342687
                ...   
2001-12-02    0.547769
2001-12-03    0.137738
2001-12-04    0.200636
2001-12-05    0.065570
2001-12-06    0.425774
2001-12-07    0.896202
2001-12-08    0.688910
2001-12-09    0.660590
2001-12-10    0.425618
2001-12-11    0.304047
2001-12-12    0.370372
2001-12-13    0.405013
2001-12-14 

In [47]:
longer_ts['2001-05']

2001-05-01    0.335397
2001-05-02    0.041896
2001-05-03    0.778588
2001-05-04    0.581564
2001-05-05    0.956404
2001-05-06    0.363826
2001-05-07    0.452233
2001-05-08    0.085144
2001-05-09    0.974512
2001-05-10    0.260081
2001-05-11    0.475670
2001-05-12    0.160935
2001-05-13    0.978727
2001-05-14    0.806105
2001-05-15    0.113885
2001-05-16    0.338481
2001-05-17    0.222093
2001-05-18    0.650848
2001-05-19    0.374187
2001-05-20    0.340782
2001-05-21    0.781718
2001-05-22    0.030111
2001-05-23    0.070029
2001-05-24    0.427490
2001-05-25    0.780814
2001-05-26    0.724740
2001-05-27    0.769652
2001-05-28    0.047124
2001-05-29    0.299669
2001-05-30    0.429473
2001-05-31    0.190014
Freq: D, dtype: float64

In [52]:
# Datetime slicing
ts[datetime(2011, 1, 7):]

2011-01-07   -0.333656
2011-01-08    0.767509
2011-01-10    0.361046
2011-01-12    0.715506
dtype: float64

In [53]:
ts

2011-01-02   -1.125144
2011-01-05    1.285811
2011-01-07   -0.333656
2011-01-08    0.767509
2011-01-10    0.361046
2011-01-12    0.715506
dtype: float64

In [54]:
# Datetime slicing
ts['1/6/2011':'1/11/2011']

2011-01-07   -0.333656
2011-01-08    0.767509
2011-01-10    0.361046
dtype: float64

In [59]:
ts.truncate(after='1/7/2011')

2011-01-02   -1.125144
2011-01-05    1.285811
2011-01-07   -0.333656
dtype: float64

In [68]:
# All of this holds true for DataFrame as well, indexing on its rows:

dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
long_df = pd.DataFrame(np.random.randn(100, 4), 
                       index=dates, 
                       columns=['Colorado', 'Texas', 'New York', 'Ohio'])

long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,-0.820784,-0.114605,-0.532565,0.227199
2001-05-09,-0.271915,1.440539,0.488478,0.128919
2001-05-16,-0.445741,0.384456,-1.499272,-0.043417
2001-05-23,0.282985,0.430544,-0.095648,0.39702
2001-05-30,0.817583,0.19013,0.466997,1.574958


### 11.2.2 Time Series with Duplicate Indices
<a id='1122'></a>

In [70]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000', '1/3/2000'])
dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int64

In [71]:
dup_ts.index.is_unique

False

In [74]:
# No duplicated
dup_ts['1/3/2000']

4

In [75]:
# Duplicated
dup_ts['1/2/2000']

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int64

In [76]:
grouped = dup_ts.groupby(level=0)
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int64

In [77]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

In [78]:
grouped.sum()

2000-01-01    0
2000-01-02    6
2000-01-03    4
dtype: int64

<hr>

## 11.3 Date Ranges, Frequencies, and Shifting
<a id='113'></a>
Generic time series in pandas are assumed to be irregular; that is, they have no fixed frequency. For many applications this is sufficient. However, it’s often desirable to work relative to a fixed frequency, such as daily, monthly, or every 15 minutes, even if that means introducing missing values into a time series. Fortunately pandas has a full suite of standard time series frequencies and tools for resampling, inferring fre‐ quencies, and generating fixed-frequency date ranges. For example, you can convert the sample time series to be fixed daily frequency by calling ***resample***:

In [79]:
ts

2011-01-02   -1.125144
2011-01-05    1.285811
2011-01-07   -0.333656
2011-01-08    0.767509
2011-01-10    0.361046
2011-01-12    0.715506
dtype: float64

In [82]:
# The string 'D' is interpreted as daily frequency.
resampler = ts.resample('D')
resampler

DatetimeIndexResampler [freq=<Day>, axis=0, closed=left, label=left, convention=start, base=0]

### 11.3.1 Generating Date Ranges
<a id='1131'></a>

In [83]:
index = pd.date_range('2012-04-01', '2012-06-01')
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

In [89]:
# The start and end dates define strict boundaries for the generated date index. 
pd.date_range(start='2012-04-01', periods=20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [90]:
# The start and end dates define strict boundaries for the generated date index. 
pd.date_range(end='2012-06-01', periods=20)

DatetimeIndex(['2012-05-13', '2012-05-14', '2012-05-15', '2012-05-16',
               '2012-05-17', '2012-05-18', '2012-05-19', '2012-05-20',
               '2012-05-21', '2012-05-22', '2012-05-23', '2012-05-24',
               '2012-05-25', '2012-05-26', '2012-05-27', '2012-05-28',
               '2012-05-29', '2012-05-30', '2012-05-31', '2012-06-01'],
              dtype='datetime64[ns]', freq='D')

In [92]:
# If you wanted a date index containing the last business day of each month, 
# you would pass the 'BM' frequency (business end of month; 
# see more complete listing of frequencies
pd.date_range('2000-01-01', '2000-12-01', freq='BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

###### Table 11-4. Base time series frequencies (not comprehensive)
|Alias|Offset type|Description|
|:---|:---|:---|
|D|Day|Calendar daily|
|B|BusinessDay|Business daily|
|H|Hour|Hourly|
|T or min|Minute|Minutely|
|S|Second|Secondly|
|L or ms|Milli|Millisecond (1/1,000 of 1 second)|
|U|Micro|Microsecond (1/1,000,000 of 1 second)|
|M|MonthEnd|Last calendar day of month|
|BM|BusinessMonthEnd|Last business day (weekday) of month|
|MS|MonthBegin|First calendar day of month|
|BMS|BusinessMonthBegin|First weekday of month|
|W-MON, W-TUE, ...|Week|Weekly on given day of week (MON, TUE, WED, THU, FRI, SAT, or SUN)|
|WOM-1MON, WOM-2MON, ...|WeekOfMonth|Generate weekly dates in the  rst, second, third, or fourth week of the month (e.g.,WOM-3FRIfor the third Friday of each month)|
|Q-JAN, Q-FEB, ...|QuarterEnd|Quarterly dates anchored on last calendar day of each month, for year ending in indicated month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)|
|BQ-JAN, BQ-FEB, ...|BusinessQuarterEnd|Quarterly dates anchored on last weekday day of each month, for year ending in indicated month|
|QS-JAN, QS-FEB, ...|QuarterBegin|Quarterly dates anchored on  rst calendar day of each month, for year ending in indicated month|
|BQS-JAN, BQS-FEB, ...|BusinessQuarterBegin|Quarterly dates anchored on  rst weekday day of each month, for year ending in indicated month|
|A-JAN, A-FEB, ...|YearEnd|Annual dates anchored on last calendar day of given month (JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, or DEC)|
|BA-JAN, BA-FEB, ...|BusinessYearEnd|Annual dates anchored on last weekday of given month|
|AS-JAN, AS-FEB, ...|YearBegin|Annual dates anchored on  rst day of given month|
|BAS-JAN, BAS-FEB, ...|BusinessYearBegin|Annual dates anchored on  rst weekday of given month|


In [94]:
# date_range by default preserves the time (if any) of the start or end timestamp:
pd.date_range('2012-05-02 12:56:31', periods=5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [96]:
# Sometimes you will have start or end dates with time information but want to 
# generate a set of timestamps normalized to midnight as a convention. To do this, there is a normalize option:
pd.date_range('2012-05-02 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

### 11.3.2 Frequencies and Date Offsets
<a id='1132'></a>
Frequencies in pandas are composed of a base frequency and a multiplier. 

Base frequencies are typically referred to by a string alias, like 'M' for monthly or 'H' for hourly. For each base frequency, there is an object defined generally referred to as a date offset.

In [97]:
from pandas.tseries.offsets import Hour, Minute
hour = Hour()
hour

<Hour>

In [98]:
four_hours = Hour(4)
four_hours

<4 * Hours>

In [99]:
# In most applications, you would never need to explicitly create one 
# of these objects, instead using a string alias like 'H' or '4H'.
pd.date_range('2000-01-01', '2000-01-03 23:59', freq="4H")

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [100]:
Hour(2) + Minute(3)

<123 * Minutes>

In [101]:
pd.date_range('2000-01-01', periods=10, freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

Some frequencies describe points in time that are not evenly spaced. For example, '**M**' (calendar month end) and '**BM**' (last business/weekday of month) depend on the number of days in a month and, in the latter case, whether the month ends on a weekend or not. We refer to these as ***anchored offsets***.

#### 11.3.2.1 Week of month dates
<a id='11321'></a>
One useful frequency class is “**week of month**,” starting with **WOM**. This enables you to get dates like the third Friday of each month:

In [104]:
rng = pd.date_range('2012-01-01', '2012-09-01', freq="WOM-3FRI")
list(rng)

[Timestamp('2012-01-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-02-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-03-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-04-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-05-18 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-06-15 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-07-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-08-17 00:00:00', freq='WOM-3FRI')]

### 11.3.3 Shifting (Leading and Lagging) Data
<a id='1133'></a>
“Shifting” refers to moving data backward and forward through time. Both Series and DataFrame have a shift method for doing naive shifts forward or backward, leaving the index unmodified:

In [106]:
ts = pd.Series(np.random.randn(4), 
               index=pd.date_range('1/1/2000', periods=4, freq='M'))
ts

2000-01-31    1.275195
2000-02-29    0.217797
2000-03-31   -0.325946
2000-04-30    1.037655
Freq: M, dtype: float64

In [107]:
ts.shift(2)

2000-01-31         NaN
2000-02-29         NaN
2000-03-31    1.275195
2000-04-30    0.217797
Freq: M, dtype: float64

In [108]:
ts.shift(-2)

2000-01-31   -0.325946
2000-02-29    1.037655
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

<div style='color:red;'>
    <h3>Very important!!!</h3>
    <p>To calculate percent changes:</p>
    <p style='padding-left:20px;'><span style='border:3px solid red; padding:3px 3px 3px 3px'>ts / ts.shift(1) - 1</span></p>
</div>

In [110]:
# A common use of shift is computing percent changes in a time series or
# multiple time series as DataFrame columns. This is expressed as:
ts / ts.shift(1) - 1

2000-01-31         NaN
2000-02-29   -0.829205
2000-03-31   -2.496563
2000-04-30   -4.183516
Freq: M, dtype: float64

In [111]:
ts.shift(2, freq='M')

2000-03-31    1.275195
2000-04-30    0.217797
2000-05-31   -0.325946
2000-06-30    1.037655
Freq: M, dtype: float64

In [112]:
ts.shift(3, freq='D')

2000-02-03    1.275195
2000-03-03    0.217797
2000-04-03   -0.325946
2000-05-03    1.037655
dtype: float64

In [113]:
ts.shift(1, freq='90T')

2000-01-31 01:30:00    1.275195
2000-02-29 01:30:00    0.217797
2000-03-31 01:30:00   -0.325946
2000-04-30 01:30:00    1.037655
Freq: M, dtype: float64

#### 11.3.3.1 Shifting dates with offsets
<a id='11331'></a>

In [118]:
from pandas.tseries.offsets import Day, MonthEnd
now = datetime(2011, 11, 17)
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')

In [119]:
# If you add an anchored offset like MonthEnd, the first increment will 
# “roll forward” a date to the next date according to the frequency rule:
now + MonthEnd()

Timestamp('2011-11-30 00:00:00')

In [122]:
now + MonthEnd(2)

Timestamp('2011-12-31 00:00:00')

In [124]:
# Anchored offsets can explicitly “roll” dates forward or backward 
# by simply using their rollforward and rollback methods, respectively:
offset = MonthEnd()
offset.rollforward(now)

Timestamp('2011-11-30 00:00:00')

In [125]:
offset.rollback(now)

Timestamp('2011-10-31 00:00:00')

In [127]:
# Example:
ts = pd.Series(np.random.randn(20), 
               index=pd.date_range('1/15/2000', periods=20, freq='4d'))

ts

2000-01-15   -1.147290
2000-01-19    0.885840
2000-01-23   -0.589457
2000-01-27   -1.014008
2000-01-31   -2.212851
2000-02-04    1.483324
2000-02-08   -0.145954
2000-02-12   -0.293128
2000-02-16    1.012476
2000-02-20   -0.398977
2000-02-24   -0.411772
2000-02-28   -1.174715
2000-03-03    0.134170
2000-03-07    0.904943
2000-03-11   -0.357072
2000-03-15   -0.527782
2000-03-19   -1.146566
2000-03-23   -0.398036
2000-03-27    0.265019
2000-03-31   -0.290907
Freq: 4D, dtype: float64

In [128]:
ts.groupby(offset.rollforward).mean()

2000-01-31   -0.815553
2000-02-29    0.010179
2000-03-31   -0.177029
dtype: float64

In [129]:
# Of course, an easier and faster way to do this is using resample
ts.resample('M').mean()

2000-01-31   -0.815553
2000-02-29    0.010179
2000-03-31   -0.177029
Freq: M, dtype: float64

<hr>

## 11.4 Time Zone Handling
<a id='114'></a>
In Python, time zone information comes from the third-party pytz library (installable with pip or conda), which exposes the Olson database, a compilation of world time zone information.

In [131]:
import pytz

In [133]:
pytz.common_timezones[-5:]

['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

In [135]:
tz = pytz.timezone('America/New_York')
tz

<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

### 11.4.1 Time Zone Localization and Conversion
<a id='1141'></a>

In [138]:
rng = pd.date_range('3/9/2012 9:30', periods=6, fred='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-09 09:30:00    0.302952
2012-03-10 09:30:00   -0.166772
2012-03-11 09:30:00   -0.872938
2012-03-12 09:30:00   -0.219430
2012-03-13 09:30:00   -0.359165
2012-03-14 09:30:00    1.293987
Freq: D, dtype: float64

In [141]:
print(ts.index.tz)

None


In [139]:
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
               '2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [143]:
# Once a time series has been localized to a particular time zone, 
# it can be converted to another time zone with tz_convert:
ts_utc = ts.tz_localize('UTC')
ts_utc

2012-03-09 09:30:00+00:00    0.302952
2012-03-10 09:30:00+00:00   -0.166772
2012-03-11 09:30:00+00:00   -0.872938
2012-03-12 09:30:00+00:00   -0.219430
2012-03-13 09:30:00+00:00   -0.359165
2012-03-14 09:30:00+00:00    1.293987
Freq: D, dtype: float64

In [144]:
ts_utc.index

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [145]:
ts_utc.tz_convert('America/New_York')

2012-03-09 04:30:00-05:00    0.302952
2012-03-10 04:30:00-05:00   -0.166772
2012-03-11 05:30:00-04:00   -0.872938
2012-03-12 05:30:00-04:00   -0.219430
2012-03-13 05:30:00-04:00   -0.359165
2012-03-14 05:30:00-04:00    1.293987
Freq: D, dtype: float64

In [146]:
# In the case of the preceding time series, which straddles a DST 
# transition in the Amer ica/New_York time zone, we could localize 
# to EST and convert to, say, UTC or Berlin time:
ts_eastern = ts.tz_localize('America/New_York')
ts_eastern.tz_convert('UTC')

2012-03-09 14:30:00+00:00    0.302952
2012-03-10 14:30:00+00:00   -0.166772
2012-03-11 13:30:00+00:00   -0.872938
2012-03-12 13:30:00+00:00   -0.219430
2012-03-13 13:30:00+00:00   -0.359165
2012-03-14 13:30:00+00:00    1.293987
Freq: D, dtype: float64

In [147]:
ts_eastern.tz_convert('Europe/Berlin')

2012-03-09 15:30:00+01:00    0.302952
2012-03-10 15:30:00+01:00   -0.166772
2012-03-11 14:30:00+01:00   -0.872938
2012-03-12 14:30:00+01:00   -0.219430
2012-03-13 14:30:00+01:00   -0.359165
2012-03-14 14:30:00+01:00    1.293987
Freq: D, dtype: float64

In [148]:
# tz_localize and tz_convert are also instance methods on DatetimeIndex:
ts.index.tz_localize('Asia/Shanghai')

DatetimeIndex(['2012-03-09 09:30:00+08:00', '2012-03-10 09:30:00+08:00',
               '2012-03-11 09:30:00+08:00', '2012-03-12 09:30:00+08:00',
               '2012-03-13 09:30:00+08:00', '2012-03-14 09:30:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq='D')

### 11.4.2 Operations with Time Zone-Aware Timestamp Objects
<a id='1142'></a>

In [149]:
stamp = pd.Timestamp('2011-03-12 04:00')
stamp_utc = stamp.tz_localize('UTC')
stamp_utc.tz_convert('America/New_York')

Timestamp('2011-03-11 23:00:00-0500', tz='America/New_York')

In [151]:
stamp_moscow = pd.Timestamp('2011-03-12 04:00', tz='Europe/Moscow')
stamp_moscow

Timestamp('2011-03-12 04:00:00+0300', tz='Europe/Moscow')

In [152]:
# When performing time arithmetic using pandas’s DateOffset objects, 
# pandas respects daylight saving time transitions where possible. 
# Here we construct time‐ stamps that occur right before DST transitions 
# (forward and backward). First, 30 minutes before transitioning to DST:
from pandas.tseries.offsets import Hour
stamp = pd.Timestamp('2012-03-12 01:30', tz='US/Eastern')
stamp

Timestamp('2012-03-12 01:30:00-0400', tz='US/Eastern')

In [153]:
stamp + Hour()

Timestamp('2012-03-12 02:30:00-0400', tz='US/Eastern')

In [155]:
# Then, 90 minutes before transitioning out of DST
stamp = pd.Timestamp('2012-11-04 00:30', tz='US/Eastern')
stamp

Timestamp('2012-11-04 00:30:00-0400', tz='US/Eastern')

In [156]:
stamp + 2 * Hour()

Timestamp('2012-11-04 01:30:00-0500', tz='US/Eastern')

### 11.4.3 Operations Between Different Time Zones
<a id='1143'></a>

In [157]:
rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-07 09:30:00   -0.311088
2012-03-08 09:30:00   -0.713923
2012-03-09 09:30:00    0.303602
2012-03-12 09:30:00    0.066331
2012-03-13 09:30:00   -1.380534
2012-03-14 09:30:00    0.351114
2012-03-15 09:30:00    0.944967
2012-03-16 09:30:00    0.082544
2012-03-19 09:30:00   -1.163346
2012-03-20 09:30:00   -1.845309
Freq: B, dtype: float64

In [160]:
# If two time series with different time zones are combined, the result will be UTC.
ts1 = ts[:7].tz_localize('Europe/London')
ts2 = ts1[2:].tz_convert('Europe/Moscow')
result = ts1 + ts2
result.index

DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
               '2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='B')

## 11.5 Periods and Period Arithmetic
<a id='115'></a>
*Periods* represent timespans, like days, months, quarters, or years. The **Period** class represents this data type, requiring a string or integer and a frequency.

In [162]:
# In this case, the Period object represents the full timespan from January 1, 2007, to December 31, 2007, inclusive.
p = pd.Period(2007, freq='A-DEC')
p

Period('2007', 'A-DEC')

In [163]:
p + 5

Period('2012', 'A-DEC')

In [164]:
p - 2

Period('2005', 'A-DEC')

In [165]:
# If two periods have the same frequency, their difference is the number of units between them:
pd.Period('2014', freq='A-DEC') - p

7

In [166]:
# Regular ranges of periods can be constructed with the period_range function:
rng = pd.period_range('2001-01-01', '2001-06-30', freq='M')
rng

PeriodIndex(['2001-01', '2001-02', '2001-03', '2001-04', '2001-05', '2001-06'], dtype='period[M]', freq='M')

In [168]:
pd.Series(np.random.randn(6), index=rng)

2001-01   -0.023847
2001-02    1.291398
2001-03    0.084439
2001-04    0.882810
2001-05    0.734414
2001-06    0.590948
Freq: M, dtype: float64

In [169]:
value = ['2001Q3', '2002Q2', '2003Q1']
index = pd.PeriodIndex(value, freq='Q-DEC')
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

### 11.5.1 Period Frequency Conversion
<a id='1151'></a>
Periods and PeriodIndex objects can be converted to another frequency with their ***asfreq*** method.

In [170]:
p = pd.Period('2007', freq='A-DEC')
p

Period('2007', 'A-DEC')

In [171]:
p.asfreq('M', how='start')

Period('2007-01', 'M')

In [172]:
p.asfreq('M', how='end')

Period('2007-12', 'M')

In [176]:
# For a  scal year ending on a month other than December, the corresponding monthly subperiods are different:
p = pd.Period('2007', freq='A-JUN')
p

Period('2007', 'A-JUN')

In [174]:
p.asfreq('M', 'start')

Period('2006-07', 'M')

In [175]:
p.asfreq('M', 'end')

Period('2007-06', 'M')

In [177]:
# When you are converting from high to low frequency, pandas determines the superperiod depending 
# on where the subperiod “belongs.”
p = pd.Period('Aug-2007', 'M')
p.asfreq('A-JUN')

Period('2008', 'A-JUN')

In [178]:
rng = pd.period_range('2006', '2009', freq='A-DEC')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2006   -0.281711
2007    0.243721
2008    1.877977
2009   -0.694283
Freq: A-DEC, dtype: float64

In [179]:
ts.asfreq('M', how='start')

2006-01   -0.281711
2007-01    0.243721
2008-01    1.877977
2009-01   -0.694283
Freq: M, dtype: float64

In [180]:
ts.asfreq('B', how='end')

2006-12-29   -0.281711
2007-12-31    0.243721
2008-12-31    1.877977
2009-12-31   -0.694283
Freq: B, dtype: float64

<hr>
[Back to top](#index)