<h1><font color = #fc7cc9> Ch. 11 Time Series
    <br>pg. 317 - 362</h1>
<p> 

In [1]:
import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns

from datetime import datetime
from datetime import timedelta

<h2> <font color = #39abed> 11.1 Data and Time Data Types and Tools
    </h2>
<p><blockquote>Anything that is observed or measured at many points in time forms a time series. Many time series are fixed frequency, which is to say that data points occur at regular intervals according to some rule, such as every 15 seconds, every 5 minutes, or once per month. Time series can also be irregular without a fixed unit of time or offset between units.</blockquote>
<p>Specific terminology for different time-related data:
    <ul>
    <li>Timestamps = specific instances in time
    <li>Fixed periods = such as the month January 2007 or the full year 2010
    <li>Intervals of time = as indicated by a start and end timestamp. Periods can be thought of as speciall cases of intervals
    <li>Experiment or elapsed time = each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being put in the oven- here you could use pandas timedeltas!)
        </ul>
<p>This chapter focuses on the first three categories.the most common kind of time series are those <b>indexed by timestamp</b>.

In [2]:
# Mainly use datetime, or calendar, modules. Datetime is mor common.

now = datetime.now()
now

datetime.datetime(2021, 3, 4, 10, 31, 37, 486535)

In [3]:
now.year, now.month, now.day

(2021, 3, 4)

In [4]:
# Timedelt represent the temporal difference between two datetime objects

delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(days=926, seconds=56700)

In [5]:
delta.days

926

In [6]:
delta.seconds

56700

In [7]:
# Can + or - a timedelta or multiple threof to a datetime object, yo make a new shifter object

from datetime import timedelta

In [8]:
start = datetime(2011, 1, 7)

In [9]:
start + timedelta(12) # Adding 12 days t othe first start datetime.

datetime.datetime(2011, 1, 19, 0, 0)

In [10]:
start - 2 * timedelta(12) # Minus 24 days. or w.e

datetime.datetime(2010, 12, 14, 0, 0)

<b>See page 319 for table 11.1 with more types/things you can do in the datetime moduel </b>

<h3> <font color = #39abed> Converting Between String and Datetime
    </h3>
 <p>

In [11]:
# Can format datetimes's objects and pandas's Timestamp objects as strings
# Using the 'str' or strftime' method

stamp = datetime(2011, 1, 3, 13, 45)

In [12]:
str(stamp)

'2011-01-03 13:45:00'

In [13]:
stamp.strftime('%Y-%m-%d')

'2011-01-03'

<b>See page 319 for table 11.2 for more Datetime format specs! (e.g. %yY and so on)

In [14]:
# Can convert strings to dates using same format code
# And by using 'datetime.strptime'

value = '2011-01-03'
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [15]:
datestrs = ['7/6/2011', '8/6/2011']
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

<blockquote><code>datetime.strptime</code> is a good way to parse a date with a known format. However, it can be a bit annoying to have to write a format spec each time, especially for common date formats. In this case, you can use the <code>parser.parse</code>  method in the third-party <code>dateutil</code> package (this is installed automatically when you install pandas): 

In [16]:
from dateutil.parser import parse

In [17]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [18]:
# dateutil can parse other things too! Most date representations

parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

In [19]:
# For more international styles of date, with day written first, can do this:

parse('6/12/2011', dayfirst = True)

datetime.datetime(2011, 12, 6, 0, 0)

pandas is more for working with <b>arrays</b> of dates. <code>to_datetime</code> method parses many different kinds of date representations, like the standard date formates like ISO 8601:

In [20]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [21]:
# to_datetime can also handled None and empty string values

idx = pd.to_datetime(datestrs + [None]) # Adding another date/string that is None
idx

# NaT = Not a Time. This is pandas' null value for timestamp data

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

<b> See page 321, table 11.3 for more locale-specific date formatting

<h2> <font color = #39abed> 11.2 Time Series Basic
    </h2>
<p>A common time series object in pandas is a Series indexed by timestamps. Often represented external to pandas as Python strings or datetime objects.

In [22]:
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
dates

[datetime.datetime(2011, 1, 2, 0, 0),
 datetime.datetime(2011, 1, 5, 0, 0),
 datetime.datetime(2011, 1, 7, 0, 0),
 datetime.datetime(2011, 1, 8, 0, 0),
 datetime.datetime(2011, 1, 10, 0, 0),
 datetime.datetime(2011, 1, 12, 0, 0)]

In [23]:
ts = pd.Series(np.random.randn(6), index = dates)
ts

# The Index in this series is the dates, the right col is just random data

2011-01-02   -0.148958
2011-01-05   -0.617395
2011-01-07    0.604750
2011-01-08    1.438137
2011-01-10   -1.154009
2011-01-12   -1.048098
dtype: float64

Under the hood, the <code>datetime</code> objects have been put in a <code>DatetimeIndex</code>:

In [24]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [25]:
# Just as with other Series, math operations between diff. indexed time series automatically align on the dates

ts + ts[::2]

2011-01-02   -0.297916
2011-01-05         NaN
2011-01-07    1.209500
2011-01-08         NaN
2011-01-10   -2.308019
2011-01-12         NaN
dtype: float64

In [26]:
ts[::2] # every other value in TS

2011-01-02   -0.148958
2011-01-07    0.604750
2011-01-10   -1.154009
dtype: float64

In [27]:
# Scalar values from a DatetimeIndex are pandas Timestamp objects:

stamp = ts.index[0]
stamp

Timestamp('2011-01-02 00:00:00')

<code>Timestamp</code> cane be used anywhere you would use a <code>datetime</code> object. 

<h3> <font color = #39abed> Indexing, Selecting, Subsetting
    </h3>
 <p>Timeseries behave like any other pandas.Series for indexing and selecing data based on labels:

In [28]:
ts

2011-01-02   -0.148958
2011-01-05   -0.617395
2011-01-07    0.604750
2011-01-08    1.438137
2011-01-10   -1.154009
2011-01-12   -1.048098
dtype: float64

In [29]:
stamps = ts.index[2] # Set stamps equal to the 3rd value from ts

In [30]:
ts[stamps]

0.6047502123873608

In [31]:
# Can also pass a string that is interpretable as a date:

ts['1/10/2011']

-1.15400931938646

In [32]:
ts['20110110']

-1.15400931938646

In [33]:
# For longer time series, a year or only a year and month can be passed to easily select slices of data

longer_ts = pd.Series(np.random.randn(1000),
                      index = pd.date_range('1/1/2000', periods = 1000)) 

# Periods = how many periods /dates to generate

longer_ts



2000-01-01   -0.265631
2000-01-02   -0.580875
2000-01-03   -0.480520
2000-01-04   -1.795227
2000-01-05   -1.136288
                ...   
2002-09-22    0.289269
2002-09-23    0.391838
2002-09-24   -0.010930
2002-09-25   -0.640284
2002-09-26   -0.076628
Freq: D, Length: 1000, dtype: float64

In [34]:
longer_ts['2001'] # all the 2001 data. 2001 is interpreted as a tyear here.

2001-01-01   -0.245397
2001-01-02   -0.837936
2001-01-03   -0.016314
2001-01-04   -0.636982
2001-01-05    0.527905
                ...   
2001-12-27   -1.050327
2001-12-28   -1.302271
2001-12-29   -0.002487
2001-12-30    1.153363
2001-12-31    0.886509
Freq: D, Length: 365, dtype: float64

In [35]:
# Can also specify months

longer_ts['2001-05']

2001-05-01   -0.973916
2001-05-02    0.389325
2001-05-03    0.736916
2001-05-04    0.751678
2001-05-05    2.499908
2001-05-06    0.231253
2001-05-07    1.598478
2001-05-08    2.093363
2001-05-09    2.418706
2001-05-10    0.469855
2001-05-11   -1.806469
2001-05-12   -0.787731
2001-05-13    0.026498
2001-05-14   -0.491243
2001-05-15   -0.940259
2001-05-16   -1.350170
2001-05-17   -0.063753
2001-05-18    0.975852
2001-05-19    1.343401
2001-05-20   -0.492831
2001-05-21   -0.457161
2001-05-22    1.002743
2001-05-23    1.291650
2001-05-24    1.817059
2001-05-25    0.860634
2001-05-26    0.210050
2001-05-27    0.353174
2001-05-28   -1.205139
2001-05-29    0.798252
2001-05-30    0.074861
2001-05-31    1.802055
Freq: D, dtype: float64

In [36]:
# You can also SLICE with 'datetime' objects as well

ts[datetime(2011, 1, 7):] # HUH ???

2011-01-07    0.604750
2011-01-08    1.438137
2011-01-10   -1.154009
2011-01-12   -1.048098
dtype: float64

<blockquote>Because most time series data is ordered chronologically, you can slice with time‐ stamps not contained in a time series to perform a range query: 

In [37]:
ts['1/6/2011':'1/11/2011'] # Notice that these dates are not in te original dataset!

2011-01-07    0.604750
2011-01-08    1.438137
2011-01-10   -1.154009
dtype: float64

In [38]:
ts # just for reference

2011-01-02   -0.148958
2011-01-05   -0.617395
2011-01-07    0.604750
2011-01-08    1.438137
2011-01-10   -1.154009
2011-01-12   -1.048098
dtype: float64

Remember that just like with NumPy arrays, slicing is just it's own thing, it not a saved copied (unless stated otherwise, e.g., you save it as its own variable). So what is sliced will not be reflected in the original data.

In [39]:
# the method 'truncate' slices a Series between two dates...

ts.truncate(after = '1/9/2011') # notice again that 9th is not included in the original data set! but can slice there anyway

2011-01-02   -0.148958
2011-01-05   -0.617395
2011-01-07    0.604750
2011-01-08    1.438137
dtype: float64

In [40]:
# The above also works for dataframes, indexing on its rows:

dates = pd.date_range('1/1/2000', periods = 100, freq = 'W-WED')

long_df = pd.DataFrame(np.random.randn(100, 4),
                      index = dates,
                      columns = ['Colorado', 'Texas',
                                 'New York', 'Ohio'])
long_df

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.174764,0.585874,-0.040467,-2.214267
2000-01-12,-1.196074,-0.973859,1.264964,0.300146
2000-01-19,1.279673,0.981989,-0.443785,0.120396
2000-01-26,1.474784,-1.154225,-0.335140,-0.097445
2000-02-02,0.484455,0.427458,1.595978,-1.326458
...,...,...,...,...
2001-10-31,1.090445,-0.441111,1.918842,-0.815012
2001-11-07,-2.123394,1.610230,-0.792265,2.316817
2001-11-14,0.110368,0.968701,-0.659532,1.902736
2001-11-21,0.731133,0.461295,0.497530,-0.304626


In [41]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,-0.456719,0.678199,0.935796,-1.454754
2001-05-09,-0.965694,0.134877,-1.63049,-1.529608
2001-05-16,0.600285,0.257534,0.083965,0.548121
2001-05-23,-0.963425,1.90926,0.849372,-0.58071
2001-05-30,-0.857178,1.054397,0.44115,0.083045


<h3> <font color = #39abed> Time Series with Duplicate Indicies
    </h3>
 <p>You maye have several data observations on the same same time stamp, like so:

In [42]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',
                          '1/2/2000', '1/3/2000'])

In [43]:
dup_ts =pd.Series(np.arange(5), index = dates)

dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int32

In [44]:
# Can check if the index is unique or not (has only 1 value in it) like this:

dup_ts.index.is_unique

False

In [45]:
# if there are multiple values you will see them as slices, or if single, just scalar values:

dup_ts['1/3/2000'] # not duplicated

4

In [46]:
dup_ts['1/2/2000'] # duplicated

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int32

If you want to aggregate the data that has non-unique timestamps, one way to go this is to use <code>groupby</code> and pass <code>level = 0</code>

In [47]:
grouped = dup_ts.groupby(level = 0)

In [48]:
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int32

In [49]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

<h2> <font color = #39abed> 11.3 Date Ranges, Frequencies, and Shifting
    </h2>
<p>Time series in pandas are assumed to be irregular (i.e. they have no fixed freuqency). Usually, you want a fixed frequency, (e.g., daily, monthly, or every 15 min.), even if you have to put in missing values into the time series. One way to do this is to use the pandas' <code>resample</code>:

In [53]:
ts

2011-01-02   -0.148958
2011-01-05   -0.617395
2011-01-07    0.604750
2011-01-08    1.438137
2011-01-10   -1.154009
2011-01-12   -1.048098
dtype: float64

In [52]:
resampler = ts.resample('D') # D here is interpreted as daily frequency.. more about thisin 11.6
resampler

<pandas.core.resample.DatetimeIndexResampler object at 0x00000188EE95C5B0>

<h3> <font color = #39abed> Generating Date Ranges
    </h3>
    <p>You can use <code>pandas.date_range</code> for generating <code>DatetimeIndex</code> with an indicated length according to a particular frequency. By default, it will generate daily timestamps. 

In [54]:
index = pd.date_range('2012-04-01', '2012-06-01')
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

In [55]:
# If you only pass a single start or single end date, you need to pass the no. of periods to generate.

pd.date_range(start = '2012-04-01', periods = 20)

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

<font color = #f00ece>If you want a date index to contain the <b>last business day</b> of each month, you can pass <code>'BM'</code> frequency. See table 11.4 on page 329 for more.

In [59]:
# E.g. of when to use last business day, for pay days or something like dat. For just business days, use 'B'

pd.date_range('2000-01-01', '2000-12-01', freq = 'BM')

DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-28',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-29', '2000-10-31', '2000-11-30'],
              dtype='datetime64[ns]', freq='BM')

In [61]:
# By default, date_range is inclusive of the given start or end timestamp

pd.date_range('2012-05-02 12:56:31', periods = 5)

DatetimeIndex(['2012-05-02 12:56:31', '2012-05-03 12:56:31',
               '2012-05-04 12:56:31', '2012-05-05 12:56:31',
               '2012-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

<blockquote>Sometimes you will have start or end dates with time information but want to generate a set of timestamps normalized to midnight as a convention. To do this, there is a normalize option: 

In [62]:
pd.date_range('2012-05-02 12:56:31', periods = 5, normalize = True)

DatetimeIndex(['2012-05-02', '2012-05-03', '2012-05-04', '2012-05-05',
               '2012-05-06'],
              dtype='datetime64[ns]', freq='D')

<h3> <font color = #39abed> Frequencies and Data Offsets
    </h3>
    <p>These are composed of <i>base frequency</i> and a multiplier... they are typically referred to by a string alias like 'M' for monthly. For each frequency, there is an object defined generally referred to as a <i>data offset</i>. E.g., hourly frequency can be represetned with the <code>Hour</code> class:

In [63]:
from pandas.tseries.offsets import Hour, Minute

In [65]:
hour = Hour()
hour

<Hour>

In [66]:
# You can define a multiple of an offset by passing an integer
four_hours = Hour(4)
four_hours

<4 * Hours>

In [67]:
# In most cases youdont need to explicitly create such an object, you can just do something like '4H'
pd.date_range('2000-01-01', '2000-01-03 23:59', freq = '4h')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               '2000-01-02 16:00:00', '2000-01-02 20:00:00',
               '2000-01-03 00:00:00', '2000-01-03 04:00:00',
               '2000-01-03 08:00:00', '2000-01-03 12:00:00',
               '2000-01-03 16:00:00', '2000-01-03 20:00:00'],
              dtype='datetime64[ns]', freq='4H')

In [68]:
# You can combine offsets with addition

Hour(2) + Minute(30)

<150 * Minutes>

In [69]:
# You an also pass frequency strings, like '1h30min'
pd.date_range('2000-01-01', periods = 10, freq = '1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

<i>Anchored offsets</i> are not evenly spaced out, like work/business days etc.

#### Week of month dates 
<p> <blockquote> One useful frequency class is “week of month,” starting with WOM. This enables you to get dates like the third Friday of each month: 

In [71]:
rng = pd.date_range('2012-01-01', '2012-09-01', freq = 'WOM-3FRI') # Every 3rd week\s Friday

In [72]:
list(rng)

[Timestamp('2012-01-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-02-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-03-16 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-04-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-05-18 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-06-15 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-07-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2012-08-17 00:00:00', freq='WOM-3FRI')]

<h3> <font color = #39abed> Shifting (Leading and Lagging) Data
    </h3>
 <p>Shifting = moving data backward and forward through time. Both Series and DataFrames have a <code>shift</code> 

In [74]:
ts = pd.Series(np.random.randn(4),
               index = pd.date_range('1/1/2000', periods = 4, freq = 'M')) # M = last day of the month

ts

2000-01-31    0.373379
2000-02-29    1.030767
2000-03-31   -0.474707
2000-04-30    0.180316
Freq: M, dtype: float64

In [76]:
ts.shift(2) # move forward 2 dates...??? or take away them from the series

2000-01-31         NaN
2000-02-29         NaN
2000-03-31    0.373379
2000-04-30    1.030767
Freq: M, dtype: float64

In [77]:
ts.shift(-2) # Move backwards 2 dates/take them away

2000-01-31   -0.474707
2000-02-29    0.180316
2000-03-31         NaN
2000-04-30         NaN
Freq: M, dtype: float64

In [78]:
# A commone use of shift is computing percentage changes in a time series or DataFrame, like so:

ts / ts.shift(1) - 1

2000-01-31         NaN
2000-02-29    1.760648
2000-03-31   -1.460538
2000-04-30   -1.379848
Freq: M, dtype: float64

<blockquote>Because naive shifts leave the index unmodified, some data is discarded. Thus if the frequency is known, it can be passed to shift  to advance the timestamps instead of simply the data: 

In [79]:
ts.shift(2, freq = 'M') # So if you don't want the ABOVE to happen, with NaN, then use/pass a freq!

2000-03-31    0.373379
2000-04-30    1.030767
2000-05-31   -0.474707
2000-06-30    0.180316
Freq: M, dtype: float64

In [81]:
# other types of frequencies can be passed to

ts.shift(3, freq = 'D')

# of

ts.shift(1, freq = '90T') # T is for Minutes

2000-01-31 01:30:00    0.373379
2000-02-29 01:30:00    1.030767
2000-03-31 01:30:00   -0.474707
2000-04-30 01:30:00    0.180316
Freq: M, dtype: float64

#### Shifting dates with offsets

In [83]:
from pandas.tseries.offsets import Day, MonthEnd

In [85]:
now = datetime(2011, 11, 17)
now

datetime.datetime(2011, 11, 17, 0, 0)

In [89]:
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')

<blockquote> If you add an anchored offset like MonthEnd, the first increment will “roll forward” a date to the next date according to the frequency rule: 

In [90]:
now + MonthEnd()

Timestamp('2011-11-30 00:00:00')

In [92]:
now + MonthEnd(3)

Timestamp('2012-01-31 00:00:00')

In [93]:
# Can also specify if you want it to roll forward or backward

offset = MonthEnd()

In [95]:
offset.rollforward(now)

Timestamp('2011-11-30 00:00:00')

#### Also using groupby (but an easier and faster way is to use 'resample', see after)

In [97]:
ts = pd.Series(np.random.randn(20),
               index = pd.date_range('1/15/2000', periods = 20, freq = '4d'))
ts

2000-01-15   -0.141408
2000-01-19    0.584191
2000-01-23    1.341504
2000-01-27   -1.888553
2000-01-31    0.309497
2000-02-04    1.640739
2000-02-08   -1.039778
2000-02-12   -0.567509
2000-02-16    0.031717
2000-02-20   -0.681851
2000-02-24    0.837703
2000-02-28   -0.259262
2000-03-03    0.063148
2000-03-07   -0.556264
2000-03-11    1.593275
2000-03-15   -2.308009
2000-03-19   -0.822295
2000-03-23   -1.226401
2000-03-27   -2.489695
2000-03-31   -0.182224
Freq: 4D, dtype: float64

In [98]:
ts.groupby(offset.rollforward).mean()

2000-01-31    0.041046
2000-02-29   -0.005463
2000-03-31   -0.741058
dtype: float64

In [101]:
ts.resample('M').mean() # Group it by month, then caluclate the mean

2000-01-31    0.041046
2000-02-29   -0.005463
2000-03-31   -0.741058
Freq: M, dtype: float64

<h2> <font color = #39abed> 11.4 Time Zone Handling
    </h2>
<p>Timeszones - in relation to UTC. Python uses the library pytz for timezones.

In [102]:
import pytz

In [104]:
pytz.common_timezones[-5:] # a brief look at some of the common time zones in lib pytz

['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

In [105]:
# Get a time zone object from 'pytz' using 'pytz.timezone'

tz = pytz.timezone('America/New_York')
tz

<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

<h3> <font color = #39abed> Time Zone Localisation and Conversion
    </h3>
    <p>By default, time series in pandas are <i>time zone naive</i>. See below:

In [106]:
rng = pd.date_range('3/9/2012 9:30', periods = 6, freq = 'D')
rng

DatetimeIndex(['2012-03-09 09:30:00', '2012-03-10 09:30:00',
               '2012-03-11 09:30:00', '2012-03-12 09:30:00',
               '2012-03-13 09:30:00', '2012-03-14 09:30:00'],
              dtype='datetime64[ns]', freq='D')

In [107]:
ts = pd.Series(np.random.randn(len(rng)), index = rng)
ts

2012-03-09 09:30:00   -2.143832
2012-03-10 09:30:00   -0.997192
2012-03-11 09:30:00   -0.750244
2012-03-12 09:30:00   -1.179428
2012-03-13 09:30:00   -0.349924
2012-03-14 09:30:00   -0.282618
Freq: D, dtype: float64

In [108]:
# The index's tz field is None

print(ts.index.tz)

None


In [109]:
# Conversion from naive to localised is handled by the lz_localize method
ts

2012-03-09 09:30:00   -2.143832
2012-03-10 09:30:00   -0.997192
2012-03-11 09:30:00   -0.750244
2012-03-12 09:30:00   -1.179428
2012-03-13 09:30:00   -0.349924
2012-03-14 09:30:00   -0.282618
Freq: D, dtype: float64

In [114]:
ts_utc = ts.tz_localize('UTC')
ts_utc

2012-03-09 09:30:00+00:00   -2.143832
2012-03-10 09:30:00+00:00   -0.997192
2012-03-11 09:30:00+00:00   -0.750244
2012-03-12 09:30:00+00:00   -1.179428
2012-03-13 09:30:00+00:00   -0.349924
2012-03-14 09:30:00+00:00   -0.282618
Freq: D, dtype: float64

In [111]:
ts_utc.index

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

<blockquote>Once a time series has been localized to a particular time zone, it can be converted to another time zone with <code>tz_convert</code>: 

In [115]:
ts_utc.tz_convert('America/New_York')

2012-03-09 04:30:00-05:00   -2.143832
2012-03-10 04:30:00-05:00   -0.997192
2012-03-11 05:30:00-04:00   -0.750244
2012-03-12 05:30:00-04:00   -1.179428
2012-03-13 05:30:00-04:00   -0.349924
2012-03-14 05:30:00-04:00   -0.282618
Freq: D, dtype: float64

In [116]:
ts_utc.tz_convert('Europe/Stockholm')

2012-03-09 10:30:00+01:00   -2.143832
2012-03-10 10:30:00+01:00   -0.997192
2012-03-11 10:30:00+01:00   -0.750244
2012-03-12 10:30:00+01:00   -1.179428
2012-03-13 10:30:00+01:00   -0.349924
2012-03-14 10:30:00+01:00   -0.282618
Freq: D, dtype: float64

In [117]:
# tz_localize and tz_convert are also instance methods on Datetime Index
ts.index.tz_localize('Asia/Shanghai')

DatetimeIndex(['2012-03-09 09:30:00+08:00', '2012-03-10 09:30:00+08:00',
               '2012-03-11 09:30:00+08:00', '2012-03-12 09:30:00+08:00',
               '2012-03-13 09:30:00+08:00', '2012-03-14 09:30:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq='D')

<h3> <font color = #39abed> Operations and Time Zone-Aware Timestamp Objects
    </h3>
 <p>

<h3> <font color = #39abed> Operations Between Different Time Zones
    </h3>
 <p>

<h2> <font color = #39abed> 11.5 periods and Period Arithmetic
    </h2>
<p>

<h3> <font color = #39abed> Period Frequency Conversion
    </h3>
 <p>

<h3> <font color = #39abed> Quarterly Period Frequencies
    </h3>
 <p>

<h3> <font color = #39abed> Conversting Timestamps to Periods (and Back)
    </h3>
 <p>

<h3> <font color = #39abed> Creating PeriodIndex from Arrays
    </h3>
 <p>

<h2> <font color = #39abed> 11.6 Resampling and Frequency Conversion
    </h2>
<p>

<h3> <font color = #39abed> Downsampling
    </h3>
 <p>

<h3> <font color = #39abed> Upsampling and Interpolation
    </h3>
 <p>

<h3> <font color = #39abed> Resampling with Periods
    </h3>
 <p>

<h2> <font color = #39abed> 11.7 Moving Window Functions
    </h2>
<p>

<h3> <font color = #39abed> Exponetially Weighted Functions
    </h3>
 <p>

<h3> <font color = #39abed> Binary Moving Window Functions
    </h3>
 <p>

<h3> <font color = #39abed> User-Defined Moving Window Functions
    </h3>
 <p>

<h2> <font color = #39abed> 11.8 Conclusion
    </h2>
<p>