# Time Series 
how you mark and refer to time series data depends on the application, and you may have one of the following  
* Timestamps, specific instants in time
* fixed periods, such as the month jan 2007 or the full year 2010
* intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
* Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time(e.g. the diameter of a cookie baking each seocond since being placed in the oven)

In [6]:
import pandas as pd

In [7]:

from datetime import datetime
now = datetime.now()
now

datetime.datetime(2020, 8, 4, 12, 44, 6, 529635)

In [8]:
now.year, now.month, now.day

(2020, 8, 4)

In [9]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(926, 56700)

In [10]:
from datetime import timedelta
start = datetime(2011, 1, 7)
start + 2 * timedelta(12)

datetime.datetime(2011, 1, 31, 0, 0)

## Converting Between String and Datetime

In [11]:
stamp = datetime(1998, 12, 28)
str(stamp)

'1998-12-28 00:00:00'

In [12]:
stamp.strftime('%Y-%m-%d')

'1998-12-28'

## Datetime format specification 
%Y -- four  
%y -- Two-digit year  
%m -- Two-digit month  
%d -- Two-digit day  
%H -- Hour (24 hour clock)  
%I -- House (12 hour clock)  
%M -- Two-digit minute   
%S -- Second [00,61] 61 account for leap second  
%w -- Weekday as integer [0(sunday), 6]  
%U -- Week number of the year [0-53]; sunday is considered the first day of the week, and days before the first sunday of the year are "week 0"   
%W -- Week number of the year [00, 53] Monday is considered the first day of the week, and days before the first monday of the year are week "0"
%z -- UTC time zone offset as +HHMM or -HHMM; empty if time zone naive   
%F -- Shortcut for %Y-%m-%d  
%D -- Shortcut for %m/%d/%y   
<br/>
you can use the same methods to convert strings to dates using datetime.strptime 

In [13]:
value = '2011-01-03'
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [14]:
import numpy as np
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02    0.342957
2011-01-05   -1.095502
2011-01-07    2.092905
2011-01-08   -0.152931
2011-01-10   -1.758786
2011-01-12   -1.015951
dtype: float64

In [15]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [16]:
ts + ts[::2] # selects every other point and doubles them. 

2011-01-02    0.685914
2011-01-05         NaN
2011-01-07    4.185811
2011-01-08         NaN
2011-01-10   -3.517572
2011-01-12         NaN
dtype: float64

In [17]:
stamp = ts.index[0]
stamp

Timestamp('2011-01-02 00:00:00')

In [18]:
ts[stamp]

0.3429570414720186

In [19]:
ts['2011/01/02'] # can pass a date as a string 

0.3429570414720186

In [20]:
#for larger series sets 
longer_ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
longer_ts.tail()

2002-09-22   -0.494385
2002-09-23   -0.226518
2002-09-24    0.856928
2002-09-25    0.209116
2002-09-26    0.177636
Freq: D, dtype: float64

In [21]:
longer_ts['2001'].tail()

2001-12-27    0.963591
2001-12-28   -0.296360
2001-12-29    1.119830
2001-12-30   -0.676404
2001-12-31   -1.928090
Freq: D, dtype: float64

In [22]:
longer_ts['2002-05'].tail()

2002-05-27    0.372464
2002-05-28   -0.456472
2002-05-29    0.900367
2002-05-30   -0.654095
2002-05-31   -0.382102
Freq: D, dtype: float64

In [23]:
ts[datetime(2009, 5, 5):]

2011-01-02    0.342957
2011-01-05   -1.095502
2011-01-07    2.092905
2011-01-08   -0.152931
2011-01-10   -1.758786
2011-01-12   -1.015951
dtype: float64

In [24]:
ts['1/6/2011':'1/11/2011']

2011-01-07    2.092905
2011-01-08   -0.152931
2011-01-10   -1.758786
dtype: float64

In [25]:
ts.truncate(after='1/9/2011')

2011-01-02    0.342957
2011-01-05   -1.095502
2011-01-07    2.092905
2011-01-08   -0.152931
dtype: float64

## Time Series with Duplicate Indices

In [26]:
dates = pd.DatetimeIndex(['1/1/2002', '1/2/2002', '1/3/2002', '1/2/2002', '1/2/2002' ])
dup_ts = pd.Series(np.random.randn(5), index=dates)
dup_ts

2002-01-01   -0.401372
2002-01-02    0.578632
2002-01-03    0.322778
2002-01-02   -0.660494
2002-01-02   -0.977648
dtype: float64

In [27]:
dup_ts.index.unique()

DatetimeIndex(['2002-01-01', '2002-01-02', '2002-01-03'], dtype='datetime64[ns]', freq=None)

In [28]:
# suppose you wanted to group non-unique dates
non_unique = dup_ts.groupby(level=0)
non_unique.mean()

2002-01-01   -0.401372
2002-01-02   -0.353170
2002-01-03    0.322778
dtype: float64

In [29]:
non_unique.count()

2002-01-01    1
2002-01-02    3
2002-01-03    1
dtype: int64

## Date Ranges Frequencies and shifting  
often you time series may have non-uniform date ranges. For most purposes this is fine but should you whish to have equidistant dates you may whish to use the following

In [30]:
ts

2011-01-02    0.342957
2011-01-05   -1.095502
2011-01-07    2.092905
2011-01-08   -0.152931
2011-01-10   -1.758786
2011-01-12   -1.015951
dtype: float64

In [31]:
resampler = ts.resample('D') #D for daily
resampler.mean()

2011-01-02    0.342957
2011-01-03         NaN
2011-01-04         NaN
2011-01-05   -1.095502
2011-01-06         NaN
2011-01-07    2.092905
2011-01-08   -0.152931
2011-01-09         NaN
2011-01-10   -1.758786
2011-01-11         NaN
2011-01-12   -1.015951
Freq: D, dtype: float64

In [32]:
index = pd.date_range('2009-04-01', '2009-05-01')
index

DatetimeIndex(['2009-04-01', '2009-04-02', '2009-04-03', '2009-04-04',
               '2009-04-05', '2009-04-06', '2009-04-07', '2009-04-08',
               '2009-04-09', '2009-04-10', '2009-04-11', '2009-04-12',
               '2009-04-13', '2009-04-14', '2009-04-15', '2009-04-16',
               '2009-04-17', '2009-04-18', '2009-04-19', '2009-04-20',
               '2009-04-21', '2009-04-22', '2009-04-23', '2009-04-24',
               '2009-04-25', '2009-04-26', '2009-04-27', '2009-04-28',
               '2009-04-29', '2009-04-30', '2009-05-01'],
              dtype='datetime64[ns]', freq='D')

In [33]:
index2 = pd.date_range(start='1998-12-28', periods=365) #can also used 'end' as arg
# should you have wanted a index that just referenced the last business day of the month
index2 = pd.date_range(start='1998-12-28', end='2020-12-28', freq='BM') # 'BM' for business month

### Base time series frequencies 
D -- Day   
B -- BusinessDay   
H -- Hour  
T/min -- Minute   
S -- Second  
L/ms -- milli  
U -- Micro  
M -- MonthEnd  
BM -- BusinessMonthEnd  
MS -- MonthBegin  
BMS -- BusinessMonthBegin   
W-MON, W-TUE -- Week    
WOM-1MON/WOM-2MON -- Generates dates in first/second/third/fourth week of the month.    
Q-JAN/Q-FEB -- Quater end anchord on last calendar name of each month  
BQ-JAN/BQ-FEB -- Business Quater End. Quaterly dates anchored on last weekday of each month  
QS-JAN/QS-FEB -- Quaterly dates anchored on first calendar day of each month.    
BQS-JAN/BQS-FEB -- Quarterly dates anchored onf irst weekday of each mont, for year ending in indicated month.   
A-JAN/A-FEB -- Annual dates anchored on last calendar day of given month   
BA-JAN/BA-FEB -- Annual dates anchored on last weekday of given month   
AS-JAN/AS-FEB -- Annual dates anchored on first daty of given month  
BAS-JAN/BAS-FEB -- Annual dates anchored on frist weekday of given month.  


In [34]:
pd.date_range('2001-11-23 12:56:31', periods=5)

DatetimeIndex(['2001-11-23 12:56:31', '2001-11-24 12:56:31',
               '2001-11-25 12:56:31', '2001-11-26 12:56:31',
               '2001-11-27 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [35]:
pd.date_range('2001-11-23 12:56:31', periods=5, normalize=True) #should you want to normalise them to midnight 

DatetimeIndex(['2001-11-23', '2001-11-24', '2001-11-25', '2001-11-26',
               '2001-11-27'],
              dtype='datetime64[ns]', freq='D')

## Shifting Data
moving information around by shifting date

In [36]:
dates = pd.date_range('1998-12-05', periods = 6)
ts = pd.Series(np.random.randn(6), index=dates)
ts

1998-12-05   -0.353879
1998-12-06   -0.463790
1998-12-07   -1.287616
1998-12-08   -0.930545
1998-12-09    1.623138
1998-12-10   -1.473908
Freq: D, dtype: float64

In [37]:
ts.shift(3)

1998-12-05         NaN
1998-12-06         NaN
1998-12-07         NaN
1998-12-08   -0.353879
1998-12-09   -0.463790
1998-12-10   -1.287616
Freq: D, dtype: float64

In [38]:
ts.shift(-2) 

1998-12-05   -1.287616
1998-12-06   -0.930545
1998-12-07    1.623138
1998-12-08   -1.473908
1998-12-09         NaN
1998-12-10         NaN
Freq: D, dtype: float64

In [39]:
ts.shift(1, freq='90S')

1998-12-05 00:01:30   -0.353879
1998-12-06 00:01:30   -0.463790
1998-12-07 00:01:30   -1.287616
1998-12-08 00:01:30   -0.930545
1998-12-09 00:01:30    1.623138
1998-12-10 00:01:30   -1.473908
Freq: D, dtype: float64

# Time Zone Handling
working with time zones is considered to be a pain. As a result many time series used choose to with with UTC, the succsesor to GMT. IT is the current international standard. Time zones are expressed as offsets from UTC. In python, time zone information comes from pytz lirary. You can install it with pip or conda. pandas wraps pytz's functionalty so you can ignore its API outside of the time zone names. Time zone names can be found interactively and in the docs. 

In [40]:
import pytz 
pytz.common_timezones[-5:]

['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

In [41]:
# To get a time zone object from pytz, use pytz.timezone
tz = pytz.timezone('America/New_York')
tz

<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

## Time Zone Localization and Conversion
By default, Time series in pandas are time zone naive. For example, consider the following time series

In [42]:
rng = pd.date_range('3/9/2012 9:20', periods=7, freq='D')
rng

DatetimeIndex(['2012-03-09 09:20:00', '2012-03-10 09:20:00',
               '2012-03-11 09:20:00', '2012-03-12 09:20:00',
               '2012-03-13 09:20:00', '2012-03-14 09:20:00',
               '2012-03-15 09:20:00'],
              dtype='datetime64[ns]', freq='D')

In [43]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-09 09:20:00   -0.919565
2012-03-10 09:20:00    0.355124
2012-03-11 09:20:00    1.363503
2012-03-12 09:20:00    0.551122
2012-03-13 09:20:00   -1.313585
2012-03-14 09:20:00    0.313324
2012-03-15 09:20:00    0.228200
Freq: D, dtype: float64

In [44]:
print(ts.index.tz) # Shows the series does not have an associated timezone

None


In [45]:
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC') #can pass a series a timezone

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
               '2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [46]:
ts_utc = ts.tz_localize('UTC') #localises time series to UTC 
ts_utc

2012-03-09 09:20:00+00:00   -0.919565
2012-03-10 09:20:00+00:00    0.355124
2012-03-11 09:20:00+00:00    1.363503
2012-03-12 09:20:00+00:00    0.551122
2012-03-13 09:20:00+00:00   -1.313585
2012-03-14 09:20:00+00:00    0.313324
2012-03-15 09:20:00+00:00    0.228200
Freq: D, dtype: float64

In [47]:
ts_utc.index

DatetimeIndex(['2012-03-09 09:20:00+00:00', '2012-03-10 09:20:00+00:00',
               '2012-03-11 09:20:00+00:00', '2012-03-12 09:20:00+00:00',
               '2012-03-13 09:20:00+00:00', '2012-03-14 09:20:00+00:00',
               '2012-03-15 09:20:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [48]:
ts_utc.tz_convert('America/New_York') # converts it to another time zone

2012-03-09 04:20:00-05:00   -0.919565
2012-03-10 04:20:00-05:00    0.355124
2012-03-11 05:20:00-04:00    1.363503
2012-03-12 05:20:00-04:00    0.551122
2012-03-13 05:20:00-04:00   -1.313585
2012-03-14 05:20:00-04:00    0.313324
2012-03-15 05:20:00-04:00    0.228200
Freq: D, dtype: float64

In [49]:
ts_eastern = ts.tz_localize('America/New_York') #localise to America/New_York
ts_eastern.tz_convert('UTC') #Converts to UTC

2012-03-09 14:20:00+00:00   -0.919565
2012-03-10 14:20:00+00:00    0.355124
2012-03-11 13:20:00+00:00    1.363503
2012-03-12 13:20:00+00:00    0.551122
2012-03-13 13:20:00+00:00   -1.313585
2012-03-14 13:20:00+00:00    0.313324
2012-03-15 13:20:00+00:00    0.228200
Freq: D, dtype: float64

In [50]:
ts_eastern.tz_convert('Europe/Berlin')

2012-03-09 15:20:00+01:00   -0.919565
2012-03-10 15:20:00+01:00    0.355124
2012-03-11 14:20:00+01:00    1.363503
2012-03-12 14:20:00+01:00    0.551122
2012-03-13 14:20:00+01:00   -1.313585
2012-03-14 14:20:00+01:00    0.313324
2012-03-15 14:20:00+01:00    0.228200
Freq: D, dtype: float64

## Operations Between Different Time Zones
If two Series with different time zones are combined, the result will be UTC. Since the timestamps are stored under the hood in UTC.

In [52]:
rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-07 09:30:00   -1.008152
2012-03-08 09:30:00   -0.145881
2012-03-09 09:30:00    0.857205
2012-03-12 09:30:00    1.367619
2012-03-13 09:30:00    0.613715
2012-03-14 09:30:00   -0.425579
2012-03-15 09:30:00    0.361737
2012-03-16 09:30:00    1.274083
2012-03-19 09:30:00    0.506632
2012-03-20 09:30:00    0.572335
Freq: B, dtype: float64

In [60]:
ts1 = ts[:7].tz_localize('Europe/London')
ts2 = ts1[2:].tz_convert('Europe/Moscow')
result = ts1 + ts2
result.index # Resultant time zone is UTC 

DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
               '2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='B')

## Periods and Period Arithmetic 
Periods represent timespanes, like days, months, quaters or years. The period is represented by the Period class

In [62]:
p = pd.Period(2007, freq='A-Dec')
p

Period('2007', 'A-DEC')

In [63]:
# addidng and subtracting shift the periods date 
p + 5

Period('2012', 'A-DEC')

In [65]:
# regular ranges or periods can be constructed with period_range
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

In [67]:
pd.Series(np.random.randn(len(rng)), index=rng) #can use as an axis index in data strucutre. 

2000-01   -1.637263
2000-02   -0.302098
2000-03   -0.671930
2000-04    1.017570
2000-05   -0.111502
2000-06    1.055387
Freq: M, dtype: float64

## period Freqency Conversion
for example shoudl you have a period index of a year and whish to convert it into months 


In [68]:
p = pd.Period('2007', freq='A-DEC')
p

Period('2007', 'A-DEC')

In [69]:
p.asfreq('M', how='start')

Period('2007-01', 'M')

In [70]:
p.asfreq('M', how='end')

Period('2007-12', 'M')

 PeriodIndex objects or time series can be similarly converted with the same sematincs

In [75]:
rng = pd.period_range('2006','2012', freq='A-DEC')
p = pd.Series(np.random.randn(len(rng)), index=rng)
p

2006   -1.675085
2007    0.848519
2008    0.393645
2009   -0.761129
2010    0.669436
2011   -0.603207
2012   -0.737702
Freq: A-DEC, dtype: float64

In [79]:
p.asfreq('B', how='end') # last business day of each year

2006-12-29   -1.675085
2007-12-31    0.848519
2008-12-31    0.393645
2009-12-31   -0.761129
2010-12-31    0.669436
2011-12-30   -0.603207
2012-12-31   -0.737702
Freq: B, dtype: float64

## Converting Timestamps to Peridos (and Back)
Series and DataFrame objects indexed by timestamps can be converted to periods with the to_period method

In [82]:
rng = pd.date_range('2000-01-01', periods=3, freq='M')
ts = pd.Series(np.random.randn(3), index=rng)
ts

2000-01-31    1.189275
2000-02-29    0.112156
2000-03-31    0.228564
Freq: M, dtype: float64

In [84]:
pts = ts.to_period()
pts

2000-01   -0.781802
2000-02    0.853108
2000-03    1.117287
Freq: M, dtype: float64

Since periods refer to non-overlapping timespans, a timestamp can only belong to a single period. There however is no issue with having multiple numbers of the same period.

In [86]:
rng = pd.date_range('1/29/2000', periods=6, freq='D')
ts2 = pd.Series(np.random.randn(6), index=rng)
ts2

2000-01-29   -1.335982
2000-01-30   -0.838258
2000-01-31    0.125719
2000-02-01    0.294786
2000-02-02    0.263647
2000-02-03   -0.700288
Freq: D, dtype: float64

In [89]:
pts2 = ts2.to_period('M')
pts2

2000-01   -1.335982
2000-01   -0.838258
2000-01    0.125719
2000-02    0.294786
2000-02    0.263647
2000-02   -0.700288
Freq: M, dtype: float64

In [90]:
# To convert back to timestamp use to_timestamp()

In [92]:
ts3 = pts2.to_timestamp(how='end')
ts3

2000-01-31 23:59:59.999999999   -1.335982
2000-01-31 23:59:59.999999999   -0.838258
2000-01-31 23:59:59.999999999    0.125719
2000-02-29 23:59:59.999999999    0.294786
2000-02-29 23:59:59.999999999    0.263647
2000-02-29 23:59:59.999999999   -0.700288
dtype: float64

# Resampling and Frequency conversion
resampling referes to the process of converting a time series from one frequncy to antoher. downsampling, converts to lower frequency where upsampling refers to higher frequency. the resample method is the main workhorse for all frequency conversion. 

In [99]:
rng = pd.date_range('2000-01-01', periods=100, freq='D')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head(8)

2000-01-01    0.873980
2000-01-02    1.106158
2000-01-03    0.813009
2000-01-04    0.623379
2000-01-05   -0.722063
2000-01-06    0.444274
2000-01-07    1.054500
2000-01-08   -1.209713
Freq: D, dtype: float64

In [102]:
ts.resample('M').mean()

2000-01-31    0.081158
2000-02-29   -0.239076
2000-03-31   -0.011235
2000-04-30   -0.059235
Freq: M, dtype: float64

In [104]:
ts.resample('M', kind='period').mean()

2000-01    0.081158
2000-02   -0.239076
2000-03   -0.011235
2000-04   -0.059235
Freq: M, dtype: float64

## Resample method arguments  
freq -- String or DateOffset indicating desired resampled frequency
axis -- Axis to resample on; default=0  
fill_method -- How to interpolate when upsampling as in 'ffill', or 'bfill';    
closed -- In downsampling, which ened of each interval is closed 'right' or 'left'.    
label -- In downsampling, how to lavel the aggregated result with the 'right' or 'left' bin edge    
limit -- When foward or backward filling, the maximum number of periods to fill.   
kind -- Aggregate to periods or timestamps 'period' or 'timestamp'   
convention -- When resampling periods, the convention('start' or'end' for converting the low-frequncy period to high frequency'; defaults to 'start'. 