# Time Series 
how you mark and refer to time series data depends on the application, and you may have one of the following  
* Timestamps, specific instants in time
* fixed periods, such as the month jan 2007 or the full year 2010
* intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
* Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time(e.g. the diameter of a cookie baking each seocond since being placed in the oven)

In [1]:
import pandas as pd

In [2]:

from datetime import datetime
now = datetime.now()
now

datetime.datetime(2020, 8, 4, 9, 30, 29, 933331)

In [3]:
now.year, now.month, now.day

(2020, 8, 4)

In [4]:
delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(926, 56700)

In [5]:
from datetime import timedelta
start = datetime(2011, 1, 7)
start + 2 * timedelta(12)

datetime.datetime(2011, 1, 31, 0, 0)

## Converting Between String and Datetime

In [6]:
stamp = datetime(1998, 12, 28)
str(stamp)

'1998-12-28 00:00:00'

In [7]:
stamp.strftime('%Y-%m-%d')

'1998-12-28'

## Datetime format specification 
%Y -- four  
%y -- Two-digit year  
%m -- Two-digit month  
%d -- Two-digit day  
%H -- Hour (24 hour clock)  
%I -- House (12 hour clock)  
%M -- Two-digit minute   
%S -- Second [00,61] 61 account for leap second  
%w -- Weekday as integer [0(sunday), 6]  
%U -- Week number of the year [0-53]; sunday is considered the first day of the week, and days before the first sunday of the year are "week 0"   
%W -- Week number of the year [00, 53] Monday is considered the first day of the week, and days before the first monday of the year are week "0"
%z -- UTC time zone offset as +HHMM or -HHMM; empty if time zone naive   
%F -- Shortcut for %Y-%m-%d  
%D -- Shortcut for %m/%d/%y   
<br/>
you can use the same methods to convert strings to dates using datetime.strptime 

In [8]:
value = '2011-01-03'
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [12]:
import numpy as np
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), datetime(2011, 1, 8), datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02    0.495205
2011-01-05   -0.269290
2011-01-07    0.856273
2011-01-08   -0.707436
2011-01-10    0.942841
2011-01-12   -0.609319
dtype: float64

In [13]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [14]:
ts + ts[::2] # selects every other point and doubles them. 

2011-01-02    0.990409
2011-01-05         NaN
2011-01-07    1.712547
2011-01-08         NaN
2011-01-10    1.885682
2011-01-12         NaN
dtype: float64

In [16]:
stamp = ts.index[0]
stamp

Timestamp('2011-01-02 00:00:00')

In [17]:
ts[stamp]

0.4952045210405504

In [18]:
ts['2011/01/02'] # can pass a date as a string 

0.4952045210405504

In [23]:
#for larger series sets 
longer_ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
longer_ts.tail()

2002-09-22   -0.535371
2002-09-23   -1.157224
2002-09-24   -0.263808
2002-09-25    1.188756
2002-09-26    2.559750
Freq: D, dtype: float64

In [28]:
longer_ts['2001'].tail()

2001-12-27    1.400298
2001-12-28    0.590972
2001-12-29    0.281058
2001-12-30    1.065767
2001-12-31   -0.380578
Freq: D, dtype: float64

In [30]:
longer_ts['2002-05'].tail()

2002-05-27   -1.366162
2002-05-28    1.653971
2002-05-29    0.288634
2002-05-30    1.218602
2002-05-31   -0.278151
Freq: D, dtype: float64

In [32]:
ts[datetime(2009, 5, 5):]

2011-01-02    0.495205
2011-01-05   -0.269290
2011-01-07    0.856273
2011-01-08   -0.707436
2011-01-10    0.942841
2011-01-12   -0.609319
dtype: float64

In [47]:
ts['1/6/2011':'1/11/2011']

2011-01-07    0.856273
2011-01-08   -0.707436
2011-01-10    0.942841
dtype: float64

In [48]:
ts.truncate(after='1/9/2011')

2011-01-02    0.495205
2011-01-05   -0.269290
2011-01-07    0.856273
2011-01-08   -0.707436
dtype: float64

## Time Series with Duplicate Indices

In [53]:
dates = pd.DatetimeIndex(['1/1/2002', '1/2/2002', '1/3/2002', '1/2/2002', '1/2/2002' ])
dup_ts = pd.Series(np.random.randn(5), index=dates)
dup_ts

2002-01-01    0.338043
2002-01-02    0.420432
2002-01-03    1.040398
2002-01-02    0.015719
2002-01-02    0.919147
dtype: float64

In [55]:
dup_ts.index.unique()

DatetimeIndex(['2002-01-01', '2002-01-02', '2002-01-03'], dtype='datetime64[ns]', freq=None)

In [59]:
# suppose you wanted to group non-unique dates
non_unique = dup_ts.groupby(level=0)
non_unique.mean()

2002-01-01    0.338043
2002-01-02    0.451766
2002-01-03    1.040398
dtype: float64

In [60]:
non_unique.count()

2002-01-01    1
2002-01-02    3
2002-01-03    1
dtype: int64

## Date Ranges Frequencies and shifting  
often you time series may have non-uniform date ranges. For most purposes this is fine but should you whish to have equidistant dates you may whish to use the following

In [61]:
ts

2011-01-02    0.495205
2011-01-05   -0.269290
2011-01-07    0.856273
2011-01-08   -0.707436
2011-01-10    0.942841
2011-01-12   -0.609319
dtype: float64

In [66]:
resampler = ts.resample('D') #D for daily
resampler.mean()

2011-01-02    0.495205
2011-01-03         NaN
2011-01-04         NaN
2011-01-05   -0.269290
2011-01-06         NaN
2011-01-07    0.856273
2011-01-08   -0.707436
2011-01-09         NaN
2011-01-10    0.942841
2011-01-11         NaN
2011-01-12   -0.609319
Freq: D, dtype: float64

In [68]:
index = pd.date_range('2009-04-01', '2009-05-01')
index

DatetimeIndex(['2009-04-01', '2009-04-02', '2009-04-03', '2009-04-04',
               '2009-04-05', '2009-04-06', '2009-04-07', '2009-04-08',
               '2009-04-09', '2009-04-10', '2009-04-11', '2009-04-12',
               '2009-04-13', '2009-04-14', '2009-04-15', '2009-04-16',
               '2009-04-17', '2009-04-18', '2009-04-19', '2009-04-20',
               '2009-04-21', '2009-04-22', '2009-04-23', '2009-04-24',
               '2009-04-25', '2009-04-26', '2009-04-27', '2009-04-28',
               '2009-04-29', '2009-04-30', '2009-05-01'],
              dtype='datetime64[ns]', freq='D')

In [None]:
index2 = 