# UNIT4: time series
* Date and Time Data Types and Tools

# Different ways to refer to time series data are
* Timestamps, specific instants in time
* Fixed periods, such as the month January 2007 or the full year 2010
* Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
* Experiment or elapsed time relative to a particular start time

In [1]:
import matplotlib.pyplot as plt 
import pandas as pd 
import numpy as np

## datetime(yr,mon,day). stores both the date and time down to the microsecond. 

In [6]:
from datetime import datetime
now = datetime.now()
now
now.year, now.month, now.day

(2022, 10, 20)

## difference in two dates object . Results is timedelta objects which is always in days and sec/microsecs
* timedelta represents the temporal difference between two datetime objects

In [3]:
deltadiff = datetime(2021, 10, 1) - datetime(2021, 1, 1, 0, 0)
deltadiff


datetime.timedelta(days=273)

In [4]:
deltadiff.days

273

In [5]:
deltadiff.seconds

0

## to know the largest time diff which could be used/assigned

In [6]:
deltadiff.max

datetime.timedelta(days=999999999, seconds=86399, microseconds=999999)

## timedelta may be used to add some time interval to a date to get new date. By default unit is day

In [4]:
from datetime import timedelta
start = datetime(2021, 10, 6)
start + timedelta(3)


NameError: name 'datetime' is not defined

In [8]:
start - 2 * timedelta(3)

datetime.datetime(2021, 9, 30, 0, 0)

## datetime objects and function:
* date: Stores calendar date (year, month, day) using the Gregorian calendar
* time: Store time of day as hours, minutes, seconds, and microseconds
* datetime: Stores both date and time
* timedelta: Represents the difference between two datetime values (as days, seconds, and microseconds)
* tzinfo: Base type for storing time zone information

### Conversion Between String and Datetime
* see table 11.2 Datetime format specification (ISO C89 compatible)


In [9]:
initial = datetime(2021, 10, 6)
str(initial)


'2021-10-06 00:00:00'

## get str in particular time format

In [11]:
initial.strftime('%Y-%m-%d')

'2021-10-06'

## shortcuts for date (see Table 11-3. Locale-specific date formatting)
* %F Shortcut for %Y-%m-%d (e.g., 2012-4-18) (Full)
* %D Shortcut for %m/%d/%y (e.g., 04/18/12)  
* %A for full day name  (A,B: alphabetical names) 
* %B : month name


In [10]:
initial.strftime('%D')

'10/06/21'

In [11]:
initial.strftime('%A of  %B, %Y')

'Wednesday of  October, 2021'

In [12]:
initial.strftime('%F')

'2021-10-06'

## weekday number

In [13]:
initial.strftime('%w')

'3'

## strptime: to convert strings to dates using datetime. i.e. parse a date with a known format

In [14]:
value = '2021-10-06'
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2021, 10, 6, 0, 0)

## accessing multiple dates stored in a list/tuples as done for simple values

In [15]:
datestrs = ['06/10/2021', '9/10/2021']
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2021, 6, 10, 0, 0), datetime.datetime(2021, 9, 10, 0, 0)]

## datetime.strptime needs  format to extract date. this limitation is overcome by a third party module dateutil.parser method the parser.

In [16]:
from dateutil.parser import parse
parse('2011-1-03')

datetime.datetime(2011, 1, 3, 0, 0)

### In international locales, day appearing before month is very common, hence  pass dayfirst=True to indicate the same

In [17]:
D=parse('15/12/2011',dayfirst=True)
D.strftime('%d')

'15'

In [18]:
parse('6/14/2011')

datetime.datetime(2011, 6, 14, 0, 0)

In [19]:
parse('Jan 31, 1997 10:45 PM')

datetime.datetime(1997, 1, 31, 22, 45)

## dateutil.parser is a useful but imperfect tool. 
* may  recognize some numeric strings as dates that is not for example, '42' will be parsed as the year 2042 with today’s calendar date.

In [22]:
parse('20')

datetime.datetime(2022, 10, 20, 0, 0)

# pandas extended utility to use to_datetime method to get array of dates to be used as index

In [23]:
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00']
pd.to_datetime(datestrs)

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

## adding none date: NaT: not a time
* NaN: Not a number

In [24]:
idx = pd.to_datetime(datestrs + [None])
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [25]:
pd.isnull(idx)

array([False, False,  True])

## accessing array content as done in python

In [26]:
idx[2]

NaT

## Time Series Basics

In [27]:
#from datetime import datetime
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]
dates


[datetime.datetime(2011, 1, 2, 0, 0),
 datetime.datetime(2011, 1, 5, 0, 0),
 datetime.datetime(2011, 1, 7, 0, 0),
 datetime.datetime(2011, 1, 8, 0, 0),
 datetime.datetime(2011, 1, 10, 0, 0),
 datetime.datetime(2011, 1, 12, 0, 0)]

In [8]:
dates1=[datetime(2021, 10, x) for x in range(1,15)]
dates1

[datetime.datetime(2021, 10, 1, 0, 0),
 datetime.datetime(2021, 10, 2, 0, 0),
 datetime.datetime(2021, 10, 3, 0, 0),
 datetime.datetime(2021, 10, 4, 0, 0),
 datetime.datetime(2021, 10, 5, 0, 0),
 datetime.datetime(2021, 10, 6, 0, 0),
 datetime.datetime(2021, 10, 7, 0, 0),
 datetime.datetime(2021, 10, 8, 0, 0),
 datetime.datetime(2021, 10, 9, 0, 0),
 datetime.datetime(2021, 10, 10, 0, 0),
 datetime.datetime(2021, 10, 11, 0, 0),
 datetime.datetime(2021, 10, 12, 0, 0),
 datetime.datetime(2021, 10, 13, 0, 0),
 datetime.datetime(2021, 10, 14, 0, 0)]

## Generating a time series indexed by date

In [9]:
timeseries1 = pd.Series(np.arange(2,29,2), index=dates1)
timeseries1

2021-10-01     2
2021-10-02     4
2021-10-03     6
2021-10-04     8
2021-10-05    10
2021-10-06    12
2021-10-07    14
2021-10-08    16
2021-10-09    18
2021-10-10    20
2021-10-11    22
2021-10-12    24
2021-10-13    26
2021-10-14    28
dtype: int32

In [41]:
timeseries1.index

DatetimeIndex(['2021-10-01', '2021-10-02', '2021-10-03', '2021-10-04',
               '2021-10-05', '2021-10-06', '2021-10-07', '2021-10-08',
               '2021-10-09', '2021-10-10', '2021-10-11', '2021-10-12',
               '2021-10-13', '2021-10-14'],
              dtype='datetime64[ns]', freq=None)

In [30]:
timeseries1[::2]

2021-10-01     2
2021-10-03     6
2021-10-05    10
2021-10-07    14
2021-10-09    18
2021-10-11    22
2021-10-13    26
dtype: int32

## adding a series with time index to another . same way as for data series. automatically index are aligned

In [31]:
timeseries1[::2]

2021-10-01     2
2021-10-03     6
2021-10-05    10
2021-10-07    14
2021-10-09    18
2021-10-11    22
2021-10-13    26
dtype: int32

In [32]:
timeseries1 + timeseries1[::2]

2021-10-01     4.0
2021-10-02     NaN
2021-10-03    12.0
2021-10-04     NaN
2021-10-05    20.0
2021-10-06     NaN
2021-10-07    28.0
2021-10-08     NaN
2021-10-09    36.0
2021-10-10     NaN
2021-10-11    44.0
2021-10-12     NaN
2021-10-13    52.0
2021-10-14     NaN
dtype: float64

In [152]:
timeseries1.index[0]

Timestamp('2021-10-01 00:00:00')

# pandas stores timestamps using NumPy’s datetime64 data type at the nanosecond resolution

In [33]:
timeseries1.index.dtype

dtype('<M8[ns]')

# Indexing, Selection, Subsetting

In [34]:
timeseries1['10/10/2021']


20

In [38]:
timeseries1['13/10/2021']

26

## alternate shortcutway yyyymmdd

In [36]:
timeseries1['20211013']

26

# pd.date_range: to create array of dates with specified number of dates as period and started date (mm/dd/yr)

In [53]:
pd.date_range('1/10/2021', periods=100)

DatetimeIndex(['2021-01-10', '2021-01-11', '2021-01-12', '2021-01-13',
               '2021-01-14', '2021-01-15', '2021-01-16', '2021-01-17',
               '2021-01-18', '2021-01-19', '2021-01-20', '2021-01-21',
               '2021-01-22', '2021-01-23', '2021-01-24', '2021-01-25',
               '2021-01-26', '2021-01-27', '2021-01-28', '2021-01-29',
               '2021-01-30', '2021-01-31', '2021-02-01', '2021-02-02',
               '2021-02-03', '2021-02-04', '2021-02-05', '2021-02-06',
               '2021-02-07', '2021-02-08', '2021-02-09', '2021-02-10',
               '2021-02-11', '2021-02-12', '2021-02-13', '2021-02-14',
               '2021-02-15', '2021-02-16', '2021-02-17', '2021-02-18',
               '2021-02-19', '2021-02-20', '2021-02-21', '2021-02-22',
               '2021-02-23', '2021-02-24', '2021-02-25', '2021-02-26',
               '2021-02-27', '2021-02-28', '2021-03-01', '2021-03-02',
               '2021-03-03', '2021-03-04', '2021-03-05', '2021-03-06',
      

## creating a ts series 

In [10]:
longerseries = pd.Series(np.random.randn(100),
                      index=pd.date_range('10-1-2021', periods=100))
longerseries

2021-10-01   -0.906660
2021-10-02    0.007293
2021-10-03   -0.388947
2021-10-04    1.014034
2021-10-05   -1.888008
                ...   
2022-01-04   -0.029469
2022-01-05   -0.597811
2022-01-06    0.711364
2022-01-07   -1.681690
2022-01-08   -0.246059
Freq: D, Length: 100, dtype: float64

In [56]:
longerseries[2]

0.5235862849928344

In [None]:
#different ways of selection

## get data for month october

In [38]:
longerseries['2021-10']

2021-10-01   -1.794590
2021-10-02   -1.747455
2021-10-03    0.040517
2021-10-04   -0.470010
2021-10-05    0.107898
2021-10-06   -0.614794
2021-10-07    1.159246
2021-10-08   -1.092107
2021-10-09    0.586668
2021-10-10   -0.210097
2021-10-11    0.666187
2021-10-12    0.687940
2021-10-13    0.536408
2021-10-14   -0.642232
2021-10-15   -1.932763
2021-10-16    0.183778
2021-10-17    0.328996
2021-10-18    0.632272
2021-10-19    0.558433
2021-10-20    0.699429
2021-10-21   -0.238702
2021-10-22   -1.588229
2021-10-23   -0.799408
2021-10-24    0.068958
2021-10-25   -0.704994
2021-10-26    0.875011
2021-10-27   -1.856802
2021-10-28    0.368878
2021-10-29    0.004835
2021-10-30    0.924057
2021-10-31    0.167276
Freq: D, dtype: float64

In [39]:
longerseries['2021-10-30':]

2021-10-30    0.924057
2021-10-31    0.167276
2021-11-01    0.135411
2021-11-02    0.693408
2021-11-03   -0.022707
                ...   
2022-01-04   -1.071086
2022-01-05   -0.755306
2022-01-06    0.318058
2022-01-07    1.616983
2022-01-08    0.271103
Freq: D, Length: 71, dtype: float64

In [40]:
longerseries[datetime(2021, 10, 17):]

2021-10-17    0.328996
2021-10-18    0.632272
2021-10-19    0.558433
2021-10-20    0.699429
2021-10-21   -0.238702
                ...   
2022-01-04   -1.071086
2022-01-05   -0.755306
2022-01-06    0.318058
2022-01-07    1.616983
2022-01-08    0.271103
Freq: D, Length: 84, dtype: float64

In [41]:
len(longerseries)

100

In [42]:
longerseries.truncate(after='11/01/2021')

2021-10-01   -1.794590
2021-10-02   -1.747455
2021-10-03    0.040517
2021-10-04   -0.470010
2021-10-05    0.107898
2021-10-06   -0.614794
2021-10-07    1.159246
2021-10-08   -1.092107
2021-10-09    0.586668
2021-10-10   -0.210097
2021-10-11    0.666187
2021-10-12    0.687940
2021-10-13    0.536408
2021-10-14   -0.642232
2021-10-15   -1.932763
2021-10-16    0.183778
2021-10-17    0.328996
2021-10-18    0.632272
2021-10-19    0.558433
2021-10-20    0.699429
2021-10-21   -0.238702
2021-10-22   -1.588229
2021-10-23   -0.799408
2021-10-24    0.068958
2021-10-25   -0.704994
2021-10-26    0.875011
2021-10-27   -1.856802
2021-10-28    0.368878
2021-10-29    0.004835
2021-10-30    0.924057
2021-10-31    0.167276
2021-11-01    0.135411
Freq: D, dtype: float64

In [43]:
longerseries.truncate(before='11/01/2021')

2021-11-01    0.135411
2021-11-02    0.693408
2021-11-03   -0.022707
2021-11-04   -0.663868
2021-11-05    0.932833
                ...   
2022-01-04   -1.071086
2022-01-05   -0.755306
2022-01-06    0.318058
2022-01-07    1.616983
2022-01-08    0.271103
Freq: D, Length: 69, dtype: float64

In [44]:
longerseries.truncate(after='11/15/2021',before='11/01/2021')

2021-11-01    0.135411
2021-11-02    0.693408
2021-11-03   -0.022707
2021-11-04   -0.663868
2021-11-05    0.932833
2021-11-06   -0.776645
2021-11-07    0.798507
2021-11-08   -0.009898
2021-11-09   -1.288432
2021-11-10   -0.679322
2021-11-11   -1.409580
2021-11-12    2.527516
2021-11-13    0.288334
2021-11-14   -0.282935
2021-11-15   -0.587119
Freq: D, dtype: float64

# creating a timestamped DF

## freq: step

In [13]:
dates = pd.date_range('22/10/2022', periods=40, freq='W-WED')

In [14]:
dates

DatetimeIndex(['2022-10-26', '2022-11-02', '2022-11-09', '2022-11-16',
               '2022-11-23', '2022-11-30', '2022-12-07', '2022-12-14',
               '2022-12-21', '2022-12-28', '2023-01-04', '2023-01-11',
               '2023-01-18', '2023-01-25', '2023-02-01', '2023-02-08',
               '2023-02-15', '2023-02-22', '2023-03-01', '2023-03-08',
               '2023-03-15', '2023-03-22', '2023-03-29', '2023-04-05',
               '2023-04-12', '2023-04-19', '2023-04-26', '2023-05-03',
               '2023-05-10', '2023-05-17', '2023-05-24', '2023-05-31',
               '2023-06-07', '2023-06-14', '2023-06-21', '2023-06-28',
               '2023-07-05', '2023-07-12', '2023-07-19', '2023-07-26'],
              dtype='datetime64[ns]', freq='W-WED')

In [5]:
dates = pd.date_range('19/10/2022', periods=40, freq='W-mon')

In [6]:
dates

DatetimeIndex(['2022-10-24', '2022-10-31', '2022-11-07', '2022-11-14',
               '2022-11-21', '2022-11-28', '2022-12-05', '2022-12-12',
               '2022-12-19', '2022-12-26', '2023-01-02', '2023-01-09',
               '2023-01-16', '2023-01-23', '2023-01-30', '2023-02-06',
               '2023-02-13', '2023-02-20', '2023-02-27', '2023-03-06',
               '2023-03-13', '2023-03-20', '2023-03-27', '2023-04-03',
               '2023-04-10', '2023-04-17', '2023-04-24', '2023-05-01',
               '2023-05-08', '2023-05-15', '2023-05-22', '2023-05-29',
               '2023-06-05', '2023-06-12', '2023-06-19', '2023-06-26',
               '2023-07-03', '2023-07-10', '2023-07-17', '2023-07-24'],
              dtype='datetime64[ns]', freq='W-MON')

In [15]:

long_df = pd.DataFrame(np.random.randn(40, 4),
                       index=dates,
                       columns=['Colorado', 'Texas',
                                'New York', 'Ohio'])
long_df

Unnamed: 0,Colorado,Texas,New York,Ohio
2022-10-26,-0.621967,3.148048,1.474458,0.555478
2022-11-02,-0.443369,-0.845934,0.111496,1.077539
2022-11-09,-0.05531,1.201419,-0.512756,-0.496965
2022-11-16,-1.387125,0.22311,-1.557064,-1.922861
2022-11-23,0.245437,-0.714604,-0.613313,-0.543063
2022-11-30,0.61647,-0.458057,0.90992,-0.060158
2022-12-07,1.681563,0.722299,-0.319287,-0.889556
2022-12-14,0.550959,-0.898689,1.865912,1.133899
2022-12-21,0.418629,-1.363252,0.577185,-0.053065
2022-12-28,1.747177,0.966339,0.846287,-1.149911


In [11]:
long_df.loc['12-2022']

Unnamed: 0,Colorado,Texas,New York,Ohio
2022-12-07,-1.068217,-1.265754,-0.396535,-0.852878
2022-12-14,-0.748791,-1.511717,0.852222,0.450033
2022-12-21,-2.406444,-1.578876,0.234502,-2.278881
2022-12-28,0.317278,0.468394,0.164861,-1.714436


### Time Series with Duplicate Indices

In [19]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000',
                          '3/2/2000', '1/3/2000'])
dup_ts = pd.Series(np.arange(5), index=dates)
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-03-02    3
2000-01-03    4
dtype: int32

In [17]:
dup_ts.index.is_unique

False

In [20]:
grouped = dup_ts.groupby(level=0)
grouped.mean()


2000-01-01    0.0
2000-01-02    1.5
2000-01-03    4.0
2000-03-02    3.0
dtype: float64

In [21]:
grouped.count()

2000-01-01    1
2000-01-02    2
2000-01-03    1
2000-03-02    1
dtype: int64

In [22]:
grouped.std()

2000-01-01         NaN
2000-01-02    0.707107
2000-01-03         NaN
2000-03-02         NaN
dtype: float64

## Date Ranges, Frequencies, and Shifting

In [23]:
timeseries1

2021-10-01     2
2021-10-02     4
2021-10-03     6
2021-10-04     8
2021-10-05    10
2021-10-06    12
2021-10-07    14
2021-10-08    16
2021-10-09    18
2021-10-10    20
2021-10-11    22
2021-10-12    24
2021-10-13    26
2021-10-14    28
dtype: int32

In [24]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-03-02    3
2000-01-03    4
dtype: int32

## resampler: Convenient method for frequency conversion and resampling of time series. returns a group like object. index values which are not in index are assigned 0 in aggregate functions
* Making a group object of indices

In [25]:
dup_ts.resample('D').sum()

2000-01-01    0
2000-01-02    3
2000-01-03    4
2000-01-04    0
2000-01-05    0
             ..
2000-02-27    0
2000-02-28    0
2000-02-29    0
2000-03-01    0
2000-03-02    3
Freq: D, Length: 62, dtype: int32

In [27]:
dup_ts.resample('M').count()

2000-01-31    4
2000-02-29    0
2000-03-31    1
Freq: M, dtype: int64

In [30]:
resampler = dup_ts.resample('D')
resampler.indices

defaultdict(list,
            {Timestamp('2000-01-01 00:00:00', freq='D'): [0],
             Timestamp('2000-01-02 00:00:00', freq='D'): [1, 2],
             Timestamp('2000-01-03 00:00:00', freq='D'): [3],
             Timestamp('2000-03-02 00:00:00', freq='D'): [4]})

In [31]:
type(resampler)

pandas.core.resample.DatetimeIndexResampler

In [31]:
for i,e in resampler:
    print(i,e)

2000-01-01 00:00:00 2000-01-01    0
dtype: int32
2000-01-02 00:00:00 2000-01-02    1
2000-01-02    2
dtype: int32
2000-01-03 00:00:00 2000-01-03    4
dtype: int32
2000-01-04 00:00:00 Series([], dtype: int32)
2000-01-05 00:00:00 Series([], dtype: int32)
2000-01-06 00:00:00 Series([], dtype: int32)
2000-01-07 00:00:00 Series([], dtype: int32)
2000-01-08 00:00:00 Series([], dtype: int32)
2000-01-09 00:00:00 Series([], dtype: int32)
2000-01-10 00:00:00 Series([], dtype: int32)
2000-01-11 00:00:00 Series([], dtype: int32)
2000-01-12 00:00:00 Series([], dtype: int32)
2000-01-13 00:00:00 Series([], dtype: int32)
2000-01-14 00:00:00 Series([], dtype: int32)
2000-01-15 00:00:00 Series([], dtype: int32)
2000-01-16 00:00:00 Series([], dtype: int32)
2000-01-17 00:00:00 Series([], dtype: int32)
2000-01-18 00:00:00 Series([], dtype: int32)
2000-01-19 00:00:00 Series([], dtype: int32)
2000-01-20 00:00:00 Series([], dtype: int32)
2000-01-21 00:00:00 Series([], dtype: int32)
2000-01-22 00:00:00 Series(

In [34]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-03-02    3
2000-01-06    5
dtype: int64

In [32]:
dup_ts.drop(labels=pd.to_datetime('2000-01-03'),inplace=True)

In [33]:
dup_ts[pd.to_datetime('2000-01-06')]=5

In [35]:
I= dup_ts.resample('Y')
I.indices

defaultdict(list,
            {Timestamp('2000-12-31 00:00:00', freq='A-DEC'): [0, 1, 2, 3, 4]})

## gps of every 3days

In [36]:
I= dup_ts.resample('3D')
I.indices

defaultdict(list,
            {Timestamp('2000-01-01 00:00:00', freq='3D'): [0, 1, 2],
             Timestamp('2000-01-04 00:00:00', freq='3D'): [3],
             Timestamp('2000-03-01 00:00:00', freq='3D'): [4]})

In [37]:
I.sum()

2000-01-01    3
2000-01-04    5
2000-01-07    0
2000-01-10    0
2000-01-13    0
2000-01-16    0
2000-01-19    0
2000-01-22    0
2000-01-25    0
2000-01-28    0
2000-01-31    0
2000-02-03    0
2000-02-06    0
2000-02-09    0
2000-02-12    0
2000-02-15    0
2000-02-18    0
2000-02-21    0
2000-02-24    0
2000-02-27    0
2000-03-01    3
Freq: 3D, dtype: int64

In [48]:
dup_ts[pd.to_datetime('2000-02-01')]=15

In [49]:
resampler = dup_ts.resample('M')

In [50]:
resampler.sum()

2000-01-31     8
2000-02-29    15
2000-03-31     3
Freq: M, dtype: int64

### Generating Date Ranges. See table 11.4 for more options

In [51]:
pd.date_range(start='2012-04-01', periods=20)


DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20'],
              dtype='datetime64[ns]', freq='D')

In [52]:
pd.date_range(end='2012-06-01', periods=20,freq='3D')

DatetimeIndex(['2012-04-05', '2012-04-08', '2012-04-11', '2012-04-14',
               '2012-04-17', '2012-04-20', '2012-04-23', '2012-04-26',
               '2012-04-29', '2012-05-02', '2012-05-05', '2012-05-08',
               '2012-05-11', '2012-05-14', '2012-05-17', '2012-05-20',
               '2012-05-23', '2012-05-26', '2012-05-29', '2012-06-01'],
              dtype='datetime64[ns]', freq='3D')

In [53]:
pd.date_range(start='2012-06-01', periods=20,freq='3h')

DatetimeIndex(['2012-06-01 00:00:00', '2012-06-01 03:00:00',
               '2012-06-01 06:00:00', '2012-06-01 09:00:00',
               '2012-06-01 12:00:00', '2012-06-01 15:00:00',
               '2012-06-01 18:00:00', '2012-06-01 21:00:00',
               '2012-06-02 00:00:00', '2012-06-02 03:00:00',
               '2012-06-02 06:00:00', '2012-06-02 09:00:00',
               '2012-06-02 12:00:00', '2012-06-02 15:00:00',
               '2012-06-02 18:00:00', '2012-06-02 21:00:00',
               '2012-06-03 00:00:00', '2012-06-03 03:00:00',
               '2012-06-03 06:00:00', '2012-06-03 09:00:00'],
              dtype='datetime64[ns]', freq='3H')

## Y: year end

In [54]:
pd.date_range(start='2012-06-01', periods=20,freq='Y')

DatetimeIndex(['2012-12-31', '2013-12-31', '2014-12-31', '2015-12-31',
               '2016-12-31', '2017-12-31', '2018-12-31', '2019-12-31',
               '2020-12-31', '2021-12-31', '2022-12-31', '2023-12-31',
               '2024-12-31', '2025-12-31', '2026-12-31', '2027-12-31',
               '2028-12-31', '2029-12-31', '2030-12-31', '2031-12-31'],
              dtype='datetime64[ns]', freq='A-DEC')

In [55]:
pd.date_range(start='2012-06-01', periods=20,freq='Q-Jan')

DatetimeIndex(['2012-07-31', '2012-10-31', '2013-01-31', '2013-04-30',
               '2013-07-31', '2013-10-31', '2014-01-31', '2014-04-30',
               '2014-07-31', '2014-10-31', '2015-01-31', '2015-04-30',
               '2015-07-31', '2015-10-31', '2016-01-31', '2016-04-30',
               '2016-07-31', '2016-10-31', '2017-01-31', '2017-04-30'],
              dtype='datetime64[ns]', freq='Q-JAN')

## Series with datetime indices

In [188]:
pd.date_range('2021-10-07 12:56:31', periods=6)

DatetimeIndex(['2021-10-07 12:56:31', '2021-10-08 12:56:31',
               '2021-10-09 12:56:31', '2021-10-10 12:56:31',
               '2021-10-11 12:56:31', '2021-10-12 12:56:31'],
              dtype='datetime64[ns]', freq='D')

## to generate a set of timestamps normalized to midnight as a convention. So set normalize=True

In [38]:
pd.date_range('2021-10-07 12:56:31', periods=6,normalize=True)

DatetimeIndex(['2021-10-07', '2021-10-08', '2021-10-09', '2021-10-10',
               '2021-10-11', '2021-10-12'],
              dtype='datetime64[ns]', freq='D')

### Frequencies and Date Offsets. To define  a object as user-specified offset '3hours'

In [39]:
from pandas.tseries.offsets import Hour, Minute
hour = Hour()
hour

<Hour>

In [40]:
hour

<Hour>

In [41]:
thrhours=Hour(3)

In [192]:
thrhours

<3 * Hours>

#alternatively used the defined variable

In [42]:
threehours=hour*3

In [43]:
threehours

<3 * Hours>

In [44]:
pd.date_range('2000-01-01', '2000-01-03 23:59', freq=threehours)

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 03:00:00',
               '2000-01-01 06:00:00', '2000-01-01 09:00:00',
               '2000-01-01 12:00:00', '2000-01-01 15:00:00',
               '2000-01-01 18:00:00', '2000-01-01 21:00:00',
               '2000-01-02 00:00:00', '2000-01-02 03:00:00',
               '2000-01-02 06:00:00', '2000-01-02 09:00:00',
               '2000-01-02 12:00:00', '2000-01-02 15:00:00',
               '2000-01-02 18:00:00', '2000-01-02 21:00:00',
               '2000-01-03 00:00:00', '2000-01-03 03:00:00',
               '2000-01-03 06:00:00', '2000-01-03 09:00:00',
               '2000-01-03 12:00:00', '2000-01-03 15:00:00',
               '2000-01-03 18:00:00', '2000-01-03 21:00:00'],
              dtype='datetime64[ns]', freq='3H')

#more specific combination for offset

In [45]:
off=Hour(2) + Minute(30)

In [46]:
pd.date_range('2021-01-01', '2021-01-03 23:59', freq=threehours)

DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 03:00:00',
               '2021-01-01 06:00:00', '2021-01-01 09:00:00',
               '2021-01-01 12:00:00', '2021-01-01 15:00:00',
               '2021-01-01 18:00:00', '2021-01-01 21:00:00',
               '2021-01-02 00:00:00', '2021-01-02 03:00:00',
               '2021-01-02 06:00:00', '2021-01-02 09:00:00',
               '2021-01-02 12:00:00', '2021-01-02 15:00:00',
               '2021-01-02 18:00:00', '2021-01-02 21:00:00',
               '2021-01-03 00:00:00', '2021-01-03 03:00:00',
               '2021-01-03 06:00:00', '2021-01-03 09:00:00',
               '2021-01-03 12:00:00', '2021-01-03 15:00:00',
               '2021-01-03 18:00:00', '2021-01-03 21:00:00'],
              dtype='datetime64[ns]', freq='3H')

In [47]:
pd.date_range('2021-01-01', '2021-03-03 23:59', freq='WOM-3Fri')

DatetimeIndex(['2021-01-15', '2021-02-19'], dtype='datetime64[ns]', freq='WOM-3FRI')

### Shifting (Leading and Lagging) Data

In [48]:
timeseries1

2021-10-01     2
2021-10-02     4
2021-10-03     6
2021-10-04     8
2021-10-05    10
2021-10-06    12
2021-10-07    14
2021-10-08    16
2021-10-09    18
2021-10-10    20
2021-10-11    22
2021-10-12    24
2021-10-13    26
2021-10-14    28
dtype: int32

## specified number of items in shift will be discarded from the end and Nan in the begining

In [49]:
timeseries1.shift(2)

2021-10-01     NaN
2021-10-02     NaN
2021-10-03     2.0
2021-10-04     4.0
2021-10-05     6.0
2021-10-06     8.0
2021-10-07    10.0
2021-10-08    12.0
2021-10-09    14.0
2021-10-10    16.0
2021-10-11    18.0
2021-10-12    20.0
2021-10-13    22.0
2021-10-14    24.0
dtype: float64

## shifting from from end

In [50]:
timeseries1.shift(-2)

2021-10-01     6.0
2021-10-02     8.0
2021-10-03    10.0
2021-10-04    12.0
2021-10-05    14.0
2021-10-06    16.0
2021-10-07    18.0
2021-10-08    20.0
2021-10-09    22.0
2021-10-10    24.0
2021-10-11    26.0
2021-10-12    28.0
2021-10-13     NaN
2021-10-14     NaN
dtype: float64

## simple shift just changes the data not the dateindex. If need to change data index then use freq

In [56]:
timeseries1

2021-10-01     2
2021-10-02     4
2021-10-03     6
2021-10-04     8
2021-10-05    10
2021-10-06    12
2021-10-07    14
2021-10-08    16
2021-10-09    18
2021-10-10    20
2021-10-11    22
2021-10-12    24
2021-10-13    26
2021-10-14    28
dtype: int32

## initial point (number * freq)

In [52]:
timeseries1.shift(2,freq='2D').head(5)

2021-10-05     2
2021-10-06     4
2021-10-07     6
2021-10-08     8
2021-10-09    10
dtype: int32

In [68]:
timeseries1-timeseries1.shift(2)

2021-10-01    NaN
2021-10-02    NaN
2021-10-03    4.0
2021-10-04    4.0
2021-10-05    4.0
2021-10-06    4.0
2021-10-07    4.0
2021-10-08    4.0
2021-10-09    4.0
2021-10-10    4.0
2021-10-11    4.0
2021-10-12    4.0
2021-10-13    4.0
2021-10-14    4.0
dtype: float64

# changing dates with date and time offset

In [53]:
from pandas.tseries.offsets import Day, MonthEnd, BMonthBegin, BMonthEnd
now = datetime(2021, 10, 7)
now + 3 * Day()

Timestamp('2021-10-10 00:00:00')

In [54]:
now + 1 * BMonthEnd() # Business month end/beging

Timestamp('2021-10-29 00:00:00')

In [72]:
now + 1 * MonthEnd()

Timestamp('2021-10-31 00:00:00')

In [73]:
now + 1 *BMonthBegin()

Timestamp('2021-11-01 00:00:00')

In [76]:
now + 1 *BMonthEnd()

Timestamp('2021-10-29 00:00:00')

In [208]:
x=now + 1 *BMonthEnd()
type(x)

pandas._libs.tslibs.timestamps.Timestamp

In [209]:
x.day

29

In [210]:
x.weekday()

4

In [211]:
x.day_name()

'Friday'

## using offset with timestamp data

In [69]:
offset = MonthEnd()
offset*=2


NameError: name 'MonthEnd' is not defined

In [217]:
now+offset

Timestamp('2021-11-30 00:00:00')

In [218]:
offset.rollforward(now)

Timestamp('2021-10-31 00:00:00')

In [215]:
offset.rollback(now)

Timestamp('2021-09-30 00:00:00')

# use of rollforward in group by

In [219]:
ts = pd.Series(np.arange(10)*100,
               index=pd.date_range('1/15/2000', periods=10, freq='4d'))
ts


2000-01-15      0
2000-01-19    100
2000-01-23    200
2000-01-27    300
2000-01-31    400
2000-02-04    500
2000-02-08    600
2000-02-12    700
2000-02-16    800
2000-02-20    900
Freq: 4D, dtype: int32

## get mean of each month income as the monthend

In [220]:
ts.resample('M').mean()

2000-01-31    200
2000-02-29    700
Freq: M, dtype: int32

## using rollforward

In [221]:
ts.groupby(offset.rollforward).mean()

2000-01-31    200
2000-02-29    700
dtype: int32

In [156]:
ts

2000-01-15      0
2000-01-19    100
2000-01-23    200
2000-01-27    300
2000-01-31    400
2000-02-04    500
2000-02-08    600
2000-02-12    700
2000-02-16    800
2000-02-20    900
Freq: 4D, dtype: int32

## fall back to last month end

In [222]:
ts.groupby(offset.rollback).sum()

1999-12-31     600
2000-01-31    3900
dtype: int32