In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 


---||
# Time Series

These represent data with time component factored in. That is, a given (set) of data is measured at certain intervals (regular or not) in time.

Category of time-series:

- fixed frequency (or regular): data points occur at regular time intervals

- irregular: no fixed unit of time.

Time series data are recorded based on the different kind of application using different **time measurement** as follows:

- timestamps: specific instants in time

- fixed period: 

- intervals of time: starting and ending timestamps.

---
---

### Date and Time Data Types

Python data and time library: 

- **datetime**,

- **time**

- **calendar**

In [2]:
# import datetime library 

from datetime import datetime

In [None]:
# get the present date, time (hour, minutes, and seconds)

now = datetime.now()
now

In [None]:
# get the year component -> year, month and day

now.date()

In [None]:
# get the year

now.year

In [None]:
# get the month

now.month

In [None]:
# get the time component - hour, minutes, second and microseconds

now.time()

In [None]:
# get the day

now.day

In [None]:
# get the hour

now.hour

In [None]:
# get the minutes

now.minute

In [None]:
# get the seconds

now.second


---||
#### Time-Delta

This is a measure of the time difference between two **datetime** objects.

In [12]:
# get the new_time

new_now = datetime.now()


In [None]:
# note; still same year, month and day; main difference is the hour, minutes and sec

new_now.hour, new_now.minute, new_now.second

In [None]:
# get the time-delta between previous now and the new now

# still same hour
new_now - now

In [None]:
# get timedelta between different datetime object

delta = datetime(2024, 1, 20, 16, 43) - datetime(2015, 9, 29)

delta

In [None]:
# days elapsed

delta.days

In [None]:
# seconds elapsed

delta.seconds

timedelta can be added (subtracted) to (from) a datetime object

In [32]:
# working with the timedelta object directly

from datetime import timedelta

start = datetime(2017, 11, 13)

In [None]:
delta = timedelta(days=12, minutes=32, seconds=1234)

delta

In [None]:
start + delta

converting datetime to a string object and vice-versa using

- converting to string using the **str** function

- using **strftime** to change the format of the datetime object

In [5]:
stamp = datetime(2013, 12, 30)

In [None]:
# convert to a string representation

str(stamp)

Time Format:


- **%D**: Date
- **%d**: Day

- **%M**: Minutes
- **%m**: Month

- **%Y**: Four digit year (e.g. '2013')
- **%y**: Two-digit of year (e.g. '13)

- **%H**: 24-hour format
- **%I**: 12-hour format


- **%S**: Second
- **%w**: weekday [ (0 -> Sunday) to (6 -> Saturday) ]

> Any combination of this may be used to format a datetime object; the seperator should be specified if necessary

In [None]:
# change the format from (year, month, day, time) -> (day, month, year)

stamp.strftime('%d-%m-%y')

In [None]:
stamp.strftime("%m-%y")

**strptime** is used for parsing date with a *known* format

In [3]:
from dateutil.parser import parse

In [None]:
parse('2011-01-03')

In [None]:
parse('Sept 30, 2024 11:01 AM')

Pandas **to_datetime** function is used to convert a datetime-like object to a pandas object -> DatetimeIndex

In [None]:
data = [parse('May 25, 2013 9 PM'), parse('Aug 20, 2024, 5 AM')]

data

In [None]:
pd.to_datetime(data)

A datetime object missing value is represented as **NaT (not a time)**

---
---||
### Time Series Basics

In [10]:
from datetime import datetime

In [None]:
# get a time series indexed by timestamps

dates = [
  datetime(2024, 1, 3),
  datetime(2024, 2, 4),
  datetime(2024, 3, 6),
  datetime(2024, 4, 9),
  datetime(2024, 5, 11),
  datetime(2024, 6, 12),
  datetime(2024, 7, 17),
  datetime(2024, 8, 20),
  datetime(2024, 9, 26),
  datetime(2024, 10, 30),
]

ts = pd.Series(np.random.randn(10), index=dates)

ts

In [None]:
# check the index ts

ts.index

In [None]:
# datatype of ts

ts.index.dtype

In [None]:
# get a timestamp object

stamp = ts.index[3]

stamp

In [None]:
ts['7/2024']

**date_range** function, takes a starting datetime and the number of periods to add so as to generate a list

In [34]:
index = pd.date_range('05-06-2010', end=None, periods=500)

In [None]:
ts2 = pd.Series(np.random.randn(len(index)), index=index)

ts2

In [None]:
ts2['2011']

In [None]:
ts2['2010-05']

Range enquiry may start and end with values that may (or may not) belong in the series 

In [None]:
# slices

# note the starting value is not in the series
ts2['2010-01-01': '2010-06-03']

**truncate** allows slices the data **before** and/or **after** a given value(s)

In [None]:
ts.head()

In [None]:
# truncate all the data points before the specified value

ts.truncate(before='2024-04')

In [None]:
# truncate all the data points after the specified value

ts.truncate( after='2024-09')

In [None]:
# similar to slicing between two values

ts.truncate(before='2024-04', after='2024-09')

In [None]:
ts = pd.DataFrame(
  data=np.random.randn(600).reshape(100, 6),
  index=pd.date_range(
    start='2024-09-12', 
    periods=100, 
    freq='2M' # every four(4) months from the starting data
  ),
  columns=[list('ABCDEF')]
)

ts.head()

In [None]:
# select data indexed by year 2026

ts.loc['2025']

In [None]:
# select data after 2025-05, and before 2025-10

ts.truncate(before='2025-05', after='2025-10')


---||
#### Date Ranges, Frequencies and Shifting



In [None]:
# date range generation

# the base frequency is daily
index= pd.date_range(start='2022-01-01', end='2024-10-01')

index

In [None]:
# date range generation

# the base frequency is weekly
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='W')

index

In [None]:
# date range generation

# the base frequency is montly
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='M')

index

##### Integer multiple of the frequency can be used.

e.g. 3W ==> every three(3) weeks(W)


In [None]:
# date range generation

# every three weeks
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='3W')

index

In [None]:
# date range generation

# bi-monthly
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='3M')

index

In [None]:
# specifies the periods- indicating the number of date to generate

# the monday of every three-weeks
index = pd.date_range(start='2024-06', periods=22, freq='3W-MON')

index

In [None]:
# last business day of every three months - friday

index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='3BM')

index

In [None]:
# date range generation

# every four hours
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='4H')

index

In [None]:
# date range generation

# every 3 minutes
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='3T')

index

In [None]:
# date range generation

# every 10 seconds
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='10S')

index

In [None]:
# date range generation

# every business day
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='B')

index


---||
##### Offsets

In [None]:
from pandas.tseries.offsets import Hour, BMonthBegin, BusinessHour

bh = BusinessHour(4)

bh

In [None]:
# date range generation

# use the offset directly in the freq
index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='bh')

index

In [None]:
# have a combination of offsets
from pandas.tseries.offsets import Day

hour = Hour(2)
day = Day(2)
offset = day + hour

offset

In [None]:
# use the offset as the frequency

# date range generation

index= pd.date_range(start='2022-01-01', end='2024-10-01', freq=offset)

index

In [None]:
# passing frequency strings

# date range generation

index= pd.date_range(start='2022-01-01', end='2024-10-01', freq='3h20min10s')

index

In [None]:
# every second wednesday of the first week of the month

pd.date_range('2024-05', periods=20, freq='WOM-2WED')


---||
#### Shifting

This refers to moving data backward or forward through time. This is achieved using the **shift** method

In [None]:
ts = pd.DataFrame(
  data=np.random.randn(24).reshape(6, 4),
  index=pd.date_range(
    start='2024-02',
    periods=6,
    freq='3M'
  ),
  columns=list('ABCD')
)

ts

In [None]:
# shift the data forward, without modifying index

ts.shift(4)

In [None]:
# shift the data backward, without modifying index

ts.shift(-4)

In [None]:
# shift both the data and timestamp using the underlying frequency, if known

# shift each time index by two 3-months === six months
ts.shift(2, freq='3M')


---||
### Time Zone Handling

- **UTC**: Coordinated Universal Time
- **GMT**: Greenwich Meridian Time

In [118]:
import pytz

In [None]:
# get the common time zones

pytz.common_timezones

In [120]:
# timezone object

pytz.timezone('Africa/Lagos')

<DstTzInfo 'Africa/Lagos' LMT+0:14:00 STD>


---||
#### Timezone Localization and Conversion



In [130]:
# add a time zone
index1 = pd.date_range(start='2024-04 7:45', periods=5, freq='WOM-3WED', tz='GMT')

# no time zone added
index2 = pd.date_range(start='2024-04 7:45', periods=5, freq='WOM-3WED')

ts_tz = pd.DataFrame(np.random.randn(10).reshape(5, 2), index=index1, columns=['A', 'B'])
ts_ntz = pd.DataFrame(np.random.randn(10).reshape(5, 2), index=index2, columns=['A', 'B'])

ts_tz

Unnamed: 0,A,B
2024-04-17 07:45:00+00:00,-1.514635,0.77608
2024-05-15 07:45:00+00:00,-0.129114,1.364
2024-06-19 07:45:00+00:00,-0.274739,1.503706
2024-07-17 07:45:00+00:00,-1.390302,-0.681389
2024-08-21 07:45:00+00:00,0.059201,0.593623


In [131]:
ts_ntz

Unnamed: 0,A,B
2024-04-17 07:45:00,0.512234,0.271564
2024-05-15 07:45:00,0.377059,-1.190629
2024-06-19 07:45:00,-0.022247,-0.262382
2024-07-17 07:45:00,-0.704565,-1.150591
2024-08-21 07:45:00,-0.543382,0.241842


In [133]:
# get the timezone

ts_tz.index.tz, ts_ntz.index.tz

(<StaticTzInfo 'GMT'>, None)

Localization is the mechanism of adding a timezone to datetime that is timezone naive (i.e. has no timezone)

In [140]:
# add a timezone
ts_ntz.tz_localize('UTC').index

DatetimeIndex(['2024-04-17 07:45:00+00:00', '2024-05-15 07:45:00+00:00',
               '2024-06-19 07:45:00+00:00', '2024-07-17 07:45:00+00:00',
               '2024-08-21 07:45:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='WOM-3WED')

In [139]:
ts_ntz.index

DatetimeIndex(['2024-04-17 07:45:00', '2024-05-15 07:45:00',
               '2024-06-19 07:45:00', '2024-07-17 07:45:00',
               '2024-08-21 07:45:00'],
              dtype='datetime64[ns]', freq='WOM-3WED')

In [141]:
# change from one timezone to another

ts_tz.tz_convert('UTC')

Unnamed: 0,A,B
2024-04-17 07:45:00+00:00,-1.514635,0.77608
2024-05-15 07:45:00+00:00,-0.129114,1.364
2024-06-19 07:45:00+00:00,-0.274739,1.503706
2024-07-17 07:45:00+00:00,-1.390302,-0.681389
2024-08-21 07:45:00+00:00,0.059201,0.593623



---||
#### Periods and Period Arithmetic

