In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 


---||
# Time Series

These represent data with time component factored in. That is, a given (set) of data is measured at certain intervals (regular or not) in time.

Category of time-series:

- fixed frequency (or regular): data points occur at regular time intervals

- irregular: no fixed unit of time.

Time series data are recorded based on the different kind of application using different **time measurement** as follows:

- timestamps: specific instants in time

- fixed period: 

- intervals of time: starting and ending timestamps.

---
---

### Date and Time Data Types

Python data and time library: 

- **datetime**,

- **time**

- **calendar**

In [2]:
# import datetime library 

from datetime import datetime

In [4]:
# get the present date, time (hour, minutes, and seconds)

now = datetime.now()
now

datetime.datetime(2024, 9, 29, 19, 13, 16, 420003)

In [38]:
# get the year component -> year, month and day

now.date()

datetime.date(2024, 9, 29)

In [5]:
# get the year

now.year

2024

In [6]:
# get the month

now.month

9

In [37]:
# get the time component - hour, minutes, second and microseconds

now.time()

datetime.time(19, 13, 16, 420003)

In [7]:
# get the day

now.day

29

In [8]:
# get the hour

now.hour

19

In [9]:
# get the minutes

now.minute

13

In [10]:
# get the seconds

now.second

16


---||
#### Time-Delta

This is a measure of the time difference between two **datetime** objects.

In [12]:
# get the new_time

new_now = datetime.now()


In [13]:
# note; still same year, month and day; main difference is the hour, minutes and sec

new_now.hour, new_now.minute, new_now.second

(19, 19, 7)

In [15]:
# get the time-delta between previous now and the new now

# still same hour
new_now - now

datetime.timedelta(seconds=350, microseconds=776828)

In [29]:
# get timedelta between different datetime object

delta = datetime(2024, 1, 20, 16, 43) - datetime(2015, 9, 29)

delta

datetime.timedelta(days=3035, seconds=60180)

In [31]:
# days elapsed

delta.days

3035

In [30]:
# seconds elapsed

delta.seconds

60180

timedelta can be added (subtracted) to (from) a datetime object

In [32]:
# working with the timedelta object directly

from datetime import timedelta

start = datetime(2017, 11, 13)

In [33]:
delta = timedelta(days=12, minutes=32, seconds=1234)

delta

datetime.timedelta(days=12, seconds=3154)

In [34]:
start + delta

datetime.datetime(2017, 11, 25, 0, 52, 34)

converting datetime to a string object and vice-versa using

- converting to string using the **str** function

- using **strftime** to change the format of the datetime object

In [5]:
stamp = datetime(2013, 12, 30)

In [6]:
# convert to a string representation

str(stamp)

'2013-12-30 00:00:00'

Time Format:


- **%D**: Date
- **%d**: Day

- **%M**: Minutes
- **%m**: Month

- **%Y**: Four digit year (e.g. '2013')
- **%y**: Two-digit of year (e.g. '13)

- **%H**: 24-hour format
- **%I**: 12-hour format


- **%S**: Second
- **%w**: weekday [ (0 -> Sunday) to (6 -> Saturday) ]

> Any combination of this may be used to format a datetime object; the seperator should be specified if necessary

In [7]:
# change the format from (year, month, day, time) -> (day, month, year)

stamp.strftime('%d-%m-%y')

'30-12-13'

In [8]:
stamp.strftime("%m-%y")

'12-13'

**strptime** is used for parsing date with a *known* format

In [3]:
from dateutil.parser import parse

In [4]:
parse('2011-01-03')

datetime.datetime(2011, 1, 3, 0, 0)

In [10]:
parse('Sept 30, 2024 11:01 AM')

datetime.datetime(2024, 9, 30, 11, 1)

Pandas **to_datetime** function is used to convert a datetime-like object to a pandas object -> DatetimeIndex

In [13]:
data = [parse('May 25, 2013 9 PM'), parse('Aug 20, 2024, 5 AM')]

data

[datetime.datetime(2013, 5, 25, 21, 0), datetime.datetime(2024, 8, 20, 5, 0)]

In [14]:
pd.to_datetime(data)

DatetimeIndex(['2013-05-25 21:00:00', '2024-08-20 05:00:00'], dtype='datetime64[ns]', freq=None)

A datetime object missing value is represented as **NaT (not a time)**

---
---||
### Time Series Basics

In [10]:
from datetime import datetime

In [11]:
# get a time series indexed by timestamps

dates = [
  datetime(2024, 1, 3),
  datetime(2024, 2, 4),
  datetime(2024, 3, 6),
  datetime(2024, 4, 9),
  datetime(2024, 5, 11),
  datetime(2024, 6, 12),
  datetime(2024, 7, 17),
  datetime(2024, 8, 20),
  datetime(2024, 9, 26),
  datetime(2024, 10, 30),
]

ts = pd.Series(np.random.randn(10), index=dates)

ts

2024-01-03    1.156774
2024-02-04   -1.189575
2024-03-06   -0.897435
2024-04-09    1.123229
2024-05-11   -1.369784
2024-06-12    0.187728
2024-07-17   -1.092136
2024-08-20    0.626905
2024-09-26   -1.585301
2024-10-30   -0.065751
dtype: float64

In [18]:
# check the index ts

ts.index

DatetimeIndex(['2024-01-03', '2024-02-04', '2024-03-06', '2024-04-09',
               '2024-05-11', '2024-06-12', '2024-07-17', '2024-08-20',
               '2024-09-26', '2024-10-30'],
              dtype='datetime64[ns]', freq=None)

In [19]:
# datatype of ts

ts.index.dtype

dtype('<M8[ns]')

In [20]:
# get a timestamp object

stamp = ts.index[3]

stamp

Timestamp('2024-04-09 00:00:00')

In [29]:
ts['7/2024']

2024-07-17    1.106071
dtype: float64

**date_range** function, takes a starting datetime and the number of periods to add so as to generate a list

In [34]:
index = pd.date_range('05-06-2010', end=None, periods=500)

In [35]:
ts2 = pd.Series(np.random.randn(len(index)), index=index)

ts2

2010-05-06   -0.486220
2010-05-07    1.900111
2010-05-08   -0.140583
2010-05-09    0.058644
2010-05-10   -0.010430
                ...   
2011-09-13    0.911424
2011-09-14   -0.450865
2011-09-15   -0.207405
2011-09-16   -0.057241
2011-09-17    1.201797
Freq: D, Length: 500, dtype: float64

In [36]:
ts2['2011']

2011-01-01    0.312138
2011-01-02    0.291174
2011-01-03   -1.853702
2011-01-04   -0.542990
2011-01-05    0.022637
                ...   
2011-09-13    0.911424
2011-09-14   -0.450865
2011-09-15   -0.207405
2011-09-16   -0.057241
2011-09-17    1.201797
Freq: D, Length: 260, dtype: float64

In [37]:
ts2['2010-05']

2010-05-06   -0.486220
2010-05-07    1.900111
2010-05-08   -0.140583
2010-05-09    0.058644
2010-05-10   -0.010430
2010-05-11    0.552440
2010-05-12   -0.906776
2010-05-13    0.223298
2010-05-14   -0.080754
2010-05-15   -0.612495
2010-05-16    0.024339
2010-05-17   -1.405387
2010-05-18   -0.113842
2010-05-19    0.266988
2010-05-20   -0.111341
2010-05-21    0.742994
2010-05-22    0.131965
2010-05-23    0.469956
2010-05-24   -1.561097
2010-05-25    0.903390
2010-05-26   -0.900696
2010-05-27   -0.559421
2010-05-28    0.232470
2010-05-29    0.831692
2010-05-30    2.187910
2010-05-31    1.209643
Freq: D, dtype: float64

Range enquiry may start and end with values that may (or may not) belong in the series 

In [39]:
# slices

# note the starting value is not in the series
ts2['2010-01-01': '2010-06-03']

2010-05-06   -0.486220
2010-05-07    1.900111
2010-05-08   -0.140583
2010-05-09    0.058644
2010-05-10   -0.010430
2010-05-11    0.552440
2010-05-12   -0.906776
2010-05-13    0.223298
2010-05-14   -0.080754
2010-05-15   -0.612495
2010-05-16    0.024339
2010-05-17   -1.405387
2010-05-18   -0.113842
2010-05-19    0.266988
2010-05-20   -0.111341
2010-05-21    0.742994
2010-05-22    0.131965
2010-05-23    0.469956
2010-05-24   -1.561097
2010-05-25    0.903390
2010-05-26   -0.900696
2010-05-27   -0.559421
2010-05-28    0.232470
2010-05-29    0.831692
2010-05-30    2.187910
2010-05-31    1.209643
2010-06-01   -0.334824
2010-06-02    0.757810
2010-06-03    0.338071
Freq: D, dtype: float64

**truncate** allows slices the data **before** and/or **after** a given value(s)

In [40]:
ts.head()

2024-01-03   -2.030289
2024-02-04   -0.033228
2024-03-06   -0.337515
2024-04-09    1.887770
2024-05-11    0.484812
dtype: float64

In [48]:
# truncate all the data points before the specified value

ts.truncate(before='2024-04')

2024-04-09    1.887770
2024-05-11    0.484812
2024-06-12    1.093502
2024-07-17    1.106071
2024-08-20    0.274829
2024-09-26    1.213179
2024-10-30    0.138284
dtype: float64

In [49]:
# truncate all the data points after the specified value

ts.truncate( after='2024-09')

2024-01-03   -2.030289
2024-02-04   -0.033228
2024-03-06   -0.337515
2024-04-09    1.887770
2024-05-11    0.484812
2024-06-12    1.093502
2024-07-17    1.106071
2024-08-20    0.274829
dtype: float64

In [50]:
# similar to slicing between two values

ts.truncate(before='2024-04', after='2024-09')

2024-04-09    1.887770
2024-05-11    0.484812
2024-06-12    1.093502
2024-07-17    1.106071
2024-08-20    0.274829
dtype: float64

In [30]:
ts = pd.DataFrame(
  data=np.random.randn(600).reshape(100, 6),
  index=pd.date_range(
    start='2024-09-12', 
    periods=100, 
    freq='2M' # every four(4) months from the starting data
  ),
  columns=[list('ABCDEF')]
)

ts.head()

Unnamed: 0,A,B,C,D,E,F
2024-09-30,-0.053654,-0.898007,-0.352058,-0.43245,0.519633,1.452864
2024-11-30,0.490116,0.746653,-0.254472,-0.616039,-1.662282,1.460309
2025-01-31,-0.986334,-0.413577,0.202971,0.638971,1.078456,0.2023
2025-03-31,0.022737,0.339843,-0.334861,0.597315,-1.75384,0.181129
2025-05-31,0.909638,-0.889156,0.560633,0.429995,0.254626,-1.225863


In [31]:
# select data indexed by year 2026

ts.loc['2025']

Unnamed: 0,A,B,C,D,E,F
2025-01-31,-0.986334,-0.413577,0.202971,0.638971,1.078456,0.2023
2025-03-31,0.022737,0.339843,-0.334861,0.597315,-1.75384,0.181129
2025-05-31,0.909638,-0.889156,0.560633,0.429995,0.254626,-1.225863
2025-07-31,0.782683,0.401639,0.742606,-0.387254,1.130218,0.058935
2025-09-30,-1.532775,-0.298403,-1.39978,-0.737208,0.466735,-1.576423
2025-11-30,0.975054,-0.381182,-0.235731,-1.402067,-0.457854,-0.835175


In [33]:
# select data after 2025-05, and before 2025-10

ts.truncate(before='2025-05', after='2025-10')

Unnamed: 0,A,B,C,D,E,F
2025-05-31,0.909638,-0.889156,0.560633,0.429995,0.254626,-1.225863
2025-07-31,0.782683,0.401639,0.742606,-0.387254,1.130218,0.058935
2025-09-30,-1.532775,-0.298403,-1.39978,-0.737208,0.466735,-1.576423
