# Loading data for the examples

We use an open Kaggle dataset for the examples in this project. Specifically, we utilize the [Delhi 5-Minute Electricity Demand for Forecasting dataset](https://www.kaggle.com/datasets/yug201/delhi-5-minute-electricity-demand-for-forecasting), which provides high-frequency electricity demand data for Delhi. This dataset is ideal for demonstrating time series analysis, forecasting techniques, and anomaly detection. By applying various modeling approaches, including PCA and machine learning, we explore patterns in electricity consumption and showcase the capabilities of TimeScape in extracting insights from temporal data.

In [1]:
import pandas as pd
import kagglehub

# Download latest version
path = kagglehub.dataset_download("yug201/delhi-5-minute-electricity-demand-for-forecasting")

print("Path to dataset files:", path)

  from .autonotebook import tqdm as notebook_tqdm


Path to dataset files: C:\Users\spg_mlie\.cache\kagglehub\datasets\yug201\delhi-5-minute-electricity-demand-for-forecasting\versions\1


## Dataset
Lets open the CSV file and load it into a dataframe.

In [2]:
data = pd.read_csv(path + "\powerdemand_5min_2021_to_2024_with weather.csv")
data.head()

Unnamed: 0.1,Unnamed: 0,datetime,Power demand,temp,dwpt,rhum,wdir,wspd,pres,year,month,day,hour,minute,moving_avg_3
0,0,2021-01-01 00:30:00,2014.0,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,30,
1,1,2021-01-01 00:35:00,2005.63,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,35,
2,2,2021-01-01 00:40:00,1977.6,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,40,1999.076667
3,3,2021-01-01 00:45:00,1976.44,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,45,1986.556667
4,4,2021-01-01 00:50:00,1954.37,8.0,6.9,93.0,0.0,0.0,1017.0,2021,1,1,0,50,1969.47


The dataset consists of the following columns:

* datetime: Timestamp of the observation
* Power demand: Electricity demand (in kW) recorded every 5 minutes.
* temp: Temperature (°C).
* dwpt: Dew point temperature (°C).
* rhum: Relative humidity (%).
* wdir: Wind direction (degrees).
* wspd: Wind speed (m/s).
* pres: Atmospheric pressure (hPa).
* year, month, day, hour, minute: Breakdown of the timestamp for easy time-series analysis.

In [3]:
data.datetime = pd.to_datetime(data['datetime'])
data = data.set_index("datetime", drop=True)
data = data.drop(columns=["Unnamed: 0", "minute"])

We see that the dataset contains about 400k rows, a nice base for showcasing the performance of library functions.

In [4]:
data.describe()

Unnamed: 0,Power demand,temp,dwpt,rhum,wdir,wspd,pres,year,month,day,hour,moving_avg_3
count,393440.0,393440.0,393440.0,393440.0,392900.0,393440.0,393440.0,393440.0,393440.0,393440.0,393440.0,393438.0
mean,3960.736469,25.527913,16.338046,63.435767,163.777081,7.85455,1008.932695,2022.487129,6.446991,15.745171,11.184371,3960.746273
std,1300.473773,7.981563,7.37744,24.850663,116.888397,5.664314,6.902759,1.110587,3.412654,8.783091,6.797501,1299.545642
min,1302.08,4.0,-8.6,5.0,0.0,0.0,989.6,2021.0,1.0,1.0,0.0,1307.68
25%,3074.9,20.0,10.4,44.0,50.0,5.4,1003.0,2021.0,3.0,8.0,5.0,3075.861667
50%,3832.32,27.0,15.4,67.0,160.0,7.6,1009.0,2022.0,6.0,16.0,11.0,3831.728333
75%,4870.465,31.0,23.9,84.0,270.0,11.2,1015.0,2023.0,9.0,23.0,17.0,4869.906667
max,8631.53,46.4,30.3,100.0,360.0,63.0,1027.0,2024.0,12.0,31.0,23.0,8598.126667


Now we can store the dataset for convenience and later retrieval:

In [5]:
data.to_parquet("dehli.parquet")