# Time Series Decomposition

In this tutorial, we explore time series decomposition using the `normet` package.
The goal of decomposition is to split an observed signal into interpretable 
components such as:

- **Weather component**: Decompose a time series into meteorological contributions ranked by model
    feature importance.
- **Emission component**: Decompose a time series into emission-related components by progressively
    freezing time-related variables during resampling.


# 1. Load Dataset and model

In [1]:
import pandas as pd
import normet as nm

df_pre = pd.read_csv("data_df_prep.csv",parse_dates=['date'],index_col='date')
model_flaml=nm.load_model(folder_path='.',filename='automl.joblib')
model_h2o=nm.load_model(folder_path='.',backend='h2o',filename='automl')

Checking whether there is an H2O instance running at http://localhost:54321. connected.
Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html


0,1
H2O_cluster_uptime:,11 mins 04 secs
H2O_cluster_timezone:,Europe/London
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.46.0.7
H2O_cluster_version_age:,5 months and 21 days
H2O_cluster_name:,H2O_from_python_n94921cs_um8ihg
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,6.960 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,7


# 2. Emission-related component

In [2]:
df_emi=nm.decom_emi(df_pre, value='value',model=model_flaml,feature_names=['u10', 'v10', 'd2m', 't2m',
       'blh', 'sp', 'ssrd', 'tcc', 'tp', 'rh2m','date_unix', 'day_julian', 'weekday',
       'hour'], n_samples=300)

In [3]:
df_emi

Unnamed: 0_level_0,observed,date_unix,day_julian,weekday,hour,emi_total,emi_noise,emi_base
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-01-01 00:00:00,58.1,12.585427,-7.999555,-4.195440,-0.836412,9.257922,0.520629,9.183272
2020-01-01 01:00:00,43.2,12.960784,-8.373226,-4.209904,-1.250548,8.208625,-0.101752,9.183272
2020-01-01 02:00:00,43.0,13.597747,-8.630524,-4.634238,-1.327028,7.295590,-0.893639,9.183272
2020-01-01 03:00:00,42.8,13.200170,-8.598249,-4.211034,-1.263878,7.758605,-0.551678,9.183272
2020-01-01 04:00:00,36.8,13.097771,-8.515065,-4.263471,-1.059132,8.721947,0.278571,9.183272
...,...,...,...,...,...,...,...,...
2020-12-31 19:00:00,11.7,2.244018,1.468774,-3.495459,0.246548,10.098802,0.451650,9.183272
2020-12-31 20:00:00,11.0,2.507577,1.201683,-3.439797,-0.083176,9.348444,-0.021117,9.183272
2020-12-31 21:00:00,15.3,2.682733,1.006918,-3.572377,0.071585,8.857440,-0.514690,9.183272
2020-12-31 22:00:00,17.1,2.583330,1.281499,-3.677945,-0.188740,9.297286,0.115869,9.183272


In [7]:
df_emi_h2o=nm.decom_emi(df_pre, value='value',model=model_h2o,feature_names=['u10', 'v10', 'd2m', 't2m',
       'blh', 'sp', 'ssrd', 'tcc', 'tp', 'rh2m','date_unix', 'day_julian', 'weekday',
       'hour'], n_samples=300)

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
Export File progress: |██████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
Export File progress: |██████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
Export File progress: |██████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |████████████████████████

In [8]:
df_emi_h2o

Unnamed: 0_level_0,observed,date_unix,day_julian,weekday,hour,emi_total,emi_noise,emi_base
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-01-01 00:00:00,58.1,13.678577,-10.294772,-3.130409,-0.864267,8.949349,0.396984,9.163236
2020-01-01 01:00:00,43.2,13.972299,-10.405251,-3.322040,-0.854242,8.388789,-0.165212,9.163236
2020-01-01 02:00:00,43.0,14.228233,-10.662444,-3.338710,-1.081548,7.430981,-0.877785,9.163236
2020-01-01 03:00:00,42.8,13.993944,-10.523189,-3.129671,-1.137501,7.857510,-0.509310,9.163236
2020-01-01 04:00:00,36.8,14.197713,-10.774653,-3.301511,-0.748802,8.802969,0.266986,9.163236
...,...,...,...,...,...,...,...,...
2020-12-31 19:00:00,11.7,1.492961,-0.443572,-0.760436,0.167839,10.088772,0.468744,9.163236
2020-12-31 20:00:00,11.0,1.617913,-0.488165,-0.830937,-0.196620,9.165863,-0.099564,9.163236
2020-12-31 21:00:00,15.3,1.661922,-0.622656,-0.811997,0.002998,8.875606,-0.517897,9.163236
2020-12-31 22:00:00,17.1,1.606996,-0.550163,-0.841701,-0.191866,9.398885,0.212382,9.163236


# 3. meteorological contributions

In [9]:
df_met=nm.decom_met(df_pre, value='value',model=model_flaml,feature_names=['u10', 'v10', 'd2m', 't2m',
       'blh', 'sp', 'ssrd', 'tcc', 'tp', 'rh2m','date_unix', 'day_julian', 'weekday',
       'hour'], n_samples=300)

In [10]:
df_met

Unnamed: 0_level_0,observed,emi_total,blh,u10,d2m,sp,v10,t2m,tcc,tp,rh2m,ssrd,met_total,met_base,met_noise
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2020-01-01 00:00:00,58.1,27.182170,1.747127,6.907205,0.108777,2.129622,8.177831,1.944083,3.038886,1.826078,2.476466,-28.356075,30.917830,-0.272951,31.190781
2020-01-01 01:00:00,43.2,25.958245,1.497383,6.372982,-0.276071,0.440097,5.814759,0.712032,2.204222,0.114941,0.023268,-16.903613,17.241755,-0.272951,17.514706
2020-01-01 02:00:00,43.0,25.749106,6.905823,5.234658,-0.190116,0.155462,5.421741,-1.009258,3.574900,0.354570,1.541427,-21.989207,17.250894,-0.272951,17.523845
2020-01-01 03:00:00,42.8,25.697841,9.543428,1.365803,-0.675506,1.298016,3.183256,-1.885136,2.670369,0.466045,0.658054,-16.624328,17.102159,-0.272951,17.375110
2020-01-01 04:00:00,36.8,26.751151,6.387102,2.001910,-0.681866,-1.452728,1.890110,-1.359555,2.780033,0.519516,1.602139,-11.686662,10.048849,-0.272951,10.321800
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-12-31 19:00:00,11.7,13.923514,-0.988478,-0.599763,-0.507729,0.301568,0.641462,-0.569125,-0.140906,0.341183,-0.514308,2.036095,-2.223514,-0.272951,-1.950563
2020-12-31 20:00:00,11.0,13.513001,-0.774380,-0.397388,-0.491752,0.404927,0.662255,-0.593093,-0.601991,0.090385,-0.617948,2.318986,-2.513001,-0.272951,-2.240050
2020-12-31 21:00:00,15.3,12.921053,-0.668264,-0.682246,-0.602764,0.326296,0.791646,-0.617332,-0.568020,0.091811,-0.365413,2.294285,2.378947,-0.272951,2.651898
2020-12-31 22:00:00,17.1,13.443768,7.103325,-3.660732,-0.875603,1.148461,0.254076,-0.235992,-1.162650,0.809815,0.425367,-3.806067,3.656232,-0.272951,3.929183


In [11]:
df_met_h2o=nm.decom_met(df_pre, value='value',model=model_h2o,feature_names=['u10', 'v10', 'd2m', 't2m',
       'blh', 'sp', 'ssrd', 'tcc', 'tp', 'rh2m','date_unix', 'day_julian', 'weekday',
       'hour'], n_samples=300)

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
Export File progress: |██████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
Export File progress: |██████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
Export File progress: |██████████████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
gbm prediction progress: |████████████████████████

In [12]:
df_met_h2o

Unnamed: 0_level_0,observed,emi_total,blh,u10,d2m,sp,v10,t2m,rh2m,tcc,ssrd,tp,met_total,met_base,met_noise
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2020-01-01 00:00:00,58.1,26.747566,-0.685692,8.237067,0.040652,1.740875,2.553671,0.926107,1.013603,1.423810,-0.100472,-15.149623,31.352434,-0.103159,31.455594
2020-01-01 01:00:00,43.2,25.892533,-0.083725,7.048417,-0.067193,2.309767,2.660501,0.447486,0.570279,1.215169,-0.172690,-13.928011,17.307467,-0.103159,17.410627
2020-01-01 02:00:00,43.0,24.744767,1.297865,5.169713,0.362393,4.471958,2.530851,1.116375,-0.095434,1.607591,-0.662551,-15.798761,18.255233,-0.103159,18.358392
2020-01-01 03:00:00,42.8,24.888955,3.669335,3.190447,0.599329,4.121629,1.456098,0.088975,0.432077,1.816695,-0.556616,-14.817971,17.911045,-0.103159,18.014204
2020-01-01 04:00:00,36.8,25.965658,4.084822,1.406836,0.337906,3.766984,1.123482,-0.524787,0.359523,1.651506,-0.867593,-11.338678,10.834342,-0.103159,10.937501
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-12-31 19:00:00,11.7,12.371624,-0.873613,-0.036832,-0.135024,0.817879,0.193373,0.183870,0.283234,-0.017805,-0.771720,0.356639,-0.671624,-0.103159,-0.568465
2020-12-31 20:00:00,11.0,11.586944,-0.614335,0.174963,-0.117066,0.692587,0.064921,0.786918,0.188087,-0.652703,-0.781100,0.257728,-0.586944,-0.103159,-0.483785
2020-12-31 21:00:00,15.3,11.283063,-1.170900,0.002810,0.389001,0.738250,-0.084617,0.593092,0.077460,-0.677495,-0.444446,0.576846,4.016937,-0.103159,4.120096
2020-12-31 22:00:00,17.1,11.803681,6.814033,-2.796184,0.218716,1.475771,-0.177330,0.232567,-1.330318,1.251049,-0.163856,-5.524449,5.296319,-0.103159,5.399478


In [13]:
df_met.describe()

Unnamed: 0,observed,emi_total,blh,u10,d2m,sp,v10,t2m,tcc,tp,rh2m,ssrd,met_total,met_base,met_noise
count,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0
mean,9.134238,9.407189,0.515026,-0.649679,-0.076692,0.222034,0.064016,0.020321,-0.044616,-0.01109,-0.01427,-0.02505,-0.272951,-0.272951,-2.854212e-16
std,8.137577,3.013772,3.457504,2.528226,1.970255,1.192345,1.384666,1.145522,0.702251,1.033316,0.90901,6.912117,7.172175,0.0,7.172175
min,-4.3,4.100656,-4.560284,-6.915521,-4.869755,-4.049281,-8.993979,-8.745112,-4.072201,-5.964634,-4.24265,-55.18382,-15.978659,-0.272951,-15.70571
25%,4.4,6.710641,-1.438063,-2.059543,-0.757609,-0.263181,-0.673448,-0.479548,-0.366363,-0.376042,-0.475898,-1.746218,-4.354035,-0.272951,-4.081084
50%,6.8,9.134899,-0.813969,-0.551139,-0.218718,0.10609,0.209349,-0.060397,-0.050963,-0.051476,-0.032461,1.366799,-1.669269,-0.272951,-1.396319
75%,10.7,11.584774,0.681705,0.275663,0.160154,0.504321,0.829351,0.372073,0.264032,0.294284,0.388029,3.902386,1.579971,-0.272951,1.852922
max,72.3,28.142333,17.351036,12.508581,19.970023,11.452135,8.177831,6.150083,5.785555,9.381517,11.233589,14.033802,56.435173,-0.272951,56.70812


In [14]:
df_met_h2o.describe()

Unnamed: 0,observed,emi_total,blh,u10,d2m,sp,v10,t2m,rh2m,tcc,ssrd,tp,met_total,met_base,met_noise
count,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0,6373.0
mean,9.134238,9.237398,0.390789,-0.402747,0.005667,0.10585,0.061589,0.00575,0.006032,-0.001564,-0.223517,0.052151,-0.103159,-0.103159,0.0
std,8.137577,2.59308,3.439664,2.365962,1.977472,1.185992,1.13286,1.345394,0.724454,0.791825,0.771984,7.027719,7.608861,0.0,7.608861
min,-4.3,4.869018,-4.622713,-6.485211,-3.936577,-5.15688,-7.008476,-6.454626,-4.91205,-5.139478,-3.910919,-56.621772,-15.850152,-0.103159,-15.746993
25%,4.4,7.030869,-1.607884,-1.914659,-0.625523,-0.32825,-0.500316,-0.524802,-0.32349,-0.32959,-0.604914,-1.409352,-4.484597,-0.103159,-4.381437
50%,6.8,8.881836,-0.974273,-0.416046,-0.165078,0.014033,0.175632,-0.137384,-0.020453,-0.046717,-0.219056,1.545758,-1.715005,-0.103159,-1.611846
75%,10.7,10.871789,0.79489,0.540123,0.130103,0.334134,0.639533,0.276487,0.326336,0.263396,0.128641,4.023408,1.704289,-0.103159,1.807448
max,72.3,27.584487,21.657635,15.807835,23.273812,9.590096,6.38661,9.038982,5.410638,7.309356,6.221995,12.631847,58.373305,-0.103159,58.476464
