
<div style="text-align: center;">
  <img width="420" height="420" src="https://www.naterscreations.com/imputegap/logo_imputegab.png" />
</div>

<h1>ImputeGAP: Explore Notebook</h1>

In [None]:
%pip install imputegap
%pip install -U ipywidgets

<h1>Benchmark</h1>

ImputeGAP can serve as a common test-bed for comparing the effectiveness and efficiency of time series imputation algorithms. Users have full control over the benchmark by customizing various parameters, including the list of the algorithms to compare, the choice of optimizer to fine-tune the algorithms, the datasets to evaluate, the missingness patterns, the range of missing values, and the performance metrics.

The benchmarking module can be utilized as follows:

In [1]:
from imputegap.recovery.benchmark import Benchmark

my_algorithms = ["SoftImpute", "KNNImpute"]

my_opt = ["default_params"]

my_datasets = ["eeg-alcohol"]

my_patterns = ["mcar"]

range = [0.05, 0.1, 0.2, 0.4, 0.6, 0.8]

my_metrics = ["*"]

# launch the evaluation
list_results, sum_scores = Benchmark().eval(algorithms=my_algorithms, datasets=my_datasets, patterns=my_patterns, x_axis=range, metrics=my_metrics, optimizers=my_opt)


(SYS) The time series have been loaded from /tmp/pycharm_project_615/imputegap/dataset/eeg-alcohol.txt

pattern : mcar, algorithm SoftImpute, started at 2025-04-03 15:45:19.
done!


pattern : mcar, algorithm KNNImpute, started at 2025-04-03 15:45:20.
done!


the results of the analysis has been saved in :  ./imputegap_assets/benchmark 


> logs: benchmark - Execution Time: 4.6457 seconds


Dataset: eegalcohol, pattern: mcar, metric: RMSE, parms=default_params

 Rate       KNNImpute           SoftImpute     

 0.05      0.2410259540        0.4359915238    
  0.1      0.2889085181        0.3665001858    
  0.2      0.3252384253        0.3983300622    
  0.4      0.3382458476        0.4355910162    
  0.6      0.3656435952        0.4500113662    
  0.8      0.4985193265        0.4655442240    



Dataset: eegalcohol, pattern: mcar, metric: runtime_linear_scale, parms=default_params

 Rate       KNNImpute           SoftImpute     

 0.05      0.2473430634        0.0275871754    
  0.1    

In [2]:
opt = {"optimizer": "ray_tune", "options": {"n_calls": 1, "max_concurrent_trials": 1}}
my_opt = [opt]

<h1>Downstream</h1>

ImputeGAP includes a dedicated module for systematically evaluating the impact of data imputation on downstream tasks. Currently, forecasting is the primary supported task, with plans to expand to additional tasks in the future.

In [3]:
from imputegap.recovery.imputation import Imputation
from imputegap.recovery.manager import TimeSeries
from imputegap.tools import utils

# initialize the time series object
ts = TimeSeries()

# load and normalize the timeseries
ts.load_series(utils.search_path("forecast-economy"))
ts.normalize()

# contaminate the time series
ts_m = ts.Contamination.aligned(ts.data, rate_series=0.8)

# define and impute the contaminated series
imputer = Imputation.MatrixCompletion.CDRec(ts_m)
imputer.impute()

# compute and print the downstream results
downstream_config = {"task": "forecast", "model": "hw-add", "comparator": "ZeroImpute"}
imputer.score(ts.data, imputer.recov_data, downstream=downstream_config)
ts.print_results(imputer.downstream_metrics, algorithm=imputer.algorithm)


(SYS) The time series have been loaded from /tmp/pycharm_project_615/imputegap/dataset/forecast-economy.txt

> logs: normalization (z_score) of the data - runtime: 0.0004 seconds

(CONT) missigness pattern: ALIGNED
	rate series impacted: 20.0%
	missing rate per series: 80.0%
	starting position: 93
	index impacted : 93 -> 837

(IMPUTATION) CDRec: (16,931) for rank 3, epsilon 1e-06, and iterations 100.
> logs: imputation cdrec - Execution Time: 0.7270 seconds.

(DOWNSTREAM) Default parameters of the downstream model loaded.

(DOWNSTREAM) Analysis launched !
task: forecast
model: hw-add
params: {'sp': 7, 'trend': 'add', 'seasonal': 'additive'}
imputation algorithm: cdrec
comparator: ZeroImpute


plots saved in  ./imputegap_assets/downstream/25_04_03_15_47_05_forecast_hw-add_downstream.jpg

Results of the analysis (cdrec) :
MSE_original         = 0.5663304997036217
MSE_CDREC            = 0.8353031889744237
MSE_ZEROIMPUTE       = 0.6242557383168621
sMAPE_original       = 89.37515160605443


All downstream models develboped in ImputeGAP are available in the **ts.forecasting_models** module, which can be listed as follows:

In [4]:
ts.forecasting_models

['arima',
 'bats',
 'croston',
 'deepar',
 'ets',
 'exp-smoothing',
 'hw-add',
 'lightgbm',
 'lstm',
 'naive',
 'nbeats',
 'prophet',
 'sf-arima',
 'theta',
 'transformer',
 'unobs',
 'xgboost']

<h1>Explainer</h1>

The library provides insights into the algorithm’s behavior by identifying the features that impact the most the imputation results. It trains a regression model to predict imputation results across various methods and uses SHapley Additive exPlanations (SHAP) to reveal how different time series features influence the model’s predictions.

Let’s illustrate the explainer using the CDRec algorithm and MCAR missingness pattern:

In [5]:
from imputegap.recovery.manager import TimeSeries
from imputegap.recovery.explainer import Explainer
from imputegap.tools import utils

# initialize the time series object
ts = TimeSeries()

# load and normalize the timeseries
ts.load_series(utils.search_path("eeg-alcohol"))
ts.normalize(normalizer="z_score")

# configure the explanation
shap_values, shap_details = Explainer.shap_explainer(input_data=ts.data, extractor="pycatch", pattern="mcar", file_name=ts.name, algorithm="CDRec")

# print the impact of each feature
Explainer.print(shap_values, shap_details)


(SYS) The time series have been loaded from /tmp/pycharm_project_615/imputegap/dataset/eeg-alcohol.txt

> logs: normalization (z_score) of the data - runtime: 0.0009 seconds

explainer launched
	extractor: pycatch 
	imputation algorithm: CDRec 
	params: None 
	missigness pattern: mcar
	missing rate: 40.0%
	nbr of series training set: 38
	nbr of series testing set: 26


Generation  0 / 64 ( 0 %)________________________________________________________
	Contamination  0 ...
	pycatch22 : features extracted successfully___22 features
	Imputation  0 ...


Generation  1 / 64 ( 1 %)________________________________________________________
	Contamination  1 ...
	pycatch22 : features extracted successfully___22 features
	Imputation  1 ...


Generation  2 / 64 ( 3 %)________________________________________________________
	Contamination  2 ...
	pycatch22 : features extracted successfully___22 features
	Imputation  2 ...


Generation  3 / 64 ( 4 %)__________________________________________________

  0%|          | 0/26 [00:00<?, ?it/s]

  0%|          | 0/38 [00:00<?, ?it/s]



	plot has been saved :  ./imputegap_assets/shap/eeg-alcohol_CDRec_pycatch_shap_all.png


	plot has been saved :  ./imputegap_assets/shap/analysis_grouped/eeg-alcohol_CDRec_pycatch_shap_reverse.png


	plot has been saved :  ./imputegap_assets/shap/analysis_grouped/eeg-alcohol_CDRec_pycatch_DTL_Waterfall.png


	plot has been saved :  ./imputegap_assets/shap/analysis_grouped/eeg-alcohol_CDRec_pycatch_DTL_Beeswarm.png


	plot has been saved :  ./imputegap_assets/shap/analysis_per_cat/eeg-alcohol_CDRec_pycatch_shap_geometry.png


	plot has been saved :  ./imputegap_assets/shap/analysis_per_cat/eeg-alcohol_CDRec_pycatch_shap_transformation.png


	plot has been saved :  ./imputegap_assets/shap/analysis_per_cat/eeg-alcohol_CDRec_pycatch_shap_correlation.png


	plot has been saved :  ./imputegap_assets/shap/analysis_per_cat/eeg-alcohol_CDRec_pycatch_shap_trend.png


	plot has been saved :  ./imputegap_assets/shap/eeg-alcohol_CDRec_pycatch_shap_cat.png


	plot has been saved :  ./imputegap_ass

All features extractors developed in ImputeGAP are available in the **ts.extractors module**, which can be listed as follows:

In [6]:
ts.extractors

['pycatch', 'tsfel', 'tsfresh']