Running the model

Code for the paper - Multi-Variate Time Series Forecasting on Variable Subsets accepted at KDD 2022 Research Track.

Running the model

Datasets - METR-LA, SOLAR, TRAFFIC, ECG. This code provides a running example with all components on MTGNN model (we acknowledge the authors of the work).

Standard Training

python train_multi_step.py --data ./data/{0} --model_name {1} --device cuda:0 --expid {2} --epochs 100 --batch_size 64 --runs 10 --random_node_idx_split_runs 100 --lower_limit_random_node_selections 15 --upper_limit_random_node_selections 15 --step_size1 {3} --mask_remaining {4}

Here,
{0} - refers to the dataset directory: ./data/{ECG/TRAFFIC/METR-LA/SOLAR}
{1} - refers to the model name
{2} - refers to the manually assigned "ID" of the experiment
{3} - step_size1 is 2500 for METR-LA and SOLAR, 400 for ECG, 1000 for TRAFFIC
{4} - inference post training in the partial setting, set to true or false. Note - mask_remaining is the alias for "Partial" setting in the paper

random_node_idx_split_runs - the number of randomly sampled subsets per trained model run
lower_limit_random_node_selections and upper_limit_random_node_selections - the percentage of variables in the subset S.

Training with predefined subset S, the S apriori setting

python train_multi_step.py --data ./data/{0} --model_name {1} --device cuda:0 --expid {2} --epochs 100 --batch_size 64 --runs 50 --predefined_S --random_node_idx_split_runs 1 --lower_limit_random_node_selections 100 --upper_limit_random_node_selections 100 --step_size1 {3}

Training the model with Identity matrix as Adjacency

python train_multi_step.py --data ./data/{0} --model_name {1} --device cuda:0 --expid {2} --epochs 100 --batch_size 64 --runs 10 --adj_identity_train_test --random_node_idx_split_runs 100 --lower_limit_random_node_selections 100 --upper_limit_random_node_selections 100 --step_size1 {3}

Inference

Partial setting inference

python train_multi_step.py --data ./data/{0} --model_name {1} --device cuda:0 --expid {2} --epochs 0 --batch_size 64 --runs 10 --random_node_idx_split_runs 100 --lower_limit_random_node_selections 15 --upper_limit_random_node_selections 15 --mask_remaining True

Note that epochs are set to 0 and mask_remaining (alias of "Partial" setting in the paper) to True

Oracle setting inference

python train_multi_step.py --data ./data/{0} --model_name {1} --device cuda:0 --expid {2} --epochs 0 --batch_size 64 --runs 10 --random_node_idx_split_runs 100 --lower_limit_random_node_selections 100 --upper_limit_random_node_selections 100 --do_full_set_oracle true --full_set_oracle_lower_limit 15 --full_set_oracle_upper_limit 15

Our Wrapper Technique

python train_multi_step.py --data ./data/{0} --model_name {1} --device cuda:0 --expid {2} --epochs 0 --batch_size 64 --runs 10 --random_node_idx_split_runs 100 --lower_limit_random_node_selections 15 --upper_limit_random_node_selections 15 --borrow_from_train_data true --num_neighbors_borrow 5 --dist_exp_value 0.5 --neighbor_temp 0.1 --use_ewp True

Requirements

The model is implemented using Python3 with dependencies specified in requirements.txt

Data Preparation

Multivariate time series datasets

Download Solar and Traffic datasets from https://github.com/laiguokun/multivariate-time-series-data. Uncompress them and move them to the data folder.

Download the METR-LA dataset from Google Drive or Baidu Yun provided by Li et al.. Move them into the data folder. (Optinally - download the adjacency matrix for META-LA from here and put it as ./data/sensor_graph/adj_mx.pkl , as shown below):

wget https://github.com/nnzhan/MTGNN/blob/master/data/sensor_graph/adj_mx.pkl
mkdir data/sensor_graph
mv adj_mx.pkl data/sensor_graph/

Download the ECG5000 dataset from time series classification.


# Create data directories
mkdir -p data/{METR-LA,SOLAR,TRAFFIC,ECG}

# for any dataset, run the following command
python generate_training_data.py --ds_name {0} --output_dir data/{1} --dataset_filename data/{2}

Here
{0} is for the dataset: metr-la, solar, traffic, ECG
{1} is the directory where to save the train, valid, test splits. These are created from the first command
{2} the raw data filename (the downloaded file), such as - ECG_data.csv, metr-la.hd5, solar.txt, traffic.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
generate_training_data.py		generate_training_data.py
layer.py		layer.py
net.py		net.py
requirements.txt		requirements.txt
train_multi_step.py		train_multi_step.py
trainer.py		trainer.py
util.py		util.py

License

isabella232/vsf-time-series

Folders and files

Latest commit

History

Repository files navigation

Running the model

Standard Training

Training with predefined subset S, the S apriori setting

Training the model with Identity matrix as Adjacency

Inference

Partial setting inference

Oracle setting inference

Our Wrapper Technique

Requirements

Data Preparation

Multivariate time series datasets

Citation

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages