TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions

This repository is the official implementation of TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions, Taehyung Kwon, Jihoon Ko, Jinghong jung, and Kijung Shin, ICDM 2023

Requirements

To run the provided codes, you need to install PyTorch. Since the installation commands for the package rely on the environments, please visit the page (https://pytorch.org/get-started/locally/) for guideline to install the package.
The code should be run on the folder (./) which includes the TensorCodec folder. The dataset files should be located in ./input.

Initializing the orders

Initialization of the orders of the tensor is implemented in init_order.py.

Positional argument

-sp, --save_path: path for saving the original tensor.
-lp, --load_path: path for saving the reordered tensor.

Example commands and results

  python TensorCodec/init_order.py -lp input/action_orig.npy -sp input/action.npy

  order: 0, loss before: 2387.244534244066, loss after: 2387.244534244066
  order: 1, loss before: 46618.97891356207, loss after: 7292.559081679066
  order: 2, loss before: 20461.291436824664, loss after: 14315.363870162731
  Total elapsed time: 8.323100328445435

Running TensorCodec

Training (compressing) and evaluating (decompressing) process are implemented in main.py.

Positional argument

action: train for compressing the matrix. test for checking the reconstruction loss of the trained model.
-d, --dataset: data to be compressed

Optional arguments (common)

-de, --device: GPU id(s) for execution.
-rk, --rank: rank of TT cores.
-hs, --hidden_size: size of the hidden dimension.
-m, --model: type of the model (gru, lstm, mha). The default is lstm.
-nb, --num_batch: the number of mini-batches for training.
-b, --batch_size: the number of entries of the tensor which are processed simultaneosly in GPUs.

Optional arguments for training

-lr, --lr: learning rate.
-e, --epoch: maximum epoch numbers.
-sp, --save_path: path for saving the parameters of the trained model and the new orders of the indices of the tensor (bijective function from indices of the reordered tensor to the indices of the original tensor).
-tol, --tol: tolerance for training.

Optional argument for evaluating

-lp, --load_path: path for loading the parameters of the trained model and the orders of the indices of the tensor.
-sp, --save_path: path for saving the reconstructed tensor

Example command

  # Training
  python TensorCodec/main.py train -d action -de 0 1 2 3 -rk 6 -hs 8 -sp output/action_r6_h8 -e 5000 -lr 1 -m lstm -nb 100 -t 100 -b 2097152

  # Evaluating
  python TensorCodec/main.py test -lp output/action_r6_h8 -d action -de 0 1 2 3 -rk 6 -hs 8 -sp output/action_recon.npy

Evaluating the trained model

Command

We uploaded the trained model for the 3 smallest tensors among 3-order tensors and the smallest tensor among 4-order tensors in the folder 'trained model'.
The hyperparameters (rank and hidden dimension) correspond to the models with the fewest parameters shown in Figure 3 of the main paper for all datasets.
You can run the code with the following commands. Note that the device option should be changed depending on the available GPUs.

  python TensorCodec/main.py test -d action -lp 'TensorCodec/trained model/action_r6_h8.pt' -de 0 1 2 3 -rk 6 -hs 8 -sp action_recon.npy
  python TensorCodec/main.py test -d airquality -lp 'TensorCodec/trained model/airquality_r7_h11.pt' -de 0 1 2 3 -rk 7 -hs 11 -sp airquality_recon.npy
  python TensorCodec/main.py test -d uber -lp 'TensorCodec/trained model/uber_r8_h7.pt' -de 0 1 2 3 -rk 8 -hs 7 -sp uber_recon.npy
  python TensorCodec/main.py test -d nyc -lp 'TensorCodec/trained model/nyc_r2_h5.pt' -de 0 1 2 3 -rk 2 -hs 5 -sp nyc_recon.npy

Expected results

	action	airquality	uber	nyc
Fitness	0.65	0.648	0.669	0.558
Compressed Size (bytes)	11686	26031	11870	5227

Real-world datasets we used

Name	shape	Density	Source	Link
Uber	183 x 24 x 1,140	0.138	FROSTT	Link
Air Quality	5,600 x 362 x 6	0.917	Air Korea	Link
Action	100 x 570 x 567	0.393	Multivariate LSTM-FCNs	Link
PEMS-SF	963 X 144 X 440	0.999	The UEA & UCR Time Series Classification Repository	Link
Activity	337 x 570 x 320	0.569	Multivariate LSTM-FCNs	Link
Stock	1,317 x 88 x 916	0.816	Zoom-Tucker	Link
NYC	265 X 265 X 28 X 35	0.118	New York City Government	Link
Absorb	192 x 228 x 30 x 120	1.000	Climate Data at the National Center for Atmospheric Research	Link

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
input_size		input_size
trained model		trained model
.gitignore		.gitignore
README.md		README.md
data.py		data.py
init_order.py		init_order.py
main.py		main.py
model.py		model.py
online appendix.pdf		online appendix.pdf

kbrother/TensorCodec

Folders and files

Latest commit

History

Repository files navigation

TensorCodec: Compact Lossy Compression of Tensors without Strong Data Assumptions

Requirements

Initializing the orders

Positional argument

Example commands and results

Running TensorCodec

Positional argument

Optional arguments (common)

Optional arguments for training

Optional argument for evaluating

Example command

Evaluating the trained model

Command

Expected results

Real-world datasets we used

About

Resources

Stars

Watchers

Forks

Languages