Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some difficulties reproducing the results in paper. #36

Closed
Guanyunlph opened this issue Feb 17, 2023 · 4 comments
Closed

Some difficulties reproducing the results in paper. #36

Guanyunlph opened this issue Feb 17, 2023 · 4 comments

Comments

@Guanyunlph
Copy link

Guanyunlph commented Feb 17, 2023

Thank you for your significant contributions to the field of time series, which have given me the opportunity to build on your work in downstream tasks. However, I am facing some challenges in reproducing the results of your paper. I have used the unsupervised pre-training mode to complete the regression of AppliancesEnergy data, and the parameters are also specific values given in your paper (Table 14).

The specific pre-training code is as follows.
python src/main.py --output_dir experiments --comment "pretraining through imputation" --name pretrained --records_file Imputation_records.xls --data_dir "/AppliancesEnergy/" --data_class tsra --pattern TRAIN --val_ratio 0.2 --epochs 700 --lr 0.001 --optimizer RAdam --pos_encoding learnable --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128

The specific fine-tune code is as follows.
python src/main.py --output_dir experiments --comment "finetune for regression" --name finetuned --records_file Regression_records.xls --data_dir /AppliancesEnergy/ --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 200 --lr 0.001 --optimizer RAdam --pos_encoding learnable --load_model /pretrained_2023-02-17_21-13-58_dtF/checkpoints/model_best.pth --task regression --change_output --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128

The specific test code is as follows.
python src/main.py --output_dir experiments --comment "test" --name test --data_dir /AppliancesEnergy/ --data_class tsra --load_model /finetuned_2023-02-17_21-28-36_l5O/checkpoints/model_best.pth --pattern TEST --test_only testset --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128 --task regression

I have run these three steps (pre-training, fine-tuning, and testing) several times, but each time the test loss results have a large difference with that in paper. I saw that there is a similar situation in issue 19 (Test result in multivariate dataset without pretrain #19), and he said he solved it this way (He search best epoch and train the model with such epoch and the whole train set). However, I did not understand his meaning indeed. Would you be able to provide a detailed explanation or some other advice to me as a beginner?

the RMSE pretrained reslut of AppliancesEnergy is 2.375 (Table 4)
but i got thes test result:
Test Summary: loss: 11.078025
Test Summary: loss: 11.406574 |
Test Summary: loss: 10.409667

@Guanyunlph Guanyunlph changed the title some difficulties reproducing the results of the paper. some difficulties reproducing the results in paper. Feb 17, 2023
@Guanyunlph Guanyunlph changed the title some difficulties reproducing the results in paper. Some difficulties reproducing the results in paper. Feb 17, 2023
@gzerveas
Copy link
Owner

gzerveas commented Feb 20, 2023

Okay, so the first thing to keep in mind is that the values reported by the code (as mentioned in the README), are always MSE, not RMSE, so you should take the square root at the end, which means that the last RMSE loss in your experiments would be about sqrt(10.41) = 3.226.

Secondly, and very importantly, you should allow fine-tuning for a sufficient number of epochs (e.g. 700 for this dataset, not 200 - you can monitor how the loss evolves). In this dataset, I got the best results after more than 600 epochs of fine-tuning.

Finally, specifically for this dataset, the result for the pre-trained / fine-tuned transformer was achieved with a batch size of 64, not 128; the rest of the hyperparameters are correct. A couple of datasets, including this one, displayed a better performance with a batch size different from 128 (typically 64 or 32) and for this reason the batch size should have been probably mentioned as a separate hyperparameter - but unfortunately never was.

If you take care of these 3 points, I believe you will get the performance you desire.

@Guanyunlph
Copy link
Author

Guanyunlph commented Feb 21, 2023

Thank you very much for your patient guidance. Following your advice, I was able to reproduce the results of the paper perfectly. I was lucky to have encountered a patient and friendly author and such a beautiful piece of work. Hyperparameters can be both loved and hated, do you have any tips for tuning them? Or could you recommend some tuning tools? Thank you again.

@gzerveas
Copy link
Owner

Hi @Guanyunlph , thank you for your kind words. I am glad you could perfectly reproduce the results.
I think your last post is a completely different topic, so could you please remove if from this thread and open a new issue? (It helps with discoverability, visibility etc). I will try to answer with a few thoughts.

@Guanyunlph
Copy link
Author

Well, I'd be happy to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants