-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some difficulties reproducing the results in paper. #36
Comments
Okay, so the first thing to keep in mind is that the values reported by the code (as mentioned in the README), are always MSE, not RMSE, so you should take the square root at the end, which means that the last RMSE loss in your experiments would be about sqrt(10.41) = 3.226. Secondly, and very importantly, you should allow fine-tuning for a sufficient number of epochs (e.g. 700 for this dataset, not 200 - you can monitor how the loss evolves). In this dataset, I got the best results after more than 600 epochs of fine-tuning. Finally, specifically for this dataset, the result for the pre-trained / fine-tuned transformer was achieved with a batch size of 64, not 128; the rest of the hyperparameters are correct. A couple of datasets, including this one, displayed a better performance with a batch size different from 128 (typically 64 or 32) and for this reason the batch size should have been probably mentioned as a separate hyperparameter - but unfortunately never was. If you take care of these 3 points, I believe you will get the performance you desire. |
Thank you very much for your patient guidance. Following your advice, I was able to reproduce the results of the paper perfectly. I was lucky to have encountered a patient and friendly author and such a beautiful piece of work. Hyperparameters can be both loved and hated, do you have any tips for tuning them? Or could you recommend some tuning tools? Thank you again. |
Hi @Guanyunlph , thank you for your kind words. I am glad you could perfectly reproduce the results. |
Well, I'd be happy to do that. |
Thank you for your significant contributions to the field of time series, which have given me the opportunity to build on your work in downstream tasks. However, I am facing some challenges in reproducing the results of your paper. I have used the unsupervised pre-training mode to complete the regression of AppliancesEnergy data, and the parameters are also specific values given in your paper (Table 14).
The specific pre-training code is as follows.
python src/main.py --output_dir experiments --comment "pretraining through imputation" --name pretrained --records_file Imputation_records.xls --data_dir "/AppliancesEnergy/" --data_class tsra --pattern TRAIN --val_ratio 0.2 --epochs 700 --lr 0.001 --optimizer RAdam --pos_encoding learnable --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128
The specific fine-tune code is as follows.
python src/main.py --output_dir experiments --comment "finetune for regression" --name finetuned --records_file Regression_records.xls --data_dir /AppliancesEnergy/ --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 200 --lr 0.001 --optimizer RAdam --pos_encoding learnable --load_model /pretrained_2023-02-17_21-13-58_dtF/checkpoints/model_best.pth --task regression --change_output --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128
The specific test code is as follows.
python src/main.py --output_dir experiments --comment "test" --name test --data_dir /AppliancesEnergy/ --data_class tsra --load_model /finetuned_2023-02-17_21-28-36_l5O/checkpoints/model_best.pth --pattern TEST --test_only testset --num_layers 3 --num_heads 16 --d_model 128 --dim_feedforward 512 --batch_size 128 --task regression
I have run these three steps (pre-training, fine-tuning, and testing) several times, but each time the test loss results have a large difference with that in paper. I saw that there is a similar situation in issue 19 (Test result in multivariate dataset without pretrain #19), and he said he solved it this way (He search best epoch and train the model with such epoch and the whole train set). However, I did not understand his meaning indeed. Would you be able to provide a detailed explanation or some other advice to me as a beginner?
the RMSE pretrained reslut of AppliancesEnergy is 2.375 (Table 4)
but i got thes test result:
Test Summary: loss: 11.078025
Test Summary: loss: 11.406574 |
Test Summary: loss: 10.409667
The text was updated successfully, but these errors were encountered: