Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing procedure #11

Open
Woutah opened this issue Mar 30, 2022 · 11 comments
Open

Testing procedure #11

Woutah opened this issue Mar 30, 2022 · 11 comments

Comments

@Woutah
Copy link

Woutah commented Mar 30, 2022

First of all, thank you for providing such a complete implementation of your code.
In the paper you mention that "Ater fixing the hyperparameters, the entire training set was used to train the model again, which was finally evaluated on the official test set.". Could you explain the way in which this final training procedure (on the entire training set) was performed?

Was a predefined amount of epochs used to train the model, after which it was evaluated on the testset? Or was the testset used as a validation set?

Thanks in advance.

@gzerveas
Copy link
Owner

gzerveas commented Mar 31, 2022

Yes, you can consider the number of epochs a "hyperparameter". Once you find out what it should be for each dataset, based on the original validation split, you use this predesignated number to train the model on the entire training set. After training, you can use the --test_only option to evaluate on the test set.
However, in practice, it can be more convenient (i.e. it spares you a run) if for this last training session, you define the test set as a validation set (using e.g. --val_pattern TEST), and you simply read out the evaluation performance for this "validation set".
This can also be interesting if you want to look into robustness: even if you allow training to progress longer, you can check what was the performance at the predesignated number of epochs, and see whether a substantially better performance on the test set was actually recorded earlier/later during training. For most datasets, I think that this probably wouldn't be the case.

@Woutah
Copy link
Author

Woutah commented Mar 31, 2022

Thank you for your quick response. Do the accuracies reported in the paper correspond to the maximum performance on the testset (in this case validation set) in this last training session?

@gzerveas
Copy link
Owner

No, as I wrote above, they should correspond to the predesignated number - and the hope is that this would be anyway close to the maximum performance.

@Woutah
Copy link
Author

Woutah commented Apr 4, 2022

I have some trouble when training on the the multivariate classification datasets, that is why I asked just to be sure. Would it be possible to provide the hyperparameters that were used during training? In particular the used learning rate and batch sizes would (probably) help me out a lot, as I am experiencing some instability when training.

@gzerveas
Copy link
Owner

gzerveas commented Apr 5, 2022

Sure, these tables with hyperparameters are from the KDD paper:

image

image

Regarding the learning rate, as far as I remember it was always set to 0.001 (the main reason for using RAdam was to make training insensitive to the learning rate). The batch size for most datasets was 128, and for some I believe 64 and 32.
Are you interested in a particular dataset? I can try to find the configuration file, that contains the full set of hyperparameters.

@donghucey
Copy link

i am trying do training dataset AppliancesEnergy,can you give configuration file about this?

@Woutah
Copy link
Author

Woutah commented Feb 6, 2023

Sorry for my late response. I am still working on this project and I am currently running some experiments again, I think my problem from before had to do with my batch sizes being too small.

Do you maybe have a list of the (approximate) batch sizes and epoch counts used in the experiments for the supervised multivariate classification datasets/task? I'd like to reproduce all classification-dataset experiments as closely to the paper as possible.

@Woutah
Copy link
Author

Woutah commented Feb 14, 2023

I am currently struggling with the configuration for SCP2, I tried with batch sizes 32, 64 and 96 but I am unable to get stable training performance resulting in the accuracy mentioned in the paper. Any help would be greatly appreciated.

@gzerveas
Copy link
Owner

In this dataset I got the best results (in the self-supervised-followed-by-finetuning case) when using a sub-sampling factor of 3 (via the option --subsample_factor ), which could potentially make a big difference, and a batch size of 32.
But I think that you may be right that this dataset in particular shows instability when training; the evaluation on the validation set was fluctuating around 0.6, and for certain hyperparameter configurations or even runs this was reached very early on, and for some after ~600 epochs. In the end I chose some kind of intermediate value for training, like 100.

@Woutah
Copy link
Author

Woutah commented Mar 3, 2023

Thanks for your help. I'll try that and report back. Have you used subsampling on any other classification datasets that you know of?

I had a couple of (multivariate classification) datasets for which I did not get the same performance when using the default parameters. If, perchance, you have some list with a general overview of what parameters were used for what datasets, that would probably make it a lot easier to reproduce the results.

One last question; how was table 9 with the standard deviations constructed? Are these the test-accuracies after x training runs using the same configuration?

@Woutah
Copy link
Author

Woutah commented Mar 7, 2023

I attached an image here of several runs using the settings from above with valset=testset (so "optimal" situation), but I'm still not reaching an accuracy of ~0.6. Can you think of any other changes in the configuration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants