Testing procedure #11

Woutah · 2022-03-30T15:39:25Z

First of all, thank you for providing such a complete implementation of your code.
In the paper you mention that "Ater fixing the hyperparameters, the entire training set was used to train the model again, which was finally evaluated on the official test set.". Could you explain the way in which this final training procedure (on the entire training set) was performed?

Was a predefined amount of epochs used to train the model, after which it was evaluated on the testset? Or was the testset used as a validation set?

Thanks in advance.

gzerveas · 2022-03-31T18:49:26Z

Yes, you can consider the number of epochs a "hyperparameter". Once you find out what it should be for each dataset, based on the original validation split, you use this predesignated number to train the model on the entire training set. After training, you can use the --test_only option to evaluate on the test set.
However, in practice, it can be more convenient (i.e. it spares you a run) if for this last training session, you define the test set as a validation set (using e.g. --val_pattern TEST), and you simply read out the evaluation performance for this "validation set".
This can also be interesting if you want to look into robustness: even if you allow training to progress longer, you can check what was the performance at the predesignated number of epochs, and see whether a substantially better performance on the test set was actually recorded earlier/later during training. For most datasets, I think that this probably wouldn't be the case.

Woutah · 2022-03-31T19:27:02Z

Thank you for your quick response. Do the accuracies reported in the paper correspond to the maximum performance on the testset (in this case validation set) in this last training session?

gzerveas · 2022-03-31T20:07:11Z

No, as I wrote above, they should correspond to the predesignated number - and the hope is that this would be anyway close to the maximum performance.

Woutah · 2022-04-04T17:16:56Z

I have some trouble when training on the the multivariate classification datasets, that is why I asked just to be sure. Would it be possible to provide the hyperparameters that were used during training? In particular the used learning rate and batch sizes would (probably) help me out a lot, as I am experiencing some instability when training.

gzerveas · 2022-04-05T23:37:31Z

Sure, these tables with hyperparameters are from the KDD paper:

Regarding the learning rate, as far as I remember it was always set to 0.001 (the main reason for using RAdam was to make training insensitive to the learning rate). The batch size for most datasets was 128, and for some I believe 64 and 32.
Are you interested in a particular dataset? I can try to find the configuration file, that contains the full set of hyperparameters.

donghucey · 2022-04-13T03:38:53Z

i am trying do training dataset AppliancesEnergy,can you give configuration file about this?

Woutah · 2023-02-06T20:51:56Z

Sorry for my late response. I am still working on this project and I am currently running some experiments again, I think my problem from before had to do with my batch sizes being too small.

Do you maybe have a list of the (approximate) batch sizes and epoch counts used in the experiments for the supervised multivariate classification datasets/task? I'd like to reproduce all classification-dataset experiments as closely to the paper as possible.

Woutah · 2023-02-14T15:25:36Z

I am currently struggling with the configuration for SCP2, I tried with batch sizes 32, 64 and 96 but I am unable to get stable training performance resulting in the accuracy mentioned in the paper. Any help would be greatly appreciated.

gzerveas · 2023-02-25T02:19:15Z

In this dataset I got the best results (in the self-supervised-followed-by-finetuning case) when using a sub-sampling factor of 3 (via the option --subsample_factor ), which could potentially make a big difference, and a batch size of 32.
But I think that you may be right that this dataset in particular shows instability when training; the evaluation on the validation set was fluctuating around 0.6, and for certain hyperparameter configurations or even runs this was reached very early on, and for some after ~600 epochs. In the end I chose some kind of intermediate value for training, like 100.

Woutah · 2023-03-03T12:48:27Z

Thanks for your help. I'll try that and report back. Have you used subsampling on any other classification datasets that you know of?

I had a couple of (multivariate classification) datasets for which I did not get the same performance when using the default parameters. If, perchance, you have some list with a general overview of what parameters were used for what datasets, that would probably make it a lot easier to reproduce the results.

One last question; how was table 9 with the standard deviations constructed? Are these the test-accuracies after x training runs using the same configuration?

Woutah · 2023-03-07T12:42:57Z

I attached an image of several runs using the settings from above with valset=testset (so "optimal" situation), but I'm still not reaching an accuracy of ~0.6. Can you think of any other changes in the configuration?

Mingzhe-Han mentioned this issue Jul 27, 2022

Test result in multivariate dataset without pretrain #19

Closed

Woutah closed this as completed Feb 6, 2023

Woutah reopened this Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing procedure #11

Testing procedure #11

Woutah commented Mar 30, 2022

gzerveas commented Mar 31, 2022 •

edited

Loading

Woutah commented Mar 31, 2022

gzerveas commented Mar 31, 2022

Woutah commented Apr 4, 2022

gzerveas commented Apr 5, 2022 •

edited

Loading

donghucey commented Apr 13, 2022

Woutah commented Feb 6, 2023

Woutah commented Feb 14, 2023

gzerveas commented Feb 25, 2023

Woutah commented Mar 3, 2023

Woutah commented Mar 7, 2023

Testing procedure #11

Testing procedure #11

Comments

Woutah commented Mar 30, 2022

gzerveas commented Mar 31, 2022 • edited Loading

Woutah commented Mar 31, 2022

gzerveas commented Mar 31, 2022

Woutah commented Apr 4, 2022

gzerveas commented Apr 5, 2022 • edited Loading

donghucey commented Apr 13, 2022

Woutah commented Feb 6, 2023

Woutah commented Feb 14, 2023

gzerveas commented Feb 25, 2023

Woutah commented Mar 3, 2023

Woutah commented Mar 7, 2023

gzerveas commented Mar 31, 2022 •

edited

Loading

gzerveas commented Apr 5, 2022 •

edited

Loading