You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While training, in tensorboard, or /sim/results/log_test the average reward value fluctuates a lot.
Is it supposed to be like this? Below is the test result of models by every 100 epoch. ( I used 0.5 as initial entropy and set entropy decaying with 0.99998
Second, How did you selected the best model. Was it based on how high the test result was?
Lastly, how did you came to conclusion that the model generalize well in real world. I'm concerned that on new real word test set, the test result could decrease.
Thank you!
The text was updated successfully, but these errors were encountered:
quito418
changed the title
Why training /test reward value fluctuate a lot
Why training/test reward value fluctuate a lot
Jan 16, 2018
In our experience the learning curve climbs more steadily. You might already converge to a good policy based on this figure, can you compare against our pre-trained model?
We select the model based on a validation set. You can divide the training trace to a training and validation set.
Please refer to the paper section 5.3 for generalization. Synthetic trace is also provided. But as you suggested, if the network characteristics change drastically from training to actual deployment, the performance could degrade. It's also interesting to do (refined) training online.
Hello, Hongzimao
While training, in tensorboard, or /sim/results/log_test the average reward value fluctuates a lot.
Is it supposed to be like this? Below is the test result of models by every 100 epoch. ( I used 0.5 as initial entropy and set entropy decaying with 0.99998
Second, How did you selected the best model. Was it based on how high the test result was?
Lastly, how did you came to conclusion that the model generalize well in real world. I'm concerned that on new real word test set, the test result could decrease.
Thank you!
The text was updated successfully, but these errors were encountered: