Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the rollout accuracy in test script is lower than the test accuracy in train script. #6

Closed
albzni opened this issue Apr 27, 2018 · 4 comments

Comments

@albzni
Copy link

albzni commented Apr 27, 2018

Hello!

I have a little doubt.Does the rollout accuracy indicate the success rate? If so, why is it lower than the prediction accuracy? In the Aviv's implementation, the success rate of the 8x8 grid world was as high as 99.6%. Why is the success rate in your experiment relatively low?

Thanks!

@kentsommer
Copy link
Owner

Hi @albzni,

Rollout accuracy does indicate the success rate. I'm not sure what numbers you are getting, however, the success rate numbers over a large sample size are reported at the bottom of the README. Accounting for some randomness these numbers match Aviv's original implementation.

@albzni
Copy link
Author

albzni commented May 2, 2018

Hi @kentsommer ! Thank you for your comments.
After reading the README, I still have some questions.In test script,the n_domains=100, and in your results the Success Rate up to 99.69%. Did you average the results of multiple tests? If not, why would the accuracy between 99% and 100% in 100 domains? And does increasing the number of domains can reduce the randomness of the results?

Thank you!

@kentsommer
Copy link
Owner

kentsommer commented May 2, 2018

@albzni

The success rate is taken over 5000 randomly generated environments as noted in the readme.

The reason for increasing the number of test domains is that it gives a larger sample size and therefore a better indication of actual performance. The higher the number of samples from the full distribution of all possible random environments, the better you can estimate the true performance of the policy.

@albzni
Copy link
Author

albzni commented May 2, 2018

Thank you so much!

@albzni albzni closed this as completed May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants