-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: How can I manage the reproducibility of an experiment? #35
Comments
Another thing, I just notice that parameter value is updated on each step. I currently was to do a class and call it in a step callback. class EpisodicDecay:
def __init__(self, parameter):
self.parameter = parameter
self._init_table = parameter._n_updates.table.copy()
def __call__(self, dataset):
if dataset[-1][-1]: # episode has ended
self._init_table += 1
self.parameter._n_updates.table = self._init_table.copy() |
You need also to set the numpy seed, as it influences the policy. In general, we cannot write a general method to set the seed, as many libraries could use different random generators. For the parameter decay, the one that you propose is the only supported way to achieve that behavior. That's exactly one of the use cases of callbacks. |
ooh right! And about the episodic decay parameter, that's ok, I can handle that. Thanks! |
I get
|
see my answer to #78 |
Hi,
I'm currently setting a seed value for the environment using its seed methods, but when I run it multiple times I get very different results.
I know this issue involves many variables that the results may diverge, but I was wondering if there is any another parameter to be set in order to reduce this divergence?
I'm using your advance experiment as base to run a Gym.CartPole environment with LinearDecay epsilon and learning rate parameters just to learn how your library works, because this problem I coded from scratch with Q-learning with successfully results.
My code is something like this
Currently is not learning, but that is not the issue, just the different obtained results.
Thanks,
The text was updated successfully, but these errors were encountered: