Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How can I manage the reproducibility of an experiment? #35

Closed
angel-ayala opened this issue Aug 14, 2020 · 5 comments
Closed

Comments

@angel-ayala
Copy link

Hi,
I'm currently setting a seed value for the environment using its seed methods, but when I run it multiple times I get very different results.
I know this issue involves many variables that the results may diverge, but I was wondering if there is any another parameter to be set in order to reduce this divergence?

I'm using your advance experiment as base to run a Gym.CartPole environment with LinearDecay epsilon and learning rate parameters just to learn how your library works, because this problem I coded from scratch with Q-learning with successfully results.

My code is something like this

environment = Gym(name='CartPole-v0', horizon=np.inf, gamma=1.)
environment.seed(seed_val)

# Policy
linear_epsilon = LinearParameter(0.9, 0.1, n=episodes//2)
pi = EpsGreedy(epsilon=linear_epsilon)

# state codification
n_tilings = 1
tilings = Tiles.generate(n_tilings, [1, 1, 6, 3],
                         environment.info.observation_space.low,
                         environment.info.observation_space.high)
features = Features(tilings=tilings)

approximator_params = dict(input_shape=(features.size,),
                           output_shape=(environment.info.action_space.n,),
                           n_actions=environment.info.action_space.n)

# agent
linear_alpha = LinearParameter(0.5, 0.1, n=episodes//2)
agent = SARSALambdaContinuous(environment.info, pi, LinearApproximator,
                              approximator_params=approximator_params,
                              learning_rate=linear_alpha,
                              lambda_coeff=.9, features=features)

Currently is not learning, but that is not the issue, just the different obtained results.

Thanks,

@angel-ayala
Copy link
Author

Another thing, I just notice that parameter value is updated on each step.
Is there anyway to do this when the episode ends?

I currently was to do a class and call it in a step callback.

class EpisodicDecay:
    def __init__(self, parameter):
        self.parameter = parameter
        self._init_table = parameter._n_updates.table.copy()
        
    def __call__(self, dataset):
        if dataset[-1][-1]: # episode has ended
            self._init_table += 1
        self.parameter._n_updates.table = self._init_table.copy()

@boris-il-forte
Copy link
Collaborator

boris-il-forte commented Aug 14, 2020

You need also to set the numpy seed, as it influences the policy.
If you add torch, you should also set that seed too.

In general, we cannot write a general method to set the seed, as many libraries could use different random generators.
e.g. the enviroment seed method is only for gym enviroments, all the others use default numpy seed.

For the parameter decay, the one that you propose is the only supported way to achieve that behavior. That's exactly one of the use cases of callbacks.

@angel-ayala
Copy link
Author

ooh right!
Yes I know how do that, and make sense. Thanks!

And about the episodic decay parameter, that's ok, I can handle that.

Thanks!

@NishanthVAnand
Copy link

I get NotImplementedError when I try to set seed on some of the envrionments. Any thoughts on how to fix it?

env = PuddleWorld()
env.seed(seed)
 File  "/python3.8/site-packages/mushroom_rl/core/environment.py", line 137, in seed
     raise NotImplementedError
 NotImplementedError

@boris-il-forte
Copy link
Collaborator

see my answer to #78

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants