I can't get deterministic results... even after setting the seed before each run. #280

spott · 2018-07-12T20:27:31Z

I created a callback to reset the seed before each training run, in order to try and nail down data reproduction:

from skorch.callbacks import Callback

class FixRandomSeed(Callback):
    
    def __init__(self, seed=42):
        self.seed = 42
    
    def on_train_begin(self, net, **kwargs):
        print("setting random seed to: ",self.seed)
        torch.manual_seed(self.seed)
        torch.cuda.manual_seed(self.seed)
        
        try:
            random.seed(self.seed)
        except NameError:
            import random
            random.seed(self.seed)

        np.random.seed(self.seed)
        torch.backends.cudnn.deterministic=True

However, even when I add this to my model (and it runs before training starts), I get non-deterministic results.

I assume that the reason for that is the data loader is being seeded before the training loop gets around to calling the FixRandomSeed callback. Is there an idiomatic way to fix this? or do I have to fix all the seeds before I run net.fit?

The text was updated successfully, but these errors were encountered:

spott · 2018-07-12T20:36:19Z

Figured it out: I need to have the random seed stuff in the initialize method, rather than the on_train_begin:

from skorch.callbacks import Callback

class FixRandomSeed(Callback):
    
    def __init__(self, seed=42):
        self.seed = 42
    
    def initialize(self):
        print("setting random seed to: ",self.seed)
        torch.manual_seed(self.seed)
        torch.cuda.manual_seed(self.seed)
        
        try:
            random.seed(self.seed)
        except NameError:
            import random
            random.seed(self.seed)

        np.random.seed(self.seed)
        torch.backends.cudnn.deterministic=True

taketwo · 2018-07-12T20:38:50Z

Would it make sense to add this callback to skorch?

spott · 2018-07-12T20:41:41Z

I think so, deterministic behavior can be nice to have.

I don't know if the way I'm doing it is the "right" way (I basically just found all the "seed" parameters I could and set them...), and I think you can run into issues if your DataSet uses a random number generator that isn't the pytorch rng, cause the workers will then all have the same numpy RND for example.

Let me know if you want it and I can create a pull request.

BenjaminBossan · 2018-07-12T20:42:43Z

Great that it worked.

Usually, when I fix seeds at the start of my script, I never have trouble getting deterministic results, so I can't really tell why this is necessary here (though I'm not sure that determinism can be guaranteed in each case when using parallelism and/or CUDA).

BenjaminBossan · 2018-07-12T20:44:57Z

Also, I could imagine that fixing the seed at the start of the training (as the callback would do) is not always desirable. If you're not careful, it could happen that you want to run an experiment X times and average the results, but really you're having the exact same outcome each time.

spott · 2018-07-12T20:47:30Z

This is more when playing around with things in Jupyter (you don't always run things in order), or conceivably when doing a Gridsearch (if you want to actually compare like vs. like, and not seed vs seed).

If you're not careful, it could happen that you want to run an experiment X times and average the results, but really you're having the exact same outcome each time.

There are definitely times when you don't want this, I completely agree.

benjamin-work · 2018-07-13T07:52:59Z

I can see that sometimes it can be practical to do as you suggest. But I'm not sure it helps to avoid "seed vs seed" comparison in a grid search. It could well be that a particular seed happens to work better with these hyper parameters than with those hyper parameters, when another seed wouldn't.

Regarding the question whether to include this in skorch, here are my reasons why I would avoid it:

I would rather not have a skorch solution for fixing seeds. Better to leave this job to numpy, pytorch etc., keeping with the way the users already do it.
The callback could introduce hard to spot bugs (as mentioned earlier).
It introduces coupling between components (e.g. if we ever need to move initialize_callbacks after initialize_module, this callback would no longer do its job).

On the other hand, I see no problem with using "hacky" solutions like the one proposed above during quick experimentation. Our goal with skorch was to make such things possible.

taketwo · 2018-07-13T13:03:43Z

I see your point. Indeed, this is perhaps too "hacky" to be promoted to an "official" callback. But as you admitted, it's okay to use this. I just thought that this snippet could be quite helpful and was wondering what we can do to increase its discoverability.

benjamin-work · 2018-07-13T13:10:32Z

I just thought that this snippet could be quite helpful and was wondering what we can do to increase its discoverability

I see two possibilities here, with decreasing "officiality":

add the callback to helpers.py and strongly document when to use it and when not
add the code to the FAQ

Still, I might have to see a complete example where this is really needed first.

taketwo · 2018-07-13T13:13:56Z

Okay, I'll keep this thread in mind. If I ever need to use a fixed seed and find this useful, I'll send a PR to add it to FAQ.

benjamin-work · 2018-07-30T08:11:09Z

@spott Not sure if this is related to your issue, but there was a bugfix with regards to random seeds in pytorch 0.4.1 which might help you: pytorch/pytorch#7886

BenjaminBossan · 2020-04-19T13:44:00Z

This discussion seems to be finished to me. Please re-open if there is still a need to discuss.

thomasjpfan mentioned this issue May 5, 2019

Method to set random state for all components #33

Open

BenjaminBossan closed this as completed Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can't get deterministic results... even after setting the seed before each run. #280

I can't get deterministic results... even after setting the seed before each run. #280

spott commented Jul 12, 2018 •

edited

Loading

spott commented Jul 12, 2018

taketwo commented Jul 12, 2018

spott commented Jul 12, 2018

BenjaminBossan commented Jul 12, 2018

BenjaminBossan commented Jul 12, 2018

spott commented Jul 12, 2018

benjamin-work commented Jul 13, 2018

taketwo commented Jul 13, 2018

benjamin-work commented Jul 13, 2018

taketwo commented Jul 13, 2018

benjamin-work commented Jul 30, 2018

BenjaminBossan commented Apr 19, 2020

I can't get deterministic results... even after setting the seed before each run. #280

I can't get deterministic results... even after setting the seed before each run. #280

Comments

spott commented Jul 12, 2018 • edited Loading

spott commented Jul 12, 2018

taketwo commented Jul 12, 2018

spott commented Jul 12, 2018

BenjaminBossan commented Jul 12, 2018

BenjaminBossan commented Jul 12, 2018

spott commented Jul 12, 2018

benjamin-work commented Jul 13, 2018

taketwo commented Jul 13, 2018

benjamin-work commented Jul 13, 2018

taketwo commented Jul 13, 2018

benjamin-work commented Jul 30, 2018

BenjaminBossan commented Apr 19, 2020

spott commented Jul 12, 2018 •

edited

Loading