How to fix the order of data in iterator during training step? #828

seewoo5 · 2020-06-17T06:23:45Z

❓ Questions and Help

Description

Currently, I'm running experiments with several datasets in torchtext, and I just found that I can't reproduce my experiments although I excluded all the possible randomness as following:

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)
np.random.seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

I found that, when Iterator class is initialized, RandomShuffler() defined in torchtext.data.utils is set as a self.random_shuffler, and this is used to shuffle data in training dataset. However, although one can set random state of RandomShuffler by feeding it as an argument of it, the line self.random_shuffler = RandomShuffler() doesn't let us to manually set the random state of it. Am I right? Is there a way to fix the order of data for training step?

The text was updated successfully, but these errors were encountered:

zhangguanheng66 · 2020-06-17T13:58:19Z

@bencwallace mentioned this temporary fix. #522 (comment)

In the end, we will switch to torch.utils.data.DataLoader, which support deterministic sampling. Could you tell me which datasets are you using now? I may help you with our experimental datasets.

seewoo5 · 2020-06-18T02:19:16Z

Thanks! I'm using Iterator instead of BucketIterator with SST dataset (fine-grained classification). I just tried the way you mentioned, but it still doesn't work. Before I call Iterator.splits, I added the following 3 lines of code:

random.seed(ARGS.random_seed)
rand_st = random.getstate()
random.setstate(rand_st)

where ARGS.random_seed is a fixed integer (1). I thought that fixing a random seed also fixes random state, but this doesn't seem to be the right way. Could you help me further?

bencwallace · 2020-06-18T02:31:18Z

Strange, I think the work-around should work for Iterator as well. You're calling Iterator.splits right?

By the way, I'm pretty sure you're right that seeding and then calling getstate and setstate should have the same effect as just seeding. You should be able to fix the random state of the iterator just by calling random.seed. Maybe try a small, self-contained test to make sure there isn't some other source of randomness.

seewoo5 · 2020-06-18T08:35:28Z

Yes. Here's a plot of train accuracy and train loss for SST-5 data (obtained with wandb): two experiments are running with exactly same script. They are "almost" same but different.

bencwallace · 2020-06-18T15:11:00Z

That's somewhat disconcerting. Please let me know if you find out the problem! I'm trying to maintain reproducibility in a project of my own.

zhangguanheng66 added the new datasets and building blocks label Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fix the order of data in iterator during training step? #828

How to fix the order of data in iterator during training step? #828

seewoo5 commented Jun 17, 2020

zhangguanheng66 commented Jun 17, 2020

seewoo5 commented Jun 18, 2020

bencwallace commented Jun 18, 2020

seewoo5 commented Jun 18, 2020

bencwallace commented Jun 18, 2020

How to fix the order of data in iterator during training step? #828

How to fix the order of data in iterator during training step? #828

Comments

seewoo5 commented Jun 17, 2020

❓ Questions and Help

zhangguanheng66 commented Jun 17, 2020

seewoo5 commented Jun 18, 2020

bencwallace commented Jun 18, 2020

seewoo5 commented Jun 18, 2020

bencwallace commented Jun 18, 2020