DataBlock.dataloaders does not support the advertised "shuffle" argument #3133

juliangilbey · 2021-01-07T17:27:37Z

Please confirm you have the latest versions of fastai, fastcore, and nbdev prior to reporting a bug (delete one): YES
fastai 2.2.2, fastcore 1.3.16, nbdev 1.1.6

Describe the bug
The DataBlock documentation describes the https://docs.fast.ai/data.block.html#DataBlock.dataloaders function and explicitly says that there is an optional kwarg shuffle. However, using this results in the error:

TypeError: fastai.data.core.TfmdDL() got multiple values for keyword argument 'shuffle'

To Reproduce
Take the notebook for Chapter 2 of the fastai book, and change the cell which reads dls = bears.dataloaders(path) to read dls = bears.dataloaders(path, shuffle=True) instead.

Expected behavior
It should perform shuffling as requested.

Error with full stack trace

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-cb46ba7faa34> in <module>
----> 1 dls = bears.dataloaders(path, shuffle=True)

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastai/data/block.py in dataloaders(self, source, path, verbose, **kwargs)
    113         dsets = self.datasets(source, verbose=verbose)
    114         kwargs = {**self.dls_kwargs, **kwargs, 'verbose': verbose}
--> 115         return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)
    116 
    117     _docs = dict(new="Create a new `DataBlock` with other `item_tfms` and `batch_tfms`",

~/fast.ai.course/fastai-venv/lib/python3.9/site-packages/fastai/data/core.py in dataloaders(self, bs, val_bs, shuffle_train, n, path, dl_type, dl_kwargs, device, **kwargs)
    211         if dl_type is None: dl_type = self._dl_type
    212         drop_last = kwargs.pop('drop_last', shuffle_train)
--> 213         dl = dl_type(self.subset(0), bs=bs, shuffle=shuffle_train, drop_last=drop_last, n=n, device=device,
    214                      **merge(kwargs, dl_kwargs[0]))
    215         dls = [dl] + [dl.new(self.subset(i), bs=(bs if val_bs is None else val_bs), shuffle=False, drop_last=False,

TypeError: fastai.data.core.TfmdDL() got multiple values for keyword argument 'shuffle'

Additional context
The cause of the error is clear, though the best way to solve it is not as obvious. The DataBlock.dataloaders() method:

fastai/fastai/data/block.py

Lines 112 to 115 in 373050b

    
           def dataloaders(self, source, path='.', verbose=False, **kwargs): 
        
               dsets = self.datasets(source, verbose=verbose) 
        
               kwargs = {**self.dls_kwargs, **kwargs, 'verbose': verbose} 
        
               return dsets.dataloaders(path=path, after_item=self.item_tfms, after_batch=self.batch_tfms, **kwargs)

just passes most of the keyword arguments through to its Datasets object, which in turn is a subclass of FilteredBase, so inherits the dataloaders() method from there. This method is as follows:

fastai/fastai/data/core.py

Lines 207 to 217 in d1cf81c

    
           def dataloaders(self, bs=64, val_bs=None, shuffle_train=True, n=None, path='.', dl_type=None, dl_kwargs=None, 
        
                           device=None, **kwargs): 
        
               if device is None: device=default_device() 
        
               if dl_kwargs is None: dl_kwargs = [{}] * self.n_subsets 
        
               if dl_type is None: dl_type = self._dl_type 
        
               drop_last = kwargs.pop('drop_last', shuffle_train) 
        
               dl = dl_type(self.subset(0), bs=bs, shuffle=shuffle_train, drop_last=drop_last, n=n, device=device, 
        
                            **merge(kwargs, dl_kwargs[0])) 
        
               dls = [dl] + [dl.new(self.subset(i), bs=(bs if val_bs is None else val_bs), shuffle=False, drop_last=False, 
        
                                    n=None, **dl_kwargs[i]) for i in range(1, self.n_subsets)] 
        
               return self._dbunch_type(*dls, path=path, device=device)

This has the keyword argument shuffle_train (but not shuffle), but on lines 213-214, calls the dl_type function with an explicit shuffle keyword and then the kwargs passed through (merged with dl_kwargs[0]). So if kwargs includes shuffle, there will be a repeated keyword argument.

I haven't checked whether the same would apply to the other keyword arguments listed on line 213.

One simple way would be to rewrite lines 213-214 as:

        dl = dl_type(self.subset(0),
                     **merge({'bs': bs, 'shuffle': shuffle_train, 'drop_last': drop_last, 'n': n, 'device': device},
                     kwargs, dl_kwargs[0]))

The text was updated successfully, but these errors were encountered:

juliangilbey · 2021-01-07T17:51:24Z

Actually, I see the same issue appears on lines 215-216 as well, if shuffle appears in dl_kwargs[i].

marii-moe · 2021-01-08T04:08:17Z

Was able to reproduce this and taking a look at it now.

tcapelle · 2021-01-08T08:48:27Z

just to understand, what are you trying to shuffle?

juliangilbey · 2021-01-08T11:26:47Z

I was trying to shuffle the validation set (I had a good reason to), but realised that even if the shuffle argument did as it said, it would still not shuffle the validation set, only the training set. So I found a different (perhaps clunky) way to do what I wanted to do. But if someone didn't want to shuffle the training set for some reason, shuffle=False would not work either, so the bug definitely remains, though it's a pretty small one.

marii-moe · 2021-01-12T01:16:42Z

@juliangilbey Just to help with writing this, why are you trying to shuffle the validation set? I wouldn't think we would want to drop the last batch on the validation set so making sure.

marii-moe · 2021-01-12T02:27:13Z

@jph00 For this one I think it might be good to assume keyword arguments apply to train, and can be overridden for the validation set by starting with 'val_'. The key idea below:

default_kwargs = {'bs':bs,'n':None}
def create_dl(): #place holder for creating dataloaders
val_kwargs={k[4:]:v for k,v in kwargs.items() if k.startswith('val_')} #implmenting this is the key idea
#below validation set only
dls = [ create_dl(**merge(kwargs,default_kwargs,val_kwargs,dl_kwargs[i]) ) for i in range(num_dls) ]

Here is a more complete gist for my idea: https://gist.github.com/marii-moe/7374ee9d59400f85b17dc54b7a2c4af0

Here is the original:

fastai/fastai/data/core.py

Line 207 in dded785

    
           def dataloaders(self, bs=64, val_bs=None, shuffle_train=True, n=None, path='.', dl_type=None, dl_kwargs=None,

This would mean removing 'shuffle_train' which has been part of the api as far back as I could track in fastai_dev/fastai2. kwargs was originally included in creating the validation dataloader, but more recently was removed, so less sure about including that one as part of the merge. Unable to find info on why kwargs is not included as part of creating validation dataloaders, but it could have been to simplfy things. Here is the commit that changed it though: a4e2456#diff-b881fa2851a7d60fe30b52d6cd1bc25cd510f0b9ac30de516cd01c3de05af1cbL166

It is important to put all in a merge, as this was we don't end up passing two of the same keyword argument.

juliangilbey · 2021-01-12T12:06:22Z

@juliangilbey Just to help with writing this, why are you trying to shuffle the validation set? I wouldn't think we would want to drop the last batch on the validation set so making sure.

Hi @marii-moe I wanted to run dls.valid.show_batch() to check that things looked OK in the validation set. Unfortunately, the original data is very nicely ordered, so I only see data instances from one class. Being able to shuffle the validation set would help for this.

jph00 · 2021-01-12T18:47:12Z

For this one I think it might be good to assume keyword arguments apply to train, and can be overridden for the validation set by starting with 'val_'.

I like it. Let's leave shuffle_train as well for now, but have it raise a deprecation warning if not None, and have it just set shuffle. How does that sound?

marii-moe self-assigned this Jan 8, 2021

marii-moe added the bug label Jan 8, 2021

jph00 closed this as completed in #3178 Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataBlock.dataloaders does not support the advertised "shuffle" argument #3133

DataBlock.dataloaders does not support the advertised "shuffle" argument #3133

juliangilbey commented Jan 7, 2021

juliangilbey commented Jan 7, 2021

marii-moe commented Jan 8, 2021

tcapelle commented Jan 8, 2021

juliangilbey commented Jan 8, 2021

marii-moe commented Jan 12, 2021

marii-moe commented Jan 12, 2021 •

edited

Loading

juliangilbey commented Jan 12, 2021

jph00 commented Jan 12, 2021

DataBlock.dataloaders does not support the advertised "shuffle" argument #3133

DataBlock.dataloaders does not support the advertised "shuffle" argument #3133

Comments

juliangilbey commented Jan 7, 2021

juliangilbey commented Jan 7, 2021

marii-moe commented Jan 8, 2021

tcapelle commented Jan 8, 2021

juliangilbey commented Jan 8, 2021

marii-moe commented Jan 12, 2021

marii-moe commented Jan 12, 2021 • edited Loading

juliangilbey commented Jan 12, 2021

jph00 commented Jan 12, 2021

marii-moe commented Jan 12, 2021 •

edited

Loading