UCF101 datasets with DataLoader returns error when stacking examples in mini-batches #2265

AndreaCossu · 2020-05-27T08:27:25Z

🐛 Bug

UCF101 dataset returns a RunTimeError when combined with standard DataLoader class. It returns the error when trying to stack multiple tensors in batches.

To Reproduce

import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

tfs = transforms.Compose([
            transforms.Lambda(lambda x: x / 255.), # scale in [0, 1]
            transforms.Lambda(lambda x: x.permute(0, 3, 1, 2) ) # reshape into (T, C, H, W)
    ])


# root, root_labels are the directories containing data and labels
d = datasets.UCF101(root, root_labels, frames_per_clip=25, step_between_clips=25, train=False, transform=tfs)
dataset = DataLoader(d, batch_size=7, shuffle=True, drop_last=True)

for i, (v, a, l) in enumerate(dataset):  # <- RunTimeError occurs here
    pass

RuntimeError                              Traceback (most recent call last)
      1 print(len(dataset))
----> 2 for i, (v, a, l) in enumerate(dataset):
      3     pass

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    343 
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1
    347         if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    383     def _next_data(self):
    384         index = self._next_index()  # may raise StopIteration
--> 385         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    386         if self._pin_memory:
    387             data = _utils.pin_memory.pin_memory(data)

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     77     elif isinstance(elem, container_abcs.Sequence):
     78         transposed = zip(*batch)
---> 79         return [default_collate(samples) for samples in transposed]
     80 
     81     raise TypeError(default_collate_err_msg_format.format(elem_type))

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in <listcomp>(.0)
     77     elif isinstance(elem, container_abcs.Sequence):
     78         transposed = zip(*batch)
---> 79         return [default_collate(samples) for samples in transposed]
     80 
     81     raise TypeError(default_collate_err_msg_format.format(elem_type))

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     53             storage = elem.storage()._new_shared(numel)
     54             out = elem.new(storage)
---> 55         return torch.stack(batch, 0, out=out)
     56     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     57             and elem_type.__name__ != 'string_':

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 2 and 1 in dimension 1 at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensor.cpp:612

Expected behavior

The iteration over the dataloader should return a video tensor of size (B, T, C, H, W) where B is batch size, T is the number of frames, C are the image channels and H and W the image dimensions.

Environment

PyTorch / torchvision Version: 1.4.0 / 0.5.0
OS (e.g., Linux): CentOS Linux 7
How you installed PyTorch / torchvision (conda, pip, source): conda
Python version: 3.7.3

The text was updated successfully, but these errors were encountered:

fmassa · 2020-05-29T13:32:29Z

Thanks for the bug report!

I see that you are using torchvision 0.5.0, would it be possible to try seeing if you reproduce the error with the latest stable release (0.6.0)?

AndreaCossu · 2020-05-30T09:16:04Z

Yes, of course:

PyTorch version: 1.5.0
Torchvision version: 0.6.0a0+82fd1c8

Same exact code as above returns:

/home/cossu/miniconda3/lib/python3.7/site-packages/torchvision/io/video.py:104: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  "The pts_unit 'pts' gives wrong results and will be removed in a "
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-db8cb3bb2399> in <module>
      1 print(len(dataset))
----> 2 for i, (v, a, l) in enumerate(dataset):
      3     pass

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    343 
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1
    347         if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    383     def _next_data(self):
    384         index = self._next_index()  # may raise StopIteration
--> 385         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    386         if self._pin_memory:
    387             data = _utils.pin_memory.pin_memory(data)

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     77     elif isinstance(elem, container_abcs.Sequence):
     78         transposed = zip(*batch)
---> 79         return [default_collate(samples) for samples in transposed]
     80 
     81     raise TypeError(default_collate_err_msg_format.format(elem_type))

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in <listcomp>(.0)
     77     elif isinstance(elem, container_abcs.Sequence):
     78         transposed = zip(*batch)
---> 79         return [default_collate(samples) for samples in transposed]
     80 
     81     raise TypeError(default_collate_err_msg_format.format(elem_type))

~/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     53             storage = elem.storage()._new_shared(numel)
     54             out = elem.new(storage)
---> 55         return torch.stack(batch, 0, out=out)
     56     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     57             and elem_type.__name__ != 'string_':

RuntimeError: stack expects each tensor to be equal size, but got [2, 28800] at entry 0 and [1, 28800] at entry 6

On a side note, I tried to set _video_min_dimension=3 in the UCF101 constructor in order to prevent clips with very few frames to be taken into consideration. However, when looping over the final dataloader I obtain the error: pyav backend doesn't support _video_min_dimension != 0 (which explains why the parameter is preceded by an underscore and not listed in the documentation, of course :) )

Thank you for any help you may provide!

fmassa · 2020-06-05T09:13:21Z

Did you try rescaling the videos so that they have the same size?
In the video-classification reference examples, we have some example transforms (which will be soon merged to torchvision master) https://github.com/pytorch/vision/blob/master/references/video_classification/transforms.py that are used in

vision/references/video_classification/train.py

Lines 121 to 127 in 11a39aa

    
           transform_train = torchvision.transforms.Compose([ 
        
               T.ToFloatTensorInZeroOne(), 
        
               T.Resize((128, 171)), 
        
               T.RandomHorizontalFlip(), 
        
               normalize, 
        
               T.RandomCrop((112, 112)) 
        
           ])

for video classification and it works fine. Maybe that's the issue you are facing?

AndreaCossu · 2020-06-05T10:13:54Z

Thanks for the tip. I tried to resize the video, without success.
However, by looking more closely at the error message, the RunTimeError is raised on tensors of size (2, 28800) vs. (1, 28800). This is exactly the size of audio sample returned by UCF101 as second element of the 3-uple video, audio, label.
Maybe the problem is on audio, not on videos. Is there a way to make UCF101 returning only a pair video, label? Or to apply a transformation only to the audio in order to make all samples (1, 28800)?

fmassa · 2020-06-05T13:11:18Z

Oh, I see, yes, this is an issue with the audio signal.

I would propose that you write a custom collate_fn that you pass to your DataLoader which will remove the audio file, maybe something like

def custom_collate(batch):
    filtered_batch = []
    for video, _, label in batch:
        filtered_batch.append((video, label))
    torch.utils.data.dataloader.default_collate(filtered_batch)

AndreaCossu · 2020-06-05T14:08:17Z

Perfect, that's working! I just added return to your last line:

def custom_collate(batch):
    filtered_batch = []
    for video, _, label in batch:
        filtered_batch.append((video, label))
    return torch.utils.data.dataloader.default_collate(filtered_batch)

Final working code

import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

tfs = transforms.Compose([
             # scale in [0, 1]
             transforms.Lambda(lambda x: x / 255.),
             # reshape into (T, C, H, W)
             transforms.Lambda(lambda x: x.permute(0, 3, 1, 2) ) 
    ])


# root, root_labels are the directories containing data and labels
d = datasets.UCF101(root, root_labels, frames_per_clip=25, 
        step_between_clips=25, train=False, transform=tfs)
dataset = DataLoader(d, batch_size=7, shuffle=True, 
        drop_last=True, collate_fn=custom_collate)

for i, (v, l) in enumerate(dataset): 
    print(v.size())
    print(l)
    break

Thank you very much!!

pevogam · 2020-08-01T17:34:39Z

Hi @fmassa, could we somehow integrate this collate suggestion into the UCF101 dataset/loader classes or at least offer the user to optionally drop the audio from the return arguments? As some UCF101 videos don't have audio and others have one channel resulting in different [1, 0], [1, 11520] or [2, 11520] shapes (from what I have gathered so far) this will always be a part of the UCF101 dataset and everybody using these classes will have to manually dig up this issue and copy the custom collate implementation. Could there be a default workaround to be part of these dataset classes?

pevogam · 2020-08-01T17:35:48Z

Probably even just mentioning this in the UCF101 dataset class documentation would be useful.

fmassa added bug help wanted high priority module: io labels May 29, 2020

AndreaCossu closed this as completed Jun 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCF101 datasets with DataLoader returns error when stacking examples in mini-batches #2265

UCF101 datasets with DataLoader returns error when stacking examples in mini-batches #2265

AndreaCossu commented May 27, 2020

fmassa commented May 29, 2020

AndreaCossu commented May 30, 2020 •

edited

Loading

fmassa commented Jun 5, 2020

AndreaCossu commented Jun 5, 2020

fmassa commented Jun 5, 2020

AndreaCossu commented Jun 5, 2020 •

edited

Loading

pevogam commented Aug 1, 2020

pevogam commented Aug 1, 2020

UCF101 datasets with DataLoader returns error when stacking examples in mini-batches #2265

UCF101 datasets with DataLoader returns error when stacking examples in mini-batches #2265

Comments

AndreaCossu commented May 27, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

fmassa commented May 29, 2020

AndreaCossu commented May 30, 2020 • edited Loading

fmassa commented Jun 5, 2020

AndreaCossu commented Jun 5, 2020

fmassa commented Jun 5, 2020

AndreaCossu commented Jun 5, 2020 • edited Loading

pevogam commented Aug 1, 2020

pevogam commented Aug 1, 2020

AndreaCossu commented May 30, 2020 •

edited

Loading

AndreaCossu commented Jun 5, 2020 •

edited

Loading