Error running video classification toturial #640

ChristianEschen · 2020-11-04T16:04:13Z

Running step 5 in the the video classification toturial:
import time
import os

from classy_vision.trainer import LocalTrainer
from classy_vision.hooks import CheckpointHook
from classy_vision.hooks import LossLrMeterLoggingHook

hooks = [LossLrMeterLoggingHook(log_freq=4)]

checkpoint_dir = f"/tmp/classy_checkpoint_{time.time()}"
os.mkdir(checkpoint_dir)
hooks.append(CheckpointHook(checkpoint_dir, input_args={}))

task = task.set_hooks(hooks)

trainer = LocalTrainer()

gives me the following errror:

RuntimeError Traceback (most recent call last)
in ()
15
16 trainer = LocalTrainer()
---> 17 trainer.train(task)

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py in train(self, task)
25 set_cpu_device()
26
---> 27 super().train(task)

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py in train(self, task)
43 task.on_start()
44 while not task.done_training():
---> 45 task.on_phase_start()
46 while True:
47 try:

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in on_phase_start(self)
943 self.phase_start_time_total = time.perf_counter()
944
--> 945 self.advance_phase()
946
947 for hook in self.hooks:

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in advance_phase(self)
845 # Re-build dataloader & re-create iterator anytime membership changes.
846 self._recreate_data_loader_from_dataset()
--> 847 self.create_data_iterator()
848 # Set up pytorch module in train vs eval mode, update optimizer.
849 self._set_model_train_mode()

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in create_data_iterator(self)
898 # are cleaned up.
899 del self.data_iterator
--> 900 self.data_iterator = iter(self.dataloaders[self.phase_type])
901
902 def _set_model_train_mode(self):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in iter(self)
350 return self._iterator
351 else:
--> 352 return self._get_iterator()
353
354 @Property

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _get_iterator(self)
292 return _SingleProcessDataLoaderIter(self)
293 else:
--> 294 return _MultiProcessingDataLoaderIter(self)
295
296 @Property

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in init(self, loader)
825 _utils.signal_handling._set_SIGCHLD_handler()
826 self._worker_pids_set = True
--> 827 self._reset(loader, first_iter=True)
828
829 def _reset(self, loader, first_iter=False):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _reset(self, loader, first_iter)
855 # prime the prefetch loop
856 for _ in range(self._prefetch_factor * self._num_workers):
--> 857 self._try_put_index()
858
859 def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _try_put_index(self)
1089
1090 try:
-> 1091 index = self._next_index()
1092 except StopIteration:
1093 return

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _next_index(self)
425
426 def _next_index(self):
--> 427 return next(self._sampler_iter) # may raise StopIteration
428
429 def _next_data(self):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/sampler.py in iter(self)
225 def iter(self):
226 batch = []
--> 227 for idx in self.sampler:
228 batch.append(idx)
229 if len(batch) == self.batch_size:

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torchvision/datasets/samplers/clip_sampler.py in iter(self)
94
95 if isinstance(self.dataset, Sampler):
---> 96 orig_indices = list(iter(self.dataset))
97 indices = [orig_indices[i] for i in indices]
98

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/dataset/classy_video_dataset.py in iter(self)
45 num_samples = len(self)
46 n = 0
---> 47 for clip in self.clip_sampler:
48 if n < num_samples:
49 yield clip

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torchvision/datasets/samplers/clip_sampler.py in iter(self)
173 s += length
174 idxs.append(sampled)
--> 175 idxs_ = torch.cat(idxs)
176 # shuffle all clips randomly
177 perm = torch.randperm(len(idxs_))

RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

My setup is the following:

PyTorch Version (e.g., 1.0): 1.7.0
OS (e.g., Linux): Ubuntu 18,04
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source):
Python version: 3.6.11
CUDA/cuDNN version: 11.0
GPU models and configuration: 1x RTX 2080 TI
Any other relevant information: Classy_vision is installed using pip

The text was updated successfully, but these errors were encountered:

mannatsingh · 2020-11-05T05:08:25Z

Hi @ChristianEschen that's a weird error which I haven't seen before. Can you print the output of the following lines -

for phase in ["train", "test"]:
    iterator = datasets[phase].iterator()
    count = 0
    for _ in iterator:
        count += 1
        if count >= 10:
            break
    print(phase)
    print(count)

Also, which exact version of Python are you using (like 3.6.2) and how did you install classy?

ChristianEschen · 2020-11-05T09:43:41Z

I get the same error as presented above.
I use python 3.6.11.
it is installed using pip install classy_vision.

mannatsingh · 2020-11-06T03:40:53Z

Ah, I just noticed, your CUDA version is 11.0 - that isn't supported by Classy Vision yet. Can you try downgrading to CUDA 10.2 and running this?

cc @vreis , @jackhamburger since you guys had worked with CUDA 11.0, do you think this could be related?

ChristianEschen · 2020-11-06T07:32:46Z

Hi again,

I figured out that my ufc-101 dataset was not in the correct format.
This means I had a "flatten" data structure.

So it was an error 40, indicating the error was 40 centimeters from the device...
Thanks anyway.

mannatsingh · 2020-11-06T15:05:15Z

Got it, I had figured that the dataset would throw an exception during initialization if there was a data error. Do you mind mentioning what the exact issue was and how you fixed it, for future users? :)

failable · 2020-12-07T06:38:03Z

I got the same issue.

The test snippet does not work for me @mannatsingh

Traceback (most recent call last):
  File "video_classification.py", line 120, in <module>
    for _ in iterator:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 352, in __iter__
    return self._get_iterator()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 827, in __init__
    self._reset(loader, first_iter=True)
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 857, in _reset
    self._try_put_index()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1091, in _try_put_index
    index = self._next_index()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 427, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 227, in __iter__
    for idx in self.sampler:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torchvision/datasets/samplers/clip_sampler.py", line 87, in __iter__
    orig_indices = list(iter(self.dataset))
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/classy_vision/dataset/classy_video_dataset.py", line 47, in __iter__
    for clip in self.clip_sampler:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torchvision/datasets/samplers/clip_sampler.py", line 167, in __iter__
    idxs = torch.cat(idxs)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

I got a 24M ucf101_metadata.pt and I assume my dataset format is correct?

>>> import torch
>>> a = torch.load('ucf101_metadata.pt')
>>> a.keys()
dict_keys(['video_paths', 'video_pts', 'video_fps'])
>>> len(a['video_paths'])
13320
>>> len(a['video_pts'])
13320
>>> len(a['video_fps'])
13320
>>>

BTW, I came from this issue, and have

for phase in ["train", "test"]:
    task.set_dataset(datasets[phase], phase)
    task.set_dataloader_mp_context('fork')

in the video classification tutorial following the suggestion in the mentioned issue. And setting the option to fork, spawn and forkserver or setting num_workers to 0 caused the same issue.

Yevgnen · 2020-12-07T12:25:41Z

I encountered the same issue. The issue is probably related to torchvision upstream and is fixed in this commit. If one set the data directory with a slash suffix like

# set it to the folder where video files are saved
video_dir = "/path/to/ucf101/"

The indice will become [] before this commit and cause RuntimeError: There were no tensor arguments to this function. It's a bit unfriendly torchvision itself does not print any warning or raise errors.

Note that any unexpected dataset format may also cause the issue. Updating torchvision fixed my issue.

mannatsingh · 2020-12-10T03:12:28Z

Thanks so much @Yevgnen for the suggestion!

@liebkne I've verified that your metadata file looks correct - can you try @Yevgnen 's suggestion and see if that works for you?

ChristianEschen closed this as completed Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error running video classification toturial #640

Error running video classification toturial #640

ChristianEschen commented Nov 4, 2020 •

edited

mannatsingh commented Nov 5, 2020

ChristianEschen commented Nov 5, 2020

mannatsingh commented Nov 6, 2020

ChristianEschen commented Nov 6, 2020

mannatsingh commented Nov 6, 2020

failable commented Dec 7, 2020 •

edited

Yevgnen commented Dec 7, 2020

mannatsingh commented Dec 10, 2020

Error running video classification toturial #640

Error running video classification toturial #640

Comments

ChristianEschen commented Nov 4, 2020 • edited

mannatsingh commented Nov 5, 2020

ChristianEschen commented Nov 5, 2020

mannatsingh commented Nov 6, 2020

ChristianEschen commented Nov 6, 2020

mannatsingh commented Nov 6, 2020

failable commented Dec 7, 2020 • edited

Yevgnen commented Dec 7, 2020

mannatsingh commented Dec 10, 2020

ChristianEschen commented Nov 4, 2020 •

edited

failable commented Dec 7, 2020 •

edited