Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running video classification toturial #640

Closed
ChristianEschen opened this issue Nov 4, 2020 · 8 comments
Closed

Error running video classification toturial #640

ChristianEschen opened this issue Nov 4, 2020 · 8 comments

Comments

@ChristianEschen
Copy link

ChristianEschen commented Nov 4, 2020

Running step 5 in the the video classification toturial:
import time
import os

from classy_vision.trainer import LocalTrainer
from classy_vision.hooks import CheckpointHook
from classy_vision.hooks import LossLrMeterLoggingHook

hooks = [LossLrMeterLoggingHook(log_freq=4)]

checkpoint_dir = f"/tmp/classy_checkpoint_{time.time()}"
os.mkdir(checkpoint_dir)
hooks.append(CheckpointHook(checkpoint_dir, input_args={}))

task = task.set_hooks(hooks)

trainer = LocalTrainer()

gives me the following errror:

RuntimeError Traceback (most recent call last)
in ()
15
16 trainer = LocalTrainer()
---> 17 trainer.train(task)

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py in train(self, task)
25 set_cpu_device()
26
---> 27 super().train(task)

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py in train(self, task)
43 task.on_start()
44 while not task.done_training():
---> 45 task.on_phase_start()
46 while True:
47 try:

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in on_phase_start(self)
943 self.phase_start_time_total = time.perf_counter()
944
--> 945 self.advance_phase()
946
947 for hook in self.hooks:

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in advance_phase(self)
845 # Re-build dataloader & re-create iterator anytime membership changes.
846 self._recreate_data_loader_from_dataset()
--> 847 self.create_data_iterator()
848 # Set up pytorch module in train vs eval mode, update optimizer.
849 self._set_model_train_mode()

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in create_data_iterator(self)
898 # are cleaned up.
899 del self.data_iterator
--> 900 self.data_iterator = iter(self.dataloaders[self.phase_type])
901
902 def _set_model_train_mode(self):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in iter(self)
350 return self._iterator
351 else:
--> 352 return self._get_iterator()
353
354 @Property

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _get_iterator(self)
292 return _SingleProcessDataLoaderIter(self)
293 else:
--> 294 return _MultiProcessingDataLoaderIter(self)
295
296 @Property

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in init(self, loader)
825 _utils.signal_handling._set_SIGCHLD_handler()
826 self._worker_pids_set = True
--> 827 self._reset(loader, first_iter=True)
828
829 def _reset(self, loader, first_iter=False):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _reset(self, loader, first_iter)
855 # prime the prefetch loop
856 for _ in range(self._prefetch_factor * self._num_workers):
--> 857 self._try_put_index()
858
859 def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _try_put_index(self)
1089
1090 try:
-> 1091 index = self._next_index()
1092 except StopIteration:
1093 return

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _next_index(self)
425
426 def _next_index(self):
--> 427 return next(self._sampler_iter) # may raise StopIteration
428
429 def _next_data(self):

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/sampler.py in iter(self)
225 def iter(self):
226 batch = []
--> 227 for idx in self.sampler:
228 batch.append(idx)
229 if len(batch) == self.batch_size:

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torchvision/datasets/samplers/clip_sampler.py in iter(self)
94
95 if isinstance(self.dataset, Sampler):
---> 96 orig_indices = list(iter(self.dataset))
97 indices = [orig_indices[i] for i in indices]
98

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/dataset/classy_video_dataset.py in iter(self)
45 num_samples = len(self)
46 n = 0
---> 47 for clip in self.clip_sampler:
48 if n < num_samples:
49 yield clip

/home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torchvision/datasets/samplers/clip_sampler.py in iter(self)
173 s += length
174 idxs.append(sampled)
--> 175 idxs_ = torch.cat(idxs)
176 # shuffle all clips randomly
177 perm = torch.randperm(len(idxs_))

RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

My setup is the following:

  • PyTorch Version (e.g., 1.0): 1.7.0
  • OS (e.g., Linux): Ubuntu 18,04
  • How you installed PyTorch (conda, pip, source): conda
  • Build command you used (if compiling from source):
  • Python version: 3.6.11
  • CUDA/cuDNN version: 11.0
  • GPU models and configuration: 1x RTX 2080 TI
  • Any other relevant information: Classy_vision is installed using pip
@mannatsingh
Copy link
Contributor

Hi @ChristianEschen that's a weird error which I haven't seen before. Can you print the output of the following lines -

for phase in ["train", "test"]:
    iterator = datasets[phase].iterator()
    count = 0
    for _ in iterator:
        count += 1
        if count >= 10:
            break
    print(phase)
    print(count)

Also, which exact version of Python are you using (like 3.6.2) and how did you install classy?

@ChristianEschen
Copy link
Author

I get the same error as presented above.
I use python 3.6.11.
it is installed using pip install classy_vision.

@mannatsingh
Copy link
Contributor

Ah, I just noticed, your CUDA version is 11.0 - that isn't supported by Classy Vision yet. Can you try downgrading to CUDA 10.2 and running this?

cc @vreis , @jackhamburger since you guys had worked with CUDA 11.0, do you think this could be related?

@ChristianEschen
Copy link
Author

Hi again,

I figured out that my ufc-101 dataset was not in the correct format.
This means I had a "flatten" data structure.

So it was an error 40, indicating the error was 40 centimeters from the device...
Thanks anyway.

@mannatsingh
Copy link
Contributor

Got it, I had figured that the dataset would throw an exception during initialization if there was a data error. Do you mind mentioning what the exact issue was and how you fixed it, for future users? :)

@failable
Copy link

failable commented Dec 7, 2020

I got the same issue.

The test snippet does not work for me @mannatsingh

Traceback (most recent call last):
  File "video_classification.py", line 120, in <module>
    for _ in iterator:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 352, in __iter__
    return self._get_iterator()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 827, in __init__
    self._reset(loader, first_iter=True)
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 857, in _reset
    self._try_put_index()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1091, in _try_put_index
    index = self._next_index()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 427, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 227, in __iter__
    for idx in self.sampler:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torchvision/datasets/samplers/clip_sampler.py", line 87, in __iter__
    orig_indices = list(iter(self.dataset))
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/classy_vision/dataset/classy_video_dataset.py", line 47, in __iter__
    for clip in self.clip_sampler:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torchvision/datasets/samplers/clip_sampler.py", line 167, in __iter__
    idxs = torch.cat(idxs)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

I got a 24M ucf101_metadata.pt and I assume my dataset format is correct?

>>> import torch
>>> a = torch.load('ucf101_metadata.pt')
>>> a.keys()
dict_keys(['video_paths', 'video_pts', 'video_fps'])
>>> len(a['video_paths'])
13320
>>> len(a['video_pts'])
13320
>>> len(a['video_fps'])
13320
>>> 

BTW, I came from this issue, and have

for phase in ["train", "test"]:
    task.set_dataset(datasets[phase], phase)
    task.set_dataloader_mp_context('fork')

in the video classification tutorial following the suggestion in the mentioned issue. And setting the option to fork, spawn and forkserver or setting num_workers to 0 caused the same issue.

@Yevgnen
Copy link

Yevgnen commented Dec 7, 2020

I encountered the same issue. The issue is probably related to torchvision upstream and is fixed in this commit. If one set the data directory with a slash suffix like

# set it to the folder where video files are saved
video_dir = "/path/to/ucf101/"

The indice will become [] before this commit and cause RuntimeError: There were no tensor arguments to this function. It's a bit unfriendly torchvision itself does not print any warning or raise errors.

Note that any unexpected dataset format may also cause the issue. Updating torchvision fixed my issue.

@mannatsingh
Copy link
Contributor

Thanks so much @Yevgnen for the suggestion!

@liebkne I've verified that your metadata file looks correct - can you try @Yevgnen 's suggestion and see if that works for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants