Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: bad value(s) in fds_to_keep #597

Closed
siyangbing opened this issue Aug 11, 2020 · 23 comments
Closed

ValueError: bad value(s) in fds_to_keep #597

siyangbing opened this issue Aug 11, 2020 · 23 comments
Assignees

Comments

@siyangbing
Copy link

siyangbing commented Aug 11, 2020

i use ucf-101 example but get this problem

Traceback (most recent call last):
  File "/home/sucom/hdd_1T/project/video_rec/my_video_rec/self_video_train.py", line 160, in <module>
    trainer.train(task)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py", line 27, in train
    super().train(task)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py", line 45, in train
    task.on_phase_start()
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 945, in on_phase_start
    self.advance_phase()
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 847, in advance_phase
    self.create_data_iterator()
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 900, in create_data_iterator
    self.data_iterator = iter(self.dataloaders[self.phase_type])
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 719, in __init__
    w.start()
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 59, in _launch
    cmd, self._fds)
  File "/home/sucom/.conda/envs/classy_vision/lib/python3.6/multiprocessing/util.py", line 417, in spawnv_passfds
    False, False, None)
ValueError: bad value(s) in fds_to_keep
@siyangbing
Copy link
Author

siyangbing commented Aug 11, 2020

help me, please @mannatsingh

@siyangbing
Copy link
Author

siyangbing commented Aug 11, 2020

please help me! The classyvision is wonderful. I can not wait to try it! @vreis

@mannatsingh
Copy link
Contributor

Hi @siyangbing looks like it's an issue with multiprocessing inside the dataloader. Can you share a minimal repro with us?
Also, some additional information will help -

  • What is the code you're getting this error with?
  • Are you using distributed training?
  • Are you using any custom components (like a different model)?
  • If this is simply following the tutorial, can you share what steps you ran to get this error?

@siyangbing
Copy link
Author

siyangbing commented Aug 12, 2020

@siyangbing看来这是数据加载器内部进行多处理的问题。您可以与我们分享最少的复制品吗?
此外,一些其他信息也将帮助您-

  • 您收到此错误的代码是什么?
  • 您正在使用分布式培训吗?
  • 您是否正在使用任何自定义组件(例如其他模型)?
  • 如果仅是按照本教程进行操作,您可以共享执行此步骤来获取此错误的步骤吗?

My system is ubuntu18.04 and a 1080ti graphics card. All the following operations are carried out in accordance with the requirements of the video classification example. It is installed with conda. The data set is the downloaded ucf101. Only the video is downloaded, and then splits_dir is placed The ones are trainlist and testlist. The metadata_file is a non-existent file. This error occurred when I executed the example. Can you help me solve it? I have seen a lot of video classification projects, only this one is more mature, but I am stuck, thank you for your help!

@siyangbing
Copy link
Author

siyangbing commented Aug 12, 2020

Hi @siyangbing looks like it's an issue with multiprocessing inside the dataloader. Can you share a minimal repro with us?
Also, some additional information will help -

  • What is the code you're getting this error with?
  • Are you using distributed training?
  • Are you using any custom components (like a different model)?
  • If this is simply following the tutorial, can you share what steps you ran to get this error?

@siyangbing
Copy link
Author

please help me! The classyvision is wonderful. I can not wait to try it! @vreis

it make me upsad and crazy, i try many times follow the guide, but always like this and i cannot find anything useful by google,please help me

@siyangbing
Copy link
Author

siyangbing commented Aug 12, 2020

help me, please @mannatsingh

from classy_vision.dataset import build_dataset
from classy_vision.models import build_model
from classy_vision.heads import build_head
from collections import defaultdict
from classy_vision.meters import build_meters, AccuracyMeter, VideoAccuracyMeter
from classy_vision.tasks import ClassificationTask
from classy_vision.optim import build_optimizer
from classy_vision.losses import build_loss
import time
import os

from classy_vision.trainer import LocalTrainer
from classy_vision.hooks import CheckpointHook
from classy_vision.hooks import LossLrMeterLoggingHook



video_dir = "/home/sucom/hdd_1T/project/video_rec/UCF-101"
splits_dir = "/home/sucom/hdd_1T/project/video_rec/ucfTrainTestlist"
metadata_file = "./ucf101_metadata.pt"


datasets = {}
datasets["train"] = build_dataset({
    "name": "ucf101",
    "split": "train",
    "batchsize_per_replica": 8,
    "use_shuffle": True,
    "num_samples": 64,
    "clips_per_video": 1,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_augment",
                "crop_size": 112,
                "size_range": [128, 160]
            }
        ]
    }
})
datasets["test"] = build_dataset({
    "name": "ucf101",
    "split": "test",
    "batchsize_per_replica": 10,
    "use_shuffle": False,
    "num_samples": 80,
    "clips_per_video": 10,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_no_augment",
                "size": 128
            }
        ]
    }
})


model = build_model({
    "name": "resnext3d",
    "frames_per_clip": 8,
    "input_planes": 3,
    "clip_crop_size": 112,
    "skip_transformation_type": "postactivated_shortcut",
    "residual_transformation_type": "basic_transformation",
    "num_blocks": [2, 2, 2, 2],
    "input_key": "video",
    "stage_planes": 64,
    "num_classes": 101
})


unique_id = "default_head"
head = build_head({
    "name": "fully_convolutional_linear",
    "unique_id": unique_id,
    "pool_size": [1, 7, 7],
    "num_classes": 101,
    "in_plane": 512
})

fork_block = "pathway0-stage4-block1"
heads = defaultdict(list)
heads[fork_block].append(head)
model.set_heads(heads)



meters = build_meters({
    "accuracy": {
        "topk": [1, 5]
    },
    "video_accuracy": {
        "topk": [1, 5],
        "clips_per_video_train": 1,
        "clips_per_video_test": 10
    }
})



loss = build_loss({"name": "CrossEntropyLoss"})

optimizer = build_optimizer({
    "name": "sgd",
    "param_schedulers": {
        "lr": {
            "name": "multistep",
            "values": [0.005, 0.0005],
            "milestones": [1]
        }
    },
    "num_epochs": 2,
    "weight_decay": 0.0001,
    "momentum": 0.9
})


num_epochs = 2
task = (
    ClassificationTask()
    .set_num_epochs(num_epochs)
    .set_loss(loss)
    .set_model(model)
    .set_optimizer(optimizer)
    .set_meters(meters)
)


for phase in ["train", "test"]:
    task.set_dataset(datasets[phase], phase)

hooks = [LossLrMeterLoggingHook(log_freq=4)]

checkpoint_dir = f"/tmp/classy_checkpoint_{time.time()}"
os.mkdir(checkpoint_dir)
hooks.append(CheckpointHook(checkpoint_dir, input_args={}))

task = task.set_hooks(hooks)

trainer = LocalTrainer()
trainer.train(task)

@mannatsingh
Copy link
Contributor

@siyangbing I am not sure what is causing the issue, but let's try and figure out what's causing the issue.

Let's first make sure that your data loader works as expected -

for phase in ["train", "test"]:
    iterator = datasets[phase].iterator()
    count = 0
    for _ in iterator:
        count += 1
    if count >= 10:
        break

If it doesn't work, can you try passing num_workers=0 to the iterator() above?

@siyangbing
Copy link
Author

siyangbing commented Aug 13, 2020

@siyangbing I am not sure what is causing the issue, but let's try and figure out what's causing the issue.

Let's first make sure that your data loader works as expected -

for phase in ["train", "test"]:
    iterator = datasets[phase].iterator()
    count = 0
    for _ in iterator:
        count += 1
    if count >= 10:
        break

If it doesn't work, can you try passing num_workers=0 to the iterator() above?

I am not sure about data loader works as expected, can you help me@mannatsingh thankyou

@mannatsingh
Copy link
Contributor

mannatsingh commented Aug 13, 2020

@siyangbing Your message is scrambled and not clear to me. It looks like the dataloader is working, but can you print just what you get after running this -

for phase in ["train", "test"]:
    iterator = datasets[phase].iterator()
    count = 0
    for _ in iterator:
        count += 1
        if count >= 10:
            break
    print(phase)
    print(count)

Also, can you format your output using https://guides.github.com/features/mastering-markdown/ so that it is easier to understand?

@siyangbing
Copy link
Author

siyangbing commented Aug 13, 2020

@siyangbing Your message is scrambled and not clear to me. It looks like the dataloader is working, but can you print just what you get after running this -

for phase in ["train", "test"]:
    iterator = datasets[phase].iterator()
    count = 0
    for _ in iterator:
        count += 1
        if count >= 10:
            break
    print(phase)
    print(count)

Also, can you format your output using https://guides.github.com/features/mastering-markdown/ so that it is easier to understand?

ok,I formate my code and error, and print count, please help me @mannatsingh thankyou, I can not wait to solve the problems!
I try your code and the result is:

100%|██████████| 833/833 [00:47<00:00, 17.64it/s]
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
train
8
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives > > wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > >behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling > behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old > behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
test
8
Traceback (most recent call last):
  File "/home/sucom/hdd_1T/project/video_rec/my_video_rec/self_video_train.py", line 161, in <module>
    trainer.train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py", line 27, in train
    super().train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py", line 45, in train
    task.on_phase_start()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 945, in > on_phase_start
    self.advance_phase()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 847, in > advance_phase
    self.create_data_iterator()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 900, in > create_data_iterator
    self.data_iterator = iter(self.dataloaders[self.phase_type])
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 291, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 737, in __init__
    w.start()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__ >    self._launch(process_obj)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 59, in _launch
    cmd, self._fds)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/util.py", line 417, in spawnv_passfds
    False, False, None)
ValueError: bad value(s) in fds_to_keep

thankyou for your help @mannatsingh

@mannatsingh
Copy link
Contributor

Got it. So it looks like the dataloaders work independently but there are some issues while starting training.

Can you try another couple of things -

After the following lines -

task = (
    ClassificationTask()
    .set_num_epochs(num_epochs)
    .set_loss(loss)
    .set_model(model)
    .set_optimizer(optimizer)
    .set_meters(meters)
)

Can you try adding the following lines and set mp_context to "fork" and then "spawn" and tell us what the output was in each situation -

task.set_dataloader_mp_context(mp_context)

And independent of the above step, can you set num_workers to 0 in the dataloader configs and send us the output -

datasets["train"] = build_dataset({
    "name": "ucf101",
    "split": "train",
    "num_workers": 0,
    "batchsize_per_replica": 8,
    "use_shuffle": True,
    "num_samples": 64,
    "clips_per_video": 1,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_augment",
                "crop_size": 112,
                "size_range": [128, 160]
            }
        ]
    }
})
datasets["test"] = build_dataset({
    "name": "ucf101",
    "split": "test",
    "num_workers": 0,
    "batchsize_per_replica": 10,
    "use_shuffle": False,
    "num_samples": 80,
    "clips_per_video": 10,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_no_augment",
                "size": 128
            }
        ]
    }
})

@mannatsingh
Copy link
Contributor

Also, were you able to get the Getting Started tutorial to work @siyangbing ?

@siyangbing
Copy link
Author

siyangbing commented Aug 14, 2020

Also, were you able to get the Getting Started tutorial to work @siyangbing ?

I have excute this
./classy_train.py --config configs/template_config.json
it run in seconds
and print like this @mannatsingh

/home/sucom/.conda/envs/v4/bin/python /home/sucom/hdd_1T/project/video_rec/my_video_rec/test.py
INFO:root:Classy Vision's default training script.
INFO:root:AMP disabled
INFO:root:mixup disabled
INFO:root:Synchronized Batch Normalization is disabled
INFO:root:Logging outputs to ./output_2020-08-14T08:34:46.676177
INFO:root:Logging checkpoints to ./output_2020-08-14T08:34:46.676177/checkpoints
WARNING:root:tensorboardX not installed, skipping tensorboard hooks
INFO:root:Starting training on rank 0 worker. World size is 1
INFO:root:Using GPU, CUDA device index: 0
INFO:root:Starting training. Task: <classy_vision.tasks.classification_task.ClassificationTask object at 0x7f13cb616828> initialized with config:
{
    "name": "classification_task",
    "num_epochs": 2,
    "loss": {
        "name": "my_loss"
    },
    "dataset": {
        "train": {
            "name": "my_dataset",
            "crop_size": 224,
            "class_ratio": 0.5,
            "num_samples": 320,
            "seed": 0,
            "batchsize_per_replica": 32,
            "use_shuffle": true,
            "transforms": [
                {
                    "name": "generic_image_transform",
                    "transforms": [
                        {
                            "name": "RandomResizedCrop",
                            "size": 224
                        },
                        {
                            "name": "RandomHorizontalFlip"
                        },
                        {
                            "name": "ToTensor"
                        },
                        {
                            "name": "Normalize",
                            "mean": [
                                0.485,
                                0.456,
                                0.406
                            ],
                            "std": [
                                0.229,
                                0.224,
                                0.225
                            ]
                        }
                    ]
                }
            ]
        },
        "test": {
            "name": "my_dataset",
            "crop_size": 224,
            "class_ratio": 0.5,
            "num_samples": 100,
            "seed": 1,
            "batchsize_per_replica": 32,
            "use_shuffle": false,
            "transforms": [
                {
                    "name": "generic_image_transform",
                    "transforms": [
                        {
                            "name": "Resize",
                            "size": 256
                        },
                        {
                            "name": "CenterCrop",
                            "size": 224
                        },
                        {
                            "name": "ToTensor"
                        },
                        {
                            "name": "Normalize",
                            "mean": [
                                0.485,
                                0.456,
                                0.406
                            ],
                            "std": [
                                0.229,
                                0.224,
                                0.225
                            ]
                        }
                    ]
                }
            ]
        }
    },
    "meters": {
        "accuracy": {
            "topk": [
                1
            ]
        }
    },
    "model": {
        "name": "my_model"
    },
    "optimizer": {
        "name": "sgd",
        "param_schedulers": {
            "lr": {
                "name": "step",
                "values": [
                    0.1,
                    0.01
                ]
            }
        },
        "weight_decay": 0.0001,
        "momentum": 0.9,
        "num_epochs": 2,
        "lr": 0.1,
        "nesterov": false,
        "use_larc": false,
        "larc_config": {
            "clip": true,
            "eps": 1e-08,
            "trust_coefficient": 0.02
        }
    }
}
INFO:root:Number of parameters in model: 2402
WARNING:root:Model contains unsupported modules, could not compute FLOPs for model forward pass.
INFO:root:Model does not implement input_shape. Skipping activation calculation.
INFO:root:Approximate meters: [0] train phase 0 (50.00% done), loss: 0.1368, meters: [accuracy_meter(top_1=0.918750)]
INFO:root:Approximate meters: [0] train phase 0 (100.00% done), loss: 0.0684, meters: [accuracy_meter(top_1=0.959375)]
INFO:root:Synced meters: [0] train phase 0 (100.00% done), loss: 0.0684, meters: [accuracy_meter(top_1=0.959375)]
INFO:root:Saving checkpoint to './output_2020-08-14T08:34:46.676177/checkpoints'...
INFO:root:Synced meters: [0] test phase 0 (100.00% done), loss: 0.0000, meters: [accuracy_meter(top_1=1.000000)]
INFO:root:Approximate meters: [0] train phase 1 (50.00% done), loss: 0.0000, meters: [accuracy_meter(top_1=1.000000)]
INFO:root:Approximate meters: [0] train phase 1 (100.00% done), loss: 0.0000, meters: [accuracy_meter(top_1=1.000000)]
INFO:root:Synced meters: [0] train phase 1 (100.00% done), loss: 0.0000, meters: [accuracy_meter(top_1=1.000000)]
INFO:root:Saving checkpoint to './output_2020-08-14T08:34:46.676177/checkpoints'...
INFO:root:Synced meters: [0] test phase 1 (100.00% done), loss: 0.0000, meters: [accuracy_meter(top_1=1.000000)]
INFO:root:Training successful!
INFO:root:Results of this training run are available at: "/home/sucom/hdd_1T/project/video_rec/my_video_rec/output_2020-08-14T08:34:46.676177/checkpoints"

@siyangbing
Copy link
Author

siyangbing commented Aug 14, 2020

Got it. So it looks like the dataloaders work independently but there are some issues while starting training.
Can you try another couple of things -
After the following lines -

task = (
    ClassificationTask()
    .set_num_epochs(num_epochs)
    .set_loss(loss)
    .set_model(model)
    .set_optimizer(optimizer)
    .set_meters(meters)
)

Can you try adding the following lines and set mp_context to "fork" and then "spawn" and tell us what the output was in each situation -

task.set_dataloader_mp_context(mp_context)

And independent of the above step, can you set num_workers to 0 in the dataloader configs and send us the output -

datasets["train"] = build_dataset({
    "name": "ucf101",
    "split": "train",
    "num_workers": 0,
    "batchsize_per_replica": 8,
    "use_shuffle": True,
    "num_samples": 64,
    "clips_per_video": 1,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_augment",
                "crop_size": 112,
                "size_range": [128, 160]
            }
        ]
    }
})
datasets["test"] = build_dataset({
    "name": "ucf101",
    "split": "test",
    "num_workers": 0,
    "batchsize_per_replica": 10,
    "use_shuffle": False,
    "num_samples": 80,
    "clips_per_video": 10,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_no_augment",
                "size": 128
            }
        ]
    }
})

I try this
task.set_dataloader_mp_context("spawn")
and got this

100%|██████████| 833/833 [00:47<00:00, 17.62it/s]
Traceback (most recent call last):
  File "/home/sucom/hdd_1T/project/video_rec/my_video_rec/self_video_train.py", line 157, in <module>
    trainer.train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py", line 27, in train
    super().train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py", line 45, in train
    task.on_phase_start()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 945, in on_phase_start
    self.advance_phase()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 847, in advance_phase
    self.create_data_iterator()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 900, in create_data_iterator
    self.data_iterator = iter(self.dataloaders[self.phase_type])
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 291, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 737, in __init__
    w.start()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 59, in _launch
    cmd, self._fds)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/multiprocessing/util.py", line 417, in spawnv_passfds
    False, False, None)
ValueError: bad value(s) in fds_to_keep

I try this
task.set_dataloader_mp_context("fork")
and got this

100%|██████████| 833/833 [00:47<00:00, 17.60it/s]
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))

Process finished with exit code 0

Got it. So it looks like the dataloaders work independently but there are some issues while starting training.
Can you try another couple of things -
After the following lines -

task = (
    ClassificationTask()
    .set_num_epochs(num_epochs)
    .set_loss(loss)
    .set_model(model)
    .set_optimizer(optimizer)
    .set_meters(meters)
)

Can you try adding the following lines and set mp_context to "fork" and then "spawn" and tell us what the output was in each situation -

task.set_dataloader_mp_context(mp_context)

And independent of the above step, can you set num_workers to 0 in the dataloader configs and send us the output -

datasets["train"] = build_dataset({
    "name": "ucf101",
    "split": "train",
    "num_workers": 0,
    "batchsize_per_replica": 8,
    "use_shuffle": True,
    "num_samples": 64,
    "clips_per_video": 1,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_augment",
                "crop_size": 112,
                "size_range": [128, 160]
            }
        ]
    }
})
datasets["test"] = build_dataset({
    "name": "ucf101",
    "split": "test",
    "num_workers": 0,
    "batchsize_per_replica": 10,
    "use_shuffle": False,
    "num_samples": 80,
    "clips_per_video": 10,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_no_augment",
                "size": 128
            }
        ]
    }
})

I try "num_workers": 0,
and got

100%|██████████| 833/833 [00:46<00:00, 17.85it/s]
Traceback (most recent call last):
  File "/home/sucom/hdd_1T/project/video_rec/my_video_rec/self_video_train.py", line 157, in <module>
    trainer.train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py", line 27, in train
    super().train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py", line 36, in train
    task.prepare()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 555, in prepare
    multiprocessing_context=mp.get_context(self.dataloader_mp_context),
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 543, in build_dataloaders
    for phase_type in self.datasets.keys()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 543, in <dictcomp>
    for phase_type in self.datasets.keys()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 528, in build_dataloader
    return self.datasets[phase_type].iterator(pin_memory=pin_memory, **kwargs)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/dataset/classy_video_dataset.py", line 277, in iterator
    return super(ClassyVideoDataset, self).iterator(*args, **kwargs)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/dataset/classy_dataset.py", line 178, in iterator
    sampler=self._get_sampler(epoch=offset_epoch),
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 150, in __init__
    self.multiprocessing_context = multiprocessing_context
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 285, in __setattr__
    super(DataLoader, self).__setattr__(attr, val)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 276, in multiprocessing_context
    'num_workers={}').format(self.num_workers))
ValueError: multiprocessing_context can only be used with multi-process loading (num_workers > 0), but got num_workers=0

@mannatsingh
Copy link
Contributor

Looks like it worked with task.set_dataloader_mp_context("fork")? I don't see any errors - can you confirm that everything works properly?

The "num_workers": 0 change should be done without calling task.set_dataloader_mp_context()

@siyangbing
Copy link
Author

看起来可以用task.set_dataloader_mp_context("fork")吗?我没有看到任何错误-您可以确认一切正常吗?

"num_workers": 0变化应该做的事**,而不**调用task.set_dataloader_mp_context()

I am sure I just set num_works:0 and do not use task.set_dataloader_mp_context().
the code like this

from classy_vision.dataset import build_dataset
from classy_vision.models import build_model
from classy_vision.heads import build_head
from collections import defaultdict
from classy_vision.meters import build_meters, AccuracyMeter, VideoAccuracyMeter
from classy_vision.tasks import ClassificationTask
from classy_vision.optim import build_optimizer
from classy_vision.losses import build_loss
import time
import os

from classy_vision.trainer import LocalTrainer
from classy_vision.hooks import CheckpointHook
from classy_vision.hooks import LossLrMeterLoggingHook



video_dir = "/home/sucom/hdd_1T/project/video_rec/UCF-101"
splits_dir = "/home/sucom/hdd_1T/project/video_rec/ucfTrainTestlist"
metadata_file = "./ucf101_metadata.pt"


datasets = {}
datasets["train"] = build_dataset({
    "name": "ucf101",
    "split": "train",
    "num_workers": 0,
    "batchsize_per_replica": 8,
    "use_shuffle": True,
    "num_samples": 64,
    "clips_per_video": 1,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_augment",
                "crop_size": 112,
                "size_range": [128, 160]
            }
        ]
    }
})
datasets["test"] = build_dataset({
    "name": "ucf101",
    "split": "test",
    "num_workers": 0,
    "batchsize_per_replica": 10,
    "use_shuffle": False,
    "num_samples": 80,
    "clips_per_video": 10,
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_no_augment",
                "size": 128
            }
        ]
    }
})


model = build_model({
    "name": "resnext3d",
    "frames_per_clip": 8,
    "input_planes": 3,
    "clip_crop_size": 112,
    "skip_transformation_type": "postactivated_shortcut",
    "residual_transformation_type": "basic_transformation",
    "num_blocks": [2, 2, 2, 2],
    "input_key": "video",
    "stage_planes": 64,
    "num_classes": 101
})


unique_id = "default_head"
head = build_head({
    "name": "fully_convolutional_linear",
    "unique_id": unique_id,
    "pool_size": [1, 7, 7],
    "num_classes": 101,
    "in_plane": 512
})

fork_block = "pathway0-stage4-block1"
heads = defaultdict(list)
heads[fork_block].append(head)
model.set_heads(heads)



meters = build_meters({
    "accuracy": {
        "topk": [1, 5]
    },
    "video_accuracy": {
        "topk": [1, 5],
        "clips_per_video_train": 1,
        "clips_per_video_test": 10
    }
})



loss = build_loss({"name": "CrossEntropyLoss"})

optimizer = build_optimizer({
    "name": "sgd",
    "param_schedulers": {
        "lr": {
            "name": "multistep",
            "values": [0.005, 0.0005],
            "milestones": [1]
        }
    },
    "num_epochs": 2,
    "weight_decay": 0.0001,
    "momentum": 0.9
})


num_epochs = 2
task = (
    ClassificationTask()
    .set_num_epochs(num_epochs)
    .set_loss(loss)
    .set_model(model)
    .set_optimizer(optimizer)
    .set_meters(meters)
)

# task.set_dataloader_mp_context("spawn")


for phase in ["train", "test"]:

    task.set_dataset(datasets[phase], phase)

hooks = [LossLrMeterLoggingHook(log_freq=4)]

checkpoint_dir = f"/tmp/classy_checkpoint_{time.time()}"
os.mkdir(checkpoint_dir)
hooks.append(CheckpointHook(checkpoint_dir, input_args={}))

task = task.set_hooks(hooks)

trainer = LocalTrainer()
trainer.train(task)

and the result like this

100%|██████████| 833/833 [00:46<00:00, 17.84it/s]
Traceback (most recent call last):
  File "/home/sucom/hdd_1T/project/video_rec/my_video_rec/self_video_train.py", line 157, in <module>
    trainer.train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py", line 27, in train
    super().train(task)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py", line 36, in train
    task.prepare()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 555, in prepare
    multiprocessing_context=mp.get_context(self.dataloader_mp_context),
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 543, in build_dataloaders
    for phase_type in self.datasets.keys()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 543, in <dictcomp>
    for phase_type in self.datasets.keys()
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py", line 528, in build_dataloader
    return self.datasets[phase_type].iterator(pin_memory=pin_memory, **kwargs)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/dataset/classy_video_dataset.py", line 277, in iterator
    return super(ClassyVideoDataset, self).iterator(*args, **kwargs)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/classy_vision/dataset/classy_dataset.py", line 178, in iterator
    sampler=self._get_sampler(epoch=offset_epoch),
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 150, in __init__
    self.multiprocessing_context = multiprocessing_context
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 285, in __setattr__
    super(DataLoader, self).__setattr__(attr, val)
  File "/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 276, in multiprocessing_context
    'num_workers={}').format(self.num_workers))
ValueError: multiprocessing_context can only be used with multi-process loading (num_workers > 0), but got num_workers=0

Process finished with exit code 1

can you help me @mannatsingh

@mannatsingh
Copy link
Contributor

mannatsingh commented Aug 18, 2020

My bad, I just looked at our code and we don't support disabling the multiprocessing_context, which means you cannot set num_workers=0. That was anyway something I just wanted to try out for debugging. I will try to make it so we do support this scenario.

It seems like using task.set_dataloader_mp_context("fork") doesn't give you the error you had reported, right? Does everything work properly for you after setting this?

@siyangbing
Copy link
Author

siyangbing commented Aug 24, 2020

不好,我只是看了看我们的代码,我们不支持禁用multiprocessing_context,这意味着您无法设置num_workers=0。无论如何,我只是想尝试进行调试。我将尽力做到这一点,因此我们确实支持这种情况。

看来使用task.set_dataloader_mp_context("fork")不会给您报告的错误,对吗?设置此设置后,一切对您都正常吗?

I use this

from classy_vision.dataset import build_dataset
from classy_vision.models import build_model
from classy_vision.heads import build_head
from collections import defaultdict
from classy_vision.meters import build_meters, AccuracyMeter, VideoAccuracyMeter
from classy_vision.tasks import ClassificationTask
from classy_vision.optim import build_optimizer
from classy_vision.losses import build_loss



# set it to the folder where video files are saved
# video_dir = "/home/sucom/hdd_1T/project/video_rec/datasets101/UCF-101"
video_dir = "../UCF-101"
# set it to the folder where dataset splitting files are saved
# splits_dir = "/home/sucom/hdd_1T/project/video_rec/datasets101/ucfTrainTestlist"
splits_dir = "../ucfTrainTestlist"
# set it to the file path for saving the metadata
metadata_file = "./ucf101_metadata.pt"


datasets = {}
datasets["train"] = build_dataset({
    "name": "ucf101",
    "split": "train",
    "batchsize_per_replica": 8,  # For training, we use 8 clips in a minibatch in each model replica
    "use_shuffle": True,         # We shuffle the clips in the training split
    "num_samples": 64,           # We train on 16 clips in one training epoch
    "clips_per_video": 1,        # For training, we randomly sample 1 clip from each video
    "frames_per_clip": 8,        # The video clip contains 8 frames
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_augment",
                "crop_size": 112,
                "size_range": [128, 160]
            }
        ]
    }
})
datasets["test"] = build_dataset({
    "name": "ucf101",
    "split": "test",
    "batchsize_per_replica": 10,  # For testing, we will take 1 video once a time, and sample 10 clips per video
    "use_shuffle": False,         # We do not shuffle clips in the testing split
    "num_samples": 80,            # We test on 80 clips in one testing epoch
    "clips_per_video": 10,        # We sample 10 clips per video
    "frames_per_clip": 8,
    "video_dir": video_dir,
    "splits_dir": splits_dir,
    "metadata_file": metadata_file,
    "fold": 1,
    "transforms": {
        "video": [
            {
                "name": "video_default_no_augment",
                "size": 128
            }
        ]
    }
})


model = build_model({
    "name": "resnext3d",
    "frames_per_clip": 8,        # The number of frames we have in each video clip
    "input_planes": 3,           # We use RGB video frames. So the input planes is 3
    "clip_crop_size": 112,       # We take croppings of size 112 x 112 from the video frames
    "skip_transformation_type": "postactivated_shortcut",    # The type of skip connection in residual unit
    "residual_transformation_type": "basic_transformation",  # The type of residual connection in residual unit
    "num_blocks": [2, 2, 2, 2],  # The number of residual blocks in each of the 4 stages
    "input_key": "video",        # The key used to index into the model input of dict type
    "stage_planes": 64,
    # "num_classes": 2           # the number of classes
    "num_classes": 101           # the number of classes
})


unique_id = "default_head"
head = build_head({
    "name": "fully_convolutional_linear",
    "unique_id": unique_id,
    "pool_size": [1, 7, 7],
    # "num_classes": 2,
    "num_classes": 101,
    "in_plane": 512
})
# In Classy Vision, the head can be attached to any residual block in the trunk.
# Here we attach the head to the last block as in the standard ResNet model
fork_block = "pathway0-stage4-block1"
heads = defaultdict(list)
heads[fork_block].append(head)
model.set_heads(heads)



meters = build_meters({
    "accuracy": {
        "topk": [1, 5],
        # "topk": [1]
    },
    "video_accuracy": {
        "topk": [1, 5],
        # "topk": [1],
        "clips_per_video_train": 1,
        "clips_per_video_test": 10
    }
})



loss = build_loss({"name": "CrossEntropyLoss"})

optimizer = build_optimizer({
    "name": "sgd",
    "param_schedulers": {
        "lr": {
            "name": "multistep",
            "values": [0.005, 0.0005],
            "milestones": [1]
        }
    },
    "num_epochs": 2,
    "weight_decay": 0.0001,
    "momentum": 0.9
})


num_epochs = 2
task = (
    ClassificationTask()
    .set_num_epochs(num_epochs)
    .set_loss(loss)
    .set_model(model)
    .set_optimizer(optimizer)
    .set_meters(meters)
)

# task.set_dataloader_mp_context("fork")
task.set_dataloader_mp_context("fork")

if __name__=="__main__":
    # import multiprocessing as mp
    # mp.set_start_method('spawn')
    for phase in ["train", "test"]:
        task.set_dataset(datasets[phase], phase)



    import time
    import os

    from classy_vision.trainer import LocalTrainer
    from classy_vision.hooks import CheckpointHook
    from classy_vision.hooks import LossLrMeterLoggingHook


    hooks = [LossLrMeterLoggingHook(log_freq=4)]

    checkpoint_dir =f"/tmp/classy_checkpoint_{time.time()}"
    os.mkdir(checkpoint_dir)
    hooks.append(CheckpointHook(checkpoint_dir, input_args={}))

    task = task.set_hooks(hooks)

    trainer = LocalTrainer()
    trainer.train(task)

and got

100%|██████████| 833/833 [00:48<00:00, 17.29it/s]
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torchvision/io/video.py:106: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
  + "follow-up version. Please use pts_unit 'sec'."
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/sucom/.conda/envs/v4/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))

@mannatsingh can you give me any help

@mannatsingh
Copy link
Contributor

mannatsingh commented Aug 25, 2020

@siyangbing it looks like your training is in fact working now :)

The only reason you're not seeing anything is because your logging isn't set up. Before running the code, run the following lines and you should see the progress getting printed -

import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.info("Let's do this")

This should print the following -

INFO:root:Let's do this

After this, running the same code should print something lie -

INFO:root:Using CPU
INFO:root:Starting training. Task: <classy_vision.tasks.classification_task.ClassificationTask object at 0x7fdc4080d748>
INFO:root:Approximate meters: [0] train phase 0 (50.00% done), loss: 6.9378, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)]
INFO:root:Approximate meters: [0] train phase 0 (100.00% done), loss: 7.2338, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)]
INFO:root:Synced meters: [0] train phase 0 (100.00% done), loss: 7.2338, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)], processed batches: 8
INFO:root:Saving checkpoint to '/tmp/classy_checkpoint_1598322786.4107509'...
INFO:root:Synced meters: [0] test phase 0 (100.00% done), loss: 4.6212, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 1 (50.00% done), loss: 5.6589, meters: [accuracy_meter(top_1=0.000000,top_5=0.093750), video_accuracy_meter(top_1=0.000000,top_5=0.093750)]
INFO:root:Approximate meters: [0] train phase 1 (100.00% done), loss: 5.2275, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 1 (100.00% done), loss: 5.2275, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)], processed batches: 8
INFO:root:Saving checkpoint to '/tmp/classy_checkpoint_1598322786.4107509'...
INFO:root:Synced meters: [0] test phase 1 (100.00% done), loss: 4.6172, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8

I will update the tutorials to mention that logging needs to be setup and that for dataloader issues users can try changing the multiprocessing context.

mannatsingh added a commit to mannatsingh/ClassyVision that referenced this issue Aug 25, 2020
Summary:
Classy's default dataloader currently doesn't work when `num_workers` is set to 0. This is extremely useful for debugging dataloader issues like in facebookresearch#597 and facebookresearch#607.

Note that calling `set_dataloader_mp_context(None)` doesn't work since that just sets the mp context to the default value for the environment.

If `num_workers` is set to 0, the default call to `Dataloader` now sets `multiprocessing_context` to `None` so that PyTorch doesn't raise an exception.

Differential Revision: D23310512

fbshipit-source-id: 8a66a51d7c05c781783f73eda1ee97aa9398e6c9
facebook-github-bot pushed a commit that referenced this issue Aug 25, 2020
Summary:
Pull Request resolved: #608

Classy's default dataloader currently doesn't work when `num_workers` is set to 0. This is extremely useful for debugging dataloader issues like in #597 and #607.

Note that calling `set_dataloader_mp_context(None)` doesn't work since that just sets the mp context to the default value for the environment.

If `num_workers` is set to 0, the default call to `Dataloader` now sets `multiprocessing_context` to `None` so that PyTorch doesn't raise an exception.

Reviewed By: vreis

Differential Revision: D23310512

fbshipit-source-id: f4fa6766855446d2c14db7b7054f0e6bc6233bbe
@siyangbing
Copy link
Author

siyangbing commented Aug 25, 2020

@siyangbing it looks like your training is in fact working now :)

The only reason you're not seeing anything is because your logging isn't set up. Before running the code, run the following lines and you should see the progress getting printed -

import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.info("Let's do this")

This should print the following -

INFO:root:Let's do this

After this, running the same code should print something lie -

INFO:root:Using CPU
INFO:root:Starting training. Task: <classy_vision.tasks.classification_task.ClassificationTask object at 0x7fdc4080d748>
INFO:root:Approximate meters: [0] train phase 0 (50.00% done), loss: 6.9378, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)]
INFO:root:Approximate meters: [0] train phase 0 (100.00% done), loss: 7.2338, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)]
INFO:root:Synced meters: [0] train phase 0 (100.00% done), loss: 7.2338, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)], processed batches: 8
INFO:root:Saving checkpoint to '/tmp/classy_checkpoint_1598322786.4107509'...
INFO:root:Synced meters: [0] test phase 0 (100.00% done), loss: 4.6212, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 1 (50.00% done), loss: 5.6589, meters: [accuracy_meter(top_1=0.000000,top_5=0.093750), video_accuracy_meter(top_1=0.000000,top_5=0.093750)]
INFO:root:Approximate meters: [0] train phase 1 (100.00% done), loss: 5.2275, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 1 (100.00% done), loss: 5.2275, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)], processed batches: 8
INFO:root:Saving checkpoint to '/tmp/classy_checkpoint_1598322786.4107509'...
INFO:root:Synced meters: [0] test phase 1 (100.00% done), loss: 4.6172, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8

I will update the tutorials to mention that logging needs to be setup and that for dataloader issues users can try changing the multiprocessing context.

@mannatsingh it works, but it train slow (I use one 1080ti) and the top1 is low, what can I do anything to solve it

the result like this

/home/sucom/.conda/envs/classy_vision/bin/python /home/sucom/Downloads/ttttt/my_video_rec/self_video_train.py
INFO:root:Let's do this
INFO:root:Using GPU, CUDA device index: 0
INFO:root:Starting training. Task: <classy_vision.tasks.classification_task.ClassificationTask object at 0x7f194bf2f128>
INFO:root:Approximate meters: [0] train phase 0 (100.00% done), loss: 8.8057, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 0 (100.00% done), loss: 8.8057, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 0 (100.00% done), loss: 4.6320, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 1 (100.00% done), loss: 7.8447, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 1 (100.00% done), loss: 7.8447, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 1 (100.00% done), loss: 4.6320, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 2 (100.00% done), loss: 6.3085, meters: [accuracy_meter(top_1=0.015625,top_5=0.093750), video_accuracy_meter(top_1=0.015625,top_5=0.093750)]
INFO:root:Synced meters: [0] train phase 2 (100.00% done), loss: 6.3085, meters: [accuracy_meter(top_1=0.015625,top_5=0.093750), video_accuracy_meter(top_1=0.015625,top_5=0.093750)], processed batches: 8
INFO:root:Synced meters: [0] test phase 2 (100.00% done), loss: 4.6309, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 3 (100.00% done), loss: 5.0880, meters: [accuracy_meter(top_1=0.031250,top_5=0.031250), video_accuracy_meter(top_1=0.031250,top_5=0.031250)]
INFO:root:Synced meters: [0] train phase 3 (100.00% done), loss: 5.0880, meters: [accuracy_meter(top_1=0.031250,top_5=0.031250), video_accuracy_meter(top_1=0.031250,top_5=0.031250)], processed batches: 8
INFO:root:Synced meters: [0] test phase 3 (100.00% done), loss: 4.6320, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 4 (100.00% done), loss: 4.6559, meters: [accuracy_meter(top_1=0.046875,top_5=0.093750), video_accuracy_meter(top_1=0.046875,top_5=0.093750)]
INFO:root:Synced meters: [0] train phase 4 (100.00% done), loss: 4.6559, meters: [accuracy_meter(top_1=0.046875,top_5=0.093750), video_accuracy_meter(top_1=0.046875,top_5=0.093750)], processed batches: 8
INFO:root:Synced meters: [0] test phase 4 (100.00% done), loss: 4.6272, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 5 (100.00% done), loss: 4.6569, meters: [accuracy_meter(top_1=0.000000,top_5=0.062500), video_accuracy_meter(top_1=0.000000,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 5 (100.00% done), loss: 4.6569, meters: [accuracy_meter(top_1=0.000000,top_5=0.062500), video_accuracy_meter(top_1=0.000000,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 5 (100.00% done), loss: 4.6199, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 6 (100.00% done), loss: 4.6670, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)]
INFO:root:Synced meters: [0] train phase 6 (100.00% done), loss: 4.6670, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)], processed batches: 8
INFO:root:Synced meters: [0] test phase 6 (100.00% done), loss: 4.6167, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 7 (100.00% done), loss: 4.6760, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)]
INFO:root:Synced meters: [0] train phase 7 (100.00% done), loss: 4.6760, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)], processed batches: 8
INFO:root:Synced meters: [0] test phase 7 (100.00% done), loss: 4.6156, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 8 (100.00% done), loss: 4.6598, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)]
INFO:root:Synced meters: [0] train phase 8 (100.00% done), loss: 4.6598, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)], processed batches: 8
INFO:root:Synced meters: [0] test phase 8 (100.00% done), loss: 4.6141, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 9 (100.00% done), loss: 4.6271, meters: [accuracy_meter(top_1=0.031250,top_5=0.125000), video_accuracy_meter(top_1=0.031250,top_5=0.125000)]
INFO:root:Synced meters: [0] train phase 9 (100.00% done), loss: 4.6271, meters: [accuracy_meter(top_1=0.031250,top_5=0.125000), video_accuracy_meter(top_1=0.031250,top_5=0.125000)], processed batches: 8
INFO:root:Synced meters: [0] test phase 9 (100.00% done), loss: 4.6134, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 10 (100.00% done), loss: 4.6121, meters: [accuracy_meter(top_1=0.031250,top_5=0.062500), video_accuracy_meter(top_1=0.031250,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 10 (100.00% done), loss: 4.6121, meters: [accuracy_meter(top_1=0.031250,top_5=0.062500), video_accuracy_meter(top_1=0.031250,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 10 (100.00% done), loss: 4.6116, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 11 (100.00% done), loss: 4.6113, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 11 (100.00% done), loss: 4.6113, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 11 (100.00% done), loss: 4.6109, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 12 (100.00% done), loss: 4.6622, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)]
INFO:root:Synced meters: [0] train phase 12 (100.00% done), loss: 4.6622, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)], processed batches: 8
INFO:root:Synced meters: [0] test phase 12 (100.00% done), loss: 4.6107, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 13 (100.00% done), loss: 4.6285, meters: [accuracy_meter(top_1=0.031250,top_5=0.078125), video_accuracy_meter(top_1=0.031250,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 13 (100.00% done), loss: 4.6285, meters: [accuracy_meter(top_1=0.031250,top_5=0.078125), video_accuracy_meter(top_1=0.031250,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 13 (100.00% done), loss: 4.6108, meters: [accuracy_meter(top_1=0.000000,top_5=0.625000), video_accuracy_meter(top_1=0.000000,top_5=0.750000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 14 (100.00% done), loss: 4.5942, meters: [accuracy_meter(top_1=0.031250,top_5=0.031250), video_accuracy_meter(top_1=0.031250,top_5=0.031250)]
INFO:root:Synced meters: [0] train phase 14 (100.00% done), loss: 4.5942, meters: [accuracy_meter(top_1=0.031250,top_5=0.031250), video_accuracy_meter(top_1=0.031250,top_5=0.031250)], processed batches: 8
INFO:root:Synced meters: [0] test phase 14 (100.00% done), loss: 4.6105, meters: [accuracy_meter(top_1=0.000000,top_5=0.337500), video_accuracy_meter(top_1=0.000000,top_5=0.375000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 15 (100.00% done), loss: 4.5640, meters: [accuracy_meter(top_1=0.015625,top_5=0.046875), video_accuracy_meter(top_1=0.015625,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 15 (100.00% done), loss: 4.5640, meters: [accuracy_meter(top_1=0.015625,top_5=0.046875), video_accuracy_meter(top_1=0.015625,top_5=0.046875)], processed batches: 8
INFO:root:Synced meters: [0] test phase 15 (100.00% done), loss: 4.6107, meters: [accuracy_meter(top_1=0.000000,top_5=0.012500), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 16 (100.00% done), loss: 4.5715, meters: [accuracy_meter(top_1=0.015625,top_5=0.046875), video_accuracy_meter(top_1=0.015625,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 16 (100.00% done), loss: 4.5715, meters: [accuracy_meter(top_1=0.015625,top_5=0.046875), video_accuracy_meter(top_1=0.015625,top_5=0.046875)], processed batches: 8
INFO:root:Synced meters: [0] test phase 16 (100.00% done), loss: 4.6122, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 17 (100.00% done), loss: 4.6740, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 17 (100.00% done), loss: 4.6740, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 17 (100.00% done), loss: 4.6133, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 18 (100.00% done), loss: 4.6807, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 18 (100.00% done), loss: 4.6807, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)], processed batches: 8
INFO:root:Synced meters: [0] test phase 18 (100.00% done), loss: 4.6138, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 19 (100.00% done), loss: 4.6103, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 19 (100.00% done), loss: 4.6103, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 19 (100.00% done), loss: 4.6135, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 20 (100.00% done), loss: 4.6466, meters: [accuracy_meter(top_1=0.015625,top_5=0.109375), video_accuracy_meter(top_1=0.015625,top_5=0.109375)]
INFO:root:Synced meters: [0] train phase 20 (100.00% done), loss: 4.6466, meters: [accuracy_meter(top_1=0.015625,top_5=0.109375), video_accuracy_meter(top_1=0.015625,top_5=0.109375)], processed batches: 8
INFO:root:Synced meters: [0] test phase 20 (100.00% done), loss: 4.6139, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 21 (100.00% done), loss: 4.6310, meters: [accuracy_meter(top_1=0.015625,top_5=0.031250), video_accuracy_meter(top_1=0.015625,top_5=0.031250)]
INFO:root:Synced meters: [0] train phase 21 (100.00% done), loss: 4.6310, meters: [accuracy_meter(top_1=0.015625,top_5=0.031250), video_accuracy_meter(top_1=0.015625,top_5=0.031250)], processed batches: 8
INFO:root:Synced meters: [0] test phase 21 (100.00% done), loss: 4.6147, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 22 (100.00% done), loss: 4.6132, meters: [accuracy_meter(top_1=0.031250,top_5=0.078125), video_accuracy_meter(top_1=0.031250,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 22 (100.00% done), loss: 4.6132, meters: [accuracy_meter(top_1=0.031250,top_5=0.078125), video_accuracy_meter(top_1=0.031250,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 22 (100.00% done), loss: 4.6155, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 23 (100.00% done), loss: 4.6904, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)]
INFO:root:Synced meters: [0] train phase 23 (100.00% done), loss: 4.6904, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Synced meters: [0] test phase 23 (100.00% done), loss: 4.6163, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 24 (100.00% done), loss: 4.5634, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 24 (100.00% done), loss: 4.5634, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 24 (100.00% done), loss: 4.6170, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 25 (100.00% done), loss: 4.6000, meters: [accuracy_meter(top_1=0.000000,top_5=0.109375), video_accuracy_meter(top_1=0.000000,top_5=0.109375)]
INFO:root:Synced meters: [0] train phase 25 (100.00% done), loss: 4.6000, meters: [accuracy_meter(top_1=0.000000,top_5=0.109375), video_accuracy_meter(top_1=0.000000,top_5=0.109375)], processed batches: 8
INFO:root:Synced meters: [0] test phase 25 (100.00% done), loss: 4.6170, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 26 (100.00% done), loss: 4.6351, meters: [accuracy_meter(top_1=0.031250,top_5=0.046875), video_accuracy_meter(top_1=0.031250,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 26 (100.00% done), loss: 4.6351, meters: [accuracy_meter(top_1=0.031250,top_5=0.046875), video_accuracy_meter(top_1=0.031250,top_5=0.046875)], processed batches: 8
INFO:root:Synced meters: [0] test phase 26 (100.00% done), loss: 4.6163, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 27 (100.00% done), loss: 4.6078, meters: [accuracy_meter(top_1=0.031250,top_5=0.093750), video_accuracy_meter(top_1=0.031250,top_5=0.093750)]
INFO:root:Synced meters: [0] train phase 27 (100.00% done), loss: 4.6078, meters: [accuracy_meter(top_1=0.031250,top_5=0.093750), video_accuracy_meter(top_1=0.031250,top_5=0.093750)], processed batches: 8
INFO:root:Synced meters: [0] test phase 27 (100.00% done), loss: 4.6161, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 28 (100.00% done), loss: 4.5891, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 28 (100.00% done), loss: 4.5891, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 28 (100.00% done), loss: 4.6164, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 29 (100.00% done), loss: 4.5992, meters: [accuracy_meter(top_1=0.062500,top_5=0.062500), video_accuracy_meter(top_1=0.062500,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 29 (100.00% done), loss: 4.5992, meters: [accuracy_meter(top_1=0.062500,top_5=0.062500), video_accuracy_meter(top_1=0.062500,top_5=0.062500)], processed batches: 8
INFO:root:Saving checkpoint to './classy_checkpoint_1598331898.8856053'...
INFO:root:Synced meters: [0] test phase 29 (100.00% done), loss: 4.6164, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 30 (100.00% done), loss: 4.6318, meters: [accuracy_meter(top_1=0.031250,top_5=0.046875), video_accuracy_meter(top_1=0.031250,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 30 (100.00% done), loss: 4.6318, meters: [accuracy_meter(top_1=0.031250,top_5=0.046875), video_accuracy_meter(top_1=0.031250,top_5=0.046875)], processed batches: 8
INFO:root:Synced meters: [0] test phase 30 (100.00% done), loss: 4.6162, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 31 (100.00% done), loss: 4.6468, meters: [accuracy_meter(top_1=0.046875,top_5=0.078125), video_accuracy_meter(top_1=0.046875,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 31 (100.00% done), loss: 4.6468, meters: [accuracy_meter(top_1=0.046875,top_5=0.078125), video_accuracy_meter(top_1=0.046875,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 31 (100.00% done), loss: 4.6159, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 32 (100.00% done), loss: 4.6292, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 32 (100.00% done), loss: 4.6292, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 32 (100.00% done), loss: 4.6155, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 33 (100.00% done), loss: 4.6669, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 33 (100.00% done), loss: 4.6669, meters: [accuracy_meter(top_1=0.015625,top_5=0.078125), video_accuracy_meter(top_1=0.015625,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 33 (100.00% done), loss: 4.6144, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 34 (100.00% done), loss: 4.6660, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 34 (100.00% done), loss: 4.6660, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 34 (100.00% done), loss: 4.6128, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 35 (100.00% done), loss: 4.6745, meters: [accuracy_meter(top_1=0.015625,top_5=0.031250), video_accuracy_meter(top_1=0.015625,top_5=0.031250)]
INFO:root:Synced meters: [0] train phase 35 (100.00% done), loss: 4.6745, meters: [accuracy_meter(top_1=0.015625,top_5=0.031250), video_accuracy_meter(top_1=0.015625,top_5=0.031250)], processed batches: 8
INFO:root:Synced meters: [0] test phase 35 (100.00% done), loss: 4.6116, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 36 (100.00% done), loss: 4.6619, meters: [accuracy_meter(top_1=0.031250,top_5=0.062500), video_accuracy_meter(top_1=0.031250,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 36 (100.00% done), loss: 4.6619, meters: [accuracy_meter(top_1=0.031250,top_5=0.062500), video_accuracy_meter(top_1=0.031250,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 36 (100.00% done), loss: 4.6106, meters: [accuracy_meter(top_1=0.000000,top_5=0.237500), video_accuracy_meter(top_1=0.000000,top_5=0.375000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 37 (100.00% done), loss: 4.6336, meters: [accuracy_meter(top_1=0.015625,top_5=0.093750), video_accuracy_meter(top_1=0.015625,top_5=0.093750)]
INFO:root:Synced meters: [0] train phase 37 (100.00% done), loss: 4.6336, meters: [accuracy_meter(top_1=0.015625,top_5=0.093750), video_accuracy_meter(top_1=0.015625,top_5=0.093750)], processed batches: 8
INFO:root:Synced meters: [0] test phase 37 (100.00% done), loss: 4.6113, meters: [accuracy_meter(top_1=0.000000,top_5=0.312500), video_accuracy_meter(top_1=0.000000,top_5=0.375000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 38 (100.00% done), loss: 4.6530, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)]
INFO:root:Synced meters: [0] train phase 38 (100.00% done), loss: 4.6530, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)], processed batches: 8
INFO:root:Synced meters: [0] test phase 38 (100.00% done), loss: 4.6128, meters: [accuracy_meter(top_1=0.000000,top_5=0.100000), video_accuracy_meter(top_1=0.000000,top_5=0.125000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 39 (100.00% done), loss: 4.6425, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)]
INFO:root:Synced meters: [0] train phase 39 (100.00% done), loss: 4.6425, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)], processed batches: 8
INFO:root:Synced meters: [0] test phase 39 (100.00% done), loss: 4.6141, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 40 (100.00% done), loss: 4.6247, meters: [accuracy_meter(top_1=0.015625,top_5=0.093750), video_accuracy_meter(top_1=0.015625,top_5=0.093750)]
INFO:root:Synced meters: [0] train phase 40 (100.00% done), loss: 4.6247, meters: [accuracy_meter(top_1=0.015625,top_5=0.093750), video_accuracy_meter(top_1=0.015625,top_5=0.093750)], processed batches: 8
INFO:root:Synced meters: [0] test phase 40 (100.00% done), loss: 4.6147, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 41 (100.00% done), loss: 4.6118, meters: [accuracy_meter(top_1=0.000000,top_5=0.078125), video_accuracy_meter(top_1=0.000000,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 41 (100.00% done), loss: 4.6118, meters: [accuracy_meter(top_1=0.000000,top_5=0.078125), video_accuracy_meter(top_1=0.000000,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 41 (100.00% done), loss: 4.6151, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 42 (100.00% done), loss: 4.6388, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)]
INFO:root:Synced meters: [0] train phase 42 (100.00% done), loss: 4.6388, meters: [accuracy_meter(top_1=0.000000,top_5=0.046875), video_accuracy_meter(top_1=0.000000,top_5=0.046875)], processed batches: 8
INFO:root:Synced meters: [0] test phase 42 (100.00% done), loss: 4.6151, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 43 (100.00% done), loss: 4.5942, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 43 (100.00% done), loss: 4.5942, meters: [accuracy_meter(top_1=0.015625,top_5=0.062500), video_accuracy_meter(top_1=0.015625,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 43 (100.00% done), loss: 4.6154, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 44 (100.00% done), loss: 4.6118, meters: [accuracy_meter(top_1=0.031250,top_5=0.093750), video_accuracy_meter(top_1=0.031250,top_5=0.093750)]
INFO:root:Synced meters: [0] train phase 44 (100.00% done), loss: 4.6118, meters: [accuracy_meter(top_1=0.031250,top_5=0.093750), video_accuracy_meter(top_1=0.031250,top_5=0.093750)], processed batches: 8
INFO:root:Synced meters: [0] test phase 44 (100.00% done), loss: 4.6154, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 45 (100.00% done), loss: 4.6076, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)]
INFO:root:Synced meters: [0] train phase 45 (100.00% done), loss: 4.6076, meters: [accuracy_meter(top_1=0.000000,top_5=0.031250), video_accuracy_meter(top_1=0.000000,top_5=0.031250)], processed batches: 8
INFO:root:Synced meters: [0] test phase 45 (100.00% done), loss: 4.6164, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 46 (100.00% done), loss: 4.6467, meters: [accuracy_meter(top_1=0.000000,top_5=0.062500), video_accuracy_meter(top_1=0.000000,top_5=0.062500)]
INFO:root:Synced meters: [0] train phase 46 (100.00% done), loss: 4.6467, meters: [accuracy_meter(top_1=0.000000,top_5=0.062500), video_accuracy_meter(top_1=0.000000,top_5=0.062500)], processed batches: 8
INFO:root:Synced meters: [0] test phase 46 (100.00% done), loss: 4.6157, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 47 (100.00% done), loss: 4.5626, meters: [accuracy_meter(top_1=0.046875,top_5=0.125000), video_accuracy_meter(top_1=0.046875,top_5=0.125000)]
INFO:root:Synced meters: [0] train phase 47 (100.00% done), loss: 4.5626, meters: [accuracy_meter(top_1=0.046875,top_5=0.125000), video_accuracy_meter(top_1=0.046875,top_5=0.125000)], processed batches: 8
INFO:root:Synced meters: [0] test phase 47 (100.00% done), loss: 4.6125, meters: [accuracy_meter(top_1=0.000000,top_5=0.025000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 48 (100.00% done), loss: 4.6453, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)]
INFO:root:Synced meters: [0] train phase 48 (100.00% done), loss: 4.6453, meters: [accuracy_meter(top_1=0.000000,top_5=0.015625), video_accuracy_meter(top_1=0.000000,top_5=0.015625)], processed batches: 8
INFO:root:Synced meters: [0] test phase 48 (100.00% done), loss: 4.6134, meters: [accuracy_meter(top_1=0.000000,top_5=0.000000), video_accuracy_meter(top_1=0.000000,top_5=0.000000)], processed batches: 8
INFO:root:Approximate meters: [0] train phase 49 (100.00% done), loss: 4.5754, meters: [accuracy_meter(top_1=0.000000,top_5=0.078125), video_accuracy_meter(top_1=0.000000,top_5=0.078125)]
INFO:root:Synced meters: [0] train phase 49 (100.00% done), loss: 4.5754, meters: [accuracy_meter(top_1=0.000000,top_5=0.078125), video_accuracy_meter(top_1=0.000000,top_5=0.078125)], processed batches: 8
INFO:root:Synced meters: [0] test phase 49 (100.00% done), loss: 4.6107, meters: [accuracy_meter(top_1=0.100000,top_5=0.100000), video_accuracy_meter(top_1=0.125000,top_5=0.250000)], processed batches: 8

@mannatsingh
Copy link
Contributor

@siyangbing There is no straightforward way to simply speed up training unfortunately. Regarding the top-1 being low, I would recommend using the setup in https://github.com/facebookresearch/ClassyVision/blob/master/classy_vision/configs/ucf101/r3d34.json as a reference.

@mannatsingh
Copy link
Contributor

Closing the issue since the problem has been resolved. spawn doesn't work in certain environments and switching to fork solved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants