Problem in training with custom dataset #34

shurmanov · 2020-11-04T16:05:31Z

I'm trying to train the AlphAction model with a custom dataset.
To train, I'm running the following code:

python train_net.py --config-file "path/to/config/file.yaml" \ --transfer --no-head --use-tfboard \ SOLVER.BASE_LR 0.000125 \ SOLVER.STEPS '(560000, 720000)' \ SOLVER.MAX_ITER 880000 \ SOLVER.VIDEOS_PER_BATCH 2 \ TEST.VIDEOS_PER_BATCH 2

I'm getting the error:

loading annotations into memory...
Done (t=0.00s)
Loading box file into memory...
Done (t=0.00s)
loading annotations into memory...
Done (t=0.00s)
Loading box file into memory...
Done (t=0.00s)
Loading box file into memory...
Done (t=0.00s)
2020-11-04 19:54:09,398 alphaction.trainer INFO: Start training
Traceback (most recent call last):
File "./AlphAction/train_net.py", line 245, in
main()
File "./AlphAction/train_net.py", line 234, in main
model = train(cfg, args.local_rank, args.distributed, tblogger, args.transfer_weight, args.adjust_lr, args.skip_val,
File "./AlphAction/train_net.py", line 84, in train
do_train(
File "./AlphAction/alphaction/engine/trainer.py", line 40, in do_train
for iteration, (slow_video, fast_video, boxes, objects, extras, _) in enumerate(data_loader, start_iter):
File "./AlphAction/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "./AlphAction/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "./AlphAction/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "./AlphAction/venv/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
File "av/utils.pyx", line 27, in av.utils.AVError.init
TypeError: init() takes at least 3 positional arguments (2 given)

I've made ~130(I know not so much) action records in my dataset, I have annotated custom videos and gone through all the steps described in Data.md.

I've got the following dataset directory structure, which is pretty much the same as AVA, so that I don't have to change the code written for AVA.

data/AVA
├── annotations
│   ├── ava_action_list_v2.2_for_activitynet_2019.pbtxt
│   ├── ava_action_list_v2.2.pbtxt
│   ├── ava_file_names_trainval_v2.1.txt
│   ├── ava_include_timestamps_v2.2.txt
│   ├── ava_train_excluded_timestamps_v2.2.csv
│   ├── ava_train_v2.2.csv
│   ├── ava_train_v2.2.json
│   ├── ava_train_v2.2_min.json
│   ├── ava_val_excluded_timestamps_v2.2.csv
│   ├── ava_val_v2.2.csv
│   ├── ava_val_v2.2.json
│   └── ava_val_v2.2_min.json
├── boxes
│   ├── ava_train_det_object_bbox.json
│   ├── ava_val_det_object_bbox.json
│   └── ava_val_det_person_bbox.json
├── clips
│   └── trainval_old
│   ├── conv_1-1-56-576 [46 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-24_12-23-17 [108 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-34-08 [92 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-37-09 [88 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-39-56 [91 entries exceeds filelimit, not opening dir]
│   └── conv_cam1_2020-10-25_15-41-48 [124 entries exceeds filelimit, not opening dir]
├── keyframes
│   └── trainval
│   ├── conv_1-1-56-576 [46 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-24_12-23-17 [108 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-34-08 [92 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-37-09 [88 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-39-56 [91 entries exceeds filelimit, not opening dir]
│   └── conv_cam1_2020-10-25_15-41-48 [124 entries exceeds filelimit, not opening dir]
└── movies
└── trainval
├── conv_1-1-56-576.mp4
├── conv_cam1_2020-10-24_12-23-17.mp4
├── conv_cam1_2020-10-25_15-34-08.mp4
├── conv_cam1_2020-10-25_15-37-09.mp4
├── conv_cam1_2020-10-25_15-39-56.mp4
└── conv_cam1_2020-10-25_15-41-48.mp4

Thanks beforehand for the reply.

The text was updated successfully, but these errors were encountered:

yelantf · 2020-11-05T02:43:00Z

The problem seems to be related to the data. According to the traceback, PyAV failed to decode some video and tried to raise an exception. However, PyTorch failed to construct the PyAV exception (see PyAV-Org/PyAV#485). I'd recommend you to check your data, found which video cannot be successfully decoded. Or, use try...except block to capture this video and then ignore it.

shurmanov · 2020-11-05T16:00:55Z

Thanks for suggestions! 🙏
Finally I did successfully started training with a custom dataset.
Indeed, you were right, the problem was with my dataset.

P.S.: How much time did it take to train with AVA dataset?
In our case, the eta is showing 7 days in Colab Pro with 16GB Tesla V100.

2020-11-05 15:57:54,437 alphaction.trainer INFO: eta: 7 days, 5:56:30 iter: 7500 loss_pose_action: 0.0000 (0.0330) loss_object_interaction: 0.0000 (0.0003) loss_person_interaction: 0.0000 (0.0003) total_loss: 0.0000 (0.0624) accuracy_pose_action: 1.0000 (0.9737) accuracy_object_interaction: 1.0000 (0.9983) accuracy_person_interaction: 1.0000 (0.9982) time: 0.7186 (0.7177) data: 0.0260 (0.0255) lr: 0.000125 max mem: 5387

yelantf · 2020-11-06T01:47:49Z

For AVA dataset, training with 8 GPUs will cost around one day. Since you are training with single GPU, 7 days are reasonable. I think I have already solved your problem, so I will close this issue now. Feel free to reopen it if you have any more questions about this issue.

lawkane · 2020-11-23T09:28:33Z

Hi, I have a question about the custom dataset. I see your comment "I have annotated custom videos and gone through all the steps described in Data.md". I am confused that have you need to change 3 bbox.json in boxes in AVA directory structure when building your custom dataset?
Thanks before for your reply.

yelantf closed this as completed Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem in training with custom dataset #34

Problem in training with custom dataset #34

shurmanov commented Nov 4, 2020 •

edited

Loading

yelantf commented Nov 5, 2020

shurmanov commented Nov 5, 2020

yelantf commented Nov 6, 2020

lawkane commented Nov 23, 2020

Problem in training with custom dataset #34

Problem in training with custom dataset #34

Comments

shurmanov commented Nov 4, 2020 • edited Loading

yelantf commented Nov 5, 2020

shurmanov commented Nov 5, 2020

yelantf commented Nov 6, 2020

lawkane commented Nov 23, 2020

shurmanov commented Nov 4, 2020 •

edited

Loading