Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in training with custom dataset #34

Closed
shurmanov opened this issue Nov 4, 2020 · 4 comments
Closed

Problem in training with custom dataset #34

shurmanov opened this issue Nov 4, 2020 · 4 comments

Comments

@shurmanov
Copy link

shurmanov commented Nov 4, 2020

I'm trying to train the AlphAction model with a custom dataset.
To train, I'm running the following code:

python train_net.py --config-file "path/to/config/file.yaml" \ --transfer --no-head --use-tfboard \ SOLVER.BASE_LR 0.000125 \ SOLVER.STEPS '(560000, 720000)' \ SOLVER.MAX_ITER 880000 \ SOLVER.VIDEOS_PER_BATCH 2 \ TEST.VIDEOS_PER_BATCH 2

I'm getting the error:

loading annotations into memory...
Done (t=0.00s)
Loading box file into memory...
Done (t=0.00s)
loading annotations into memory...
Done (t=0.00s)
Loading box file into memory...
Done (t=0.00s)
Loading box file into memory...
Done (t=0.00s)
2020-11-04 19:54:09,398 alphaction.trainer INFO: Start training
Traceback (most recent call last):
File "./AlphAction/train_net.py", line 245, in
main()
File "./AlphAction/train_net.py", line 234, in main
model = train(cfg, args.local_rank, args.distributed, tblogger, args.transfer_weight, args.adjust_lr, args.skip_val,
File "./AlphAction/train_net.py", line 84, in train
do_train(
File "./AlphAction/alphaction/engine/trainer.py", line 40, in do_train
for iteration, (slow_video, fast_video, boxes, objects, extras, _) in enumerate(data_loader, start_iter):
File "./AlphAction/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "./AlphAction/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "./AlphAction/venv/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "./AlphAction/venv/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
File "av/utils.pyx", line 27, in av.utils.AVError.init
TypeError: init() takes at least 3 positional arguments (2 given)

I've made ~130(I know not so much) action records in my dataset, I have annotated custom videos and gone through all the steps described in Data.md.

I've got the following dataset directory structure, which is pretty much the same as AVA, so that I don't have to change the code written for AVA.

data/AVA
├── annotations
│   ├── ava_action_list_v2.2_for_activitynet_2019.pbtxt
│   ├── ava_action_list_v2.2.pbtxt
│   ├── ava_file_names_trainval_v2.1.txt
│   ├── ava_include_timestamps_v2.2.txt
│   ├── ava_train_excluded_timestamps_v2.2.csv
│   ├── ava_train_v2.2.csv
│   ├── ava_train_v2.2.json
│   ├── ava_train_v2.2_min.json
│   ├── ava_val_excluded_timestamps_v2.2.csv
│   ├── ava_val_v2.2.csv
│   ├── ava_val_v2.2.json
│   └── ava_val_v2.2_min.json
├── boxes
│   ├── ava_train_det_object_bbox.json
│   ├── ava_val_det_object_bbox.json
│   └── ava_val_det_person_bbox.json
├── clips
│   └── trainval_old
│   ├── conv_1-1-56-576 [46 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-24_12-23-17 [108 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-34-08 [92 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-37-09 [88 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-39-56 [91 entries exceeds filelimit, not opening dir]
│   └── conv_cam1_2020-10-25_15-41-48 [124 entries exceeds filelimit, not opening dir]
├── keyframes
│   └── trainval
│   ├── conv_1-1-56-576 [46 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-24_12-23-17 [108 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-34-08 [92 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-37-09 [88 entries exceeds filelimit, not opening dir]
│   ├── conv_cam1_2020-10-25_15-39-56 [91 entries exceeds filelimit, not opening dir]
│   └── conv_cam1_2020-10-25_15-41-48 [124 entries exceeds filelimit, not opening dir]
└── movies
└── trainval
├── conv_1-1-56-576.mp4
├── conv_cam1_2020-10-24_12-23-17.mp4
├── conv_cam1_2020-10-25_15-34-08.mp4
├── conv_cam1_2020-10-25_15-37-09.mp4
├── conv_cam1_2020-10-25_15-39-56.mp4
└── conv_cam1_2020-10-25_15-41-48.mp4

Thanks beforehand for the reply.

@yelantf
Copy link
Collaborator

yelantf commented Nov 5, 2020

The problem seems to be related to the data. According to the traceback, PyAV failed to decode some video and tried to raise an exception. However, PyTorch failed to construct the PyAV exception (see PyAV-Org/PyAV#485). I'd recommend you to check your data, found which video cannot be successfully decoded. Or, use try...except block to capture this video and then ignore it.

@shurmanov
Copy link
Author

Thanks for suggestions! 🙏
Finally I did successfully started training with a custom dataset.
Indeed, you were right, the problem was with my dataset.

P.S.: How much time did it take to train with AVA dataset?
In our case, the eta is showing 7 days in Colab Pro with 16GB Tesla V100.

2020-11-05 15:57:54,437 alphaction.trainer INFO: eta: 7 days, 5:56:30 iter: 7500 loss_pose_action: 0.0000 (0.0330) loss_object_interaction: 0.0000 (0.0003) loss_person_interaction: 0.0000 (0.0003) total_loss: 0.0000 (0.0624) accuracy_pose_action: 1.0000 (0.9737) accuracy_object_interaction: 1.0000 (0.9983) accuracy_person_interaction: 1.0000 (0.9982) time: 0.7186 (0.7177) data: 0.0260 (0.0255) lr: 0.000125 max mem: 5387

@yelantf
Copy link
Collaborator

yelantf commented Nov 6, 2020

For AVA dataset, training with 8 GPUs will cost around one day. Since you are training with single GPU, 7 days are reasonable. I think I have already solved your problem, so I will close this issue now. Feel free to reopen it if you have any more questions about this issue.

@yelantf yelantf closed this as completed Nov 6, 2020
@lawkane
Copy link

lawkane commented Nov 23, 2020

Hi, I have a question about the custom dataset. I see your comment "I have annotated custom videos and gone through all the steps described in Data.md". I am confused that have you need to change 3 bbox.json in boxes in AVA directory structure when building your custom dataset?
Thanks before for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants