Error when training model #6

CheungBH · 2021-07-01T08:59:36Z

Hello. Thanks for your work.
I am trying to train a GCN model using the command
python3 run_baseline.py --note pretrain --dropout 0 --lr 2e-2 --epochs 100 --posenet_name 'gcn' --checkpoint './checkpoint/pretrain_baseline' --keypoints gt,
but an error occurs.
Traceback (most recent call last):
File "run_baseline.py", line 102, in
main(args)
File "run_baseline.py", line 65, in main
glob_step, args.lr_decay, args.lr_gamma, max_norm=args.max_norm)
File "/home/hkuit155/Documents/PoseAug/function_baseline/model_pos_train.py", line 41, in train
outputs_3d = model_pos(inputs_2d)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/gcn/sem_gcn.py", line 104, in forward
out = self.gconv_input(x)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/gcn/sem_gcn.py", line 28, in forward
x = self.gconv(x).transpose(1, 2)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/gcn/sem_graph_conv.py", line 43, in forward
output = torch.matmul(adj * M, h0) + torch.matmul(adj * (1 - M), h1)
RuntimeError: invalid argument 6: wrong matrix size at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:492

A same error also occurs when training ST-GCN

Traceback (most recent call last):
File "run_baseline.py", line 102, in
main(args)
File "run_baseline.py", line 65, in main
glob_step, args.lr_decay, args.lr_gamma, max_norm=args.max_norm)
File "/home/hkuit155/Documents/PoseAug/function_baseline/model_pos_train.py", line 41, in train
outputs_3d = model_pos(inputs_2d)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/models_st_gcn/st_gcn_single_frame_test.py", line 461, in forward
x = torch.matmul(x, C) # nx2x17
RuntimeError: invalid argument 6: wrong matrix size at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:492

CheungBH · 2021-07-01T09:00:45Z

Also, when I am going to train videopose3d, I found a different error

==> Using settings Namespace(action_wise=True, actions='*', batch_size=1024, checkpoint='./checkpoint/pretrain_baseline', dataset='h36m', downsample=1, dropout=0.25, epochs=50, evaluate='', keypoints='gt', lr=0.001, lr_decay=100000, lr_gamma=0.96, max_norm=True, note='pretrain', num_workers=2, posenet_name='videopose', pretrain=False, random_seed=0, s1only=False, snapshot=25, stages=4)
==> Loading dataset...
==> Preparing data...
==> Loading 2D detections...
Generating 1559752 poses...
Generating 543344 poses...
Generating 2929 poses...
==> Creating PoseNet model...
create model: videopose
==> Total parameters for model videopose: 8.49M
==> Prepare optimizer...
==> Making checkpoint dir: ./checkpoint/pretrain_baseline/videopose/gt/0701165527_pretrain

Epoch: 1 | LR: 0.00100000
Traceback (most recent call last):
File "run_baseline.py", line 102, in
main(args)
File "run_baseline.py", line 65, in main
glob_step, args.lr_decay, args.lr_gamma, max_norm=args.max_norm)
File "/home/hkuit155/Documents/PoseAug/function_baseline/model_pos_train.py", line 41, in train
outputs_3d = model_pos(inputs_2d)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/videopose/model_VideoPose3D.py", line 81, in forward
x = x.view(x.shape[0], 1, 16, 2) # 0924
RuntimeError: shape '[1024, 1, 16, 2]' is invalid for input of size 34816

I wonder how to fix the bug? Thank you

CheungBH · 2021-07-01T09:03:05Z

My environment is:
Ubuntu 16.04
python 3.6.9
torch 1.0.1.post2
torchvision 0.2.2
cudatoolkit 10.1.243

CheungBH · 2021-07-01T09:22:57Z

I found that the 2d and 3d dataset contain 17 and 16 joints respectively. Is it abnormal?

Garfield-kh · 2021-07-01T10:24:19Z

The 2d and 3d dataset should be both in 16 joints definition.
May I ask if you generate the 2d posedata data_2d_h36m_gt.npz from ./data/prepare_data_h36m.py or copy from somewhere else?

CheungBH · 2021-07-01T12:25:15Z

Oh, I copied it from VideoPose3D repo which I have run before. Is the code for preprocess different?

CheungBH · 2021-07-01T12:38:24Z

The 2d and 3d dataset should be both in 16 joints definition.
May I ask if you generate the 2d posedata data_2d_h36m_gt.npz from ./data/prepare_data_h36m.py or copy from somewhere else?

Thank you. Problem solved. Maybe you can add some notifications since the preprocess code is slightly different with VideoPose3d

Garfield-kh · 2021-07-02T03:50:23Z

Sure, thank you for the suggestion.

CheungBH closed this as completed Jul 1, 2021

Garfield-kh mentioned this issue Jul 29, 2021

pretrain the baseline model #11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when training model #6

Error when training model #6

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021

Garfield-kh commented Jul 1, 2021

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021 •

edited

Garfield-kh commented Jul 2, 2021

Error when training model #6

Error when training model #6

Comments

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021

Garfield-kh commented Jul 1, 2021

CheungBH commented Jul 1, 2021

CheungBH commented Jul 1, 2021 • edited

Garfield-kh commented Jul 2, 2021

CheungBH commented Jul 1, 2021 •

edited