Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when training model #6

Closed
CheungBH opened this issue Jul 1, 2021 · 7 comments
Closed

Error when training model #6

CheungBH opened this issue Jul 1, 2021 · 7 comments

Comments

@CheungBH
Copy link

CheungBH commented Jul 1, 2021

Hello. Thanks for your work.
I am trying to train a GCN model using the command
python3 run_baseline.py --note pretrain --dropout 0 --lr 2e-2 --epochs 100 --posenet_name 'gcn' --checkpoint './checkpoint/pretrain_baseline' --keypoints gt,
but an error occurs.
Traceback (most recent call last):
File "run_baseline.py", line 102, in
main(args)
File "run_baseline.py", line 65, in main
glob_step, args.lr_decay, args.lr_gamma, max_norm=args.max_norm)
File "/home/hkuit155/Documents/PoseAug/function_baseline/model_pos_train.py", line 41, in train
outputs_3d = model_pos(inputs_2d)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/gcn/sem_gcn.py", line 104, in forward
out = self.gconv_input(x)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/gcn/sem_gcn.py", line 28, in forward
x = self.gconv(x).transpose(1, 2)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/gcn/sem_graph_conv.py", line 43, in forward
output = torch.matmul(adj * M, h0) + torch.matmul(adj * (1 - M), h1)
RuntimeError: invalid argument 6: wrong matrix size at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:492

A same error also occurs when training ST-GCN

Traceback (most recent call last):
File "run_baseline.py", line 102, in
main(args)
File "run_baseline.py", line 65, in main
glob_step, args.lr_decay, args.lr_gamma, max_norm=args.max_norm)
File "/home/hkuit155/Documents/PoseAug/function_baseline/model_pos_train.py", line 41, in train
outputs_3d = model_pos(inputs_2d)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/models_st_gcn/st_gcn_single_frame_test.py", line 461, in forward
x = torch.matmul(x, C) # nx2x17
RuntimeError: invalid argument 6: wrong matrix size at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:492

@CheungBH
Copy link
Author

CheungBH commented Jul 1, 2021

Also, when I am going to train videopose3d, I found a different error

==> Using settings Namespace(action_wise=True, actions='*', batch_size=1024, checkpoint='./checkpoint/pretrain_baseline', dataset='h36m', downsample=1, dropout=0.25, epochs=50, evaluate='', keypoints='gt', lr=0.001, lr_decay=100000, lr_gamma=0.96, max_norm=True, note='pretrain', num_workers=2, posenet_name='videopose', pretrain=False, random_seed=0, s1only=False, snapshot=25, stages=4)
==> Loading dataset...
==> Preparing data...
==> Loading 2D detections...
Generating 1559752 poses...
Generating 543344 poses...
Generating 2929 poses...
==> Creating PoseNet model...
create model: videopose
==> Total parameters for model videopose: 8.49M
==> Prepare optimizer...
==> Making checkpoint dir: ./checkpoint/pretrain_baseline/videopose/gt/0701165527_pretrain

Epoch: 1 | LR: 0.00100000
Traceback (most recent call last):
File "run_baseline.py", line 102, in
main(args)
File "run_baseline.py", line 65, in main
glob_step, args.lr_decay, args.lr_gamma, max_norm=args.max_norm)
File "/home/hkuit155/Documents/PoseAug/function_baseline/model_pos_train.py", line 41, in train
outputs_3d = model_pos(inputs_2d)
File "/home/hkuit155/anaconda3/envs/poseaug/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/hkuit155/Documents/PoseAug/models_baseline/videopose/model_VideoPose3D.py", line 81, in forward
x = x.view(x.shape[0], 1, 16, 2) # 0924
RuntimeError: shape '[1024, 1, 16, 2]' is invalid for input of size 34816

I wonder how to fix the bug? Thank you

@CheungBH
Copy link
Author

CheungBH commented Jul 1, 2021

My environment is:
Ubuntu 16.04
python 3.6.9
torch 1.0.1.post2
torchvision 0.2.2
cudatoolkit 10.1.243

@CheungBH
Copy link
Author

CheungBH commented Jul 1, 2021

I found that the 2d and 3d dataset contain 17 and 16 joints respectively. Is it abnormal?

@Garfield-kh
Copy link
Collaborator

The 2d and 3d dataset should be both in 16 joints definition.
May I ask if you generate the 2d posedata data_2d_h36m_gt.npz from ./data/prepare_data_h36m.py or copy from somewhere else?

@CheungBH
Copy link
Author

CheungBH commented Jul 1, 2021

Oh, I copied it from VideoPose3D repo which I have run before. Is the code for preprocess different?

@CheungBH
Copy link
Author

CheungBH commented Jul 1, 2021

The 2d and 3d dataset should be both in 16 joints definition.
May I ask if you generate the 2d posedata data_2d_h36m_gt.npz from ./data/prepare_data_h36m.py or copy from somewhere else?

Thank you. Problem solved. Maybe you can add some notifications since the preprocess code is slightly different with VideoPose3d

@CheungBH CheungBH closed this as completed Jul 1, 2021
@Garfield-kh
Copy link
Collaborator

Sure, thank you for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants