Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on single dataset #72

Open
fmx789 opened this issue Sep 29, 2022 · 1 comment
Open

Training on single dataset #72

fmx789 opened this issue Sep 29, 2022 · 1 comment

Comments

@fmx789
Copy link

fmx789 commented Sep 29, 2022

Hi,
Thank you for the great work! In the results section of your paper, you've stated results for your training on mixed datasets for 200 epochs. I attempted to train on the single 3dpw dataset from scratch but received unexpected results (as shown in the log below). I'd appreciate it if you could advise me how to solve this problem.

Thanks in advance.

2022-09-27 10:27:33,273 METRO INFO: Using 1 GPUs
2022-09-27 10:27:37,447 METRO INFO: Update config parameter num_hidden_layers: 12 -> 4
2022-09-27 10:27:37,447 METRO INFO: Update config parameter hidden_size: 768 -> 1024
2022-09-27 10:27:37,447 METRO INFO: Update config parameter num_attention_heads: 12 -> 4
2022-09-27 10:27:38,310 METRO INFO: Init model from scratch.
2022-09-27 10:27:38,310 METRO INFO: Update config parameter num_hidden_layers: 12 -> 4
2022-09-27 10:27:38,310 METRO INFO: Update config parameter hidden_size: 768 -> 256
2022-09-27 10:27:38,310 METRO INFO: Update config parameter num_attention_heads: 12 -> 4
2022-09-27 10:27:38,486 METRO INFO: Init model from scratch.
2022-09-27 10:27:38,486 METRO INFO: Update config parameter num_hidden_layers: 12 -> 4
2022-09-27 10:27:38,486 METRO INFO: Update config parameter hidden_size: 768 -> 128
2022-09-27 10:27:38,486 METRO INFO: Update config parameter num_attention_heads: 12 -> 4
2022-09-27 10:27:38,569 METRO INFO: Init model from scratch.
2022-09-27 10:27:40,009 METRO INFO: => loading hrnet-v2-w64 model
2022-09-27 10:27:40,012 METRO INFO: Transformers total parameters: 102256646
2022-09-27 10:27:40,016 METRO INFO: Backbone total parameters: 128059944
2022-09-27 10:27:40,216 METRO INFO: Training parameters Namespace(data_dir='datasets', train_yaml='pw3d_tsv_reproduce/train.yaml', val_yaml='pw3d_tsv_reproduce/test.yaml', num_workers=4, img_scale_factor=1, model_name_or_path='metro/modeling/bert/bert-base-uncased/', resume_checkpoint=None, output_dir='output/', config_name='', per_gpu_train_batch_size=20, per_gpu_eval_batch_size=30, lr=0.0001, num_train_epochs=30, vertices_loss_weight=100.0, joints_loss_weight=1000.0, vloss_w_full=0.33, vloss_w_sub=0.33, vloss_w_sub2=0.33, drop_out=0.1, arch='hrnet-w64', num_hidden_layers=4, hidden_size=128, num_attention_heads=4, intermediate_size=-1, input_feat_dim='2051,512,128', hidden_feat_dim='1024,256,128', legacy_setting=True, run_eval_only=False, logging_steps=1000, device=device(type='cuda'), seed=88, local_rank=0, num_gpus=1, distributed=False)
2022-09-27 10:37:39,084 METRO INFO: eta: 5:30:01 epoch: 0 iter: 1000 max mem : 19359 loss: 43.8094, 2d joint loss: 0.0363, 3d joint loss: 0.0242, vertex loss: 0.1603, compute: 0.5986, data: 0.0054, lr: 0.000100
2022-09-27 10:44:41,439 METRO INFO: Validation epoch: 1 mPVE: 216.89, mPJPE: 163.97, PAmPJPE: 110.12, Data Count: 35515.00
2022-09-27 10:53:16,153 METRO INFO: eta: 6:50:32 epoch: 1 iter: 2000 max mem : 19359 loss: 32.0019, 2d joint loss: 0.0250, 3d joint loss: 0.0167, vertex loss: 0.1277, compute: 0.7678, data: 0.1754, lr: 0.000100
2022-09-27 11:01:39,414 METRO INFO: Validation epoch: 2 mPVE: 213.65, mPJPE: 161.82, PAmPJPE: 105.72, Data Count: 35515.00
2022-09-27 11:08:53,971 METRO INFO: eta: 7:07:05 epoch: 2 iter: 3000 max mem : 19359 loss: 26.3174, 2d joint loss: 0.0201, 3d joint loss: 0.0134, vertex loss: 0.1088, compute: 0.8245, data: 0.2321, lr: 0.000100
2022-09-27 11:18:37,952 METRO INFO: Validation epoch: 3 mPVE: 204.17, mPJPE: 154.88, PAmPJPE: 102.00, Data Count: 35515.00
2022-09-27 11:24:28,939 METRO INFO: eta: 7:07:11 epoch: 3 iter: 4000 max mem : 19359 loss: 22.8643, 2d joint loss: 0.0172, 3d joint loss: 0.0115, vertex loss: 0.0963, compute: 0.8521, data: 0.2601, lr: 0.000100
2022-09-27 11:36:15,641 METRO INFO: Validation epoch: 4 mPVE: 182.91, mPJPE: 147.08, PAmPJPE: 96.03, Data Count: 35515.00
2022-09-27 11:36:17,768 METRO INFO: Save checkpoint to output/checkpoint-4-4544
2022-09-27 11:41:03,895 METRO INFO: eta: 7:06:50 epoch: 4 iter: 5000 max mem : 19359 loss: 20.4471, 2d joint loss: 0.0152, 3d joint loss: 0.0102, vertex loss: 0.0874, compute: 0.8807, data: 0.2837, lr: 0.000100
......
2022-09-27 18:08:18,140 METRO INFO: Validation epoch: 27 mPVE: 156.67, mPJPE: 136.94, PAmPJPE: 89.08, Data Count: 35515.00
2022-09-27 18:08:20,040 METRO INFO: Save checkpoint to output/checkpoint-27-30672
2022-09-27 18:11:35,319 METRO INFO: eta: 0:46:05 epoch: 27 iter: 31000 max mem : 19359 loss: 7.0103, 2d joint loss: 0.0049, 3d joint loss: 0.0030, vertex loss: 0.0350, compute: 0.8979, data: 0.3039, lr: 0.000010
2022-09-27 18:25:21,996 METRO INFO: Validation epoch: 28 mPVE: 157.29, mPJPE: 137.61, PAmPJPE: 88.35, Data Count: 35515.00
2022-09-27 18:25:23,883 METRO INFO: Save checkpoint to output/checkpoint-28-31808
2022-09-27 18:27:18,918 METRO INFO: eta: 0:31:10 epoch: 28 iter: 32000 max mem : 19359 loss: 6.8592, 2d joint loss: 0.0048, 3d joint loss: 0.0029, vertex loss: 0.0344, compute: 0.8993, data: 0.3053, lr: 0.000010
2022-09-27 18:42:27,939 METRO INFO: Validation epoch: 29 mPVE: 158.00, mPJPE: 137.04, PAmPJPE: 88.93, Data Count: 35515.00
2022-09-27 18:43:01,725 METRO INFO: eta: 0:16:12 epoch: 29 iter: 33000 max mem : 19359 loss: 6.7176, 2d joint loss: 0.0047, 3d joint loss: 0.0029, vertex loss: 0.0338, compute: 0.9006, data: 0.3065, lr: 0.000010
2022-09-27 18:53:01,465 METRO INFO: eta: 0:01:11 epoch: 29 iter: 34000 max mem : 19359 loss: 6.5830, 2d joint loss: 0.0046, 3d joint loss: 0.0028, vertex loss: 0.0333, compute: 0.8918, data: 0.2977, lr: 0.000010
2022-09-27 18:53:49,660 METRO INFO: eta: 0:00:00 epoch: 30 iter: 34080 max mem : 19359 loss: 6.5728, 2d joint loss: 0.0046, 3d joint loss: 0.0028, vertex loss: 0.0333, compute: 0.8911, data: 0.2970, lr: 0.000001
2022-09-27 18:59:31,358 METRO INFO: Validation epoch: 30 mPVE: 158.38, mPJPE: 137.46, PAmPJPE: 88.50, Data Count: 35515.00

@imabackstabber
Copy link

Looks like you are facing overfitting problem. I also tried training it for hand dataset by using freihand dataset alone, unluckily, it overfitted. I'm also wondering how to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants