Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when trying to train the model #43

Open
Wincioor11 opened this issue Jul 18, 2022 · 2 comments
Open

Errors when trying to train the model #43

Wincioor11 opened this issue Jul 18, 2022 · 2 comments

Comments

@Wincioor11
Copy link

Hi, I downloaded the datasets and organized them as the instruction says.
I have several issues when trying to reproduce your instruction steps:

  1. Your pretrained model gives only 28% MOTA when running configs/r50_motr_eval.sh on it.
  2. I tried training the model on my own using your instructions, but errors occurred.
    I downloaded pretrained DETR from https://github.com/fundamentalvision/Deformable-DETR#main-results.
    When I run the configs/r50_motr_train.sh I get the errors about the wrong pretrained model data sizes.
/MOTR$ sh configs/r50_motr_train.sh
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your app
*****************************************
| distributed init (rank 2): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 3): env://
git:
  sha: 8690da3392159635ca37c31975126acf40220724, status: has uncommited changes, branch: main

Namespace(accurate_ratio=False, aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, cache_mode=False, cj=False, clip_max_norm=0.1, cls_loss_coef=2, coco_panoptic_path=None, oco/', crop=False, data_txt_path_train='./datasets/data_path/joint.train', data_txt_path_val='./datasets/data_path/mot17.train', dataset_file='e2e_joint', dec_layers=6, dec_n_points=4, decooef=1, dilation=False, dim_feedforward=1024, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.0, enable_fpn=False, enc_layers=6, enc_n_points=4, epochs=200, eval=False, eignore=False, focal_alpha=0.25, fp_ratio=0.3, frozen_weights=None, giou_loss_coef=2, gpu=0, gt_file_train=None, gt_file_val=None, hidden_dim=256, img_path='data/valid/JPEGImages/', input_vi.0002, lr_backbone=2e-05, lr_backbone_names=['backbone.0'], lr_drop=100, lr_drop_epochs=None, lr_linear_proj_mult=0.1, lr_linear_proj_names=['reference_points', 'sampling_offsets'], mask_lonk_len=4, memory_bank_score_thresh=0.0, memory_bank_type=None, memory_bank_with_self_attn=False, merger_dropout=0.0, meta_arch='motr', mix_match=False, mot_path='./data', nheads=8, num_anchm_workers=2, output_dir='exps/e2e_motr_r50_joint', position_embedding='sine', position_embedding_scale=6.283185307179586, pretrained='coco_model_final.pth', query_interaction_layer='QIM', rresume='', sample_interval=10, sample_mode='random_interval', sampler_lengths=[2, 3, 4, 5], sampler_steps=[50, 90, 150], save_period=50, seed=42, set_cost_bbox=5, set_cost_class=2, set_costoch=0, two_stage=False, update_query_pos=True, use_checkpoint=True, val_width=800, vis=False, weight_decay=0.0001, with_box_refine=True, world_size=4)
Training with Extra Self Attention in Every Decoder.
Training with Self-Cross Attention.
number of params: 43912992
register 1-th video: data/crowdhuman/labels_with_ids/val
register 2-th video: data/MOT17/labels_with_ids/train/MOT17-02-SDP/img1
register 3-th video: data/MOT17/labels_with_ids/train/MOT17-04-SDP/img1
register 4-th video: data/MOT17/labels_with_ids/train/MOT17-05-SDP/img1
register 5-th video: data/MOT17/labels_with_ids/train/MOT17-09-SDP/img1
register 6-th video: data/MOT17/labels_with_ids/train/MOT17-10-SDP/img1
register 7-th video: data/MOT17/labels_with_ids/train/MOT17-11-SDP/img1
register 8-th video: data/MOT17/labels_with_ids/train/MOT17-13-SDP/img1
sampler_steps=[50, 90, 150] lenghts=[2, 3, 4, 5]
register 1-th video: data/MOT17/labels_with_ids/train/MOT17-02-SDP/img1
register 2-th video: data/MOT17/labels_with_ids/train/MOT17-04-SDP/img1
register 3-th video: data/MOT17/labels_with_ids/train/MOT17-05-SDP/img1
register 4-th video: data/MOT17/labels_with_ids/train/MOT17-09-SDP/img1
register 5-th video: data/MOT17/labels_with_ids/train/MOT17-10-SDP/img1
register 6-th video: data/MOT17/labels_with_ids/train/MOT17-11-SDP/img1
register 7-th video: data/MOT17/labels_with_ids/train/MOT17-13-SDP/img1
sampler_steps=[50, 90, 150] lenghts=[2, 3, 4, 5]
loaded coco_model_final.pth
Skip loading parameter class_embed.0.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset.
load class_embed: class_embed.0.weight shape=torch.Size([91, 256])
Skip loading parameter class_embed.0.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset.
load class_embed: class_embed.0.bias shape=torch.Size([91])
Skip loading parameter class_embed.1.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset.
load class_embed: class_embed.1.weight shape=torch.Size([91, 256])
Skip loading parameter class_embed.1.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset.
load class_embed: class_embed.1.bias shape=torch.Size([91])
Skip loading parameter class_embed.2.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset.
load class_embed: class_embed.2.weight shape=torch.Size([91, 256])
Skip loading parameter class_embed.2.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset.
load class_embed: class_embed.2.bias shape=torch.Size([91])
Skip loading parameter class_embed.3.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset.
load class_embed: class_embed.3.weight shape=torch.Size([91, 256])
Skip loading parameter class_embed.3.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset.
load class_embed: class_embed.3.bias shape=torch.Size([91])
Skip loading parameter class_embed.4.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset.
load class_embed: class_embed.4.weight shape=torch.Size([91, 256])
Skip loading parameter class_embed.4.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset.
load class_embed: class_embed.4.bias shape=torch.Size([91])
Skip loading parameter class_embed.5.weight, required shapetorch.Size([1, 256]), loaded shapetorch.Size([91, 256]). If you see this, your model does not fully load the pre-trained weight. Ps for your own dataset.
load class_embed: class_embed.5.weight shape=torch.Size([91, 256])
Skip loading parameter class_embed.5.bias, required shapetorch.Size([1]), loaded shapetorch.Size([91]). If you see this, your model does not fully load the pre-trained weight. Please make swn dataset.
load class_embed: class_embed.5.bias shape=torch.Size([91])
No param track_embed.self_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset
No param track_embed.self_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.self_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own datase
No param track_embed.self_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_pos1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_pos1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_pos2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_pos2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm_pos.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm_pos.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_feat1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_feat1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_feat2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.linear_feat2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm_feat.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm_feat.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm3.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param track_embed.norm3.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own dataset.
No param transformer.decoder.layers.0.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f
No param transformer.decoder.layers.0.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for
No param transformer.decoder.layers.0.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes
No param transformer.decoder.layers.0.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo
No param transformer.decoder.layers.0.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da
No param transformer.decoder.layers.0.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data
No param transformer.decoder.layers.1.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f
No param transformer.decoder.layers.1.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for
No param transformer.decoder.layers.1.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes
No param transformer.decoder.layers.1.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo
No param transformer.decoder.layers.1.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da
No param transformer.decoder.layers.1.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data
No param transformer.decoder.layers.2.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f
No param transformer.decoder.layers.2.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for
No param transformer.decoder.layers.2.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes
No param transformer.decoder.layers.2.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo
No param transformer.decoder.layers.2.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da
No param transformer.decoder.layers.2.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data
No param transformer.decoder.layers.3.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f
No param transformer.decoder.layers.3.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for
No param transformer.decoder.layers.3.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes
No param transformer.decoder.layers.3.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo
No param transformer.decoder.layers.3.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da
No param transformer.decoder.layers.3.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data
No param transformer.decoder.layers.4.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f
No param transformer.decoder.layers.4.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for
No param transformer.decoder.layers.4.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes
No param transformer.decoder.layers.4.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo
No param transformer.decoder.layers.4.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da
No param transformer.decoder.layers.4.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data
No param transformer.decoder.layers.5.update_attn.in_proj_weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes f
No param transformer.decoder.layers.5.update_attn.in_proj_bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for
No param transformer.decoder.layers.5.update_attn.out_proj.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes
No param transformer.decoder.layers.5.update_attn.out_proj.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes fo
No param transformer.decoder.layers.5.norm4.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own da
No param transformer.decoder.layers.5.norm4.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your own data
No param transformer.decoder.bbox_embed.0.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.0.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.0.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.0.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.0.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.0.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.1.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.1.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.1.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.1.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.1.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.1.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.2.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.2.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.2.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.2.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.2.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.2.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.3.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.3.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.3.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.3.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.3.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.3.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.4.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.4.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.4.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.4.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.4.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.4.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.5.layers.0.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.5.layers.0.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.5.layers.1.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.5.layers.1.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
No param transformer.decoder.bbox_embed.5.layers.2.weight.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your
No param transformer.decoder.bbox_embed.5.layers.2.bias.If you see this, your model does not fully load the pre-trained weight. Please make sure you set the correct --num_classes for your o
Start training

I tried all pretrained DETR models provided but none of them worked. Can you help me?

  1. I deleted the line from train.sh that points to pretrained DETR model and tried training from scratch. The training stopped after the 3rd epoch and I don't see any log that tells the reason for the stopped training.
@zyayoung
Copy link
Collaborator

  1. The pretrained model is for testing on the MOT17 test split.
  2. We use pre-trained Deformable DETR + iterative bounding box refinement from Deformable-DETR. You may try this weight. Skip loading class_embed is expected since the number of classes is changed.
  3. We haven't gone into the issue of stopping without any error log yet if no process was killed due to out of memory, you may try again and see if this issue still occurs.

@Wincioor11
Copy link
Author

Thanks for the quick answer :)
Ok, so now I understand the 2), for 3) it was just a killed process probably. Now I started the training on 2 GPUs and after 4 days I cannot see any new checkpoints (besides the 1st one) or results. Do you have a better way to do a full training with processes in the background and track the progress?

I don't understand the 1), what does it mean that model is for testing MOT17 test split? I used the r50_motr_eval.sh script, how can I evaluate it properly ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants