Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用GeneralRecognitionV2_PPLCNetV2_base.yaml训练自己的数据集,有些参数如何调整 #3093

Open
yaphet266 opened this issue Feb 27, 2024 · 3 comments
Assignees

Comments

@yaphet266
Copy link

是否有文档解释下这个yaml文件中每个配置项的含义

global configs

Global:
checkpoints: null
pretrained_model: null
output_dir: ./output
device: gpu
save_interval: 1
eval_during_train: True
eval_interval: 1
epochs: 100
print_batch_step: 20
use_visualdl: False
eval_mode: retrieval
retrieval_feature_from: features # 'backbone' or 'features'
re_ranking: False
use_dali: False

used for static mode and model export

image_shape: [3, 224, 224]
save_inference_dir: ./inference

AMP:
scale_loss: 65536
use_dynamic_loss_scaling: True

O1: mixed fp16

level: O1

model architecture

Arch:
name: RecModel
infer_output_key: features
infer_add_softmax: False

Backbone:
name: PPLCNetV2_base_ShiTu
pretrained: True
use_ssld: True
class_expand: &feat_dim 512
BackboneStopLayer:
name: flatten
Neck:
name: BNNeck
num_features: *feat_dim
weight_attr:
initializer:
name: Constant
value: 1.0
bias_attr:
initializer:
name: Constant
value: 0.0
learning_rate: 1.0e-20 # NOTE: Temporarily set lr small enough to freeze the bias to zero
Head:
name: FC
embedding_size: *feat_dim
class_num: 192612
weight_attr:
initializer:
name: Normal
std: 0.001
bias_attr: False

loss function config for traing/eval process

Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
- TripletAngularMarginLoss:
weight: 1.0
feature_from: features
margin: 0.5
reduction: mean
add_absolute: True
absolute_loss_weight: 0.1
normalize_feature: True
ap_value: 0.8
an_value: 0.4
Eval:
- CELoss:
weight: 1.0

Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Cosine
learning_rate: 0.06 # for 8gpu x 256bs
warmup_epoch: 5
regularizer:
name: L2
coeff: 0.00001

data loader for train and eval

DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/train_reg_all_data_v2.txt
relabel: True
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- RandFlipImage:
flip_code: 1
- Pad:
padding: 10
backend: cv2
- RandCropImageV2:
size: [224, 224]
- RandomRotation:
prob: 0.5
degrees: 90
interpolation: bilinear
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: hwc
sampler:
name: PKSampler
batch_size: 256
sample_per_id: 4
drop_last: False
shuffle: True
sample_method: "id_avg_prob"
id_list: [50030, 80700, 92019, 96015] # be careful when set relabel=True
ratio: [4, 4]
loader:
num_workers: 4
use_shared_memory: True

Eval:
Query:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/
cls_label_path: ./dataset/Aliproduct/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: [224, 224]
return_numpy: False
interpolation: bilinear
backend: cv2
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: hwc
sampler:
name: DistributedBatchSampler
batch_size: 64
drop_last: False
shuffle: False
loader:
num_workers: 4
use_shared_memory: True

Gallery:
  dataset:
    name: VeriWild
    image_root: ./dataset/Aliproduct/
    cls_label_path: ./dataset/Aliproduct/val_list.txt
    transform_ops:
      - DecodeImage:
          to_rgb: True
          channel_first: False
      - ResizeImage:
          size: [224, 224]
          return_numpy: False
          interpolation: bilinear
          backend: cv2
      - NormalizeImage:
          scale: 1.0/255.0
          mean: [0.485, 0.456, 0.406]
          std: [0.229, 0.224, 0.225]
          order: hwc
  sampler:
    name: DistributedBatchSampler
    batch_size: 64
    drop_last: False
    shuffle: False
  loader:
    num_workers: 4
    use_shared_memory: True

Metric:
Eval:
- Recallk:
topk: [1, 5]
- mAP: {}

@TingquanGao
Copy link
Collaborator

修改数据集路径(image_root: ./dataset/ cls_label_path: ./dataset/train_reg_all_data_v2.txt)后可以先训练试试,观察loss是否下降,已经最终的收敛情况、精度情况,再适当调整learning rate。

@yaphet266
Copy link
Author

@TingquanGao 如果主干网络想改成ResNet50怎么修改,是否有可参考的yaml文件

@changdazhou
Copy link
Collaborator

更改一下Arch:即可

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants