Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nan in my custom dataset training #239

Closed
Mason1992-Git opened this issue Mar 14, 2022 · 13 comments
Closed

nan in my custom dataset training #239

Mason1992-Git opened this issue Mar 14, 2022 · 13 comments

Comments

@Mason1992-Git
Copy link

训练时全是nan
训练集标签格式
image
train_batch0
image
train_batch1
image
train_batch2
image
结果文件hyp.yaml
lr0: 0.001
lrf: 0.2
momentum: 0.937
weight_decay: 0.0005
warmup_epochs: 3.0
warmup_momentum: 0.8
warmup_bias_lr: 0.1
box: 0.05
cls: 0.5
cls_pw: 1.0
theta: 0.5
theta_pw: 1.0
obj: 1.0
obj_pw: 1.0
iou_t: 0.2
anchor_t: 4.0
fl_gamma: 0.0
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
degrees: 180.0
translate: 0.1
scale: 0.5
shear: 0.0
perspective: 0.0
flipud: 0.5
fliplr: 0.5
mosaic: 0.95
mixup: 0.1
copy_paste: 0.0
cls_theta: 180
csl_radius: 2.0
结果文件opt.yaml
weights: weights\yolov5m.pt
cfg: ''
data: data\yolov5obb_demo.yaml
hyp: data\hyps\obb\hyp.finetune_dota.yaml
epochs: 10
batch_size: 1
imgsz: 1024
rect: false
resume: false
nosave: false
noval: false
noautoanchor: false
evolve: null
bucket: ''
cache: null
image_weights: false
device: '0'
multi_scale: false
single_cls: false
adam: false
sync_bn: false
workers: 8
project: runs\train
name: exp
exist_ok: false
quad: false
linear_lr: false
label_smoothing: 0.0
patience: 100
freeze:

  • 0
    save_period: -1
    local_rank: -1
    entity: null
    upload_dataset: false
    bbox_interval: -1
    artifact_alias: latest
    save_dir: runs\train\exp
@hukaixuan19970627
Copy link
Owner

image
把上述标记的文件整理一下截图发过来,另外再详细介绍一下你使用的数据集

@Mason1992-Git
Copy link
Author

Mason1992-Git commented Mar 15, 2022 via email

@hukaixuan19970627 hukaixuan19970627 changed the title 训练一直全是nan nan in my custom dataset training Mar 15, 2022
@Mason1992-Git
Copy link
Author

Mason1992-Git commented Mar 15, 2022

image 把上述标记的文件整理一下截图发过来,另外再详细介绍一下你使用的数据集
hyp.yaml截图
image
opt.yaml截图
image
results截图
image
train_batch0截图
image
train_batch1截图
image
train_batch2截图
image
val_batch0_labels截图
image
val_batch0_pred截图
image
没有生成labels.jpg

我的数据集:
image
images:
image
txt:
image
image
数据集一共50张图片,仅为了测试代码能否在其他场景跑通,所以只选择了50张图片。

@hukaixuan19970627
Copy link
Owner

  1. 度盘提供了DOTAv1.5的训练权重以及所有的训练参数文件,所以除非你知道每个参数的具体意义和影响否则不要自己更改参数
  2. 格式文件的类别名是字符串
  3. 训练后缺少文件说明训练过程中有问题,检查一下

@wjmicheal
Copy link

使用了demo中的数据集,P0032,分割也没有问题,但是训练的时候也是一样全是nan,不知道是什么原因,怀疑是不是在编译nms_rotated时出了问题。在编译nms_rotated时,在poly_nms_cuda.cu文件中出现错误提示:identificer 'eps' is undefined in device code,由于变量'eps'定义在文件中定义的是常量,因此将使用eps中的地方直接替换为常量,编译通过。

@wjmicheal
Copy link

追踪train.py 代码 第324行, 将pred 打印出来,添加 print(pred)发现 pred全是nan

@hukaixuan19970627
Copy link
Owner

#224

@hukaixuan19970627
Copy link
Owner

如果你是windows+CUDA11的话可以参考下这个issue:ultralytics/yolov5#5815

@Mason1992-Git
Copy link
Author

如果你是windows+CUDA11的话可以参考下这个issue:ultralytics/yolov5#5815

感谢作者!这个问题已解决,是显卡和pytorch版本问题。换到30系列可以训练

@LUO77123
Copy link

image
image
image
image
image
image

@Mason1992-Git
Copy link
Author

Mason1992-Git commented May 13, 2022 via email

@LUO77123
Copy link

image 把上述标记的文件整理一下截图发过来,另外再详细介绍一下你使用的数据集

数据集就是DATA,分割1024*1024,gap=512

@Mason1992-Git
Copy link
Author

Mason1992-Git commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants