Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time about ResNet101V1c #622

Closed
xiaoachen98 opened this issue Dec 29, 2021 · 12 comments
Closed

Training time about ResNet101V1c #622

xiaoachen98 opened this issue Dec 29, 2021 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@xiaoachen98
Copy link

推荐使用英语模板 General question,以便你的问题帮助更多人。

首先确认以下内容

  • 我已经查询了相关的 issue,但没有找到需要的帮助。
  • 我已经阅读了相关文档,但仍不知道如何解决。

描述你遇到的问题

您好!
我需要在ImageNet1K上训练一个基于ResNet101V1c进行定制的backbone,即在每个bottleneck中的第二个卷积上增加一个经过卷积的残差输出,用作下游分割任务。有以下两个问题进行请教。
微信图片_20211229121049

  1. 我们实验室有8张GTX 3090,能否告知一下使用8张GPU需要训练多久。
  2. 是应该完全从随机权重开始训练?还是应该对主支加载预训练权重,对每个bolck内新增残差模块随机初始化并在残差项最后使用0初始化的norm层来稳定训练?如果加载预训练权重,新增残差模块学习率是否应该与主支一样?
    万分感谢!

相关信息

  1. pip list | grep "mmcv\|mmcls\|^torch" 命令的输出
    [填写这里]
  2. 如果你修改了,或者使用了新的配置文件,请在这里写明
[填写这里]
  1. 如果你是在训练过程中遇到的问题,请填写完整的训练日志和报错信息
    [填写这里]
  2. 如果你对 mmcls 文件夹下的代码做了其他相关的修改,请在这里写明
    [填写这里]
@xiaoachen98 xiaoachen98 added the help wanted Extra attention is needed label Dec 29, 2021
@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Dec 29, 2021

  1. Many environmental factors will affect, and we have not trained a model in 3090; there is an experiment of training ResNetV1D refer to log. It takes about 2 days (0.25s per iter, and 5000 iters every epoch, total 100 epochs) to train by using 8 V100 GPUs and Memcached.

  2. We always align with the original author's initialization method, the detail you can refer to code of ResNet, ResNeXt, SEResNet. For CNN networks, there is no need to load pre-trained models get from ImageNet1k. The initialization method you mentioned is the classic ResNet initialization method. In my point of view,lr needs to be set to the same.

@xiaoachen98
Copy link
Author

  1. Many environmental factors will affect, and we have not trained a model in 3090; there is an experiment of training ResNetV1D refer to log. It takes about 2 days (0.25s per iter, and 5000 iters every epoch, total 100 epochs) to train by using 8 V100 GPUs and Memcached.
  2. We always align with the original author's initialization method, the detail you can refer to code of ResNet, ResNeXt, SEResNet. For CNN networks, there is no need to load pre-trained models get from ImageNet1k. The initialization method you mentioned is the classic ResNet initialization method. In my point of view,lr needs to be set to the same.

十分感激您详细的回答。
那如果我主支加载Image1K pretrained权重,自定义的残差分支采用BN0初始化,然后直接在下游分割任务做训练的话,有没有什么训练建议提供?
因为这样的话方便我做消融实验,验证我自定义残差分支加载位置的影响,不用每次都在ImageNet上重新训练一遍。

@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Dec 29, 2021

先训练小模型,进行debug和查看有效性。小模型上有效,再训大的。

其实你如果想证明有用性,可以都不用imagenet1k的预训练,全都从头开始训,如果发现有效,最后可以补这样的实验。

@xiaoachen98
Copy link
Author

先训练小模型,进行debug和查看有效性。小模型上有效,再训大的。

其实你如果想证明有用性,可以都不用imagenet1k的预训练,全都从头开始训,如果发现有效,最后可以补这样的实验。

您好,请问您在前面回复中提到的使用8张V100以及内存缓存,内存缓存是默认开启的还是?

@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Dec 30, 2021

是集群上加速数据读取的工具,https://memcached.org/

@xiaoachen98
Copy link
Author

单机8卡是不是可以不需要

@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Dec 30, 2021

对,单机的读取速度本身就是很快

@xiaoachen98
Copy link
Author

对,单机的读取速度本身就是很快

我现在使用了2张GPU,每张GPU上batch_size为128,不知道这样训练可以吗?
但是我发现程序估计出的剩余训练时间会随着训练的进行逐渐上升,而不是下降,不知道您有什么建议。
微信图片_20211230131524

@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Dec 31, 2021

机器上运行其他程序不太稳定,一般在最开始的epoch,所需时间会减少,可能是CPU负载太高

@xiaoachen98
Copy link
Author

机器上运行其他程序不太稳定,一般在最开始的epoch,所需时间会减少,可能是CPU负载太高

请问一下有无ResNet101v1c的训练日志和config?
pretraindict为‘open-mmlab://resnet101_v1c’

@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Feb 15, 2022

We have provided resnet101_v1c in #692.

@Ezra-Yu
Copy link
Collaborator

Ezra-Yu commented Mar 2, 2022

#692 has been merged,

@Ezra-Yu Ezra-Yu closed this as completed Mar 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants