Training time about ResNet101V1c #622

xiaoachen98 · 2021-12-29T04:11:53Z

推荐使用英语模板 General question，以便你的问题帮助更多人。

首先确认以下内容

我已经查询了相关的 issue，但没有找到需要的帮助。
我已经阅读了相关文档，但仍不知道如何解决。

描述你遇到的问题

您好！
我需要在ImageNet1K上训练一个基于ResNet101V1c进行定制的backbone，即在每个bottleneck中的第二个卷积上增加一个经过卷积的残差输出，用作下游分割任务。有以下两个问题进行请教。

我们实验室有8张GTX 3090，能否告知一下使用8张GPU需要训练多久。
是应该完全从随机权重开始训练？还是应该对主支加载预训练权重，对每个bolck内新增残差模块随机初始化并在残差项最后使用0初始化的norm层来稳定训练？如果加载预训练权重，新增残差模块学习率是否应该与主支一样？
万分感谢！

相关信息

pip list | grep "mmcv\|mmcls\|^torch" 命令的输出
[填写这里]
如果你修改了，或者使用了新的配置文件，请在这里写明

[填写这里]

如果你是在训练过程中遇到的问题，请填写完整的训练日志和报错信息
[填写这里]
如果你对 mmcls 文件夹下的代码做了其他相关的修改，请在这里写明
[填写这里]

The text was updated successfully, but these errors were encountered:

Ezra-Yu · 2021-12-29T04:53:36Z

Many environmental factors will affect, and we have not trained a model in 3090; there is an experiment of training ResNetV1D refer to log. It takes about 2 days (0.25s per iter, and 5000 iters every epoch, total 100 epochs) to train by using 8 V100 GPUs and Memcached.
We always align with the original author's initialization method， the detail you can refer to code of ResNet, ResNeXt, SEResNet. For CNN networks, there is no need to load pre-trained models get from ImageNet1k. The initialization method you mentioned is the classic ResNet initialization method. In my point of view，lr needs to be set to the same.

xiaoachen98 · 2021-12-29T05:02:27Z

Many environmental factors will affect, and we have not trained a model in 3090; there is an experiment of training ResNetV1D refer to log. It takes about 2 days (0.25s per iter, and 5000 iters every epoch, total 100 epochs) to train by using 8 V100 GPUs and Memcached.

We always align with the original author's initialization method， the detail you can refer to code of ResNet, ResNeXt, SEResNet. For CNN networks, there is no need to load pre-trained models get from ImageNet1k. The initialization method you mentioned is the classic ResNet initialization method. In my point of view，lr needs to be set to the same.

十分感激您详细的回答。
那如果我主支加载Image1K pretrained权重，自定义的残差分支采用BN0初始化，然后直接在下游分割任务做训练的话，有没有什么训练建议提供？
因为这样的话方便我做消融实验，验证我自定义残差分支加载位置的影响，不用每次都在ImageNet上重新训练一遍。

Ezra-Yu · 2021-12-29T06:56:35Z

先训练小模型，进行debug和查看有效性。小模型上有效，再训大的。

其实你如果想证明有用性，可以都不用imagenet1k的预训练，全都从头开始训，如果发现有效，最后可以补这样的实验。

xiaoachen98 · 2021-12-30T03:27:45Z

先训练小模型，进行debug和查看有效性。小模型上有效，再训大的。

其实你如果想证明有用性，可以都不用imagenet1k的预训练，全都从头开始训，如果发现有效，最后可以补这样的实验。

您好，请问您在前面回复中提到的使用8张V100以及内存缓存，内存缓存是默认开启的还是？

Ezra-Yu · 2021-12-30T03:54:51Z

是集群上加速数据读取的工具，https://memcached.org/

xiaoachen98 · 2021-12-30T03:56:58Z

单机8卡是不是可以不需要

Ezra-Yu · 2021-12-30T04:01:25Z

对，单机的读取速度本身就是很快

xiaoachen98 · 2021-12-30T05:15:41Z

对，单机的读取速度本身就是很快

我现在使用了2张GPU，每张GPU上batch_size为128，不知道这样训练可以吗？
但是我发现程序估计出的剩余训练时间会随着训练的进行逐渐上升，而不是下降，不知道您有什么建议。

Ezra-Yu · 2021-12-31T03:50:51Z

机器上运行其他程序不太稳定，一般在最开始的epoch，所需时间会减少，可能是CPU负载太高

xiaoachen98 · 2022-01-06T08:36:31Z

机器上运行其他程序不太稳定，一般在最开始的epoch，所需时间会减少，可能是CPU负载太高

请问一下有无ResNet101v1c的训练日志和config？
pretraindict为‘open-mmlab://resnet101_v1c’

Ezra-Yu · 2022-02-15T11:22:22Z

We have provided resnet101_v1c in #692.

Ezra-Yu · 2022-03-02T08:09:40Z

#692 has been merged,

xiaoachen98 added the help wanted Extra attention is needed label Dec 29, 2021

Ezra-Yu closed this as completed Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time about ResNet101V1c #622

Training time about ResNet101V1c #622

xiaoachen98 commented Dec 29, 2021

Ezra-Yu commented Dec 29, 2021

xiaoachen98 commented Dec 29, 2021

Ezra-Yu commented Dec 29, 2021

xiaoachen98 commented Dec 30, 2021

Ezra-Yu commented Dec 30, 2021

xiaoachen98 commented Dec 30, 2021

Ezra-Yu commented Dec 30, 2021

xiaoachen98 commented Dec 30, 2021

Ezra-Yu commented Dec 31, 2021

xiaoachen98 commented Jan 6, 2022

Ezra-Yu commented Feb 15, 2022

Ezra-Yu commented Mar 2, 2022

Training time about ResNet101V1c #622

Training time about ResNet101V1c #622

Comments

xiaoachen98 commented Dec 29, 2021

首先确认以下内容

描述你遇到的问题

相关信息

Ezra-Yu commented Dec 29, 2021

xiaoachen98 commented Dec 29, 2021

Ezra-Yu commented Dec 29, 2021

xiaoachen98 commented Dec 30, 2021

Ezra-Yu commented Dec 30, 2021

xiaoachen98 commented Dec 30, 2021

Ezra-Yu commented Dec 30, 2021

xiaoachen98 commented Dec 30, 2021

Ezra-Yu commented Dec 31, 2021

xiaoachen98 commented Jan 6, 2022

Ezra-Yu commented Feb 15, 2022

Ezra-Yu commented Mar 2, 2022