open-mmlab · Junjun2016 · Aug 30, 2021 · Jun 17, 2021 · Jun 17, 2021 · Jun 18, 2021
diff --git a/README.md b/README.md
@@ -92,6 +92,7 @@ Supported methods:
 - [x] [PointRend (CVPR'2020)](configs/point_rend)
 - [x] [CGNet (TIP'2020)](configs/cgnet)
 - [x] [SETR (CVPR'2021)](configs/setr)
+- [x] [DPT (ArXiv' 2021)](configs/dpt)
 
 ## Installation
 

diff --git a/configs/dpt/README.md b/configs/dpt/README.md
@@ -20,15 +20,23 @@
 }
 ```
 
-## How to use ViT pretrained weights
+## Usage
 
-We convert the backbone weights from the pytorch-image-models repository (https://github.com/rwightman/pytorch-image-models) with `tools/model_converters/vit_convert.py`.
+To use other repositories' pre-trained models, it is necessary to convert keys.
 
-You may follow below steps to start DPT training preparation:
+We provide a script [`vit2mmseg.py`](../../tools/model_converters/vit2mmseg.py) in the tools directory to convert the key of models from [timm](https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py) to MMSegmentation style.
 
-1. Download ViT pretrained weights (Suggest put in `pretrain/`);
-2. Run convert script to convert official pretrained weights: `python tools/model_converters/vit_convert.py pretrain/vit-timm.pth pretrain/vit-mmseg.pth`;
-3. Modify `pretrained` of VisionTransformer model config, for example, `pretrained` of `dpt_vit-b16.py` is set to `pretrain/vit-mmseg.pth`;
+```shell
+python tools/model_converters/vit2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH}
+```
+
+E.g.
+
+```shell
+python tools/model_converters/vit2mmseg.py https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_base_p16_224-80ecf9dd.pth pretrain/jx_vit_base_p16_224-80ecf9dd.pth
+```
+
+This script convert model from `PRETRAIN_PATH` and store the converted model in `STORE_PATH`.
 
 ## Results and models
 

diff --git a/mmseg/models/decode_heads/dpt_head.py b/mmseg/models/decode_heads/dpt_head.py
@@ -21,14 +21,16 @@ class ReassembleBlocks(BaseModule):
             Default: [96, 192, 384, 768].
         readout_type (str): Type of readout operation. Default: 'ignore'.
         patch_size (int): The patch size. Default: 16.
+        init_cfg (dict, optional): Initialization config dict. Default: None.
     """
 
     def __init__(self,
                  in_channels=768,
                  out_channels=[96, 192, 384, 768],
                  readout_type='ignore',
-                 patch_size=16):
-        super(ReassembleBlocks, self).__init__()
+                 patch_size=16,
+                 init_cfg=None):
+        super(ReassembleBlocks, self).__init__(init_cfg)
 
         assert readout_type in ['ignore', 'add', 'project']
         self.readout_type = readout_type
@@ -170,15 +172,17 @@ class FeatureFusionBlock(BaseModule):
             Default: False.
         align_corners (bool): align_corner setting for bilinear upsample.
             Default: True.
+        init_cfg (dict, optional): Initialization config dict. Default: None.
     """
 
     def __init__(self,
                  in_channels,
                  act_cfg,
                  norm_cfg,
                  expand=False,
-                 align_corners=True):
-        super(FeatureFusionBlock, self).__init__()
+                 align_corners=True,
+                 init_cfg=None):
+        super(FeatureFusionBlock, self).__init__(init_cfg)
 
         self.in_channels = in_channels
         self.expand = expand