[Docs] Enhance docs (#2555)

open-mmlab · Jul 20, 2023 · 5a9487d · 5a9487d
1 parent f0311df
commit 5a9487d
Show file tree

Hide file tree

Showing 6 changed files with 499 additions and 44 deletions.
diff --git a/docs/en/installation.md b/docs/en/installation.md
@@ -68,6 +68,15 @@ Note that some of the demo scripts in MMPose require [MMDetection](https://githu
 mim install "mmdet>=3.1.0"
 ```
 
+```{note}
+Here are the version correspondences between mmdet, mmpose and mmcv:
+
+- mmdet 2.x <=> mmpose 0.x <=> mmcv 1.x
+- mmdet 3.x <=> mmpose 1.x <=> mmcv 2.x
+
+If you encounter version incompatibility issues, please check the correspondence using `pip list | grep mm` and upgrade or downgrade the dependencies accordingly.
+```
+
 ## Best Practices
 
 ### Build MMPose from source
@@ -141,6 +150,15 @@ The `demo.jpg` can be downloaded from [Github](https://raw.githubusercontent.com
 
 The inference results will be a list of `PoseDataSample`, and the predictions are in the `pred_instances`, indicating the detected keypoint locations and scores.
 
+```{note}
+MMCV version should match PyTorch version strictly. If you encounter the following issues:
+
+- No module named 'mmcv.ops'
+- No module named 'mmcv._ext'
+
+It means that the current PyTorch version does not match the CUDA version. You can check the CUDA version using `nvidia-smi`, and it should match the `+cu1xx` in PyTorch version in `pip list | grep torch`. Otherwise, you need to uninstall PyTorch and reinstall it, then reinstall MMCV (the installation order **CAN NOT** be swapped).
+```
+
 ## Customize Installation
 
 ### CUDA versions

diff --git a/docs/en/user_guides/configs.md b/docs/en/user_guides/configs.md
@@ -114,11 +114,22 @@ Here is the description of General configuration:
 # General
 default_scope = 'mmpose'
 default_hooks = dict(
-    timer=dict(type='IterTimerHook'), # time the data processing and model inference
-    logger=dict(type='LoggerHook', interval=50), # interval to print logs
-    param_scheduler=dict(type='ParamSchedulerHook'), # update lr
+    # time the data processing and model inference
+    timer=dict(type='IterTimerHook'),
+    # interval to print logs，50 iters by default
+    logger=dict(type='LoggerHook', interval=50),
+    # update lr according to the lr scheduler
+    param_scheduler=dict(type='ParamSchedulerHook'),
     checkpoint=dict(
-        type='CheckpointHook', interval=1, save_best='coco/AP', # interval to save ckpt
+        # interval to save ckpt
+        # e.g.
+        # save_best='coco/AP' means save the best ckpt according to coco/AP of CocoMetric
+        # save_best='PCK' means save the best ckpt according to PCK of PCKAccuracy
+        type='CheckpointHook', interval=1, save_best='coco/AP',
+
+        # rule to judge the metric
+        # 'greater' means the larger the better
+        # 'less' means the smaller the better
         rule='greater'), # rule to judge the metric
     sampler_seed=dict(type='DistSamplerSeedHook')) # set the distributed seed
 env_cfg = dict(
@@ -135,23 +146,16 @@ log_processor = dict( # Format, interval to log
 log_level = 'INFO' # The level of logging
 ```
 
+```{note}
+We now support two visualizer backends: LocalVisBackend and TensorboardVisBackend, the former is for local visualization and the latter is for Tensorboard visualization. You can choose according to your needs. See [Train and Test](./train_and_test.md) for details.
+```
+
 General configuration is stored alone in the `$MMPOSE/configs/_base_`, and inherited by doing:
 
 ```Python
 _base_ = ['../../../_base_/default_runtime.py'] # take the config file as the starting point of the relative path
 ```
 
-```{note}
-CheckpointHook:
-
-- save_best: `'coco/AP'` for `CocoMetric`, `'PCK'` for `PCKAccuracy`
-- max_keep_ckpts: the maximum checkpoints to keep. Defaults to -1, which means unlimited.
-
-Example:
-
-`default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater', max_keep_ckpts=1))`
-```
-
 ### Data
 
 Data configuration refers to the data processing related settings, mainly including:

diff --git a/docs/en/user_guides/train_and_test.md b/docs/en/user_guides/train_and_test.md
@@ -14,7 +14,6 @@ python tools/train.py ${CONFIG_FILE} [ARGS]
 
 ```{note}
 By default, MMPose prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
-
 ```
 
 ```shell
@@ -214,6 +213,31 @@ python ./tools/train.py \
 
 - `randomness.deterministic=True`, set the deterministic option for `cuDNN` backend, i.e., set `torch.backends.cudnn.deterministic` to `True` and `torch.backends.cudnn.benchmark` to `False`. Defaults to `False`. See [Pytorch Randomness](https://pytorch.org/docs/stable/notes/randomness.html) for more details.
 
+## Training Log
+
+During training, the training log will be printed in the console as follows:
+
+```shell
+07/14 08:26:50 - mmengine - INFO - Epoch(train) [38][ 6/38]  base_lr: 5.148343e-04 lr: 5.148343e-04  eta: 0:15:34  time: 0.540754  data_time: 0.394292  memory: 3141  loss: 0.006220  loss_kpt: 0.006220  acc_pose: 1.000000
+```
+
+The training log contains the following information:
+
+- `07/14 08:26:50`: The current time.
+- `mmengine`: The name of the program.
+- `INFO` or `WARNING`: The log level.
+- `Epoch(train)`: The current training stage. `train` means the training stage, `val` means the validation stage.
+- `[38][ 6/38]`: The current epoch and the current iteration.
+- `base_lr`: The base learning rate.
+- `lr`: The current (real) learning rate.
+- `eta`: The estimated time of arrival.
+- `time`: The elapsed time (minutes) of the current iteration.
+- `data_time`: The elapsed time (minutes) of data processing (i/o and transforms).
+- `memory`: The GPU memory (MB) allocated by the program.
+- `loss`: The total loss value of the current iteration.
+- `loss_kpt`: The loss value you passed in head module.
+- `acc_pose`: The accuracy value you passed in head module.
+
 ## Visualize training process
 
 Monitoring the training process is essential for understanding the performance of your model and making necessary adjustments. In this section, we will introduce two methods to visualize the training process of your MMPose model: TensorBoard and the MMEngine Visualizer.
@@ -261,7 +285,6 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]
 
 ```{note}
 By default, MMPose prefers GPU to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
-
 ```
 
 ```shell

diff --git a/docs/zh_cn/installation.md b/docs/zh_cn/installation.md
@@ -66,6 +66,15 @@ mim install "mmcv>=2.0.1"
 mim install "mmdet>=3.1.0"
 ```
 
+```{note}
+新旧版本 mmpose、mmdet、mmcv 的对应关系为：
+
+- mmdet 2.x <=> mmpose 0.x <=> mmcv 1.x
+- mmdet 3.x <=> mmpose 1.x <=> mmcv 2.x
+
+如果遇到版本不兼容的问题，请使用 `pip list | grep mm` 检查对应关系后，升级或降级相关依赖。
+```
+
 ## 最佳实践
 
 根据具体需求，我们支持两种安装模式: 从源码安装（推荐）和作为 Python 包安装
@@ -141,6 +150,15 @@ results = inference_topdown(model, 'demo.jpg')
 示例图片 `demo.jpg` 可以从 [Github](https://raw.githubusercontent.com/open-mmlab/mmpose/main/tests/data/coco/000000000785.jpg) 下载。
 推理结果是一个 `PoseDataSample` 列表，预测结果将会保存在 `pred_instances` 中，包括检测到的关键点位置和置信度。
 
+```{note}
+MMCV 版本与 PyTorch 版本需要严格对应，如果遇到如下问题：
+
+- No module named 'mmcv.ops'
+- No module named 'mmcv._ext'
+
+说明当前环境中的 PyTorch 版本与 CUDA 版本不匹配。你可以通过 `nvidia-smi` 查看 CUDA 版本，需要与 `pip list | grep torch` 中 PyTorch 的 `+cu1xx` 对应，否则，你需要先卸载 PyTorch 并重新安装，然后重新安装 MMCV（这里的安装顺序**不可以**交换）。
+```
+
 ## 自定义安装
 
 ### CUDA 版本

diff --git a/docs/zh_cn/user_guides/configs.md b/docs/zh_cn/user_guides/configs.md
@@ -1,4 +1,4 @@
-# 配置文件
+# 如何看懂配置文件
 
 MMPose 使用 Python 文件作为配置文件，将模块化设计和继承设计结合到配置系统中，便于进行各种实验。
 
@@ -119,42 +119,61 @@ python tools/analysis/print_config.py /PATH/TO/CONFIG
 # 通用配置
 default_scope = 'mmpose'
 default_hooks = dict(
-    timer=dict(type='IterTimerHook'), # 迭代时间统计，包括数据耗时和模型耗时
-    logger=dict(type='LoggerHook', interval=50), # 日志打印间隔
-    param_scheduler=dict(type='ParamSchedulerHook'), # 用于调度学习率更新
+    # 迭代时间统计，包括数据耗时和模型耗时
+    timer=dict(type='IterTimerHook'),
+
+    # 日志打印间隔，默认每 50 iters 打印一次
+    logger=dict(type='LoggerHook', interval=50),
+
+    # 用于调度学习率更新的 Hook
+    param_scheduler=dict(type='ParamSchedulerHook'),
+
     checkpoint=dict(
-        type='CheckpointHook', interval=1, save_best='coco/AP', # ckpt保存间隔，最优ckpt参考指标
-        rule='greater'), # 最优ckpt指标评价规则
-    sampler_seed=dict(type='DistSamplerSeedHook')) # 分布式随机种子设置
+        # ckpt 保存间隔，最优 ckpt 参考指标。
+        # 例如：
+        # save_best='coco/AP' 代表以 coco/AP 作为最优指标，对应 CocoMetric 评测器的 AP 指标
+        # save_best='PCK' 代表以 PCK 作为最优指标，对应 PCKAccuracy 评测器的 PCK 指标
+        # 更多指标请前往 mmpose/evaluation/metrics/
+        type='CheckpointHook', interval=1, save_best='coco/AP',
+
+        # 最优 ckpt 保留规则，greater 代表越大越好，less 代表越小越好
+        rule='greater'),
+
+    # 分布式随机种子设置 Hook
+    sampler_seed=dict(type='DistSamplerSeedHook'))
 env_cfg = dict(
-    cudnn_benchmark=False, # cudnn benchmark开关
-    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # opencv多线程配置
-    dist_cfg=dict(backend='nccl')) # 分布式训练后端设置
-vis_backends = [dict(type='LocalVisBackend')] # 可视化器后端设置
-visualizer = dict( # 可视化器设置
+    # cudnn benchmark 开关，用于加速训练，但会增加显存占用
+    cudnn_benchmark=False,
+
+    # opencv 多线程配置，用于加速数据加载，但会增加显存占用
+    # 默认为 0，代表使用单线程
+    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
+
+    # 分布式训练后端设置，支持 nccl 和 gloo
+    dist_cfg=dict(backend='nccl'))
+
+# 可视化器后端设置，默认为本地可视化
+vis_backends = [dict(type='LocalVisBackend')]
+
+# 可视化器设置
+visualizer = dict(
     type='PoseLocalVisualizer',
     vis_backends=[dict(type='LocalVisBackend')],
     name='visualizer')
 log_processor = dict( # 训练日志格式、间隔
     type='LogProcessor', window_size=50, by_epoch=True, num_digits=6)
-log_level = 'INFO' # 日志记录等级
-```
-
-通用配置一般单独存放到`$MMPOSE/configs/_base_`目录下，通过如下方式进行继承：
-
-```Python
-_base_ = ['../../../_base_/default_runtime.py'] # 以运行时的config文件位置为相对路径起点
+# 日志记录等级，INFO 代表记录训练日志，WARNING 代表只记录警告信息，ERROR 代表只记录错误信息
+log_level = 'INFO'
 ```
 
 ```{note}
-CheckpointHook:
-
-- save_best: `'coco/AP'` 用于 `CocoMetric`, `'PCK'` 用于 `PCKAccuracy`
-- max_keep_ckpts: 最大保留ckpt数量，默认为-1，代表不限制
+可视化器后端设置支持 LocalVisBackend 和 TensorboardVisBackend，前者用于本地可视化，后者用于 Tensorboard 可视化，你可以根据需要进行选择。详情见 [训练与测试](./train_and_test.md) 的 【可视化训练进程】。
+```
 
-样例:
+通用配置一般单独存放到 `$MMPOSE/configs/_base_` 目录下，通过如下方式进行继承：
 
-`default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater', max_keep_ckpts=1))`
+```Python
+_base_ = ['../../../_base_/default_runtime.py'] # 以运行时的config文件位置为相对路径起点
 ```
 
 ### 数据配置