Skip to content

Commit

Permalink
[Docs] Enhance docs (#2555)
Browse files Browse the repository at this point in the history
  • Loading branch information
Tau-J committed Jul 20, 2023
1 parent f0311df commit 5a9487d
Show file tree
Hide file tree
Showing 6 changed files with 499 additions and 44 deletions.
18 changes: 18 additions & 0 deletions docs/en/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,15 @@ Note that some of the demo scripts in MMPose require [MMDetection](https://githu
mim install "mmdet>=3.1.0"
```

```{note}
Here are the version correspondences between mmdet, mmpose and mmcv:
- mmdet 2.x <=> mmpose 0.x <=> mmcv 1.x
- mmdet 3.x <=> mmpose 1.x <=> mmcv 2.x
If you encounter version incompatibility issues, please check the correspondence using `pip list | grep mm` and upgrade or downgrade the dependencies accordingly.
```

## Best Practices

### Build MMPose from source
Expand Down Expand Up @@ -141,6 +150,15 @@ The `demo.jpg` can be downloaded from [Github](https://raw.githubusercontent.com

The inference results will be a list of `PoseDataSample`, and the predictions are in the `pred_instances`, indicating the detected keypoint locations and scores.

```{note}
MMCV version should match PyTorch version strictly. If you encounter the following issues:
- No module named 'mmcv.ops'
- No module named 'mmcv._ext'
It means that the current PyTorch version does not match the CUDA version. You can check the CUDA version using `nvidia-smi`, and it should match the `+cu1xx` in PyTorch version in `pip list | grep torch`. Otherwise, you need to uninstall PyTorch and reinstall it, then reinstall MMCV (the installation order **CAN NOT** be swapped).
```

## Customize Installation

### CUDA versions
Expand Down
34 changes: 19 additions & 15 deletions docs/en/user_guides/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,22 @@ Here is the description of General configuration:
# General
default_scope = 'mmpose'
default_hooks = dict(
timer=dict(type='IterTimerHook'), # time the data processing and model inference
logger=dict(type='LoggerHook', interval=50), # interval to print logs
param_scheduler=dict(type='ParamSchedulerHook'), # update lr
# time the data processing and model inference
timer=dict(type='IterTimerHook'),
# interval to print logs,50 iters by default
logger=dict(type='LoggerHook', interval=50),
# update lr according to the lr scheduler
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(
type='CheckpointHook', interval=1, save_best='coco/AP', # interval to save ckpt
# interval to save ckpt
# e.g.
# save_best='coco/AP' means save the best ckpt according to coco/AP of CocoMetric
# save_best='PCK' means save the best ckpt according to PCK of PCKAccuracy
type='CheckpointHook', interval=1, save_best='coco/AP',

# rule to judge the metric
# 'greater' means the larger the better
# 'less' means the smaller the better
rule='greater'), # rule to judge the metric
sampler_seed=dict(type='DistSamplerSeedHook')) # set the distributed seed
env_cfg = dict(
Expand All @@ -135,23 +146,16 @@ log_processor = dict( # Format, interval to log
log_level = 'INFO' # The level of logging
```

```{note}
We now support two visualizer backends: LocalVisBackend and TensorboardVisBackend, the former is for local visualization and the latter is for Tensorboard visualization. You can choose according to your needs. See [Train and Test](./train_and_test.md) for details.
```

General configuration is stored alone in the `$MMPOSE/configs/_base_`, and inherited by doing:

```Python
_base_ = ['../../../_base_/default_runtime.py'] # take the config file as the starting point of the relative path
```

```{note}
CheckpointHook:
- save_best: `'coco/AP'` for `CocoMetric`, `'PCK'` for `PCKAccuracy`
- max_keep_ckpts: the maximum checkpoints to keep. Defaults to -1, which means unlimited.
Example:
`default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater', max_keep_ckpts=1))`
```

### Data

Data configuration refers to the data processing related settings, mainly including:
Expand Down
27 changes: 25 additions & 2 deletions docs/en/user_guides/train_and_test.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ python tools/train.py ${CONFIG_FILE} [ARGS]

```{note}
By default, MMPose prefers GPU to CPU. If you want to train a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
```

```shell
Expand Down Expand Up @@ -214,6 +213,31 @@ python ./tools/train.py \

- `randomness.deterministic=True`, set the deterministic option for `cuDNN` backend, i.e., set `torch.backends.cudnn.deterministic` to `True` and `torch.backends.cudnn.benchmark` to `False`. Defaults to `False`. See [Pytorch Randomness](https://pytorch.org/docs/stable/notes/randomness.html) for more details.

## Training Log

During training, the training log will be printed in the console as follows:

```shell
07/14 08:26:50 - mmengine - INFO - Epoch(train) [38][ 6/38] base_lr: 5.148343e-04 lr: 5.148343e-04 eta: 0:15:34 time: 0.540754 data_time: 0.394292 memory: 3141 loss: 0.006220 loss_kpt: 0.006220 acc_pose: 1.000000
```

The training log contains the following information:

- `07/14 08:26:50`: The current time.
- `mmengine`: The name of the program.
- `INFO` or `WARNING`: The log level.
- `Epoch(train)`: The current training stage. `train` means the training stage, `val` means the validation stage.
- `[38][ 6/38]`: The current epoch and the current iteration.
- `base_lr`: The base learning rate.
- `lr`: The current (real) learning rate.
- `eta`: The estimated time of arrival.
- `time`: The elapsed time (minutes) of the current iteration.
- `data_time`: The elapsed time (minutes) of data processing (i/o and transforms).
- `memory`: The GPU memory (MB) allocated by the program.
- `loss`: The total loss value of the current iteration.
- `loss_kpt`: The loss value you passed in head module.
- `acc_pose`: The accuracy value you passed in head module.

## Visualize training process

Monitoring the training process is essential for understanding the performance of your model and making necessary adjustments. In this section, we will introduce two methods to visualize the training process of your MMPose model: TensorBoard and the MMEngine Visualizer.
Expand Down Expand Up @@ -261,7 +285,6 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [ARGS]

```{note}
By default, MMPose prefers GPU to CPU. If you want to test a model on CPU, please empty `CUDA_VISIBLE_DEVICES` or set it to -1 to make GPU invisible to the program.
```

```shell
Expand Down
18 changes: 18 additions & 0 deletions docs/zh_cn/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,15 @@ mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
```

```{note}
新旧版本 mmpose、mmdet、mmcv 的对应关系为:
- mmdet 2.x <=> mmpose 0.x <=> mmcv 1.x
- mmdet 3.x <=> mmpose 1.x <=> mmcv 2.x
如果遇到版本不兼容的问题,请使用 `pip list | grep mm` 检查对应关系后,升级或降级相关依赖。
```

## 最佳实践

根据具体需求,我们支持两种安装模式: 从源码安装(推荐)和作为 Python 包安装
Expand Down Expand Up @@ -141,6 +150,15 @@ results = inference_topdown(model, 'demo.jpg')
示例图片 `demo.jpg` 可以从 [Github](https://raw.githubusercontent.com/open-mmlab/mmpose/main/tests/data/coco/000000000785.jpg) 下载。
推理结果是一个 `PoseDataSample` 列表,预测结果将会保存在 `pred_instances` 中,包括检测到的关键点位置和置信度。

```{note}
MMCV 版本与 PyTorch 版本需要严格对应,如果遇到如下问题:
- No module named 'mmcv.ops'
- No module named 'mmcv._ext'
说明当前环境中的 PyTorch 版本与 CUDA 版本不匹配。你可以通过 `nvidia-smi` 查看 CUDA 版本,需要与 `pip list | grep torch` 中 PyTorch 的 `+cu1xx` 对应,否则,你需要先卸载 PyTorch 并重新安装,然后重新安装 MMCV(这里的安装顺序**不可以**交换)。
```

## 自定义安装

### CUDA 版本
Expand Down
69 changes: 44 additions & 25 deletions docs/zh_cn/user_guides/configs.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# 配置文件
# 如何看懂配置文件

MMPose 使用 Python 文件作为配置文件,将模块化设计和继承设计结合到配置系统中,便于进行各种实验。

Expand Down Expand Up @@ -119,42 +119,61 @@ python tools/analysis/print_config.py /PATH/TO/CONFIG
# 通用配置
default_scope = 'mmpose'
default_hooks = dict(
timer=dict(type='IterTimerHook'), # 迭代时间统计,包括数据耗时和模型耗时
logger=dict(type='LoggerHook', interval=50), # 日志打印间隔
param_scheduler=dict(type='ParamSchedulerHook'), # 用于调度学习率更新
# 迭代时间统计,包括数据耗时和模型耗时
timer=dict(type='IterTimerHook'),

# 日志打印间隔,默认每 50 iters 打印一次
logger=dict(type='LoggerHook', interval=50),

# 用于调度学习率更新的 Hook
param_scheduler=dict(type='ParamSchedulerHook'),

checkpoint=dict(
type='CheckpointHook', interval=1, save_best='coco/AP', # ckpt保存间隔,最优ckpt参考指标
rule='greater'), # 最优ckpt指标评价规则
sampler_seed=dict(type='DistSamplerSeedHook')) # 分布式随机种子设置
# ckpt 保存间隔,最优 ckpt 参考指标。
# 例如:
# save_best='coco/AP' 代表以 coco/AP 作为最优指标,对应 CocoMetric 评测器的 AP 指标
# save_best='PCK' 代表以 PCK 作为最优指标,对应 PCKAccuracy 评测器的 PCK 指标
# 更多指标请前往 mmpose/evaluation/metrics/
type='CheckpointHook', interval=1, save_best='coco/AP',

# 最优 ckpt 保留规则,greater 代表越大越好,less 代表越小越好
rule='greater'),

# 分布式随机种子设置 Hook
sampler_seed=dict(type='DistSamplerSeedHook'))
env_cfg = dict(
cudnn_benchmark=False, # cudnn benchmark开关
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # opencv多线程配置
dist_cfg=dict(backend='nccl')) # 分布式训练后端设置
vis_backends = [dict(type='LocalVisBackend')] # 可视化器后端设置
visualizer = dict( # 可视化器设置
# cudnn benchmark 开关,用于加速训练,但会增加显存占用
cudnn_benchmark=False,

# opencv 多线程配置,用于加速数据加载,但会增加显存占用
# 默认为 0,代表使用单线程
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),

# 分布式训练后端设置,支持 nccl 和 gloo
dist_cfg=dict(backend='nccl'))

# 可视化器后端设置,默认为本地可视化
vis_backends = [dict(type='LocalVisBackend')]

# 可视化器设置
visualizer = dict(
type='PoseLocalVisualizer',
vis_backends=[dict(type='LocalVisBackend')],
name='visualizer')
log_processor = dict( # 训练日志格式、间隔
type='LogProcessor', window_size=50, by_epoch=True, num_digits=6)
log_level = 'INFO' # 日志记录等级
```

通用配置一般单独存放到`$MMPOSE/configs/_base_`目录下,通过如下方式进行继承:

```Python
_base_ = ['../../../_base_/default_runtime.py'] # 以运行时的config文件位置为相对路径起点
# 日志记录等级,INFO 代表记录训练日志,WARNING 代表只记录警告信息,ERROR 代表只记录错误信息
log_level = 'INFO'
```

```{note}
CheckpointHook:
- save_best: `'coco/AP'` 用于 `CocoMetric`, `'PCK'` 用于 `PCKAccuracy`
- max_keep_ckpts: 最大保留ckpt数量,默认为-1,代表不限制
可视化器后端设置支持 LocalVisBackend 和 TensorboardVisBackend,前者用于本地可视化,后者用于 Tensorboard 可视化,你可以根据需要进行选择。详情见 [训练与测试](./train_and_test.md) 的 【可视化训练进程】。
```

样例:
通用配置一般单独存放到 `$MMPOSE/configs/_base_` 目录下,通过如下方式进行继承:

`default_hooks = dict(checkpoint=dict(save_best='PCK', rule='greater', max_keep_ckpts=1))`
```Python
_base_ = ['../../../_base_/default_runtime.py'] # 以运行时的config文件位置为相对路径起点
```

### 数据配置
Expand Down
Loading

0 comments on commit 5a9487d

Please sign in to comment.