[Docs] Add a document about debug tricks (#938)

* fix typo * [Docs] Add debug skills * minor fix * refine * rename debug_skills to debug_tricks * refine * Update docs/en/common_usage/debug_tricks.md
open-mmlab · Feb 21, 2023 · 67acdbe · 67acdbe
1 parent 4861f03
commit 67acdbe
Show file tree

Hide file tree

Showing 5 changed files with 59 additions and 3 deletions.
diff --git a/docs/en/common_usage/debug_tricks.md b/docs/en/common_usage/debug_tricks.md
@@ -0,0 +1,3 @@
+# Debug Tricks
+
+Coming soon. Please refer to [chinese documentation](https://mmengine.readthedocs.io/zh_CN/latest/common_usage/debug_tricks.html).
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -23,6 +23,7 @@ You can switch between Chinese and English documents in the lower-left corner of
    common_usage/resume_training.md
    common_usage/speed_up_training.md
    common_usage/save_gpu_memory.md
+   common_usage/debug_tricks.md
    common_usage/epoch_to_iter.md
 
 .. toctree::

diff --git a/docs/zh_cn/common_usage/debug_tricks.md b/docs/zh_cn/common_usage/debug_tricks.md
@@ -0,0 +1,51 @@
+# 调试技巧
+
+## 设置数据集的长度
+
+在调试代码的过程中，有时需要训练几个 epoch，例如调试验证过程或者权重的保存是否符合期望。然而如果数据集太大，需要花费较长时间才能训完一个 epoch，这种情况下可以设置数据集的长度。注意，只有继承自 [BaseDataset](mmengine.dataset.BaseDataset) 的 Dataset 才支持这个功能，`BaseDataset` 的用法可阅读 [数据集基类（BASEDATASET）](../advanced_tutorials/basedataset.md)。
+
+以 `MMClassification` 为例（参考[文档](https://mmclassification.readthedocs.io/zh_CN/dev-1.x/get_started.html#id2)安装 MMClassification）。
+
+启动训练命令
+
+```bash
+python tools/train.py configs/resnet/resnet18_8xb16_cifar10.py
+```
+
+下面是训练的部分日志，其中 `3125` 表示需要迭代的次数。
+
+```
+02/20 14:43:11 - mmengine - INFO - Epoch(train)   [1][ 100/3125]  lr: 1.0000e-01  eta: 6:12:01  time: 0.0149  data_time: 0.0003  memory: 214  loss: 2.0611
+02/20 14:43:13 - mmengine - INFO - Epoch(train)   [1][ 200/3125]  lr: 1.0000e-01  eta: 4:23:08  time: 0.0154  data_time: 0.0003  memory: 214  loss: 2.0963
+02/20 14:43:14 - mmengine - INFO - Epoch(train)   [1][ 300/3125]  lr: 1.0000e-01  eta: 3:46:27  time: 0.0146  data_time: 0.0003  memory: 214  loss: 1.9858
+```
+
+关掉训练，然后修改 [configs/_base_/datasets/cifar10_bs16.py](https://github.com/open-mmlab/mmclassification/blob/dev-1.x/configs/_base_/datasets/cifar10_bs16.py) 中的 `dataset` 字段，设置 `indices=5000`。
+
+```python
+train_dataloader = dict(
+    batch_size=16,
+    num_workers=2,
+    dataset=dict(
+        type=dataset_type,
+        data_prefix='data/cifar10',
+        test_mode=False,
+        indices=5000,  # 设置 indices=5000，表示每个 epoch 只迭代 5000 个样本
+        pipeline=train_pipeline),
+    sampler=dict(type='DefaultSampler', shuffle=True),
+)
+```
+
+重新启动训练
+
+```bash
+python tools/train.py configs/resnet/resnet18_8xb16_cifar10.py
+```
+
+可以看到，迭代次数变成了 `313`，相比原先，这样能够更快跑完一个 epoch。
+
+```
+02/20 14:44:58 - mmengine - INFO - Epoch(train)   [1][100/313]  lr: 1.0000e-01  eta: 0:31:09  time: 0.0154  data_time: 0.0004  memory: 214  loss: 2.1852
+02/20 14:44:59 - mmengine - INFO - Epoch(train)   [1][200/313]  lr: 1.0000e-01  eta: 0:23:18  time: 0.0143  data_time: 0.0002  memory: 214  loss: 2.0424
+02/20 14:45:01 - mmengine - INFO - Epoch(train)   [1][300/313]  lr: 1.0000e-01  eta: 0:20:39  time: 0.0143  data_time: 0.0003  memory: 214  loss: 1.814
+```
diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
@@ -24,6 +24,7 @@
    common_usage/speed_up_training.md
    common_usage/save_gpu_memory.md
    common_usage/set_random_seed.md
+   common_usage/debug_tricks.md
    common_usage/model_analysis.md
    common_usage/set_interval.md
    common_usage/epoch_to_iter.md

diff --git a/mmengine/dataset/base_dataset.py b/mmengine/dataset/base_dataset.py
@@ -714,9 +714,9 @@ def _get_unserialized_subset(self, indices: Union[Sequence[int],
         Args:
             indices (int or Sequence[int]): If type of indices is int,
                 indices represents the first or last few data of data
-                information. If  indices of indices is Sequence, indices
-                represents the target data information index which consist
-                of subset data information.
+                information. If type of indices is Sequence, indices represents
+                the target data information index which consist of subset data
+                information.
 
         Returns:
             Tuple[np.ndarray, np.ndarray]: subset of data information.