doc(pu): add logs and config documentations (#220)

opendilab · May 4, 2024 · 0210a7f · 0210a7f
1 parent 3b62403
commit 0210a7f
Show file tree

Hide file tree

Showing 6 changed files with 424 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -211,10 +211,12 @@ python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
 
 ## Customization Documentation
 
-For those looking to tailor environments and algorithms, we offer comprehensive guides:
+For those interested in customizing environments and algorithms, we provide relevant guides:
 
-- **Environments:** [Customize Environments](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs.md)
-- **Algorithms:** [Customize Algorithms](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos.md)
+- [Customize Environments](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs.md)
+- [Customize Algorithms](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos.md)
+- [How to Set Configuration Files?](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/config/config.md)
+- [Logging and Monitoring System](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/logs/logs.md)
 
 Should you have any questions, feel free to contact us for support.
 

diff --git a/README.zh.md b/README.zh.md
@@ -194,12 +194,14 @@ python3 -u zoo/atari/config/atari_muzero_config.py
 cd LightZero
 python3 -u zoo/board_games/tictactoe/config/tictactoe_muzero_bot_mode_config.py
 ```
-## 定制化文档
+## 文档
 
-为希望定制环境和算法的用户，我们提供了全面的指南：
+为希望定制环境和算法的用户，我们提供了相应的指南：
 
-- **环境定制：** [定制环境](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs_zh.md)
-- **算法定制：** [定制算法](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos_zh.md)
+- [如何自定义环境?](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/envs/customize_envs_zh.md)
+- [如何自定义算法?](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/algos/customize_algos_zh.md)
+- [如何设置配置文件？](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/config/config_zh.md)
+- [日志系统](https://github.com/opendilab/LightZero/blob/main/docs/source/tutorials/logs/logs_zh.md)
 
 如有任何疑问，欢迎随时联系我们寻求帮助。
 

diff --git a/docs/source/tutorials/config/config.md b/docs/source/tutorials/config/config.md
@@ -0,0 +1,106 @@
+# How to Set Configuration Files in LightZero
+
+In the LightZero framework, to run a specific algorithm in a specific environment, you need to set the corresponding configuration files. The configuration files mainly consist of two parts: `main_config` and `create_config`. Among them, `main_config` defines the main parameters for running the algorithm, such as environment settings and policy settings, while `create_config` specifies the specific environment class and policy class to be used and their reference paths.
+
+To run a specific algorithm in a custom environment, you can find the default `config` file corresponding to different algorithms `<algo>` for the existing environment `<env>` under the path `zoo/<env>/config/<env>_<algo>_config`. Then, based on this, you can mainly modify the part corresponding to `env` and then perform debugging and optimization.
+
+Below, we use [atari_muzero_config.py](https://github.com/opendilab/LightZero/blob/main/zoo/atari/config/atari_muzero_config.py) as an example to explain the configuration file settings in detail.
+
+## 1. `main_config`
+
+The `main_config` dictionary contains the main parameter settings for running the algorithm, which are mainly divided into two parts: `env` and `policy`.
+
+### 1.1 Main Parameters in the `env` Part
+
+- `env_id`: Specifies the environment to be used.
+- `obs_shape`: The dimension of the environment observation.
+- `collector_env_num`: The number of parallel environments used to collect data in the experience replay collector.
+- `evaluator_env_num`: The number of parallel environments used to evaluate policy performance in the evaluator.
+- `n_evaluator_episode`: The number of episodes run by each environment in the evaluator.
+- `manager`: Specifies the type of environment manager, mainly used to control the parallelization mode of the environment.
+
+### 1.2 Main Parameters in the `policy` Part
+
+- `model`: Specifies the neural network model used by the policy, including the input dimension of the model, the number of frame stacking, the action space dimension of the model output, whether the model needs to use downsampling, whether to use self-supervised learning auxiliary loss, the action encoding type, the Normalization mode used in the network, etc.
+- `cuda`: Specifies whether to migrate the model to the GPU for training.
+- `reanalyze_noise`: Whether to introduce noise during MCTS reanalysis, which can increase exploration.
+- `env_type`: Marks the environment type faced by the MuZero algorithm. According to different environment types, the MuZero algorithm will have some differences in detail processing.
+- `game_segment_length`: The length of the sequence (game segment) used for self-play.
+- `random_collect_episode_num`: The number of randomly collected episodes, providing initial data for exploration.
+- `eps`: Exploration control parameters, including whether to use the epsilon-greedy method for control, the update method of control parameters, the starting value, the termination value, the decay rate, etc.
+- `use_augmentation`: Whether to use data augmentation.
+- `update_per_collect`: The number of updates after each data collection.
+- `batch_size`: The batch size sampled during the update.
+- `optim_type`: Optimizer type.
+- `lr_piecewise_constant_decay`: Whether to use piecewise constant learning rate decay.
+- `learning_rate`: Initial learning rate.
+- `num_simulations`: The number of simulations used in the MCTS algorithm.
+- `reanalyze_ratio`: Reanalysis coefficient, controlling the probability of reanalysis.
+- `ssl_loss_weight`: The weight of the self-supervised learning loss function.
+- `n_episode`: The number of episodes run by each environment in the parallel collector.
+- `eval_freq`: Policy evaluation frequency (measured by training steps).
+- `replay_buffer_size`: The capacity of the experience replay buffer.
+
+Two frequently changed parameter setting areas are also specially mentioned here, annotated by comments:
+
+```python
+# ==============================================================
+# begin of the most frequently changed config specified by the user
+# ==============================================================
+# These are parameters that need to be adjusted frequently based on the actual situation
+# ==============================================================
+# end of the most frequently changed config specified by the user
+# ==============================================================
+```
+
+to remind users that these parameters often need to be adjusted, such as `collector_env_num`, `num_simulations`, `update_per_collect`, `batch_size`, `max_env_step`, etc. Adjusting these parameters can optimize the algorithm performance and accelerate the training speed.
+
+## 2. `create_config`
+
+The `create_config` dictionary specifies the specific environment class and policy class to be used and their reference paths, mainly containing two parts: `env` and `policy`.
+
+### 2.1 Settings in the `env` Part
+
+```python
+env=dict(
+    type='atari_lightzero',
+    import_names=['zoo.atari.envs.atari_lightzero_env'],
+),
+```
+
+Here, `type` specifies the environment name to be used, and `env_name` specifies the reference path where the environment class is located. The predefined `atari_lightzero_env` is used here. If you want to use a custom environment class, you need to change `type` to the custom environment class name and modify the `import_names` parameter accordingly.
+
+### 2.2 Settings in the `policy` Part
+
+```python
+policy=dict(
+    type='muzero',
+    import_names=['lzero.policy.muzero'],
+),
+```
+
+Here, `type` specifies the policy name to be used, and `import_names` specifies the reference path where the policy class is located. The predefined MuZero algorithm in LightZero is used here. If you want to use a custom policy class, you need to change `type` to the custom policy class and modify the `import_names` parameter to the reference path where the custom policy is located.
+
+## 3. Running the Algorithm
+
+After completing the configuration, call the following in the `main` function:
+
+```python
+if __name__ == "__main__":
+    from lzero.entry import train_muzero
+    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
+```
+
+This will run the MuZero algorithm on the configured environment for training. `[main_config, create_config]` specifies the configuration used for training, `seed` specifies the random number seed, and `max_env_step` specifies the maximum number of environment interaction steps.
+
+## 4. Notes
+
+The above briefly introduces the methods for configuring algorithms for custom environments under the LightZero framework, and hopes to be helpful to you. Please pay attention to the following points during the configuration process:
+
+- When using a custom environment, be sure to write the environment class according to the environment interface standards defined by the LightZero framework, otherwise errors may occur.
+- Different algorithms and environments require different configuration parameters. Before configuring, you need to thoroughly understand the principles of the algorithm and the characteristics of the environment, and you can refer to relevant academic papers to set parameters reasonably.
+- If you want to run an algorithm supported by LightZero on a custom environment, you can first use the default policy configuration of the algorithm, and then optimize and adjust according to the actual training situation.
+- When configuring the number of parallel environments, the number should be set reasonably according to your computing resources to avoid problems of insufficient memory due to too many parallel environments.
+- You can use tools such as tensorboard to monitor the training situation and solve problems in time. For details, please refer to the [Log System Documentation](https://github.com/opendilab/LightZero/tree/main/docs/source/tutorials/logs/logs.md).
+
+- Wish you a smooth experience using the LightZero framework!
diff --git a/docs/source/tutorials/config/config_zh.md b/docs/source/tutorials/config/config_zh.md
@@ -0,0 +1,109 @@
+# LightZero 中如何设置配置文件？
+
+在 LightZero 框架中，针对特定环境运行特定算法需要设置相应的配置文件。
+配置文件主要包含两个部分: `main_config`和`create_config`。
+其中，`main_config`定义了算法运行的主要参数，如环境设置、策略设置等；而`create_config`则指定了要使用的具体环境类和策略类及其引用路径。
+针对自定义环境运行特定算法，您可以在 `zoo/<env>/config/<env>_<algo>_config` 路径下找到已有环境`<env>`的不同算法`<algo>`对应的默认 `config` 文件，然后在此基础上主要修改 `env` 对应的部分，然后进行调试优化。
+下面我们以 [atari_muzero_config.py](https://github.com/opendilab/LightZero/blob/main/zoo/atari/config/atari_muzero_config.py) 为例，来详细说明配置文件的设置。
+
+## 1. `main_config`
+
+`main_config`字典包含了算法运行的主要参数设置，主要分为两部分: `env`和`policy`。
+
+### 1.1 `env`部分的主要参数
+
+- `env_id`: 指定要使用的环境。
+- `obs_shape`: 环境观测的维度。
+- `collector_env_num`: 经验回放采集器(collector)中并行用于收集数据的环境数目。
+- `evaluator_env_num`: 评估器(evaluator)中并行用于评估策略性能的环境数目。 
+- `n_evaluator_episode`: 评估器中每个环境运行的episode数目。
+- `manager`: 指定环境管理器的类型，主要用于控制环境的并行化方式。
+
+### 1.2 `policy`部分的主要参数
+
+- `model`: 指定策略所使用的神经网络模型，包含模型的输入维度、叠帧数、模型输出的动作空间维度、模型是否需要使用降采样、是否使用自监督学习辅助损失、动作编码类型、网络中使用的Normalization模式等。
+- `cuda`: 指定是否将模型迁移到GPU上进行训练。
+- `reanalyze_noise`: 是否在MCTS重分析时引入噪声，可以增加探索。
+- `env_type`: 标记MuZero算法所面对的环境类型，根据不同的环境类型，MuZero算法会在细节处理上有所不同。
+- `game_segment_length`: 用于自我博弈的序列(game segment)长度。
+- `random_collect_episode_num`: 随机采集的episode数量，为探索提供初始数据。 
+- `eps`: 探索控制参数，包括是否使用epsilon-greedy方法进行控制，控制参数的更新方式、起始值、终止值、衰减速度等。
+- `use_augmentation`: 是否使用数据增强。
+- `update_per_collect`: 每次数据收集后更新的次数。
+- `batch_size`: 更新时采样的批量大小。
+- `optim_type`: 优化器类型。
+- `lr_piecewise_constant_decay`: 是否使用分段常数学习率衰减。
+- `learning_rate`: 初始学习率。
+- `num_simulations`: MCTS算法中使用的模拟次数。
+- `reanalyze_ratio`: 重分析系数，控制进行重分析的概率。
+- `ssl_loss_weight`: 自监督学习损失函数的权重。
+- `n_episode`: 并行采集器中每个环境运行的episode数量。
+- `eval_freq`: 策略评估频率(按照训练步数计)。
+- `replay_buffer_size`: 经验回放器的容量。
+
+这里还特别提到了两个易变参数设定区域，通过注释
+
+```python 
+# ==============================================================
+# begin of the most frequently changed config specified by the user
+# ==============================================================
+# 这里是需要根据实际情况经常调整的参数
+# ==============================================================
+# end of the most frequently changed config specified by the user
+# ==============================================================
+
+```
+
+标注出来，提醒用户这些参数是经常需要调整的，如`collector_env_num`，`num_simulations`，`update_per_collect`，`batch_size`，`max_env_step`等。调整这些参数可以优化算法性能，加快训练速度。
+
+## 2. `create_config`
+
+`create_config`字典指定了要使用的具体环境类和策略类及其引用路径，主要包含两个部分: `env`和`policy`。
+
+### 2.1 `env`部分的设置
+
+```python
+env=dict(
+    type='atari_lightzero',
+    import_names=['zoo.atari.envs.atari_lightzero_env'],
+),
+```
+
+其中`type`指定了要使用的环境名，`env_name`则指定了该环境类所在的引用路径。这里使用的是预定义的`atari_lightzero_env`。如果要使用自定义的环境类，则需要将`type`改为自定义环境类名，并相应修改`import_names`参数。
+
+### 2.2 `policy`部分的设置
+
+```python
+policy=dict(
+    type='muzero',
+    import_names=['lzero.policy.muzero'],
+),
+```
+
+其中`type`指定了要使用的策略名，`import_names`则指定了该策略类所在的引用路径。这里使用的是LightZero中预定义的MuZero算法。如果要使用自定义的策略类，则需要将`type`改为自定义策略类，并修改`import_names`参数为自定义策略所在的引用路径。
+
+## 3. 运行算法
+
+配置完成后，在`main`函数中调用: 
+
+```python  
+if __name__ == "__main__": 
+    from lzero.entry import train_muzero
+    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
+```
+
+即可运行MuZero算法在配置的环境上进行训练。其中`[main_config， create_config]`指定了训练使用的配置，`seed`指定了随机数种子，`max_env_step`指定了最大的环境交互步数。
+
+## 4. 注意事项
+
+以上为您简要介绍了在 LightZero 框架下针对自定义环境配置算法的方法，希望对您有所帮助。在配置过程中，请注意以下几点：
+
+- 当使用自定义环境时，请务必按照 LightZero 框架定义的环境接口标准编写环境类，否则可能引发错误。
+- 不同的算法和环境需要不同的配置参数。在配置之前，您需要详细了解算法的原理及环境的特点，可以参考相关的学术论文来合理设置参数。
+- 如果您希望在一个自定义环境上运行 LightZero 支持的算法，可以首先使用该算法的默认`policy`配置，随后根据训练的实际情况进行优化和调整。
+- 在配置并行环境的数目时，应根据您的计算资源情况来合理设定，以避免因并行环境过多而导致显存不足的问题。
+- 您可以利用 tensorboard 等工具来监控训练情况，及时发现并解决问题。具体可参考[日志系统文档](https://github.com/opendilab/LightZero/tree/main/docs/source/tutorials/logs/logs_zh.md)。
+
+祝您使用 LightZero 框架顺利！
+
+