FAQ

Frequently Asked Questions

Q1: Import packages warning

A1:

Regarding import linlink, ceph, memcache, redis related warnings displayed on the command line when running DI-engine, generally users can ignore it, and DI-engine will automatically search for corresponding alternative libraries or code implementations during import.

Q2: Cannot use DI-engine command line tool (CLI) after installation

A2:

pip with -e flag might sometimes make CLI not available. Generally, non-developers do not need to install with -e flag, removing the flag and reinstall is sufficient.
Part of the operating environment will install the CLI in the user directory, you need to verify whether the CLI installation directory is in the user's environment variable (such as $PATH in Linux).

Q3: "No permission" error occurred during installation

A3:

Due to the lack of corresponding permissions in some operating environments, "Permission denied" may appear during pip installation. The specific reasons and solutions are as follows:

pip with --user flag and install in user's directory
Move the .git folder in the root directory out, execute the pip installation command, and then move it back. For specific reasons, see pypa/pip#4525

Q4: How to set the relevant operating parameters of `SyncSubprocessEnvManager`

A4:

Add manager field to the env field in cfg file, you can specify whether to use shared_memory as well as the context of multiprocessing launch. The following code provides a simple example. For detailed parameter information, please refer to SyncSubprocessEnvManager.

config = dict(
    env=dict(
        manager=dict(shared_memory=False)
    )
)

Q5: How to adjust the learning rate

A5

Add lr_scheduler module in the entry file. You can adjust the learning rate by calling torch.optim.lr_scheduler (refer to https://pytorch.org/docs/stable/optim.html) and apply the scheduler.step() to update the learning rate after optimizer’s update. The following code provides a simple example. For more detail, see demo: https://github.com/opendilab/DI-engine/commit/9cad6575e5c00036aba6419f95cdce0e7342630f.

from torch.optim.lr_scheduler import LambdaLR

...

# Set up RL Policy
policy = DDPGPolicy(cfg.policy, model=model)
# Set up lr_scheduler, the optimizer attribute will be different in different policy.
# For example, in DDPGPolicy the attribute is 'optimizer_actor', but in DQNPolicy the attribute is 'optimizer'.
lr_scheduler = LambdaLR(
    policy.learn_mode.get_attribute('optimizer_actor'), lr_lambda=lambda iters: min(1.0, 0.5 + 0.5 * iters / 1000)
)

...

# Train
    for i in range(cfg.policy.learn.update_per_collect):
        ...
        learner.train(train_data, collector.envstep)
        lr_scheduler.step()

The curve of learning rate is shown in the figure below

Q6: How to understand the printed [EVALUATOR] information

A6:

We print out the evaluation information of evaluator in interaction_serial_evaluator.py , including env, final reward, current episode, which represent the eval_env index (env_id) corresponding to the current completed episode (in which timestep.done=True), the final reward of the completed episode, and how many episodes it was evaluated by the evaluator, respectively. A typical demonstration log is shown in the figure below:

In some cases, different evaluator environment (abbreviated as eval_env) in the evaluator may collect episodes with different lengths. For example, suppose we collect 16 episodes through evaluator, but only have 5 eval_env, i.e. setting n_evaluator_episode=16, evaluator_env_num=5 in config, If we do not limit the number of evaluation episodes in each eval_env, it is likely that we will get many episodes with short lengths. As a result, the average reward obtained in this evaluation phase will be biased and cannot fully reflect the performance of the current policy (Only reflects the performance on episodes with shorter lengths).

To address this issue, we propose to utilize the VectorEvalMonitor class. In this class, we averagely specify the number of episodes each eval_env needs to evaluate in here, e.g. if n_evaluator_episode=16 and evaluator_env_num=8, then only 2 episodes of each eval_env will be added into statistics. For the specific meaning of each method of VectorEvalMonitor, please refer to the annotations in class VectorEvalMonitor.

It is worth noting that when a certain eval_env of the evaluator completes the number of each_env_episode[i] episodes, since the reset of the eval_env is controlled by env_manager automatically, the certain eval_env will continue to run until exiting the whole evaluation phase. We utilize VectorEvalMonitor to control the termination/exiting of the evaluation phase, only if eval_monitor.is_finished() is True , i.e. the evaluator completes all evaluation episodes (n_evaluator_episode in config), the evaluator will exit the evaluation phase. Thus there may be a case where the corresponding log information of an eval_env is still repeated even if it finishes the evaluation of each_env_episode[i] episodes, which do not adversely affect the evaluation results, so the users don't need to be worried about these repeated logs.

Q7: Is there any documentation for the config file in DI-engine? How can I set the fields in the config file to control training termination?

A7:

For a detailed introduction to the configuration file system in DI-engine, please refer to the Configuration File System Documentation. In DI-engine, there are generally three types of termination settings:

When the pre-set stop value (modified in the config file) is reached, i.e., when the evaluation episode reward mean is greater than or equal to the stop value.
When the maximum number of environment interaction steps (env step) is reached, which can be modified in the training entry.
- Example 1: https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry.py#L24
- Example 2: https://github.com/opendilab/DI-engine/blob/main/ding/example/sac.py#L41 (setting the value of max_env_step)
When the maximum number of training iterations (train iter) is reached, which can be modified in the training entry.
- Example 1: https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry.py#L23
- Example 2: https://github.com/opendilab/DI-engine/blob/main/ding/example/sac.py#L41

Furthermore, for specific field descriptions in the config file, you can refer to the comments in the default config section of each class. For example:

DQN default config
SAC default config

Reinforcement learning-related configuration files can be complex. If there are any details you don't understand, please feel free to ask!

Q8: DI-engine installation-related questions.

Q8-1 Can DI-engine be installed using pip?

A8-1:

Yes, you can directly use the command pip install DI-engine for installation. For more details, please refer to the Installation Guide - Installing Released Versions.
Q8-2 Does the installation of DI-engine automatically install PyTorch? What happens if PyTorch is already installed on the computer?

A8-2:

If PyTorch has been installed in the current environment and its version meets the requirements during the DI-engine installation, DI-engine will directly install other required packages. If PyTorch has not been installed previously, DI-engine will install the CPU version of PyTorch by default. It is recommended to install the appropriate version of PyTorch beforehand. Otherwise, DI-engine will install the CPU version of PyTorch by default. For specific installation steps, please refer to the Installation Guide.
Q8-3 Which version of gym does DI-engine correspond to? Does it not support the latest gym version?

A8-3:

Currently, DI-engine corresponds to gym version 0.25.1 (2023.5.5). For the compatibility of various dependencies with DI-engine, you can refer to the setup.py file.
Q8-4

A8-4:

You can clone the repository from GitHub and navigate to the respective folder. Then run pip install -e . as follows:
```
git clone https://github.com/opendilab/DI-engine.git
cd DI-engine
pip install .
```

Q9: What does "episode" refer to in DI-engine?

A9:

It is not an original concept in reinforcement learning but originates from gaming. It is similar to the concept of a "level" and refers to the process where an agent starts playing a game until it reaches a game over or successfully completes the game. In essence, it represents a complete cycle of interaction with the environment, such as a game session or a game of Go.

Q10: Does DI-engine support the self-play mechanism?

A10:

Yes, it is supported. The simplest examples can be found in the league demo and slime volleyball in the dizoo directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

FAQ

Frequently Asked Questions

Q1: Import packages warning

Q2: Cannot use DI-engine command line tool (CLI) after installation

Q3: "No permission" error occurred during installation

Q4: How to set the relevant operating parameters of `SyncSubprocessEnvManager`

Q5: How to adjust the learning rate

Q6: How to understand the printed [EVALUATOR] information

Q7: Is there any documentation for the config file in DI-engine? How can I set the fields in the config file to control training termination?

Q9: What does "episode" refer to in DI-engine?

Q10: Does DI-engine support the self-play mechanism?

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

FAQ

Frequently Asked Questions

Q1: Import packages warning

Q2: Cannot use DI-engine command line tool (CLI) after installation

Q3: "No permission" error occurred during installation

Q4: How to set the relevant operating parameters of SyncSubprocessEnvManager

Q5: How to adjust the learning rate

Q6: How to understand the printed [EVALUATOR] information

Q7: Is there any documentation for the config file in DI-engine? How can I set the fields in the config file to control training termination?

Q8: DI-engine installation-related questions.

Q9: What does "episode" refer to in DI-engine?

Q10: Does DI-engine support the self-play mechanism?

Q4: How to set the relevant operating parameters of `SyncSubprocessEnvManager`