Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs fix and new feature request for gfootball #335

Closed
7 of 11 tasks
ErlebnisW opened this issue May 23, 2022 · 6 comments
Closed
7 of 11 tasks

Bugs fix and new feature request for gfootball #335

ErlebnisW opened this issue May 23, 2022 · 6 comments
Assignees
Labels
enhancement New feature or request P1 Issue that should be fixed within a few weeks

Comments

@ErlebnisW
Copy link

ErlebnisW commented May 23, 2022

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)

There are many bugs in current vesion of DI-engine(V0.3.1) gfootball environment. I have tried to fix some of those, but some problems still exists which are beyond my ability. So I guess it needs systemic maintenance and updates. As far as the codes I have tested, only files in dizoo/gfootball/envs/tests works well(after some bug fix). And the fundamental features metioned in the doc(play with built-in AI & self-play) are basicly unusable.

Besides, since gfootball is an environment with great potential both in academy and practice. I strongly recommend following features being added:

  • Battle between customized models
  • Multi-agent support(5 vs 5, 11 vs 11)
  • League training support
  • Imitation learning algorithm enrich (paticularly GAIL, MAGAIL)

Thanks. I think DI-engine is an excelent potential framwork, hope it to be better.

@ErlebnisW ErlebnisW changed the title Bugs fix and new feature request in gfootball Bugs fix and new feature request for gfootball May 23, 2022
@PaParaZz1 PaParaZz1 added enhancement New feature or request P1 Issue that should be fixed within a few weeks labels May 23, 2022
@zxzzz0
Copy link

zxzzz0 commented May 23, 2022

I have tried to fix some of those.

@ErlebnisW What are these exactly? Should there been bugs in the code, the tests wouldn't have passed.

@ErlebnisW
Copy link
Author

ErlebnisW commented May 23, 2022

I have tried to fix some of those.

@ErlebnisW What are these exactly? Should there been bugs in the code, the tests wouldn't have passed.
Operating system: Ubuntu 18.04.6 LTS, Python 3.8

  1. After installation, run DI-engine/dizoo/gfootball/envs/tests/test_env_gfootball_academy.py, terminal shows:

/home/usr/anaconda3/envs/di-engine/lib/python3.8/site-packages/dizoo/gfootball/envs/init.py:6: UserWarning: not found gfootball env, please install it
When I check this path(i.e. anaconda3/envs/di-engine/lib/python3.8/site-packages/dizoo/gfootball/envs/), I found that the folders "action", "obs", "reward" don't exsit. It matters because gfootball_env.py has following lines:
from .action.gfootball_action_runner import GfootballRawActionRunner
from .obs.gfootball_obs_runner import GfootballObsRunner
from .reward.gfootball_reward_runner import GfootballRewardRunner

So I copied the above folders in to solve this.

  1. run /DI-engine/dizoo/gfootball/entry/parallel/gfootball_ppo_parallel_config.py", and terminal shows:

ImportError: cannot import name 'BaseEnvInfo' from 'ding.envs'

I solved this by delete BaseEnvInfo from import and following lines.

  1. There is also another bug about "exp_name" which I foget how to reproduce , I fix this by specify exp_name in one file.

Following bugs can't fix

  1. run "Di-engine/dizoo/gfootball/entry/parallel/gfootball_ppo_parallel_config.py", terminal shows:

File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/envs/env_manager/base_env_manager.py", line 109, in init
self._reward_space = self._env_ref.reward_space
AttributeError: 'GfootballEnv' object has no attribute 'reward_space'

Can't fix this.

  1. run "Di-engine/dizoo/gfootball/entry/parallel/gfootball_il_parallel_config.py", terminal shows:

Traceback (most recent call last):
File "/home/vcis5/Userlist/Wangmingzhi/Di-engine/dizoo/gfootball/entry/parallel/gfootball_il_parallel_config.py", line 123, in
main_config = parallel_transform(main_config)
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/config/utils.py", line 205, in parallel_transform
cfg.system = set_system_cfg(cfg)
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/config/utils.py", line 158, in set_system_cfg
learner_num = cfg.main.policy.learn.learner.learner_num
AttributeError: 'EasyDict' object has no attribute 'main'

Can't fix this.

  1. run demo ppo_lstm in the doc(https://di-engine-docs.readthedocs.io/zh_CN/latest/env_tutorial/gfootball_zh.html) teminal shows :

File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/torch_utils/checkpoint_helper.py", line 327, in wrapper
return func(*args, **kwargs)
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/worker/learner/base_learner.py", line 260, in start
data = self._next_data()
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/worker/learner/base_learner.py", line 164, in wrapper
with self._wrapper_timer:
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/utils/time_helper.py", line 84, in enter
self._timer.start_time()
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/ding/utils/time_helper_cuda.py", line 43, in start_time
torch.cuda.synchronize()
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/torch/cuda/init.py", line 491, in synchronize
_lazy_init()
File "/home/vcis5/anaconda3/envs/di/lib/python3.8/site-packages/torch/cuda/init.py", line 204, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Can't fix this.

@puyuan1996
Copy link
Collaborator

Hello, thanks for your questions and suggestions.

Now in this PR :

  1. we have fixed some bugs in the gfootball env, and we have a naive dqn rl demo to battle with the built-in ai, but the performance now is poor. The naive results is:

image

We speculate that this may be because pure DQN algorithm lacks the ability to model long sequence dependencies and efficient exploration mechanisms. You can try to adapt advanced algorithms like NGU to the gfootball environment.

  1. we have a imitation learning demo to learn from rule_based_model dataset, but the performance now is not good:

image

Here is the statistics of our training dataset (100 episodes): the mean, max, min of return in the training dataset is -0.12, 4, -3, respectively, which suggests that we should improve the quality of the dataset. Then we test the accuracy in the training dataset (100 episodes) and validation dataset (50 episodes), is 0.9452 and 0.8009. We also found that the accuracy of some action in training dataset is lower than 0.5 which reflects the problem of class imbalance.

In order to obtain a good imitation learning performance, we are taking steps to address the above problems, thank you for your patience.

Thanks a lot.

@ErlebnisW
Copy link
Author

Thanks

@puyuan1996
Copy link
Collaborator

Hello,

We have reduced the difficulty of the built-in AI in the environment,change the env_id from 11_vs_11_stochastic (medium opponent bot) to 11_vs_11_easy_stochastic (easy opponent bot) , now the performance of imitation learning demo that learn from rule_based_model dataset is as follows (you can reproduce the result by runing dizoo/gfootball/entry/gfootball_il_rule_lt0_main.py in PR)
image
as you can see, the performance is better than #335 (comment).
We guess that because the level of rule_based_model is originally low, in 11_vs_11_stochastic, the average return of the collected data set is less than 0. Although we only select the trajectory with return>0 for imitation learning, these data only cover a small part of the scene, resulting in over-fitting and poor performance.

Regarding the reinforcement learning demo, we are currently trying the R2D2 algorithm (the development version is dizoo/gfootball/entry/gfootball_r2d2_main.py in PR), and we will let you know as soon as we have the results, thank you for your patience and attention.

Thanks a lot.

@ErlebnisW
Copy link
Author

Thanks a lot! I'll have a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

4 participants