Skip to content

Latest commit

 

History

History
executable file
·
173 lines (139 loc) · 8.8 KB

README.md

File metadata and controls

executable file
·
173 lines (139 loc) · 8.8 KB

GoBigger-Explore

banner

en/中文

GoBigger-Explore is the collection of baseline for GoBigger Multi-Agent Decision Intelligence Challenge in 2021. The baseline is based on OpenDILab and aims to provide a simple entry-level method. Participants can build agents by extending the baseline method provided. In addition, Opendilab's modular structure allows participants to easily get started, and it provides a wealth of reinforcement learning algorithms for participants to use. This baseline is a good starting point, especially for entry-level researchers who are familiar with multi-agent decision AI problems. We will also add more fancy algorithms to this repo.

amazing balls

Outline

🚀 Release Version

The current version is the latest version 0.3.0.

  1. What needs to be optimized in the future
    • Application of advanced algorithms.
    • Design and study of advanced actions.
  2. Supervised Learning
    • Using bots to generate data for supervised learning.
    • The supervised learning model can be used as a competition model or as a pre-train for reinforcement learning.
    • Details can be seen SL
  3. Gobigger with Go-Explore
    • Training Gobigger with Go-Explore algorithm.
    • Speed up network training by loading endgame matches.
    • Details can be seen go-explore
  4. Version-0.3.0
    • Adopt in-place algorithms and gradient accumulation strategy to save gpu memory.
    • Efficiently encode the characteristics of the Version-0.2.0 relational section.
    • Simplified network model and efficient training process design.
  5. Version-0.2.0
    • version-0.2.0 version Link
    • Fix the ckpt bug to improve the accuracy of the evaluator.
    • Fix replay_buffer bug
    • Replay_buffer stores variable-length features to improve data utilization and training speed.
  6. Version-0.1.0
  7. Feature Engineering
    • Brand new feature engineering to improve convergence speed.
      • Scalar Encoder avatar

        • The default upper left corner is the origin of the coordinates.
        • The red rectangle in the figure is the global field of view, and the green rectangle is the local field of view.
      • Food Encoder

        • For the convenience of calculation, the area of ball uses the square of the radius, omitting the constant term.
        • The food map divides the local field of vision into h*w small grids, and the size of each small grid is 16*16.
        • food map[0,:,:] represents the sum of the area of all food in each small grid。
        • food map[1,:,:] represents the sum of the area of the clone ball of a certain id in each small grid.
        • The food grid divides the local field of vision into h*w small grids, and the size of each small grid is 8*8.
        • The food grid represents the offset of the food in each small grid relative to the upper left corner of the grid and the radius of the food.
        • The dimension of the food relation is [len(clone),7*7+1,3]. Where [:,7*7,3] represents the food information in the 7*7 grid neighborhood of each clone ball, including the offset and the sum of the squares of the food area in the grid. Because the coverage rate is very low, an approximation is made here, and the location information of food is subject to the last one. [len(clone):,1,3] represents the offset and area of each clone ball.
      • Clone Encoder

        • Encode the clone ball, including the position, radius, one-hot encoding of the player name, and the speed encoding of the clone ball.
      • Relation Encoder

        • The relative position relationship between ball_1 and ball_2,(x1-x2,y1-y2).
        • The distance between ball_1 and ball_2.
        • The collision of ball_1 and ball_2 is that the center of a ball appears in another ball.
        • Whether ball_1 and ball_2 collide with each other, that is, the distance between the arc of one ball and the center of the other ball.
        • Whether ball_1 and ball_2 collide with each other after splitting, that is, the distance between the arc of the farthest split ball and the center of the other ball.
        • The relationship between eating and being eaten is the relationship between the radius of the two balls.
        • The relationship between eating and being eaten is the relationship between the radius of the two balls after splitting.
        • The relationship between the radius of the two balls. And ball_1 and ball_2 are mapped to m*n r1 and m*n r2 respectively, where m represents the number of ball_1's clone ball, and n represents the number of ball_2's clone ball. avatar
      • Model

        • The role of the mask is to record the effective information after padding. Need to combine code to understand better.
        • The model design in Baseline is not the best, players just enjoy it! avatar
  8. Win Rate VS Bot
    • Version-0.3.0 VS Rule based bot in Gobigger. avatar
  9. Version comparison
    • Version-0.3.0 VS Version-0.2.0
      • v0.3.0 is more lightweight, and network design and feature coding are easy to use.
      • v0.3.0reward and Q-value curve avatar avatar

👇 Getting Start

System environment

  • Core 8
  • GPU 1080Ti(11G) or 1060(6G)
  • Memory 40G

Baseline Config

  • The default config is the config used in this experiment. Participants can modify it according to the system environment.
  • The size of replay_buffer_size needs to be set according to the size of RAM.
  • The size of batch_size needs to be set according to the size of the GPU memory.

Install the necessary package

# Install DI-engine
git clone https://github.com/opendilab/DI-engine.git
cd YOUR_PATH/DI-engine/
pip install -e . --user

# Install Env Gobigger
git clone https://github.com/opendilab/GoBigger.git
cd YOUR_PATH/GoBigger/
pip install -e . --user

Start training

# Download baseline
git clone https://github.com/opendilab/Gobigger-Explore.git
cd my_submission/entry/
python gobigger_vsbot_baseline_simple_main.py.py

Evaluator and Save game videos

cd my_submission/entry/
python gobigger_vsbot_baseline_simple_eval.py --ckpt YOUR_CKPT_PATH
# No need to save the video, uncomment line 258 of gobigger_env.py
python gobigger_vsbot_baseline_simple_quick_eval.py --ckpt YOUR_CKPT_PATH

SL Training

cd my_submission/sl/
python generate_data_opensource.py # generate data for training
python train.py -c ./exp/sample/config.yaml #need change data dir

Go explore

cd my_submission/go-explore/
python gobigger_vsbot_explore_main.py

🎯 Result

We released training log information, checkpoints, and evaluation videos. Below is the download link,

  • Version 0.3.0
    • Baidu Netdisk Link
      • Extraction code: 95el
  • Version 0.2.0
    • Baidu Netdisk Link
      • Extraction code: u4i6

😍 Resources

⭐ Join and Contribute

Welcome to OpenDI Lab GoBigger community! Scan the QR code and add us on Wechat:

QR code

Or you can contact us with slack or email (opendilab@pjlab.org.cn).

🍸 License

GoBigger-Explore released under the Apache 2.0 license.