<a href="https://colab.research.google.com/github/AI4Finance-LLC/ElegantRL/blob/master/BipedalWalker_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **BipedalWalker-v3 Example in ElegantRL**






# **Part 1: Testing Task Description**

[BipedalWalker-v3](https://gym.openai.com/envs/BipedalWalker-v2/) is a classic task in robotics since it performs one of the most fundamental skills: moving. In this task, our goal is to make a 2D biped walker to walk through rough terrain. BipedalWalker is a difficult task in continuous action space, and there are only a few RL implementations can reach the target reward.

In [1]:
from IPython.display import HTML
HTML(f"""<video src={"https://gym.openai.com/videos/2019-10-21--mqt8Qj1mwo/BipedalWalker-v2/original.mp4"} width=500 controls/>""") # the random demonstration of the task from OpenAI Gym

# **Part 2: Install ElegantRL**

In [2]:
# install elegantrl library
!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git

Collecting git+https://github.com/AI4Finance-LLC/ElegantRL.git
  Cloning https://github.com/AI4Finance-LLC/ElegantRL.git to /tmp/pip-req-build-j8ggl5hn
  Running command git clone -q https://github.com/AI4Finance-LLC/ElegantRL.git /tmp/pip-req-build-j8ggl5hn
^C
[31mERROR: Operation cancelled by user[0m


# **Part 3: Import Packages**


*   **elegantrl**
*   **OpenAI Gym**: a toolkit for developing and comparing reinforcement learning algorithms.
*   **PyBullet Gym**: an open-source implementation of the OpenAI Gym MuJoCo environments.



In [None]:
from elegantrl.run import *
from elegantrl.agent import AgentDQN
from elegantrl.env import PreprocessEnv
import gym
gym.logger.set_level(40) # Block warning
env = mpe_make_env('simple_spread')

# **Part 4: Specify Agent and Environment**

*   **args.agent**: firstly chooses one DRL algorithm to use, and the user is able to choose any agent from agent.py
*   **args.env**: creates and preprocesses the environment, and the user can either customize own environment or preprocess environments from OpenAI Gym and PyBullet Gym from env.py.


> Before finishing initialization of **args**, please see Arguments() in run.py for more details about adjustable hyper-parameters.




In [None]:
args = Arguments(if_on_policy=False)
args.agent = AgentDQN()  # AgentSAC(), AgentTD3(), AgentDDPG()
args.env = PreprocessEnv(env)
args.reward_scale = 2 ** -1  # RewardRange: -200 < -150 < 300 < 334
args.gamma = 0.95
args.rollout_num = 2# the number of rollout workers (larger is not always faster)


| env_name:  simple_spread, action if_discrete: True
| state_dim:   18, action_dim: 5, action_max: 1
| max_step:  1024, target_return: 65536


# **Part 5: Train and Evaluate the Agent**

> The training and evaluating processes are all finished inside function **train_and_evaluate_mp()**, and the only parameter for it is **args**. It includes the fundamental objects in DRL:

*   agent,
*   environment.

> And it also includes the parameters for training-control:

*   batch_size,
*   target_step,
*   reward_scale,
*   gamma, etc.

> The parameters for evaluation-control:

*   break_step,
*   random_seed, etc.






In [1]:
from elegantrl.run import *
from elegantrl.agent import AgentDQN
from elegantrl.env import PreprocessEnv
import gym
gym.logger.set_level(40) # Block warning
env = mpe_make_env('simple_spread')
args = Arguments(if_on_policy=False)
args.agent = AgentDQN()  # AgentSAC(), AgentTD3(), AgentDDPG()
args.env = PreprocessEnv(env)
args.reward_scale = 2 ** -1  # RewardRange: -200 < -150 < 300 < 334
args.gamma = 0.95
args.rollout_num = 2# the number of rollout workers (larger is not always faster)
train_and_evaluate(args) # the training process will terminate once it reaches the target reward.


| env_name:  simple_spread, action if_discrete: True
| state_dim:   18, action_dim: 5, action_max: 1
| max_step:  1024, target_return: 65536
*****************
| Remove cwd: ./AgentDQN_simple_spread_0
*****************
*****************
################################################################################
ID     Step    maxR |    avgR   stdR   avgS  stdS |    expR   objC   etc.
*****************
*****************


AttributeError: 'list' object has no attribute 'shape'

In [2]:
a = (1,2)
print(type(a))
print(a[0])

<class 'tuple'>


Understanding the above results::
*   **Step**: the total training steps.
*  **MaxR**: the maximum reward.
*   **avgR**: the average of the rewards.
*   **stdR**: the standard deviation of the rewards.
*   **objA**: the objective function value of Actor Network (Policy Network).
*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network).