-
Notifications
You must be signed in to change notification settings - Fork 95
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Box2D integration: Add LunarLander (#111)
* add lunar_lander * setup structure * can run with: bazel build //envpool/box2d:box2d_env --config=release * sync * finish LunarLanderEnv::ResetBox2d * sync * add CreateParticle * finish LunarLanderEnv * can make, need test * sync * setup tests * add box2d-py * switch to box2d * add req * fix * fail correctness test * fix lint * fix 2 bugs * fix action shape * pass * update readme * test release * revert * polish * static constexpr -> const * change test metric Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
- Loading branch information
1 parent
a87a8fd
commit 1de8e2c
Showing
22 changed files
with
1,143 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
Box2D | ||
===== | ||
|
||
We use ``box2d==2.4.1`` and ``gym==0.23.1`` as the codebase. See | ||
https://github.com/erincatto/box2d/tree/v2.4.1 and | ||
https://github.com/openai/gym/tree/v0.23.1/gym/envs/box2d | ||
|
||
|
||
CarRacing-v1 | ||
------------ | ||
|
||
The easiest control task to learn from pixels - a top-down racing environment. | ||
The generated track is random every episode. | ||
|
||
Action Space | ||
~~~~~~~~~~~~ | ||
|
||
There are 3 actions: steering (-1 for full left, 1 for full right), gas | ||
(0 ~ 1), and breaking (0 ~ 1). | ||
|
||
Observation Space | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
State consists of 3 channel 96x96 pixels. | ||
|
||
Rewards | ||
~~~~~~~ | ||
|
||
The reward is -0.1 every frame and +1000/N for every track tile visited, where | ||
N is the total number of tiles visited in the track. For example, if you have | ||
finished in 732 frames, your reward is 1000 - 0.1\*732 = 926.8 points. | ||
|
||
Starting State | ||
~~~~~~~~~~~~~~ | ||
|
||
The car starts at rest in the center of the road. | ||
|
||
Episode Termination | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
The episode finishes when all of the tiles are visited. The car can also go | ||
outside of the playfield - that is, far off the track, in which case it will | ||
receive -100 reward and die. | ||
|
||
LunarLander-v2, LunarLanderContinuous-v2 | ||
---------------------------------------- | ||
|
||
This environment is a classic rocket trajectory optimization problem. | ||
According to Pontryagin's maximum principle, it is optimal to fire the | ||
engine at full throttle or turn it off. This is the reason why this | ||
environment has discrete actions: engine on or off. | ||
|
||
There are two environment versions: discrete or continuous. The landing pad is | ||
always at coordinates (0,0). The coordinates are the first two numbers in the | ||
state vector. Landing outside of the landing pad is possible. Fuel is | ||
infinite, so an agent can learn to fly and then land on its first attempt. | ||
|
||
Action Space | ||
~~~~~~~~~~~~ | ||
|
||
There are four discrete actions available: do nothing, fire left orientation | ||
engine, fire main engine, fire right orientation engine. | ||
|
||
Observation Space | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
There are 8 states: the coordinates of the lander in ``x`` and ``y``, its | ||
linear velocities in ``x`` and ``y``, its angle, its angular velocity, and two | ||
booleans that represent whether each leg is in contact with the ground or not. | ||
|
||
Rewards | ||
~~~~~~~ | ||
|
||
Reward for moving from the top of the screen to the landing pad and coming to | ||
rest is about 100-140 points. If the lander moves away from the landing pad, | ||
it loses reward. If the lander crashes, it receives an additional -100 points. | ||
If it comes to rest, it receives an additional +100 points. Each leg with | ||
ground contact is +10 points. Firing the main engine is -0.3 points each | ||
frame. Firing the side engine is -0.03 points each frame. Solved is 200 | ||
points. | ||
|
||
Starting State | ||
~~~~~~~~~~~~~~ | ||
|
||
The lander starts at the top center of the viewport with a random initial | ||
force applied to its center of mass. | ||
|
||
Episode Termination | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
The episode finishes if: | ||
|
||
1. the lander crashes (the lander body gets in contact with the moon); | ||
2. the lander gets outside of the viewport (``x`` coordinate is greater than | ||
1); | ||
3. the lander is not awake. From the `Box2D docs | ||
<https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61>`_, | ||
a body which is not awake is a body which doesn't move and doesn't collide | ||
with any other body: | ||
|
||
When Box2D determines that a body (or group of bodies) has come to rest, | ||
the body enters a sleep state which has very little CPU overhead. If a | ||
body is awake and collides with a sleeping body, then the sleeping body | ||
wakes up. Bodies will also wake up if a joint or contact attached to | ||
them is destroyed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,3 +57,7 @@ Minigrid | |
Garena | ||
Tianshou | ||
namedtuple | ||
playfield | ||
Pontryagin | ||
booleans | ||
viewport |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
load("@pip_requirements//:requirements.bzl", "requirement") | ||
load("@pybind11_bazel//:build_defs.bzl", "pybind_extension") | ||
|
||
package(default_visibility = ["//visibility:public"]) | ||
|
||
cc_library( | ||
name = "box2d_env", | ||
srcs = ["lunar_lander.cc"], | ||
hdrs = [ | ||
"lunar_lander.h", | ||
"lunar_lander_continuous.h", | ||
"lunar_lander_discrete.h", | ||
], | ||
deps = [ | ||
"//envpool/core:async_envpool", | ||
"@box2d", | ||
], | ||
) | ||
|
||
pybind_extension( | ||
name = "box2d_envpool", | ||
srcs = ["box2d_envpool.cc"], | ||
deps = [ | ||
":box2d_env", | ||
"//envpool/core:py_envpool", | ||
], | ||
) | ||
|
||
py_library( | ||
name = "box2d", | ||
srcs = ["__init__.py"], | ||
data = [":box2d_envpool.so"], | ||
deps = ["//envpool/python:api"], | ||
) | ||
|
||
py_test( | ||
name = "box2d_deterministic_test", | ||
size = "enormous", | ||
srcs = ["box2d_deterministic_test.py"], | ||
deps = [ | ||
":box2d", | ||
requirement("absl-py"), | ||
requirement("numpy"), | ||
], | ||
) | ||
|
||
py_test( | ||
name = "box2d_correctness_test", | ||
size = "enormous", | ||
srcs = ["box2d_correctness_test.py"], | ||
deps = [ | ||
":box2d", | ||
requirement("absl-py"), | ||
requirement("gym"), | ||
requirement("box2d"), | ||
requirement("pygame"), | ||
requirement("numpy"), | ||
], | ||
) | ||
|
||
py_library( | ||
name = "box2d_registration", | ||
srcs = ["registration.py"], | ||
deps = [ | ||
"//envpool:registration", | ||
], | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Copyright 2022 Garena Online Private Limited | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Box2D env in EnvPool.""" | ||
|
||
from envpool.python.api import py_env | ||
|
||
from .box2d_envpool import ( | ||
_LunarLanderContinuousEnvPool, | ||
_LunarLanderContinuousEnvSpec, | ||
_LunarLanderDiscreteEnvPool, | ||
_LunarLanderDiscreteEnvSpec, | ||
) | ||
|
||
( | ||
LunarLanderContinuousEnvSpec, | ||
LunarLanderContinuousDMEnvPool, | ||
LunarLanderContinuousGymEnvPool, | ||
) = py_env(_LunarLanderContinuousEnvSpec, _LunarLanderContinuousEnvPool) | ||
|
||
( | ||
LunarLanderDiscreteEnvSpec, | ||
LunarLanderDiscreteDMEnvPool, | ||
LunarLanderDiscreteGymEnvPool, | ||
) = py_env(_LunarLanderDiscreteEnvSpec, _LunarLanderDiscreteEnvPool) | ||
|
||
__all__ = [ | ||
"LunarLanderContinuousEnvSpec", | ||
"LunarLanderContinuousDMEnvPool", | ||
"LunarLanderContinuousGymEnvPool", | ||
"LunarLanderDiscreteEnvSpec", | ||
"LunarLanderDiscreteDMEnvPool", | ||
"LunarLanderDiscreteGymEnvPool", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
# Copyright 2022 Garena Online Private Limited | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""Unit tests for box2d environments correctness check.""" | ||
|
||
from typing import Any, no_type_check | ||
|
||
import gym | ||
import numpy as np | ||
from absl import logging | ||
from absl.testing import absltest | ||
|
||
from envpool.box2d import ( | ||
LunarLanderContinuousEnvSpec, | ||
LunarLanderContinuousGymEnvPool, | ||
LunarLanderDiscreteEnvSpec, | ||
LunarLanderDiscreteGymEnvPool, | ||
) | ||
|
||
|
||
class _Box2dEnvPoolCorrectnessTest(absltest.TestCase): | ||
|
||
@no_type_check | ||
def run_space_check(self, env0: gym.Env, env1: Any) -> None: | ||
"""Check observation_space and action space.""" | ||
obs0, obs1 = env0.observation_space, env1.observation_space | ||
np.testing.assert_allclose(obs0.shape, obs1.shape) | ||
act0, act1 = env0.action_space, env1.action_space | ||
if isinstance(act0, gym.spaces.Box): | ||
np.testing.assert_allclose(act0.low, act1.low) | ||
np.testing.assert_allclose(act0.high, act1.high) | ||
elif isinstance(act0, gym.spaces.Discrete): | ||
np.testing.assert_allclose(act0.n, act1.n) | ||
|
||
def test_lunar_lander_space(self) -> None: | ||
env0 = gym.make("LunarLander-v2") | ||
env1 = LunarLanderDiscreteGymEnvPool( | ||
LunarLanderDiscreteEnvSpec(LunarLanderDiscreteEnvSpec.gen_config()) | ||
) | ||
self.run_space_check(env0, env1) | ||
|
||
env0 = gym.make("LunarLanderContinuous-v2") | ||
env1 = LunarLanderContinuousGymEnvPool( | ||
LunarLanderContinuousEnvSpec(LunarLanderContinuousEnvSpec.gen_config()) | ||
) | ||
self.run_space_check(env0, env1) | ||
|
||
def heuristic_lunar_lander_policy( | ||
self, s: np.ndarray, continuous: bool | ||
) -> np.ndarray: | ||
angle_targ = np.clip(s[0] * 0.5 + s[2] * 1.0, -0.4, 0.4) | ||
hover_targ = 0.55 * np.abs(s[0]) | ||
angle_todo = (angle_targ - s[4]) * 0.5 - s[5] * 1.0 | ||
hover_todo = (hover_targ - s[1]) * 0.5 - s[3] * 0.5 | ||
|
||
if s[6] or s[7]: | ||
angle_todo = 0 | ||
hover_todo = -(s[3]) * 0.5 | ||
|
||
if continuous: | ||
a = np.array([hover_todo * 20 - 1, -angle_todo * 20]) | ||
a = np.clip(a, -1, 1) | ||
else: | ||
a = 0 | ||
if hover_todo > np.abs(angle_todo) and hover_todo > 0.05: | ||
a = 2 | ||
elif angle_todo < -0.05: | ||
a = 3 | ||
elif angle_todo > 0.05: | ||
a = 1 | ||
return a | ||
|
||
def solve_lunar_lander(self, num_envs: int, continuous: bool) -> None: | ||
if continuous: | ||
env = LunarLanderContinuousGymEnvPool( | ||
LunarLanderContinuousEnvSpec( | ||
LunarLanderContinuousEnvSpec.gen_config(num_envs=num_envs) | ||
) | ||
) | ||
else: | ||
env = LunarLanderDiscreteGymEnvPool( | ||
LunarLanderDiscreteEnvSpec( | ||
LunarLanderDiscreteEnvSpec.gen_config(num_envs=num_envs) | ||
) | ||
) | ||
# each env run two episodes | ||
for _ in range(2): | ||
env_id = np.arange(num_envs) | ||
done = np.array([False] * num_envs) | ||
obs = env.reset(env_id) | ||
rewards = np.zeros(num_envs) | ||
while not np.all(done): | ||
action = np.array( | ||
[self.heuristic_lunar_lander_policy(s, continuous) for s in obs] | ||
) | ||
obs, rew, done, info = env.step(action, env_id) | ||
env_id = info["env_id"] | ||
rewards[env_id] += rew | ||
obs = obs[~done] | ||
env_id = env_id[~done] | ||
mean_reward = np.mean(rewards) | ||
logging.info( | ||
f"{continuous}, {np.mean(rewards):.6f} ± {np.std(rewards):.6f}" | ||
) | ||
# the following number is from gym's 1000 episode mean reward | ||
if continuous: # 283.872619 ± 18.881830 | ||
self.assertTrue(abs(mean_reward - 284) < 10, (continuous, mean_reward)) | ||
else: # 236.898334 ± 105.832610 | ||
self.assertTrue(abs(mean_reward - 237) < 20, (continuous, mean_reward)) | ||
|
||
def test_lunar_lander_correctness(self, num_envs: int = 30) -> None: | ||
self.solve_lunar_lander(num_envs, True) | ||
self.solve_lunar_lander(num_envs, False) | ||
|
||
|
||
if __name__ == "__main__": | ||
absltest.main() |
Oops, something went wrong.