Box2D integration: Add LunarLander (#111)

* add lunar_lander * setup structure * can run with: bazel build //envpool/box2d:box2d_env --config=release * sync * finish LunarLanderEnv::ResetBox2d * sync * add CreateParticle * finish LunarLanderEnv * can make, need test * sync * setup tests * add box2d-py * switch to box2d * add req * fix * fail correctness test * fix lint * fix 2 bugs * fix action shape * pass * update readme * test release * revert * polish * static constexpr -> const * change test metric Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
sail-sg · May 18, 2022 · 1de8e2c · 1de8e2c
1 parent a87a8fd
commit 1de8e2c
Show file tree

Hide file tree

Showing 22 changed files with 1,143 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@
 - [x] [Toy text RL envs](https://envpool.readthedocs.io/en/latest/api/toy_text.html): Catch, FrozenLake, Taxi, NChain, CliffWalking, Blackjack
 - [x] [ViZDoom single player](https://envpool.readthedocs.io/en/latest/api/vizdoom.html)
 - [ ] [DeepMind Control Suite](https://envpool.readthedocs.io/en/latest/api/dm_control.html)
-- [ ] Box2D
+- [ ] [Box2D](https://envpool.readthedocs.io/en/latest/api/box2d.html)
 - [ ] Procgen
 - [ ] Minigrid
 

diff --git a/docs/api/box2d.rst b/docs/api/box2d.rst
@@ -0,0 +1,105 @@
+Box2D
+=====
+
+We use ``box2d==2.4.1`` and ``gym==0.23.1`` as the codebase. See
+https://github.com/erincatto/box2d/tree/v2.4.1 and
+https://github.com/openai/gym/tree/v0.23.1/gym/envs/box2d
+
+
+CarRacing-v1
+------------
+
+The easiest control task to learn from pixels - a top-down racing environment.
+The generated track is random every episode.
+
+Action Space
+~~~~~~~~~~~~
+
+There are 3 actions: steering (-1 for full left, 1 for full right), gas
+(0 ~ 1), and breaking (0 ~ 1).
+
+Observation Space
+~~~~~~~~~~~~~~~~~
+
+State consists of 3 channel 96x96 pixels.
+
+Rewards
+~~~~~~~
+
+The reward is -0.1 every frame and +1000/N for every track tile visited, where
+N is the total number of tiles visited in the track. For example, if you have
+finished in 732 frames, your reward is 1000 - 0.1\*732 = 926.8 points.
+
+Starting State
+~~~~~~~~~~~~~~
+
+The car starts at rest in the center of the road.
+
+Episode Termination
+~~~~~~~~~~~~~~~~~~~
+
+The episode finishes when all of the tiles are visited. The car can also go
+outside of the playfield - that is, far off the track, in which case it will
+receive -100 reward and die.
+
+LunarLander-v2, LunarLanderContinuous-v2
+----------------------------------------
+
+This environment is a classic rocket trajectory optimization problem.
+According to Pontryagin's maximum principle, it is optimal to fire the
+engine at full throttle or turn it off. This is the reason why this
+environment has discrete actions: engine on or off.
+
+There are two environment versions: discrete or continuous. The landing pad is
+always at coordinates (0,0). The coordinates are the first two numbers in the
+state vector. Landing outside of the landing pad is possible. Fuel is
+infinite, so an agent can learn to fly and then land on its first attempt.
+
+Action Space
+~~~~~~~~~~~~
+
+There are four discrete actions available: do nothing, fire left orientation
+engine, fire main engine, fire right orientation engine.
+
+Observation Space
+~~~~~~~~~~~~~~~~~
+
+There are 8 states: the coordinates of the lander in ``x`` and ``y``, its
+linear velocities in ``x`` and ``y``, its angle, its angular velocity, and two
+booleans that represent whether each leg is in contact with the ground or not.
+
+Rewards
+~~~~~~~
+
+Reward for moving from the top of the screen to the landing pad and coming to
+rest is about 100-140 points. If the lander moves away from the landing pad,
+it loses reward. If the lander crashes, it receives an additional -100 points.
+If it comes to rest, it receives an additional +100 points. Each leg with
+ground contact is +10 points. Firing the main engine is -0.3 points each
+frame. Firing the side engine is -0.03 points each frame. Solved is 200
+points.
+
+Starting State
+~~~~~~~~~~~~~~
+
+The lander starts at the top center of the viewport with a random initial
+force applied to its center of mass.
+
+Episode Termination
+~~~~~~~~~~~~~~~~~~~
+
+The episode finishes if:
+
+1. the lander crashes (the lander body gets in contact with the moon);
+2. the lander gets outside of the viewport (``x`` coordinate is greater than
+   1);
+3. the lander is not awake. From the `Box2D docs
+   <https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61>`_,
+   a body which is not awake is a body which doesn't move and doesn't collide
+   with any other body:
+
+   When Box2D determines that a body (or group of bodies) has come to rest,
+   the body enters a sleep state which has very little CPU overhead. If a
+   body is awake and collides with a sleeping body, then the sleeping body
+   wakes up. Bodies will also wake up if a joint or contact attached to
+   them is destroyed.
diff --git a/docs/index.rst b/docs/index.rst
@@ -73,9 +73,10 @@ stable version through `envpool.readthedocs.io/en/stable/
    :caption: Environments
 
    api/atari
+   api/box2d
    api/classic
-   api/mujoco
    api/dm_control
+   api/mujoco
    api/toy_text
    api/vizdoom
 

diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
@@ -57,3 +57,7 @@ Minigrid
 Garena
 Tianshou
 namedtuple
+playfield
+Pontryagin
+booleans
+viewport
diff --git a/envpool/BUILD b/envpool/BUILD
@@ -17,6 +17,7 @@ py_library(
     srcs = ["entry.py"],
     deps = [
         "//envpool/atari:atari_registration",
+        "//envpool/box2d:box2d_registration",
         "//envpool/classic_control:classic_control_registration",
         "//envpool/mujoco:mujoco_registration",
         "//envpool/toy_text:toy_text_registration",
@@ -31,6 +32,7 @@ py_library(
         ":entry",
         ":registration",
         "//envpool/atari",
+        "//envpool/box2d",
         "//envpool/classic_control",
         "//envpool/mujoco",
         "//envpool/python",

diff --git a/envpool/box2d/BUILD b/envpool/box2d/BUILD
@@ -0,0 +1,67 @@
+load("@pip_requirements//:requirements.bzl", "requirement")
+load("@pybind11_bazel//:build_defs.bzl", "pybind_extension")
+
+package(default_visibility = ["//visibility:public"])
+
+cc_library(
+    name = "box2d_env",
+    srcs = ["lunar_lander.cc"],
+    hdrs = [
+        "lunar_lander.h",
+        "lunar_lander_continuous.h",
+        "lunar_lander_discrete.h",
+    ],
+    deps = [
+        "//envpool/core:async_envpool",
+        "@box2d",
+    ],
+)
+
+pybind_extension(
+    name = "box2d_envpool",
+    srcs = ["box2d_envpool.cc"],
+    deps = [
+        ":box2d_env",
+        "//envpool/core:py_envpool",
+    ],
+)
+
+py_library(
+    name = "box2d",
+    srcs = ["__init__.py"],
+    data = [":box2d_envpool.so"],
+    deps = ["//envpool/python:api"],
+)
+
+py_test(
+    name = "box2d_deterministic_test",
+    size = "enormous",
+    srcs = ["box2d_deterministic_test.py"],
+    deps = [
+        ":box2d",
+        requirement("absl-py"),
+        requirement("numpy"),
+    ],
+)
+
+py_test(
+    name = "box2d_correctness_test",
+    size = "enormous",
+    srcs = ["box2d_correctness_test.py"],
+    deps = [
+        ":box2d",
+        requirement("absl-py"),
+        requirement("gym"),
+        requirement("box2d"),
+        requirement("pygame"),
+        requirement("numpy"),
+    ],
+)
+
+py_library(
+    name = "box2d_registration",
+    srcs = ["registration.py"],
+    deps = [
+        "//envpool:registration",
+    ],
+)
diff --git a/envpool/box2d/__init__.py b/envpool/box2d/__init__.py
@@ -0,0 +1,44 @@
+# Copyright 2022 Garena Online Private Limited
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Box2D env in EnvPool."""
+
+from envpool.python.api import py_env
+
+from .box2d_envpool import (
+  _LunarLanderContinuousEnvPool,
+  _LunarLanderContinuousEnvSpec,
+  _LunarLanderDiscreteEnvPool,
+  _LunarLanderDiscreteEnvSpec,
+)
+
+(
+  LunarLanderContinuousEnvSpec,
+  LunarLanderContinuousDMEnvPool,
+  LunarLanderContinuousGymEnvPool,
+) = py_env(_LunarLanderContinuousEnvSpec, _LunarLanderContinuousEnvPool)
+
+(
+  LunarLanderDiscreteEnvSpec,
+  LunarLanderDiscreteDMEnvPool,
+  LunarLanderDiscreteGymEnvPool,
+) = py_env(_LunarLanderDiscreteEnvSpec, _LunarLanderDiscreteEnvPool)
+
+__all__ = [
+  "LunarLanderContinuousEnvSpec",
+  "LunarLanderContinuousDMEnvPool",
+  "LunarLanderContinuousGymEnvPool",
+  "LunarLanderDiscreteEnvSpec",
+  "LunarLanderDiscreteDMEnvPool",
+  "LunarLanderDiscreteGymEnvPool",
+]
diff --git a/envpool/box2d/box2d_correctness_test.py b/envpool/box2d/box2d_correctness_test.py
@@ -0,0 +1,127 @@
+# Copyright 2022 Garena Online Private Limited
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Unit tests for box2d environments correctness check."""
+
+from typing import Any, no_type_check
+
+import gym
+import numpy as np
+from absl import logging
+from absl.testing import absltest
+
+from envpool.box2d import (
+  LunarLanderContinuousEnvSpec,
+  LunarLanderContinuousGymEnvPool,
+  LunarLanderDiscreteEnvSpec,
+  LunarLanderDiscreteGymEnvPool,
+)
+
+
+class _Box2dEnvPoolCorrectnessTest(absltest.TestCase):
+
+  @no_type_check
+  def run_space_check(self, env0: gym.Env, env1: Any) -> None:
+    """Check observation_space and action space."""
+    obs0, obs1 = env0.observation_space, env1.observation_space
+    np.testing.assert_allclose(obs0.shape, obs1.shape)
+    act0, act1 = env0.action_space, env1.action_space
+    if isinstance(act0, gym.spaces.Box):
+      np.testing.assert_allclose(act0.low, act1.low)
+      np.testing.assert_allclose(act0.high, act1.high)
+    elif isinstance(act0, gym.spaces.Discrete):
+      np.testing.assert_allclose(act0.n, act1.n)
+
+  def test_lunar_lander_space(self) -> None:
+    env0 = gym.make("LunarLander-v2")
+    env1 = LunarLanderDiscreteGymEnvPool(
+      LunarLanderDiscreteEnvSpec(LunarLanderDiscreteEnvSpec.gen_config())
+    )
+    self.run_space_check(env0, env1)
+
+    env0 = gym.make("LunarLanderContinuous-v2")
+    env1 = LunarLanderContinuousGymEnvPool(
+      LunarLanderContinuousEnvSpec(LunarLanderContinuousEnvSpec.gen_config())
+    )
+    self.run_space_check(env0, env1)
+
+  def heuristic_lunar_lander_policy(
+    self, s: np.ndarray, continuous: bool
+  ) -> np.ndarray:
+    angle_targ = np.clip(s[0] * 0.5 + s[2] * 1.0, -0.4, 0.4)
+    hover_targ = 0.55 * np.abs(s[0])
+    angle_todo = (angle_targ - s[4]) * 0.5 - s[5] * 1.0
+    hover_todo = (hover_targ - s[1]) * 0.5 - s[3] * 0.5
+
+    if s[6] or s[7]:
+      angle_todo = 0
+      hover_todo = -(s[3]) * 0.5
+
+    if continuous:
+      a = np.array([hover_todo * 20 - 1, -angle_todo * 20])
+      a = np.clip(a, -1, 1)
+    else:
+      a = 0
+      if hover_todo > np.abs(angle_todo) and hover_todo > 0.05:
+        a = 2
+      elif angle_todo < -0.05:
+        a = 3
+      elif angle_todo > 0.05:
+        a = 1
+    return a
+
+  def solve_lunar_lander(self, num_envs: int, continuous: bool) -> None:
+    if continuous:
+      env = LunarLanderContinuousGymEnvPool(
+        LunarLanderContinuousEnvSpec(
+          LunarLanderContinuousEnvSpec.gen_config(num_envs=num_envs)
+        )
+      )
+    else:
+      env = LunarLanderDiscreteGymEnvPool(
+        LunarLanderDiscreteEnvSpec(
+          LunarLanderDiscreteEnvSpec.gen_config(num_envs=num_envs)
+        )
+      )
+    # each env run two episodes
+    for _ in range(2):
+      env_id = np.arange(num_envs)
+      done = np.array([False] * num_envs)
+      obs = env.reset(env_id)
+      rewards = np.zeros(num_envs)
+      while not np.all(done):
+        action = np.array(
+          [self.heuristic_lunar_lander_policy(s, continuous) for s in obs]
+        )
+        obs, rew, done, info = env.step(action, env_id)
+        env_id = info["env_id"]
+        rewards[env_id] += rew
+        obs = obs[~done]
+        env_id = env_id[~done]
+      mean_reward = np.mean(rewards)
+      logging.info(
+        f"{continuous}, {np.mean(rewards):.6f} ± {np.std(rewards):.6f}"
+      )
+      # the following number is from gym's 1000 episode mean reward
+      if continuous:  # 283.872619 ± 18.881830
+        self.assertTrue(abs(mean_reward - 284) < 10, (continuous, mean_reward))
+      else:  # 236.898334 ± 105.832610
+        self.assertTrue(abs(mean_reward - 237) < 20, (continuous, mean_reward))
+
+  def test_lunar_lander_correctness(self, num_envs: int = 30) -> None:
+    self.solve_lunar_lander(num_envs, True)
+    self.solve_lunar_lander(num_envs, False)
+
+
+if __name__ == "__main__":
+  absltest.main()