Skip to content

Commit

Permalink
Box2D integration: Add LunarLander (#111)
Browse files Browse the repository at this point in the history
* add lunar_lander

* setup structure

* can run with: bazel build //envpool/box2d:box2d_env --config=release

* sync

* finish LunarLanderEnv::ResetBox2d

* sync

* add CreateParticle

* finish LunarLanderEnv

* can make, need test

* sync

* setup tests

* add box2d-py

* switch to box2d

* add req

* fix

* fail correctness test

* fix lint

* fix 2 bugs

* fix action shape

* pass

* update readme

* test release

* revert

* polish

* static constexpr -> const

* change test metric

Co-authored-by: Jiayi Weng <trinkle23897@gmail.com>
  • Loading branch information
Alicia1529 and Trinkle23897 committed May 18, 2022
1 parent a87a8fd commit 1de8e2c
Show file tree
Hide file tree
Showing 22 changed files with 1,143 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
- [x] [Toy text RL envs](https://envpool.readthedocs.io/en/latest/api/toy_text.html): Catch, FrozenLake, Taxi, NChain, CliffWalking, Blackjack
- [x] [ViZDoom single player](https://envpool.readthedocs.io/en/latest/api/vizdoom.html)
- [ ] [DeepMind Control Suite](https://envpool.readthedocs.io/en/latest/api/dm_control.html)
- [ ] Box2D
- [ ] [Box2D](https://envpool.readthedocs.io/en/latest/api/box2d.html)
- [ ] Procgen
- [ ] Minigrid

Expand Down
105 changes: 105 additions & 0 deletions docs/api/box2d.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
Box2D
=====

We use ``box2d==2.4.1`` and ``gym==0.23.1`` as the codebase. See
https://github.com/erincatto/box2d/tree/v2.4.1 and
https://github.com/openai/gym/tree/v0.23.1/gym/envs/box2d


CarRacing-v1
------------

The easiest control task to learn from pixels - a top-down racing environment.
The generated track is random every episode.

Action Space
~~~~~~~~~~~~

There are 3 actions: steering (-1 for full left, 1 for full right), gas
(0 ~ 1), and breaking (0 ~ 1).

Observation Space
~~~~~~~~~~~~~~~~~

State consists of 3 channel 96x96 pixels.

Rewards
~~~~~~~

The reward is -0.1 every frame and +1000/N for every track tile visited, where
N is the total number of tiles visited in the track. For example, if you have
finished in 732 frames, your reward is 1000 - 0.1\*732 = 926.8 points.

Starting State
~~~~~~~~~~~~~~

The car starts at rest in the center of the road.

Episode Termination
~~~~~~~~~~~~~~~~~~~

The episode finishes when all of the tiles are visited. The car can also go
outside of the playfield - that is, far off the track, in which case it will
receive -100 reward and die.

LunarLander-v2, LunarLanderContinuous-v2
----------------------------------------

This environment is a classic rocket trajectory optimization problem.
According to Pontryagin's maximum principle, it is optimal to fire the
engine at full throttle or turn it off. This is the reason why this
environment has discrete actions: engine on or off.

There are two environment versions: discrete or continuous. The landing pad is
always at coordinates (0,0). The coordinates are the first two numbers in the
state vector. Landing outside of the landing pad is possible. Fuel is
infinite, so an agent can learn to fly and then land on its first attempt.

Action Space
~~~~~~~~~~~~

There are four discrete actions available: do nothing, fire left orientation
engine, fire main engine, fire right orientation engine.

Observation Space
~~~~~~~~~~~~~~~~~

There are 8 states: the coordinates of the lander in ``x`` and ``y``, its
linear velocities in ``x`` and ``y``, its angle, its angular velocity, and two
booleans that represent whether each leg is in contact with the ground or not.

Rewards
~~~~~~~

Reward for moving from the top of the screen to the landing pad and coming to
rest is about 100-140 points. If the lander moves away from the landing pad,
it loses reward. If the lander crashes, it receives an additional -100 points.
If it comes to rest, it receives an additional +100 points. Each leg with
ground contact is +10 points. Firing the main engine is -0.3 points each
frame. Firing the side engine is -0.03 points each frame. Solved is 200
points.

Starting State
~~~~~~~~~~~~~~

The lander starts at the top center of the viewport with a random initial
force applied to its center of mass.

Episode Termination
~~~~~~~~~~~~~~~~~~~

The episode finishes if:

1. the lander crashes (the lander body gets in contact with the moon);
2. the lander gets outside of the viewport (``x`` coordinate is greater than
1);
3. the lander is not awake. From the `Box2D docs
<https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61>`_,
a body which is not awake is a body which doesn't move and doesn't collide
with any other body:

When Box2D determines that a body (or group of bodies) has come to rest,
the body enters a sleep state which has very little CPU overhead. If a
body is awake and collides with a sleeping body, then the sleeping body
wakes up. Bodies will also wake up if a joint or contact attached to
them is destroyed.
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,10 @@ stable version through `envpool.readthedocs.io/en/stable/
:caption: Environments

api/atari
api/box2d
api/classic
api/mujoco
api/dm_control
api/mujoco
api/toy_text
api/vizdoom

Expand Down
4 changes: 4 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,7 @@ Minigrid
Garena
Tianshou
namedtuple
playfield
Pontryagin
booleans
viewport
2 changes: 2 additions & 0 deletions envpool/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ py_library(
srcs = ["entry.py"],
deps = [
"//envpool/atari:atari_registration",
"//envpool/box2d:box2d_registration",
"//envpool/classic_control:classic_control_registration",
"//envpool/mujoco:mujoco_registration",
"//envpool/toy_text:toy_text_registration",
Expand All @@ -31,6 +32,7 @@ py_library(
":entry",
":registration",
"//envpool/atari",
"//envpool/box2d",
"//envpool/classic_control",
"//envpool/mujoco",
"//envpool/python",
Expand Down
67 changes: 67 additions & 0 deletions envpool/box2d/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
load("@pip_requirements//:requirements.bzl", "requirement")
load("@pybind11_bazel//:build_defs.bzl", "pybind_extension")

package(default_visibility = ["//visibility:public"])

cc_library(
name = "box2d_env",
srcs = ["lunar_lander.cc"],
hdrs = [
"lunar_lander.h",
"lunar_lander_continuous.h",
"lunar_lander_discrete.h",
],
deps = [
"//envpool/core:async_envpool",
"@box2d",
],
)

pybind_extension(
name = "box2d_envpool",
srcs = ["box2d_envpool.cc"],
deps = [
":box2d_env",
"//envpool/core:py_envpool",
],
)

py_library(
name = "box2d",
srcs = ["__init__.py"],
data = [":box2d_envpool.so"],
deps = ["//envpool/python:api"],
)

py_test(
name = "box2d_deterministic_test",
size = "enormous",
srcs = ["box2d_deterministic_test.py"],
deps = [
":box2d",
requirement("absl-py"),
requirement("numpy"),
],
)

py_test(
name = "box2d_correctness_test",
size = "enormous",
srcs = ["box2d_correctness_test.py"],
deps = [
":box2d",
requirement("absl-py"),
requirement("gym"),
requirement("box2d"),
requirement("pygame"),
requirement("numpy"),
],
)

py_library(
name = "box2d_registration",
srcs = ["registration.py"],
deps = [
"//envpool:registration",
],
)
44 changes: 44 additions & 0 deletions envpool/box2d/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright 2022 Garena Online Private Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Box2D env in EnvPool."""

from envpool.python.api import py_env

from .box2d_envpool import (
_LunarLanderContinuousEnvPool,
_LunarLanderContinuousEnvSpec,
_LunarLanderDiscreteEnvPool,
_LunarLanderDiscreteEnvSpec,
)

(
LunarLanderContinuousEnvSpec,
LunarLanderContinuousDMEnvPool,
LunarLanderContinuousGymEnvPool,
) = py_env(_LunarLanderContinuousEnvSpec, _LunarLanderContinuousEnvPool)

(
LunarLanderDiscreteEnvSpec,
LunarLanderDiscreteDMEnvPool,
LunarLanderDiscreteGymEnvPool,
) = py_env(_LunarLanderDiscreteEnvSpec, _LunarLanderDiscreteEnvPool)

__all__ = [
"LunarLanderContinuousEnvSpec",
"LunarLanderContinuousDMEnvPool",
"LunarLanderContinuousGymEnvPool",
"LunarLanderDiscreteEnvSpec",
"LunarLanderDiscreteDMEnvPool",
"LunarLanderDiscreteGymEnvPool",
]
127 changes: 127 additions & 0 deletions envpool/box2d/box2d_correctness_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Copyright 2022 Garena Online Private Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Unit tests for box2d environments correctness check."""

from typing import Any, no_type_check

import gym
import numpy as np
from absl import logging
from absl.testing import absltest

from envpool.box2d import (
LunarLanderContinuousEnvSpec,
LunarLanderContinuousGymEnvPool,
LunarLanderDiscreteEnvSpec,
LunarLanderDiscreteGymEnvPool,
)


class _Box2dEnvPoolCorrectnessTest(absltest.TestCase):

@no_type_check
def run_space_check(self, env0: gym.Env, env1: Any) -> None:
"""Check observation_space and action space."""
obs0, obs1 = env0.observation_space, env1.observation_space
np.testing.assert_allclose(obs0.shape, obs1.shape)
act0, act1 = env0.action_space, env1.action_space
if isinstance(act0, gym.spaces.Box):
np.testing.assert_allclose(act0.low, act1.low)
np.testing.assert_allclose(act0.high, act1.high)
elif isinstance(act0, gym.spaces.Discrete):
np.testing.assert_allclose(act0.n, act1.n)

def test_lunar_lander_space(self) -> None:
env0 = gym.make("LunarLander-v2")
env1 = LunarLanderDiscreteGymEnvPool(
LunarLanderDiscreteEnvSpec(LunarLanderDiscreteEnvSpec.gen_config())
)
self.run_space_check(env0, env1)

env0 = gym.make("LunarLanderContinuous-v2")
env1 = LunarLanderContinuousGymEnvPool(
LunarLanderContinuousEnvSpec(LunarLanderContinuousEnvSpec.gen_config())
)
self.run_space_check(env0, env1)

def heuristic_lunar_lander_policy(
self, s: np.ndarray, continuous: bool
) -> np.ndarray:
angle_targ = np.clip(s[0] * 0.5 + s[2] * 1.0, -0.4, 0.4)
hover_targ = 0.55 * np.abs(s[0])
angle_todo = (angle_targ - s[4]) * 0.5 - s[5] * 1.0
hover_todo = (hover_targ - s[1]) * 0.5 - s[3] * 0.5

if s[6] or s[7]:
angle_todo = 0
hover_todo = -(s[3]) * 0.5

if continuous:
a = np.array([hover_todo * 20 - 1, -angle_todo * 20])
a = np.clip(a, -1, 1)
else:
a = 0
if hover_todo > np.abs(angle_todo) and hover_todo > 0.05:
a = 2
elif angle_todo < -0.05:
a = 3
elif angle_todo > 0.05:
a = 1
return a

def solve_lunar_lander(self, num_envs: int, continuous: bool) -> None:
if continuous:
env = LunarLanderContinuousGymEnvPool(
LunarLanderContinuousEnvSpec(
LunarLanderContinuousEnvSpec.gen_config(num_envs=num_envs)
)
)
else:
env = LunarLanderDiscreteGymEnvPool(
LunarLanderDiscreteEnvSpec(
LunarLanderDiscreteEnvSpec.gen_config(num_envs=num_envs)
)
)
# each env run two episodes
for _ in range(2):
env_id = np.arange(num_envs)
done = np.array([False] * num_envs)
obs = env.reset(env_id)
rewards = np.zeros(num_envs)
while not np.all(done):
action = np.array(
[self.heuristic_lunar_lander_policy(s, continuous) for s in obs]
)
obs, rew, done, info = env.step(action, env_id)
env_id = info["env_id"]
rewards[env_id] += rew
obs = obs[~done]
env_id = env_id[~done]
mean_reward = np.mean(rewards)
logging.info(
f"{continuous}, {np.mean(rewards):.6f} ± {np.std(rewards):.6f}"
)
# the following number is from gym's 1000 episode mean reward
if continuous: # 283.872619 ± 18.881830
self.assertTrue(abs(mean_reward - 284) < 10, (continuous, mean_reward))
else: # 236.898334 ± 105.832610
self.assertTrue(abs(mean_reward - 237) < 20, (continuous, mean_reward))

def test_lunar_lander_correctness(self, num_envs: int = 30) -> None:
self.solve_lunar_lander(num_envs, True)
self.solve_lunar_lander(num_envs, False)


if __name__ == "__main__":
absltest.main()
Loading

0 comments on commit 1de8e2c

Please sign in to comment.