Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box2D integration: Add LunarLander #111

Merged
merged 33 commits into from
May 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
4182930
add lunar_lander
Alicia1529 May 12, 2022
7040274
setup structure
Trinkle23897 May 12, 2022
0d365a5
can run with: bazel build //envpool/box2d:box2d_env --config=release
Trinkle23897 May 12, 2022
674893f
sync
Trinkle23897 May 13, 2022
5df0bb0
Merge branch 'master' into lunar_lander
Trinkle23897 May 15, 2022
9c030d0
finish LunarLanderEnv::ResetBox2d
Trinkle23897 May 15, 2022
ddcbcfb
sync
Trinkle23897 May 15, 2022
664427b
Merge branch 'master' into lunar_lander
Trinkle23897 May 15, 2022
2232d55
add CreateParticle
Trinkle23897 May 15, 2022
eba9693
Merge branch 'master' into lunar_lander
Trinkle23897 May 15, 2022
9d38e3d
finish LunarLanderEnv
Trinkle23897 May 15, 2022
e94ab1a
can make, need test
Trinkle23897 May 15, 2022
e8027f2
sync
Trinkle23897 May 16, 2022
6c17f38
Merge branch 'master' into lunar_lander
Trinkle23897 May 16, 2022
622dee8
setup tests
Trinkle23897 May 16, 2022
8c175da
add box2d-py
Trinkle23897 May 16, 2022
26ac92b
switch to box2d
Trinkle23897 May 16, 2022
726d6f1
add req
Trinkle23897 May 16, 2022
31761dd
fix
Trinkle23897 May 16, 2022
b3443b9
fail correctness test
Trinkle23897 May 16, 2022
a1938db
fix lint
Trinkle23897 May 16, 2022
007e6f7
Merge branch 'master' into lunar_lander
Trinkle23897 May 17, 2022
08d119a
fix 2 bugs
Trinkle23897 May 17, 2022
e0bbee9
Merge branch 'master' into lunar_lander
Trinkle23897 May 17, 2022
e857a30
fix action shape
Trinkle23897 May 17, 2022
08eae0d
pass
Trinkle23897 May 17, 2022
55f78af
update readme
Trinkle23897 May 17, 2022
b974bcd
test release
Trinkle23897 May 18, 2022
e7987d1
revert
Trinkle23897 May 18, 2022
a7a69a9
polish
Trinkle23897 May 18, 2022
a7b0be1
static constexpr -> const
Trinkle23897 May 18, 2022
20553c0
Merge branch 'master' into lunar_lander
Trinkle23897 May 18, 2022
3696981
change test metric
Trinkle23897 May 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
- [x] [Toy text RL envs](https://envpool.readthedocs.io/en/latest/api/toy_text.html): Catch, FrozenLake, Taxi, NChain, CliffWalking, Blackjack
- [x] [ViZDoom single player](https://envpool.readthedocs.io/en/latest/api/vizdoom.html)
- [ ] [DeepMind Control Suite](https://envpool.readthedocs.io/en/latest/api/dm_control.html)
- [ ] Box2D
- [ ] [Box2D](https://envpool.readthedocs.io/en/latest/api/box2d.html)
- [ ] Procgen
- [ ] Minigrid

Expand Down
105 changes: 105 additions & 0 deletions docs/api/box2d.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
Box2D
=====

We use ``box2d==2.4.1`` and ``gym==0.23.1`` as the codebase. See
https://github.com/erincatto/box2d/tree/v2.4.1 and
https://github.com/openai/gym/tree/v0.23.1/gym/envs/box2d


CarRacing-v1
------------

The easiest control task to learn from pixels - a top-down racing environment.
The generated track is random every episode.

Action Space
~~~~~~~~~~~~

There are 3 actions: steering (-1 for full left, 1 for full right), gas
(0 ~ 1), and breaking (0 ~ 1).

Observation Space
~~~~~~~~~~~~~~~~~

State consists of 3 channel 96x96 pixels.

Rewards
~~~~~~~

The reward is -0.1 every frame and +1000/N for every track tile visited, where
N is the total number of tiles visited in the track. For example, if you have
finished in 732 frames, your reward is 1000 - 0.1\*732 = 926.8 points.

Starting State
~~~~~~~~~~~~~~

The car starts at rest in the center of the road.

Episode Termination
~~~~~~~~~~~~~~~~~~~

The episode finishes when all of the tiles are visited. The car can also go
outside of the playfield - that is, far off the track, in which case it will
receive -100 reward and die.

LunarLander-v2, LunarLanderContinuous-v2
----------------------------------------

This environment is a classic rocket trajectory optimization problem.
According to Pontryagin's maximum principle, it is optimal to fire the
engine at full throttle or turn it off. This is the reason why this
environment has discrete actions: engine on or off.

There are two environment versions: discrete or continuous. The landing pad is
always at coordinates (0,0). The coordinates are the first two numbers in the
state vector. Landing outside of the landing pad is possible. Fuel is
infinite, so an agent can learn to fly and then land on its first attempt.

Action Space
~~~~~~~~~~~~

There are four discrete actions available: do nothing, fire left orientation
engine, fire main engine, fire right orientation engine.

Observation Space
~~~~~~~~~~~~~~~~~

There are 8 states: the coordinates of the lander in ``x`` and ``y``, its
linear velocities in ``x`` and ``y``, its angle, its angular velocity, and two
booleans that represent whether each leg is in contact with the ground or not.

Rewards
~~~~~~~

Reward for moving from the top of the screen to the landing pad and coming to
rest is about 100-140 points. If the lander moves away from the landing pad,
it loses reward. If the lander crashes, it receives an additional -100 points.
If it comes to rest, it receives an additional +100 points. Each leg with
ground contact is +10 points. Firing the main engine is -0.3 points each
frame. Firing the side engine is -0.03 points each frame. Solved is 200
points.

Starting State
~~~~~~~~~~~~~~

The lander starts at the top center of the viewport with a random initial
force applied to its center of mass.

Episode Termination
~~~~~~~~~~~~~~~~~~~

The episode finishes if:

1. the lander crashes (the lander body gets in contact with the moon);
2. the lander gets outside of the viewport (``x`` coordinate is greater than
1);
3. the lander is not awake. From the `Box2D docs
<https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61>`_,
a body which is not awake is a body which doesn't move and doesn't collide
with any other body:

When Box2D determines that a body (or group of bodies) has come to rest,
the body enters a sleep state which has very little CPU overhead. If a
body is awake and collides with a sleeping body, then the sleeping body
wakes up. Bodies will also wake up if a joint or contact attached to
them is destroyed.
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,10 @@ stable version through `envpool.readthedocs.io/en/stable/
:caption: Environments

api/atari
api/box2d
api/classic
api/mujoco
api/dm_control
api/mujoco
api/toy_text
api/vizdoom

Expand Down
4 changes: 4 additions & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,7 @@ Minigrid
Garena
Tianshou
namedtuple
playfield
Pontryagin
booleans
viewport
2 changes: 2 additions & 0 deletions envpool/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ py_library(
srcs = ["entry.py"],
deps = [
"//envpool/atari:atari_registration",
"//envpool/box2d:box2d_registration",
"//envpool/classic_control:classic_control_registration",
"//envpool/mujoco:mujoco_registration",
"//envpool/toy_text:toy_text_registration",
Expand All @@ -31,6 +32,7 @@ py_library(
":entry",
":registration",
"//envpool/atari",
"//envpool/box2d",
"//envpool/classic_control",
"//envpool/mujoco",
"//envpool/python",
Expand Down
67 changes: 67 additions & 0 deletions envpool/box2d/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
load("@pip_requirements//:requirements.bzl", "requirement")
load("@pybind11_bazel//:build_defs.bzl", "pybind_extension")

package(default_visibility = ["//visibility:public"])

cc_library(
name = "box2d_env",
srcs = ["lunar_lander.cc"],
hdrs = [
"lunar_lander.h",
"lunar_lander_continuous.h",
"lunar_lander_discrete.h",
],
deps = [
"//envpool/core:async_envpool",
"@box2d",
],
)

pybind_extension(
name = "box2d_envpool",
srcs = ["box2d_envpool.cc"],
deps = [
":box2d_env",
"//envpool/core:py_envpool",
],
)

py_library(
name = "box2d",
srcs = ["__init__.py"],
data = [":box2d_envpool.so"],
deps = ["//envpool/python:api"],
)

py_test(
name = "box2d_deterministic_test",
size = "enormous",
srcs = ["box2d_deterministic_test.py"],
deps = [
":box2d",
requirement("absl-py"),
requirement("numpy"),
],
)

py_test(
name = "box2d_correctness_test",
size = "enormous",
srcs = ["box2d_correctness_test.py"],
deps = [
":box2d",
requirement("absl-py"),
requirement("gym"),
requirement("box2d"),
requirement("pygame"),
requirement("numpy"),
],
)

py_library(
name = "box2d_registration",
srcs = ["registration.py"],
deps = [
"//envpool:registration",
],
)
44 changes: 44 additions & 0 deletions envpool/box2d/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Copyright 2022 Garena Online Private Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Box2D env in EnvPool."""

from envpool.python.api import py_env

from .box2d_envpool import (
_LunarLanderContinuousEnvPool,
_LunarLanderContinuousEnvSpec,
_LunarLanderDiscreteEnvPool,
_LunarLanderDiscreteEnvSpec,
)

(
LunarLanderContinuousEnvSpec,
LunarLanderContinuousDMEnvPool,
LunarLanderContinuousGymEnvPool,
) = py_env(_LunarLanderContinuousEnvSpec, _LunarLanderContinuousEnvPool)

(
LunarLanderDiscreteEnvSpec,
LunarLanderDiscreteDMEnvPool,
LunarLanderDiscreteGymEnvPool,
) = py_env(_LunarLanderDiscreteEnvSpec, _LunarLanderDiscreteEnvPool)

__all__ = [
"LunarLanderContinuousEnvSpec",
"LunarLanderContinuousDMEnvPool",
"LunarLanderContinuousGymEnvPool",
"LunarLanderDiscreteEnvSpec",
"LunarLanderDiscreteDMEnvPool",
"LunarLanderDiscreteGymEnvPool",
]
127 changes: 127 additions & 0 deletions envpool/box2d/box2d_correctness_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Copyright 2022 Garena Online Private Limited
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Unit tests for box2d environments correctness check."""

from typing import Any, no_type_check

import gym
import numpy as np
from absl import logging
from absl.testing import absltest

from envpool.box2d import (
LunarLanderContinuousEnvSpec,
LunarLanderContinuousGymEnvPool,
LunarLanderDiscreteEnvSpec,
LunarLanderDiscreteGymEnvPool,
)


class _Box2dEnvPoolCorrectnessTest(absltest.TestCase):

@no_type_check
def run_space_check(self, env0: gym.Env, env1: Any) -> None:
"""Check observation_space and action space."""
obs0, obs1 = env0.observation_space, env1.observation_space
np.testing.assert_allclose(obs0.shape, obs1.shape)
act0, act1 = env0.action_space, env1.action_space
if isinstance(act0, gym.spaces.Box):
np.testing.assert_allclose(act0.low, act1.low)
np.testing.assert_allclose(act0.high, act1.high)
elif isinstance(act0, gym.spaces.Discrete):
np.testing.assert_allclose(act0.n, act1.n)

def test_lunar_lander_space(self) -> None:
env0 = gym.make("LunarLander-v2")
env1 = LunarLanderDiscreteGymEnvPool(
LunarLanderDiscreteEnvSpec(LunarLanderDiscreteEnvSpec.gen_config())
)
self.run_space_check(env0, env1)

env0 = gym.make("LunarLanderContinuous-v2")
env1 = LunarLanderContinuousGymEnvPool(
LunarLanderContinuousEnvSpec(LunarLanderContinuousEnvSpec.gen_config())
)
self.run_space_check(env0, env1)

def heuristic_lunar_lander_policy(
self, s: np.ndarray, continuous: bool
) -> np.ndarray:
angle_targ = np.clip(s[0] * 0.5 + s[2] * 1.0, -0.4, 0.4)
hover_targ = 0.55 * np.abs(s[0])
angle_todo = (angle_targ - s[4]) * 0.5 - s[5] * 1.0
hover_todo = (hover_targ - s[1]) * 0.5 - s[3] * 0.5

if s[6] or s[7]:
angle_todo = 0
hover_todo = -(s[3]) * 0.5

if continuous:
a = np.array([hover_todo * 20 - 1, -angle_todo * 20])
a = np.clip(a, -1, 1)
else:
a = 0
if hover_todo > np.abs(angle_todo) and hover_todo > 0.05:
a = 2
elif angle_todo < -0.05:
a = 3
elif angle_todo > 0.05:
a = 1
return a

def solve_lunar_lander(self, num_envs: int, continuous: bool) -> None:
if continuous:
env = LunarLanderContinuousGymEnvPool(
LunarLanderContinuousEnvSpec(
LunarLanderContinuousEnvSpec.gen_config(num_envs=num_envs)
)
)
else:
env = LunarLanderDiscreteGymEnvPool(
LunarLanderDiscreteEnvSpec(
LunarLanderDiscreteEnvSpec.gen_config(num_envs=num_envs)
)
)
# each env run two episodes
for _ in range(2):
env_id = np.arange(num_envs)
done = np.array([False] * num_envs)
obs = env.reset(env_id)
rewards = np.zeros(num_envs)
while not np.all(done):
action = np.array(
[self.heuristic_lunar_lander_policy(s, continuous) for s in obs]
)
obs, rew, done, info = env.step(action, env_id)
env_id = info["env_id"]
rewards[env_id] += rew
obs = obs[~done]
env_id = env_id[~done]
mean_reward = np.mean(rewards)
logging.info(
f"{continuous}, {np.mean(rewards):.6f} ± {np.std(rewards):.6f}"
)
# the following number is from gym's 1000 episode mean reward
if continuous: # 283.872619 ± 18.881830
self.assertTrue(abs(mean_reward - 284) < 10, (continuous, mean_reward))
else: # 236.898334 ± 105.832610
self.assertTrue(abs(mean_reward - 237) < 20, (continuous, mean_reward))

def test_lunar_lander_correctness(self, num_envs: int = 30) -> None:
self.solve_lunar_lander(num_envs, True)
self.solve_lunar_lander(num_envs, False)


if __name__ == "__main__":
absltest.main()
Loading