# Lunar Lander Simulation and Runtime Monitor

## Action Space

The agent has four discrete actions available, each with a corresponding numerical value:

* Do nothing (0).
* Fire right engine (1).
* Fire main engine (2).
* Fire left engine (3).

## State / Observation Space

The agent's observation space consists of a state vector with 8 variables:

* Its $(x,y)$ coordinates. The landing pad is always at coordinates $(0,0)$.
* Its linear velocities $(\dot x,\dot y)$.
* Its angle $\theta$.
* Its angular velocity $\dot \theta$.
* Two booleans, $l$ and $r$, that represent whether each leg is in contact with the ground or not.

In [1]:
from sim.lander import Lander

In [2]:
lander = Lander()   # using default property values
print(lander)       # quick little sanity check

LUNAR LANDER PROPERTIES:

	 Replay Buffer Size: 				100000
	 Steps Per Update: 					4
	 Mini-batch Size: 					64
	 Target Score to Finish: 			200.0

	 Learning Rate (Alpha): 			0.001
	 Q-func Discount Factor (Gamma): 	0.995
	 Soft Update Trade-off Param (Tau): 0.001

	 Epsilon - Min Value: 				0.01
	 Epsilon - Decay Rate: 				0.995

	 Env. Num Avail. Actions: 			4
	 Env. State Size: 					(8,)


In [3]:
# Train using the default parameters:
#       num_episodes=2000
#       num_time_steps=1000
#       num_points=100
#       epsilon=1.0

# NOTE - We are modifying the value of epsilon as we go so that, as our model improves,
# we slowly shift away from exploration and toward exploitation.  The downside to this
# is that later episodes take longer to complete because we're running Bellman more often.

lander.train_agent()

Episode 100 | Total point average of the last 100 episodes: -146.57
Episode 200 | Total point average of the last 100 episodes: -87.970
Episode 300 | Total point average of the last 100 episodes: -34.47
Episode 400 | Total point average of the last 100 episodes: 53.936
Episode 494 | Total point average of the last 100 episodes: 200.53

Environment solved in 494 episodes!
Buffer component states saved to file ./output/latest_buffer_states.npy with shape (100000, 8)
Buffer component actions saved to file ./output/latest_buffer_actions.npy with shape (100000,)
Buffer component rewards saved to file ./output/latest_buffer_rewards.npy with shape (100000,)
Buffer component next_states saved to file ./output/latest_buffer_next_states.npy with shape (100000, 8)
Buffer component done_flags saved to file ./output/latest_buffer_done_flags.npy with shape (100000,)

Total Runtime: 466.00 s (7.77 min)
