# Lunar Lander Environment

Adapted from the Reinforcement Learning specialization course on Coursera, This challenging project involves creating a realistic simulator, selecting an appropriate reinforcement learning algorithm, implementing the algorithm, and optimizing its hyperparameters. This notebook will begin by taking the first steps in developing a lunar lander environment, a realistic lunar landing simulator suitable for training an agent for real-world deployment.

## Creating an Environment

The essential functions to facilitate the development of the lunar lander environment are:

- **get_velocity**: Returns an array representing the x and y velocities of the lander, each within the range $[0, 60]$.
- **get_angle**: Returns a scalar representing the angle of the lander, ranging from $[0, 359]$ degrees.
- **get_position**: Returns an array with the x and y positions of the lander, each ranging from $[0, 100]$.
- **get_landing_zone**: Returns an array with the x and y coordinates of the landing zone, each ranging from $[1, 100]$.
- **get_fuel**: Returns a scalar indicating the remaining amount of fuel, starting at $100$ and within the range $[0, 100]$.

These functions are provided as placeholders for this notebook.

![Lunar Landar](lunar_landar.png)

In this notebook, the provided functions will be used to **structure the reward signal** based on the following criteria:

1. **Crash Condition**: The lander will crash if it touches the ground with a ``y_velocity < -3`` (downward velocity greater than three).

2. **Crash Condition**: The lander will crash if it touches the ground with an ``x_velocity < -10`` or ``x_velocity > 10`` (horizontal speed exceeding 10).

3. **Crash Condition**: The lander will crash if it touches the ground and the angle is such that ``5 < angle < 355`` (the angle deviates more than 5 degrees from vertical).

4. **Crash Condition**: The lander will crash if it has not yet landed and ``fuel <= 0`` (the fuel runs out).

5. **Fuel Efficiency**: Minimizing fuel consumption is preferred, as MST aims to save money on fuel.

6. **Landing Zone**: The lander must land within the designated landing zone. It will crash if it touches the ground and the ``x_position`` is not within the ``landing_zone`` (the lander lands outside the designated zone).

Complete the methods below to develop the lunar lander environment based on these criteria.

In [1]:
import environment
import numpy as np
from utils import get_landing_zone, get_angle, get_velocity, get_position, get_fuel, tests
get_landing_zone()
# Lunar Lander Environment
class LunarLanderEnvironment(environment.BaseEnvironment):
    def __init__(self):
        self.current_state = None
        self.count = 0
    
    def env_init(self, env_info):
        # users set this up
        self.state = np.zeros(6) # velocity x, y, angle, distance to ground, landing zone x, y
    
    def env_start(self):
        land_x, land_y = get_landing_zone() # gets the x, y coordinate of the landing zone
        # At the start we initialize the agent to the top left hand corner (100, 20) with 0 velocity 
        # in either any direction. The agent's angle is set to 0 and the landing zone is retrieved and set.
        # The lander starts with fuel of 100.
        # (vel_x, vel_y, angle, pos_x, pos_y, land_x, land_y, fuel)
        self.current_state = (0, 0, 0, 100, 20, land_x, land_y, 100)
        return self.current_state
    
    def env_step(self, action):
        
        land_x, land_y = get_landing_zone() # gets the x, y coordinate of the landing zone
        vel_x, vel_y = get_velocity(action) # gets the x, y velocity of the lander
        angle = get_angle(action) # gets the angle the lander is positioned in
        pos_x, pos_y = get_position(action) # gets the x, y position of the lander
        fuel = get_fuel(action) # get the amount of fuel remaining for the lander
        
        terminal = False
        reward = 0.0
        observation = (vel_x, vel_y, angle, pos_x, pos_y, land_x, land_y, fuel)
        
        # use the above observations to decide what the reward will be, and if the
        # agent is in a terminal state.
        # Recall - if the agent crashes or lands terminal needs to be set to True
        
        if pos_y <= land_y:
            terminal = True
            if (vel_y < -3) or (vel_x < -10 or vel_x > 10) or (5 < angle < 355) or (pos_x != land_x):
                reward = -10000
            else:
                reward = fuel
        elif fuel <= 0:
            terminal = True
            reward = -10000
        
        self.reward_obs_term = (reward, observation, terminal)
        return self.reward_obs_term
    
    def env_cleanup(self):
        return None
    
    def env_message(self):
        return None

## Evaluating the Reward Function

Designing an optimal reward function can be complex, and defining what constitutes the "best" reward function is often ambiguous. Instead of relying on quantitative metrics, evaluate the reward function qualitatively. We provide a series of test cases below, illustrating transitions and explaining the behavior of a sample reward function. 

As you review these cases, compare them with how your own reward function performs and assess whether it behaves as expected. Although the final stages of the capstone will use a standardized lunar lander environment implementation, this notebook focuses on gaining experience with environment and reward function design.

### Case 1: Uncertain Future

In this scenario, the lander is located in the top left corner of the screen, moving with a velocity of (12, 15) and has 10 units of fuel. The outcome of this landing attempt is still uncertain.

![Lunar Landar](lunar_landar_1.png)

In [2]:
tests(LunarLanderEnvironment, 1)

Reward: 0.0, Terminal: False


In this scenario, the agent did not receive any reward since it neither accomplished the objective nor experienced a crash. One approach could be to assign a positive reward for progress toward the goal, or a negative reward for fuel consumption. How does your reward function handle this situation?

Additionally, verify that the `Terminal` status is set to `False`. The agent has not yet landed, crashed, or depleted its fuel, so the episode is still ongoing.

### Case 2: Imminent Crash!

The lander is situated in the target landing zone at a 45-degree angle, but its landing gear is designed to handle only a five-degree angular offset. As a result, it is on the verge of crashing!

![Lunar Landar](lunar_landar_2.png)

In [3]:
tests(LunarLanderEnvironment, 2)

Reward: -10000, Terminal: True


A reward of -10,000 was assigned to penalize the agent for crashing.

The `Terminal` status is set to `True` since the agent has crashed and the episode has concluded.

### Case 3: Perfect Landing!

The lander is upright and situated within the target landing zone, with five units of fuel remaining. The landing is executed successfully!

![Lunar Landar](lunar_landar_3.png)

In [4]:
tests(LunarLanderEnvironment, 3)

Reward: 5, Terminal: True


To encourage the agent to conserve as much fuel as possible, we reward successful landings proportionally to the amount of fuel remaining. Here, we gave the agent a reward of five since it landed with five units of fuel remaining. How did you incentivize the agent to be fuel efficient?

The `Terminal` status is set to `True` since the agent has landed and the episode is over.

### Case 4: Fuel Depletion Alert!

The lander is directly above the target landing zone but has depleted its fuel supply. The agent faces a critical situation—without fuel, it cannot avoid a crash!

![Lunar Landar](lunar_landar_4.png)

In [5]:
tests(LunarLanderEnvironment, 4)

Reward: -10000, Terminal: True


We gave the agent a reward of -10000 to punish it for crashing.
As before, ``Terminal`` is set to ``True`` since the agent has crashed and the episode is over.

### Case 5: Where's The Landing Zone?!

The lander is touching down at a vertical angle with fuel to spare. But it is not in the landing zone and the surface is uneven &mdash; it is going to crash!

![Lunar Landar](lunar_landar_5.png)

In [6]:
tests(LunarLanderEnvironment, 5)

Reward: -10000, Terminal: True


We gave the agent a reward of -10000 to punish it for crashing.
``Terminal`` is set to ``True`` since the agent has crashed and the episode is over.