In [2]:
import gymnasium as gym
import numpy as np
import random

# Lunar Lander

1. **Objective**: The main objective is to land the lunar module between two flags safely. Landing outside the flags or crashing the module results in reduced rewards or penalties.

2. **States**: The state (or observation) space has eight dimensions:
   - X and Y coordinates.
   - X and Y velocity.
   - Angle and angular velocity.
   - A boolean (0 or 1) for whether the left leg has made contact.
   - A boolean (0 or 1) for whether the right leg has made contact.


3. **Actions**: The agent can take one of four actions at each time step:
   - Do nothing.
   - Fire the main engine (to slow descent).
   - Fire the left orientation engine (to push the lander to the right).
   - Fire the right orientation engine (to push the lander to the left).


4. **Rewards**: 
   - For each frame that the lander is flying, it gets a reward between -0.3 to -0.03 points.
   - If the lander crashes (e.g., comes in too fast or lands at a bad angle), it gets a penalty of -100 points.
   - If the lander lands successfully and safely, it receives +100 points.
   - There are additional points for moving closer to the center and penalties for using fuel (firing engines).


5. **Versions**:
   - **LunarLander-v2**: This is the standard version.
   - **LunarLanderContinuous-v2**: In this version, instead of discrete actions, the agent has to choose two continuous actions. The first one determines the main engine's thrust (0 not firing, 1 full throttle), and the second one controls orientation (-1 full left, 0 off, 1 full right).


## Random action

In [3]:
env = gym.make("LunarLander-v2", render_mode="rgb_array")

In [4]:
episodes = 10
for episode in range(episodes+1):

    # reset the evironment for each episode
    observation = env.reset()
    score = 0
    done = False 
    
    while not done:

        # possible actions at each step: do nothing, fire the main engine, fire the left orientation engine, or fire the right orientation engine 
        action = env.action_space.sample()  
        observation, reward, terminated, truncated, info = env.step(action)
        score += reward
        env.render()
        print(f"Episode: {episode} Score: {score}", end="\r")

        if terminated or truncated:
            done = True

env.close()

Episode: 10 Score: -123.547990523588428