# Lunar Lander using A2C

Let's learn how to implement A2C with stable baselines for the lunar landing task. In the
lunar lander environment, our agent drives the space vehicle and the goal of the agent is to
land correctly on the landing pad. If our agent (lander) lands away from the landing pad,
then it loses the reward and the episode will get terminated if the agent crashes or comes to
rest. The action space of the environment includes four discrete actions which are do
nothing, a fire left orientation engine, fire main engine, and fire right orientation engine.
Now, Let's see how to train the agent using A2C to correctly land on the landing pad.

First, let's import the necessary libraries:

In [1]:
import warnings
warnings.filterwarnings('ignore')

import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines import A2C


Create the lunar lander environment using gym:

In [2]:
env = gym.make('LunarLander-v2')

Let's use the dummy vectorized environment, we learned that in the dummy vectorized
environment, we run each environment in the same process:

In [3]:
env = DummyVecEnv([lambda: env])

Create the agent:

In [None]:
agent = A2C(MlpPolicy, env, ent_coef=0.1, verbose=0)

Train the agent:

In [5]:
agent.learn(total_timesteps=25000)

<stable_baselines.a2c.a2c.A2C at 0x7fbb000ca518>

After training, we can evaluate our agent by looking at the mean rewards:

In [6]:
mean_reward, n_steps = evaluate_policy(agent, agent.get_env(),
n_eval_episodes=10)

We can also have a look at how our trained agent performs in the environment:


In [None]:
state = env.reset()
while True:
    action, _states = agent.predict(state)
    next_state, reward, done, info = env.step(action)
    state = next_state
    env.render()