## Creating a custom network
In the previous section, we learned how to create A2C using stable baselines. Instead of
using the default network, can we customize the network architecture? Yes! With a stable
baseline, we can also use our own custom architecture. Let's see how to do that. 

In [1]:
import warnings
warnings.filterwarnings('ignore')

import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines.common.evaluation import evaluate_policy
from stable_baselines.common.policies import FeedForwardPolicy
from stable_baselines import A2C

Create the lunar lander environment using gym:

In [2]:
env = gym.make('LunarLander-v2')

Let's use the dummy vectorized environment, we learned that in the dummy vectorized
environment, we run each environment in the same process:

In [3]:
env = DummyVecEnv([lambda: env])

Now, we can define our custom policy (custom network) as shown below. As we can
observe in the below code, we are passing `net_arch=[dict(pi=[128, 128, 128]`,
`vf=[128, 128, 128])]`, which implies our network architecture. pi implies the
architecture of the policy network and vf implies the architecture of value network: 

In [4]:
class CustomPolicy(FeedForwardPolicy):
    def __init__(self, *args, **kargs):
        super(CustomPolicy, self).__init__(*args, **kargs,
                                           net_arch=[dict(pi=[128, 128, 128], vf=[128, 128, 128])],
                                           feature_extraction="mlp")

Create the agent:

In [None]:
agent = A2C(CustomPolicy, env, ent_coef=0.1, verbose=0)

Now, we can train the agent as usual:

In [6]:
agent.learn(total_timesteps=25000)

<stable_baselines.a2c.a2c.A2C at 0x7fb2cc0e0710>

After training, we can evaluate our agent by looking at the mean rewards:

In [7]:
mean_reward, n_steps = evaluate_policy(agent, agent.get_env(),
n_eval_episodes=10)

We can also have a look at how our trained agent performs in the environment:


In [None]:
state = env.reset()
while True:
    action, _states = agent.predict(state)
    next_state, reward, done, info = env.step(action)
    state = next_state
    env.render()