# Creating our first agent with Stable Baseline

Note that currently, Stable Baselines works only with TensorFlow version 1.x. So,
make sure you are running the Stable Baselines experiment with TensorFlow 1.x.

Now, let's create our first deep reinforcement learning algorithm using baseline. Let's create
a simple agent using deep Q network for the mountain car climbing task. We know that
in the mountain car climbing task, a car is placed between the two mountains and the goal
of the agent is to drive up the mountain on the right.

First, let's import the gym and DQN from the stable baselines:

In [None]:
import warnings
warnings.filterwarnings('ignore')
                        
import gym
from stable_baselines import DQN

Create a mountain car environment:

In [2]:
env = gym.make('MountainCar-v0')

Now, let's instantiate our agent. As we can observe in the below code, we are passing the
`MlpPolicy`, it implies that our network is a multilayer perceptron. 

In [None]:
agent = DQN('MlpPolicy', env, learning_rate=1e-3)

Let's train the agent by specifying the number of time steps we want to train: 

In [4]:
agent.learn(total_timesteps=25000)

<stable_baselines.deepq.dqn.DQN at 0x7f4190078240>

That's it. Building a DQN agent and training them is that simple.

## Evaluating the trained agent

We can also evaluate the trained agent by looking at the mean rewards using
`evaluate_policy`

In [5]:
from stable_baselines.common.evaluation import evaluate_policy

In the below code, agent is the trained agent, `agent.get_env()` gets the environment we
trained our agent with, `n_eval_episodes` implies the number of episodes we need to
evaluate our agent:

In [6]:
mean_reward, n_steps = evaluate_policy(agent, agent.get_env(), n_eval_episodes=10)

## Storing and loading the trained agent

With stable baselines, we can also save our trained agent and load them.
We can save the agent as:

In [7]:
agent.save("DQN_mountain_car_agent")

After saving, we can load the agent as:

In [None]:
agent = DQN.load("DQN_mountain_car_agent")

## Viewing the trained agent

After training, we can also have a look at how our trained agent performs in the
environment.

Initialize the state:

In [14]:
state = env.reset()

In [None]:
#for some 5000 steps:
for t in range(5000):
    
    #predict the action to perform in the given state using our trained agent:
    action, _ = agent.predict(state)
    
    #perform the predicted action
    next_state, reward, done, info = env.step(action)
    
    #update next state to current state 
    state = next_state
    
    #render the environment
    env.render()