# What are the terminal states in the `MountainCar-v0` environment?

In a previous exercise, we found that the rewards in `MountainCar-v0` is -1 for each time step irrespective of the environment state and actions. Since a negative reward is a *punishment*, the agent is basically punished at every time step. Poor agent!

However, the agent is not doomed to suffer punishements forever. That's because the `MountainCar-v0` environment has some terminal states. When there is only constant negative rewards, the only way to maximize such a reward function is to reach the terminal state as soon as possible, because there are no more punishments after you reach the terminal state.

So the idea of maximizing rewards is better understood as minimizing punishments in the `MountainCar-v0` environment, even though they are two different ways of saying the same thing.

`MountainCar-v0` has two terminal states. 

1. The first terminal state occurs if the agent is able to reach the flag on top of the mountain. You see that in the image below.

![mountain_car_terminal_state_1.jpeg](attachment:mountain_car_terminal_state_1.jpeg)

2. The second terminal state is more like a time limit. Just like `CartPole-v0`, this terminal state occurs after the agent takes 200 steps.

This means that the worst total reward you can get in this environment is -200 i.e. -1 for 200 time steps, after which the episode ends because of the second terminal state.

You can try to improve on this worst total reward by reaching the flag position (the first terminal state) in less than 200 time steps. The sooner you reach it, the better. If you reach it after 120 steps, then the total rewards you will get in the episode is -120, which is better than -200.

The **learning goal** of the `MountainCar-v0` environment is to climb the hill and reach the flag as soon as possible. We once again find that this is equivalent to maximizing total rewards in an episode. Cool!

So that was a long but necessary prelude. And now for the actual exercise... 

I want you to confirm the second terminal state by writing code. If you are ready, here's what I want you to do.

1. Write a loop where the agent continues taking random actions until it encounters the terminal state. It encounters the terminal state when `done` (the third element of the return value of `env.step()`) is `True`.
2. Keep track of the step number and print out the step number in which the terminal state appears.
3. Run the cell a few times and verify that the terminal state always appears  at step number 200.

Your code goes below the code where I have setup the environment and above the code where I close the environment. GO!

In [None]:
import gym

env = gym.make("MountainCar-v0")
observation = env.reset()
# Write a loop below this comment and above the env.close() statement to confirm the second terminal state 
env.close()

If you managed to confirm the second terminal state, congrats! 

By stopping when the terminal state arrived, your agent is no longer violating the rules of the game by taking actions even after the game ends or stopping early. Instead it respects the terminal states and only when the episode is over, it stops. That's what I call a rule-abiding citizen of Mountain country!