# Terminal states and episodes

<img src="slides/images/terminal/2.png" width="750"></img>

In [1]:
import gym

env = gym.make("CartPole-v1")
obs = env.reset()

In [2]:
env.step(0)

(array([ 0.01401658, -0.22265474, -0.03178046,  0.27844635]), 1.0, False, {})

## The third element of the return value of `env.step(action)` indicates if we have reached a terminal state

- called `done` (`type: bool`)
- `True` indicates terminal state reached

In [3]:
obs, reward, done, _ = env.step(0)
print(done)

False


## Demo of terminal states in `CartPole-v1`

<img src="slides/images/cp_terminal/6.png" width="750"></img>

In [4]:
import numpy as np

obs = env.reset()
for _ in range(30):
    print(f"Pole angle at step start: {np.degrees(obs[2])}", end=" ")
    obs, reward, done, _  = env.step(0)
    print(f"Pole angle at step end: {np.degrees(obs[2])}", end=" ")
    print(f"Reward in step: {reward}, done: {done}")

Pole angle at step start: 2.5225715240513926 Pole angle at step end: 2.5628541577896797 Reward in step: 1.0, done: False
Pole angle at step start: 2.5628541577896797 Pole angle at step end: 2.9540647813957777 Reward in step: 1.0, done: False
Pole angle at step start: 2.9540647813957777 Pole angle at step end: 3.6964366026209015 Reward in step: 1.0, done: False
Pole angle at step start: 3.6964366026209015 Pole angle at step end: 4.792280374650069 Reward in step: 1.0, done: False
Pole angle at step start: 4.792280374650069 Pole angle at step end: 6.245918098510648 Reward in step: 1.0, done: False
Pole angle at step start: 6.245918098510648 Pole angle at step end: 8.063567923896485 Reward in step: 1.0, done: False
Pole angle at step start: 8.063567923896485 Pole angle at step end: 10.253173624787452 Reward in step: 1.0, done: False
Pole angle at step start: 10.253173624787452 Pole angle at step end: 12.824169097187262 Reward in step: 1.0, done: True
Pole angle at step start: 12.8241690971



## Doing multiple episodes

- Once `done` is `True`, we should not take any more actions
- To collect, call `env.reset()` if you want to restart the simulation from the beginning 

In [5]:
for ep in range(5):
    print(f"Episode number is {ep+1}")
    obs = env.reset()
    while True:
        print(f"Pole angle at step start: {np.degrees(obs[2])}", end=" ")
        obs, reward, done, _  = env.step(0)
        print(f"Pole angle at step end: {np.degrees(obs[2])}", end=" ")
        print(f"Reward in step: {reward}, done: {done}")
        if done:
            break

Episode number is 1
Pole angle at step start: 1.1183983506183337 Pole angle at step end: 1.1099812877349993 Reward in step: 1.0, done: False
Pole angle at step start: 1.1099812877349993 Pole angle at step end: 1.4439376330601614 Reward in step: 1.0, done: False
Pole angle at step start: 1.4439376330601614 Pole angle at step end: 2.120212628817913 Reward in step: 1.0, done: False
Pole angle at step start: 2.120212628817913 Pole angle at step end: 3.1408509750108773 Reward in step: 1.0, done: False
Pole angle at step start: 3.1408509750108773 Pole angle at step end: 4.509941568170456 Reward in step: 1.0, done: False
Pole angle at step start: 4.509941568170456 Pole angle at step end: 6.233519072945295 Reward in step: 1.0, done: False
Pole angle at step start: 6.233519072945295 Pole angle at step end: 8.319416182530764 Reward in step: 1.0, done: False
Pole angle at step start: 8.319416182530764 Pole angle at step end: 10.77705728546158 Reward in step: 1.0, done: False
Pole angle at step st