# Deep Reinforcement Learning

 Reinforcement Learning (RL) is an approach wherein an agent learns to make sequential decisions by interacting with an environment. The objective is for the agent to maximize the cumulative reward it receives over time.
 The agent goes through this process by repeatedly evaluating the consequences of its actions, trying to select actions that lead to better outcomes.

To do this, we will use Gym, an platform for developing and comparing reinforcement learning algorithms. Gym provides an interface for interacting with different environments, it accepts actions from agents and plays them out in an environment, providing rewards.


## Environment

We will be using `CartPole` environment from gym's library for this assignment.  In this environment, a pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.

You can use the code below to run an instance of a random agent in this environment and see the results.

In [1]:
from IPython.display import HTML
from base64 import b64encode

def show_video(path):
    mp4 = open(path, 'rb').read()
    data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
    return HTML("""
    <video width=400 controls>
          <source src="%s" type="video/mp4">
    </video>
    """ % data_url)

In [2]:
!pip install gym[atari,accept-rom-license] -qq
!pip install imageio -qq

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.6 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.6 MB[0m [31m4.2 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.6/1.6 MB[0m [31m9.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━[0m [32m1.4/1.6 MB[0m [31m13.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/434.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m434.7/434.7 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdo

In [3]:
import cv2
import gym
import imageio
import numpy as np
from gym import spaces

We use `gym.make()` to make an instance of a certain environemtn. We can then use `.step()` method which accepts an action as input and performs it. Before that we reset the environment to its initial state by using `.reset()` method.

In [10]:
env_name = 'CartPole-v1'

# Create an instance of the environment
env = gym.make(env_name)

env.reset()

frames = []

for _ in range(500):
    action = env.action_space.sample()

    obs, reward, done, _ = env.step(action)

    # render this frame and add to the list of frames
    render_output = env.render(mode='rgb_array') # Change render mode to 'rgb_array'
    if render_output is not None:
        frames.append(render_output)

    if done:
        env.reset()

env.close()
imageio.mimsave('./cartpole.mp4', frames, fps=25) # Save the video file



In [11]:
show_video('./cartpole.mp4')

As you can see, the cart fails to keep the balance of the pole. In the next section we will train an agent to learn how to perform this task.

## Algorithm
We will be using A2C algorithm.

Advantage Actor-Critic (A2C) is a reinforcement learning algorithm.
It consists of an actor (which predicts the best action based on the current state) and a critic (which estimates the state's value function to measure expected future rewards).

We will implement this together step by step.




In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.distributions.categorical import Categorical

import numpy as np
import gym
from collections import deque
from tqdm import tqdm

## Neural Network

Here we design a simple feed forward model to embed the observation from the environment to a hidden layer. We then use two fully connected layers on top of the hidden layer, to predict the next action and estimate the value of current state. This acts as both actor, and critic.


In [13]:
class ActorCritic(nn.Module):
    def __init__(self, input_size, hidden_size, num_outputs):
        super(ActorCritic, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc_actor = nn.Linear(hidden_size, num_outputs)
        self.fc_critic = nn.Linear(hidden_size, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        action_probs = F.softmax(self.fc_actor(x), dim=-1)
        value = self.fc_critic(x)
        return action_probs, value

## A2C

The A2C algorithm aims to jointly train both the actor and the critic to improve the policy. It does this by updating the parameters
of the actor to increase the likelihood of good actions and updating the parameters
of the critic to better estimate the value function.

In each iteration A2C plays the until it ends. During this time it records log probabality of actions, rewards, and predicted values in each step. These values will be used to update the model at the end of this trajectory.

The actor is updated using the objective below:

$$ L_{\text{actor}} = -\log \pi(a|s;\theta) \times A(s, a) $$
Where advantage is calculated as:
$$A(s, a) = Q(s, a) - V(s) $$

Namely the function $Q(s,a)$ is the estimated value of taking action
$a$
 in state
$s$.
$V(s)$ is the predicted value of our critic.

This loss function aims to improve the probability of playing actions that result in higher rewards.

As for the critic the loss function is defined as a simple mean square loss between actual value of an state and the predicted one:

$$ L_{\text{critic}} = \frac{1}{2} ( R - V(s))^2 $$

In [14]:
class A2CAgent:
    def __init__(self, env, num_episodes=1000, max_steps=500, gamma=0.99, lr=1e-3, hidden_size=256):
        self.env = env
        self.num_episodes = num_episodes
        self.max_steps = max_steps
        self.gamma = gamma
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        # Get the size of the observation space and the number of actions
        input_size = env.observation_space.shape[0]
        num_outputs = env.action_space.n

        # Define your actor-critic network
        self.policy_net = ActorCritic(input_size, hidden_size, num_outputs).to(self.device)
        self.optimizer = optim.Adam(self.policy_net.parameters(), lr=lr)
        self.critic_loss = nn.MSELoss()

    def choose_action(self, state):
        state = torch.FloatTensor(state).unsqueeze(0).to(self.device)
        action_probs, _ = self.policy_net(state)
        action = torch.multinomial(action_probs, 1).item()
        return action

    def compute_returns(self, rewards):
        returns = []
        R = 0
        for r in reversed(rewards):
            R = r + self.gamma * R
            returns.insert(0, R)
        returns = torch.tensor(returns).to(self.device)
        return returns

    def train(self):
        episode_rewards = []

        for episode in tqdm(range(self.num_episodes)):
            state = self.env.reset()
            if state is None or len(state) != self.env.observation_space.shape[0]:
                print(f"Invalid initial state: {state}")
                continue

            log_probs = []
            values = []
            rewards = []
            episode_reward = 0

            for step in range(self.max_steps):
                action = self.choose_action(state)
                next_state, reward, done, _ = self.env.step(action)

                if next_state is None or len(next_state) != self.env.observation_space.shape[0]:
                    print(f"Invalid next state at step {step}: {next_state}")
                    break

                state_tensor = torch.FloatTensor(state).unsqueeze(0).to(self.device)
                _, value = self.policy_net(state_tensor)
                log_prob = torch.log(self.policy_net(state_tensor)[0][0][action])

                log_probs.append(log_prob)
                values.append(value)
                rewards.append(reward)
                episode_reward += reward

                state = next_state
                if done:
                    break

            episode_rewards.append(episode_reward)

            # Calculate the discounted rewards
            returns = self.compute_returns(rewards)

            # Convert lists to tensors
            log_probs = torch.stack(log_probs)
            values = torch.stack(values).squeeze()

            # Calculate advantage
            advantage = returns - values

            # Compute actor and critic loss
            actor_loss = -(log_probs * advantage.detach()).mean()
            critic_loss = self.critic_loss(values, returns)
            loss = actor_loss + critic_loss

            # Perform backpropagation
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()

            print(f"Episode {episode + 1}, Reward: {episode_reward}")

        self.env.close()
        return episode_rewards

Define the model and set hyperparameters.

In [15]:
env = gym.make('CartPole-v1')
num_episodes = 1000
max_steps = 500
lr = 1e-3
hidden_size = 256

a2c_model = A2CAgent(env, num_episodes=num_episodes, max_steps=max_steps, lr=lr, hidden_size=hidden_size)

  deprecation(
  deprecation(


Train the model.

In [16]:
rewards = a2c_model.train()

  if not isinstance(terminated, (bool, np.bool8)):
  1%|          | 6/1000 [00:01<02:53,  5.74it/s]

Episode 1, Reward: 14.0
Episode 2, Reward: 10.0
Episode 3, Reward: 10.0
Episode 4, Reward: 16.0
Episode 5, Reward: 14.0
Episode 6, Reward: 14.0
Episode 7, Reward: 13.0
Episode 8, Reward: 9.0
Episode 9, Reward: 13.0


  2%|▏         | 16/1000 [00:01<00:56, 17.44it/s]

Episode 10, Reward: 23.0
Episode 11, Reward: 11.0
Episode 12, Reward: 11.0
Episode 13, Reward: 10.0
Episode 14, Reward: 15.0
Episode 15, Reward: 12.0
Episode 16, Reward: 11.0
Episode 17, Reward: 13.0
Episode 18, Reward: 10.0
Episode 19, Reward: 9.0
Episode 20, Reward: 15.0


  3%|▎         | 28/1000 [00:01<00:31, 30.91it/s]

Episode 21, Reward: 11.0
Episode 22, Reward: 11.0
Episode 23, Reward: 13.0
Episode 24, Reward: 10.0
Episode 25, Reward: 12.0
Episode 26, Reward: 10.0
Episode 27, Reward: 11.0
Episode 28, Reward: 10.0
Episode 29, Reward: 15.0
Episode 30, Reward: 8.0
Episode 31, Reward: 12.0
Episode 32, Reward: 10.0


  4%|▍         | 41/1000 [00:02<00:22, 42.67it/s]

Episode 33, Reward: 9.0
Episode 34, Reward: 8.0
Episode 35, Reward: 10.0
Episode 36, Reward: 12.0
Episode 37, Reward: 12.0
Episode 38, Reward: 11.0
Episode 39, Reward: 16.0
Episode 40, Reward: 11.0
Episode 41, Reward: 9.0
Episode 42, Reward: 11.0
Episode 43, Reward: 10.0
Episode 44, Reward: 9.0
Episode 45, Reward: 9.0


  6%|▌         | 55/1000 [00:02<00:17, 52.87it/s]

Episode 46, Reward: 12.0
Episode 47, Reward: 11.0
Episode 48, Reward: 8.0
Episode 49, Reward: 11.0
Episode 50, Reward: 11.0
Episode 51, Reward: 8.0
Episode 52, Reward: 8.0
Episode 53, Reward: 8.0
Episode 54, Reward: 9.0
Episode 55, Reward: 12.0
Episode 56, Reward: 10.0
Episode 57, Reward: 9.0
Episode 58, Reward: 10.0


  7%|▋         | 68/1000 [00:02<00:16, 56.92it/s]

Episode 59, Reward: 11.0
Episode 60, Reward: 9.0
Episode 61, Reward: 9.0
Episode 62, Reward: 8.0
Episode 63, Reward: 9.0
Episode 64, Reward: 11.0
Episode 65, Reward: 8.0
Episode 66, Reward: 9.0
Episode 67, Reward: 10.0
Episode 68, Reward: 9.0
Episode 69, Reward: 10.0
Episode 70, Reward: 10.0
Episode 71, Reward: 9.0
Episode 72, Reward: 8.0


  8%|▊         | 82/1000 [00:02<00:15, 57.54it/s]

Episode 73, Reward: 10.0
Episode 74, Reward: 10.0
Episode 75, Reward: 18.0
Episode 76, Reward: 8.0
Episode 77, Reward: 10.0
Episode 78, Reward: 8.0
Episode 79, Reward: 10.0
Episode 80, Reward: 9.0
Episode 81, Reward: 12.0
Episode 82, Reward: 9.0
Episode 83, Reward: 13.0


 10%|▉         | 95/1000 [00:02<00:15, 59.33it/s]

Episode 84, Reward: 10.0
Episode 85, Reward: 9.0
Episode 86, Reward: 10.0
Episode 87, Reward: 10.0
Episode 88, Reward: 14.0
Episode 89, Reward: 11.0
Episode 90, Reward: 12.0
Episode 91, Reward: 9.0
Episode 92, Reward: 8.0
Episode 93, Reward: 9.0
Episode 94, Reward: 10.0
Episode 95, Reward: 8.0
Episode 96, Reward: 9.0


 10%|█         | 102/1000 [00:03<00:15, 59.53it/s]

Episode 97, Reward: 9.0
Episode 98, Reward: 16.0
Episode 99, Reward: 10.0
Episode 100, Reward: 10.0
Episode 101, Reward: 9.0
Episode 102, Reward: 10.0
Episode 103, Reward: 15.0
Episode 104, Reward: 10.0
Episode 105, Reward: 10.0
Episode 106, Reward: 10.0
Episode 107, Reward: 9.0
Episode 108, Reward: 11.0


 12%|█▏        | 116/1000 [00:03<00:14, 60.76it/s]

Episode 109, Reward: 9.0
Episode 110, Reward: 9.0
Episode 111, Reward: 10.0
Episode 112, Reward: 9.0
Episode 113, Reward: 10.0
Episode 114, Reward: 9.0
Episode 115, Reward: 10.0
Episode 116, Reward: 10.0
Episode 117, Reward: 13.0
Episode 118, Reward: 10.0
Episode 119, Reward: 9.0
Episode 120, Reward: 9.0
Episode 121, Reward: 10.0


 13%|█▎        | 130/1000 [00:03<00:14, 60.87it/s]

Episode 122, Reward: 9.0
Episode 123, Reward: 10.0
Episode 124, Reward: 8.0
Episode 125, Reward: 9.0
Episode 126, Reward: 9.0
Episode 127, Reward: 9.0
Episode 128, Reward: 9.0
Episode 129, Reward: 8.0
Episode 130, Reward: 10.0
Episode 131, Reward: 11.0
Episode 132, Reward: 9.0
Episode 133, Reward: 9.0
Episode 134, Reward: 9.0
Episode 135, Reward: 8.0


 14%|█▍        | 144/1000 [00:03<00:14, 58.16it/s]

Episode 136, Reward: 10.0
Episode 137, Reward: 15.0
Episode 138, Reward: 9.0
Episode 139, Reward: 10.0
Episode 140, Reward: 12.0
Episode 141, Reward: 9.0
Episode 142, Reward: 10.0
Episode 143, Reward: 10.0
Episode 144, Reward: 10.0
Episode 145, Reward: 9.0
Episode 146, Reward: 11.0


 16%|█▌        | 158/1000 [00:03<00:13, 60.55it/s]

Episode 147, Reward: 10.0
Episode 148, Reward: 9.0
Episode 149, Reward: 9.0
Episode 150, Reward: 10.0
Episode 151, Reward: 10.0
Episode 152, Reward: 9.0
Episode 153, Reward: 10.0
Episode 154, Reward: 8.0
Episode 155, Reward: 9.0
Episode 156, Reward: 10.0
Episode 157, Reward: 9.0
Episode 158, Reward: 10.0
Episode 159, Reward: 11.0


 17%|█▋        | 172/1000 [00:04<00:13, 62.71it/s]

Episode 160, Reward: 11.0
Episode 161, Reward: 10.0
Episode 162, Reward: 10.0
Episode 163, Reward: 9.0
Episode 164, Reward: 11.0
Episode 165, Reward: 8.0
Episode 166, Reward: 9.0
Episode 167, Reward: 9.0
Episode 168, Reward: 10.0
Episode 169, Reward: 8.0
Episode 170, Reward: 10.0
Episode 171, Reward: 10.0
Episode 172, Reward: 9.0
Episode 173, Reward: 10.0


 19%|█▊        | 186/1000 [00:04<00:12, 63.00it/s]

Episode 174, Reward: 10.0
Episode 175, Reward: 9.0
Episode 176, Reward: 10.0
Episode 177, Reward: 10.0
Episode 178, Reward: 12.0
Episode 179, Reward: 10.0
Episode 180, Reward: 8.0
Episode 181, Reward: 10.0
Episode 182, Reward: 12.0
Episode 183, Reward: 10.0
Episode 184, Reward: 9.0
Episode 185, Reward: 9.0
Episode 186, Reward: 10.0


 19%|█▉        | 193/1000 [00:04<00:13, 61.08it/s]

Episode 187, Reward: 10.0
Episode 188, Reward: 9.0
Episode 189, Reward: 9.0
Episode 190, Reward: 10.0
Episode 191, Reward: 8.0
Episode 192, Reward: 10.0
Episode 193, Reward: 10.0
Episode 194, Reward: 9.0
Episode 195, Reward: 10.0
Episode 196, Reward: 9.0
Episode 197, Reward: 12.0
Episode 198, Reward: 10.0


 21%|██        | 206/1000 [00:04<00:15, 52.55it/s]

Episode 199, Reward: 10.0
Episode 200, Reward: 12.0
Episode 201, Reward: 8.0
Episode 202, Reward: 9.0
Episode 203, Reward: 12.0
Episode 204, Reward: 9.0
Episode 205, Reward: 10.0
Episode 206, Reward: 9.0


 21%|██        | 212/1000 [00:04<00:16, 48.82it/s]

Episode 207, Reward: 13.0
Episode 208, Reward: 11.0
Episode 209, Reward: 12.0
Episode 210, Reward: 12.0
Episode 211, Reward: 9.0
Episode 212, Reward: 9.0
Episode 213, Reward: 9.0
Episode 214, Reward: 9.0
Episode 215, Reward: 10.0
Episode 216, Reward: 9.0


 22%|██▏       | 222/1000 [00:05<00:15, 48.83it/s]

Episode 217, Reward: 11.0
Episode 218, Reward: 8.0
Episode 219, Reward: 11.0
Episode 220, Reward: 12.0
Episode 221, Reward: 10.0
Episode 222, Reward: 10.0
Episode 223, Reward: 9.0
Episode 224, Reward: 10.0
Episode 225, Reward: 9.0
Episode 226, Reward: 12.0


 23%|██▎       | 232/1000 [00:05<00:16, 47.04it/s]

Episode 227, Reward: 13.0
Episode 228, Reward: 14.0
Episode 229, Reward: 8.0
Episode 230, Reward: 12.0
Episode 231, Reward: 10.0
Episode 232, Reward: 8.0
Episode 233, Reward: 11.0
Episode 234, Reward: 11.0
Episode 235, Reward: 9.0
Episode 236, Reward: 9.0
Episode 237, Reward: 9.0


 24%|██▍       | 243/1000 [00:05<00:16, 45.86it/s]

Episode 238, Reward: 12.0
Episode 239, Reward: 16.0
Episode 240, Reward: 12.0
Episode 241, Reward: 13.0
Episode 242, Reward: 10.0
Episode 243, Reward: 11.0
Episode 244, Reward: 13.0
Episode 245, Reward: 11.0


 25%|██▍       | 248/1000 [00:05<00:17, 42.00it/s]

Episode 246, Reward: 11.0
Episode 247, Reward: 12.0
Episode 248, Reward: 13.0
Episode 249, Reward: 15.0
Episode 250, Reward: 13.0
Episode 251, Reward: 16.0


 25%|██▌       | 253/1000 [00:06<00:24, 30.90it/s]

Episode 252, Reward: 14.0
Episode 253, Reward: 20.0
Episode 254, Reward: 12.0


 26%|██▌       | 257/1000 [00:06<00:30, 24.16it/s]

Episode 255, Reward: 20.0
Episode 256, Reward: 12.0
Episode 257, Reward: 20.0


 26%|██▌       | 260/1000 [00:06<00:35, 21.03it/s]

Episode 258, Reward: 16.0
Episode 259, Reward: 13.0
Episode 260, Reward: 15.0
Episode 261, Reward: 23.0
Episode 262, Reward: 23.0


 27%|██▋       | 266/1000 [00:07<00:50, 14.49it/s]

Episode 263, Reward: 50.0
Episode 264, Reward: 19.0
Episode 265, Reward: 18.0
Episode 266, Reward: 26.0


 27%|██▋       | 268/1000 [00:07<00:53, 13.63it/s]

Episode 267, Reward: 21.0
Episode 268, Reward: 20.0
Episode 269, Reward: 24.0


 27%|██▋       | 270/1000 [00:07<00:51, 14.08it/s]

Episode 270, Reward: 12.0
Episode 271, Reward: 36.0


 27%|██▋       | 272/1000 [00:07<01:00, 12.08it/s]

Episode 272, Reward: 45.0
Episode 273, Reward: 53.0


 27%|██▋       | 274/1000 [00:08<01:13,  9.91it/s]

Episode 274, Reward: 33.0
Episode 275, Reward: 64.0


 28%|██▊       | 276/1000 [00:08<01:14,  9.68it/s]

Episode 276, Reward: 29.0
Episode 277, Reward: 19.0


 28%|██▊       | 281/1000 [00:08<01:07, 10.72it/s]

Episode 278, Reward: 56.0
Episode 279, Reward: 21.0
Episode 280, Reward: 20.0
Episode 281, Reward: 45.0
Episode 282, Reward: 27.0


 28%|██▊       | 284/1000 [00:08<00:58, 12.30it/s]

Episode 283, Reward: 33.0
Episode 284, Reward: 58.0
Episode 285, Reward: 22.0
Episode 286, Reward: 35.0


 29%|██▉       | 290/1000 [00:09<00:41, 16.97it/s]

Episode 287, Reward: 28.0
Episode 288, Reward: 20.0
Episode 289, Reward: 29.0
Episode 290, Reward: 26.0
Episode 291, Reward: 27.0


 29%|██▉       | 294/1000 [00:09<00:55, 12.77it/s]

Episode 292, Reward: 103.0
Episode 293, Reward: 62.0
Episode 294, Reward: 30.0
Episode 295, Reward: 26.0


 30%|██▉       | 298/1000 [00:09<00:53, 13.03it/s]

Episode 296, Reward: 64.0
Episode 297, Reward: 49.0
Episode 298, Reward: 43.0


 30%|███       | 300/1000 [00:10<00:51, 13.48it/s]

Episode 299, Reward: 42.0
Episode 300, Reward: 49.0
Episode 301, Reward: 82.0


 30%|███       | 305/1000 [00:10<00:44, 15.71it/s]

Episode 302, Reward: 19.0
Episode 303, Reward: 19.0
Episode 304, Reward: 21.0
Episode 305, Reward: 48.0


 31%|███       | 307/1000 [00:10<00:58, 11.92it/s]

Episode 306, Reward: 128.0
Episode 307, Reward: 66.0
Episode 308, Reward: 37.0


 31%|███       | 311/1000 [00:10<00:53, 12.85it/s]

Episode 309, Reward: 82.0
Episode 310, Reward: 39.0
Episode 311, Reward: 27.0
Episode 312, Reward: 16.0
Episode 313, Reward: 24.0


 32%|███▏      | 317/1000 [00:11<00:39, 17.27it/s]

Episode 314, Reward: 41.0
Episode 315, Reward: 15.0
Episode 316, Reward: 46.0
Episode 317, Reward: 32.0


 32%|███▏      | 319/1000 [00:11<00:44, 15.26it/s]

Episode 318, Reward: 89.0
Episode 319, Reward: 33.0
Episode 320, Reward: 100.0


 32%|███▏      | 323/1000 [00:11<01:04, 10.56it/s]

Episode 321, Reward: 144.0
Episode 322, Reward: 55.0
Episode 323, Reward: 69.0


 32%|███▎      | 325/1000 [00:12<01:07,  9.98it/s]

Episode 324, Reward: 87.0
Episode 325, Reward: 75.0
Episode 326, Reward: 34.0
Episode 327, Reward: 21.0


 33%|███▎      | 330/1000 [00:12<00:55, 12.01it/s]

Episode 328, Reward: 50.0
Episode 329, Reward: 82.0
Episode 330, Reward: 36.0


 33%|███▎      | 332/1000 [00:12<00:57, 11.65it/s]

Episode 331, Reward: 68.0
Episode 332, Reward: 59.0
Episode 333, Reward: 98.0


 33%|███▎      | 334/1000 [00:13<01:18,  8.46it/s]

Episode 334, Reward: 167.0


 34%|███▎      | 336/1000 [00:13<01:22,  8.04it/s]

Episode 335, Reward: 144.0
Episode 336, Reward: 50.0
Episode 337, Reward: 44.0


 34%|███▍      | 340/1000 [00:13<01:06,  9.89it/s]

Episode 338, Reward: 105.0
Episode 339, Reward: 39.0
Episode 340, Reward: 30.0


 34%|███▍      | 342/1000 [00:14<01:28,  7.47it/s]

Episode 341, Reward: 170.0
Episode 342, Reward: 112.0


 34%|███▍      | 344/1000 [00:14<01:35,  6.84it/s]

Episode 343, Reward: 153.0
Episode 344, Reward: 85.0


 35%|███▍      | 346/1000 [00:14<01:33,  6.97it/s]

Episode 345, Reward: 87.0
Episode 346, Reward: 101.0
Episode 347, Reward: 29.0


 35%|███▍      | 349/1000 [00:14<01:19,  8.14it/s]

Episode 348, Reward: 82.0
Episode 349, Reward: 94.0


 35%|███▌      | 351/1000 [00:15<01:21,  7.96it/s]

Episode 350, Reward: 93.0
Episode 351, Reward: 90.0


 35%|███▌      | 353/1000 [00:15<01:36,  6.72it/s]

Episode 352, Reward: 97.0
Episode 353, Reward: 142.0


 36%|███▌      | 355/1000 [00:15<01:35,  6.79it/s]

Episode 354, Reward: 117.0
Episode 355, Reward: 74.0
Episode 356, Reward: 54.0


 36%|███▌      | 358/1000 [00:16<01:19,  8.10it/s]

Episode 357, Reward: 93.0
Episode 358, Reward: 76.0
Episode 359, Reward: 29.0


 36%|███▌      | 361/1000 [00:16<01:17,  8.21it/s]

Episode 360, Reward: 116.0
Episode 361, Reward: 101.0


 36%|███▋      | 363/1000 [00:16<01:30,  7.02it/s]

Episode 362, Reward: 125.0
Episode 363, Reward: 97.0


 36%|███▋      | 365/1000 [00:17<01:07,  9.35it/s]

Episode 364, Reward: 37.0
Episode 365, Reward: 34.0


 37%|███▋      | 367/1000 [00:17<01:04,  9.78it/s]

Episode 366, Reward: 68.0
Episode 367, Reward: 16.0
Episode 368, Reward: 77.0


 37%|███▋      | 370/1000 [00:17<01:38,  6.40it/s]

Episode 369, Reward: 154.0
Episode 370, Reward: 108.0


 37%|███▋      | 371/1000 [00:18<01:38,  6.40it/s]

Episode 371, Reward: 79.0


 37%|███▋      | 372/1000 [00:18<02:22,  4.41it/s]

Episode 372, Reward: 212.0


 37%|███▋      | 373/1000 [00:18<02:21,  4.43it/s]

Episode 373, Reward: 95.0


 38%|███▊      | 375/1000 [00:19<02:02,  5.11it/s]

Episode 374, Reward: 105.0
Episode 375, Reward: 45.0
Episode 376, Reward: 28.0


 38%|███▊      | 379/1000 [00:19<01:10,  8.75it/s]

Episode 377, Reward: 43.0
Episode 378, Reward: 68.0
Episode 379, Reward: 43.0


 38%|███▊      | 381/1000 [00:19<01:20,  7.68it/s]

Episode 380, Reward: 101.0
Episode 381, Reward: 114.0
Episode 382, Reward: 45.0


 38%|███▊      | 383/1000 [00:20<01:32,  6.70it/s]

Episode 383, Reward: 202.0


 38%|███▊      | 384/1000 [00:20<01:40,  6.11it/s]

Episode 384, Reward: 156.0


 38%|███▊      | 385/1000 [00:20<01:57,  5.22it/s]

Episode 385, Reward: 203.0


 39%|███▊      | 386/1000 [00:20<02:10,  4.71it/s]

Episode 386, Reward: 188.0


 39%|███▉      | 388/1000 [00:21<02:10,  4.67it/s]

Episode 387, Reward: 167.0
Episode 388, Reward: 135.0


 39%|███▉      | 390/1000 [00:21<01:41,  6.00it/s]

Episode 389, Reward: 57.0
Episode 390, Reward: 80.0


 39%|███▉      | 392/1000 [00:21<01:38,  6.14it/s]

Episode 391, Reward: 152.0
Episode 392, Reward: 71.0


 39%|███▉      | 393/1000 [00:21<01:30,  6.69it/s]

Episode 393, Reward: 76.0


 40%|███▉      | 395/1000 [00:22<02:04,  4.86it/s]

Episode 394, Reward: 278.0
Episode 395, Reward: 115.0


 40%|███▉      | 396/1000 [00:22<02:08,  4.69it/s]

Episode 396, Reward: 133.0


 40%|███▉      | 398/1000 [00:23<02:06,  4.78it/s]

Episode 397, Reward: 175.0
Episode 398, Reward: 112.0


 40%|████      | 401/1000 [00:23<01:13,  8.16it/s]

Episode 399, Reward: 70.0
Episode 400, Reward: 42.0
Episode 401, Reward: 30.0
Episode 402, Reward: 26.0


 40%|████      | 403/1000 [00:23<00:59, 10.00it/s]

Episode 403, Reward: 59.0
Episode 404, Reward: 34.0
Episode 405, Reward: 22.0


 41%|████      | 406/1000 [00:23<00:54, 10.99it/s]

Episode 406, Reward: 99.0
Episode 407, Reward: 209.0


 41%|████      | 409/1000 [00:24<01:44,  5.65it/s]

Episode 408, Reward: 304.0
Episode 409, Reward: 120.0


 41%|████      | 411/1000 [00:24<01:34,  6.25it/s]

Episode 410, Reward: 114.0
Episode 411, Reward: 75.0


 41%|████▏     | 414/1000 [00:25<01:07,  8.64it/s]

Episode 412, Reward: 105.0
Episode 413, Reward: 27.0
Episode 414, Reward: 43.0
Episode 415, Reward: 33.0


 42%|████▏     | 418/1000 [00:25<00:48, 11.93it/s]

Episode 416, Reward: 37.0
Episode 417, Reward: 45.0
Episode 418, Reward: 48.0


 42%|████▏     | 420/1000 [00:25<00:48, 11.95it/s]

Episode 419, Reward: 54.0
Episode 420, Reward: 58.0
Episode 421, Reward: 32.0


 42%|████▏     | 422/1000 [00:25<00:46, 12.34it/s]

Episode 422, Reward: 57.0
Episode 423, Reward: 120.0


 42%|████▏     | 424/1000 [00:26<01:08,  8.36it/s]

Episode 424, Reward: 163.0
Episode 425, Reward: 106.0


 43%|████▎     | 427/1000 [00:26<01:10,  8.17it/s]

Episode 426, Reward: 65.0
Episode 427, Reward: 81.0


 43%|████▎     | 428/1000 [00:26<01:35,  6.00it/s]

Episode 428, Reward: 227.0


 43%|████▎     | 431/1000 [00:27<01:23,  6.79it/s]

Episode 429, Reward: 192.0
Episode 430, Reward: 18.0
Episode 431, Reward: 77.0


 43%|████▎     | 432/1000 [00:27<01:51,  5.10it/s]

Episode 432, Reward: 259.0


 43%|████▎     | 434/1000 [00:28<02:04,  4.55it/s]

Episode 433, Reward: 212.0
Episode 434, Reward: 130.0


 44%|████▎     | 435/1000 [00:28<02:02,  4.60it/s]

Episode 435, Reward: 144.0


 44%|████▎     | 436/1000 [00:28<02:00,  4.68it/s]

Episode 436, Reward: 135.0
Episode 437, Reward: 19.0


 44%|████▍     | 438/1000 [00:28<01:33,  6.00it/s]

Episode 438, Reward: 121.0
Episode 439, Reward: 40.0


 44%|████▍     | 440/1000 [00:29<01:45,  5.33it/s]

Episode 440, Reward: 223.0


 44%|████▍     | 442/1000 [00:29<01:50,  5.07it/s]

Episode 441, Reward: 185.0
Episode 442, Reward: 53.0


 44%|████▍     | 443/1000 [00:30<02:00,  4.63it/s]

Episode 443, Reward: 155.0


 44%|████▍     | 444/1000 [00:30<02:36,  3.55it/s]

Episode 444, Reward: 221.0


 44%|████▍     | 445/1000 [00:30<02:54,  3.17it/s]

Episode 445, Reward: 185.0


 45%|████▍     | 446/1000 [00:31<03:18,  2.79it/s]

Episode 446, Reward: 221.0


 45%|████▍     | 448/1000 [00:32<03:07,  2.94it/s]

Episode 447, Reward: 345.0
Episode 448, Reward: 144.0


 45%|████▍     | 449/1000 [00:32<02:38,  3.48it/s]

Episode 449, Reward: 112.0


 45%|████▌     | 452/1000 [00:32<01:42,  5.37it/s]

Episode 450, Reward: 177.0
Episode 451, Reward: 55.0
Episode 452, Reward: 36.0


 45%|████▌     | 454/1000 [00:32<01:26,  6.31it/s]

Episode 453, Reward: 48.0
Episode 454, Reward: 111.0


 46%|████▌     | 456/1000 [00:33<01:39,  5.45it/s]

Episode 455, Reward: 269.0
Episode 456, Reward: 63.0


 46%|████▌     | 457/1000 [00:33<01:32,  5.85it/s]

Episode 457, Reward: 91.0


 46%|████▌     | 458/1000 [00:33<01:38,  5.51it/s]

Episode 458, Reward: 142.0


 46%|████▌     | 459/1000 [00:34<01:58,  4.56it/s]

Episode 459, Reward: 229.0


 46%|████▌     | 460/1000 [00:34<02:07,  4.23it/s]

Episode 460, Reward: 192.0


 46%|████▌     | 462/1000 [00:34<02:25,  3.71it/s]

Episode 461, Reward: 361.0
Episode 462, Reward: 104.0


 46%|████▋     | 463/1000 [00:35<02:43,  3.28it/s]

Episode 463, Reward: 284.0


 46%|████▋     | 464/1000 [00:35<02:39,  3.37it/s]

Episode 464, Reward: 182.0


 46%|████▋     | 465/1000 [00:35<02:36,  3.42it/s]

Episode 465, Reward: 196.0


 47%|████▋     | 466/1000 [00:36<03:02,  2.92it/s]

Episode 466, Reward: 325.0


 47%|████▋     | 467/1000 [00:36<02:58,  2.98it/s]

Episode 467, Reward: 207.0
Episode 468, Reward: 64.0


 47%|████▋     | 469/1000 [00:37<02:33,  3.47it/s]

Episode 469, Reward: 265.0


 47%|████▋     | 470/1000 [00:37<02:49,  3.12it/s]

Episode 470, Reward: 291.0
Episode 471, Reward: 50.0


 47%|████▋     | 472/1000 [00:37<02:14,  3.92it/s]

Episode 472, Reward: 177.0


 47%|████▋     | 473/1000 [00:38<02:29,  3.52it/s]

Episode 473, Reward: 280.0


 47%|████▋     | 474/1000 [00:38<03:13,  2.72it/s]

Episode 474, Reward: 421.0


 48%|████▊     | 475/1000 [00:39<02:53,  3.03it/s]

Episode 475, Reward: 143.0


 48%|████▊     | 476/1000 [00:39<02:45,  3.17it/s]

Episode 476, Reward: 193.0


 48%|████▊     | 477/1000 [00:39<02:40,  3.26it/s]

Episode 477, Reward: 184.0


 48%|████▊     | 478/1000 [00:39<02:35,  3.35it/s]

Episode 478, Reward: 192.0


 48%|████▊     | 479/1000 [00:40<02:38,  3.28it/s]

Episode 479, Reward: 224.0


 48%|████▊     | 480/1000 [00:40<03:08,  2.76it/s]

Episode 480, Reward: 336.0


 48%|████▊     | 482/1000 [00:41<02:23,  3.62it/s]

Episode 481, Reward: 188.0
Episode 482, Reward: 86.0


 48%|████▊     | 483/1000 [00:41<02:42,  3.19it/s]

Episode 483, Reward: 206.0


 48%|████▊     | 484/1000 [00:41<02:50,  3.03it/s]

Episode 484, Reward: 197.0


 48%|████▊     | 485/1000 [00:42<03:12,  2.68it/s]

Episode 485, Reward: 254.0


 49%|████▊     | 487/1000 [00:43<03:52,  2.20it/s]

Episode 486, Reward: 488.0
Episode 487, Reward: 104.0


 49%|████▉     | 488/1000 [00:44<03:56,  2.17it/s]

Episode 488, Reward: 337.0


 49%|████▉     | 489/1000 [00:44<03:48,  2.24it/s]

Episode 489, Reward: 280.0


 49%|████▉     | 490/1000 [00:44<03:33,  2.39it/s]

Episode 490, Reward: 229.0


 49%|████▉     | 491/1000 [00:45<03:04,  2.76it/s]

Episode 491, Reward: 155.0


 49%|████▉     | 492/1000 [00:45<02:41,  3.14it/s]

Episode 492, Reward: 145.0


 49%|████▉     | 493/1000 [00:45<02:47,  3.02it/s]

Episode 493, Reward: 230.0


 49%|████▉     | 494/1000 [00:46<02:45,  3.06it/s]

Episode 494, Reward: 220.0


 50%|████▉     | 495/1000 [00:46<02:36,  3.24it/s]

Episode 495, Reward: 185.0


 50%|████▉     | 496/1000 [00:46<02:27,  3.41it/s]

Episode 496, Reward: 176.0


 50%|████▉     | 497/1000 [00:46<02:24,  3.48it/s]

Episode 497, Reward: 181.0


 50%|████▉     | 498/1000 [00:47<02:24,  3.48it/s]

Episode 498, Reward: 206.0


 50%|█████     | 500/1000 [00:47<02:05,  3.97it/s]

Episode 499, Reward: 195.0
Episode 500, Reward: 120.0


 50%|█████     | 501/1000 [00:47<02:16,  3.67it/s]

Episode 501, Reward: 206.0


 50%|█████     | 502/1000 [00:48<02:32,  3.27it/s]

Episode 502, Reward: 271.0


 50%|█████     | 503/1000 [00:48<02:35,  3.19it/s]

Episode 503, Reward: 226.0


 50%|█████     | 504/1000 [00:48<02:30,  3.30it/s]

Episode 504, Reward: 180.0


 50%|█████     | 505/1000 [00:49<02:42,  3.04it/s]

Episode 505, Reward: 273.0


 51%|█████     | 507/1000 [00:49<02:10,  3.78it/s]

Episode 506, Reward: 177.0
Episode 507, Reward: 104.0


 51%|█████     | 508/1000 [00:50<02:22,  3.46it/s]

Episode 508, Reward: 228.0


 51%|█████     | 510/1000 [00:50<01:53,  4.32it/s]

Episode 509, Reward: 169.0
Episode 510, Reward: 87.0


 51%|█████     | 511/1000 [00:50<01:52,  4.35it/s]

Episode 511, Reward: 149.0


 51%|█████     | 512/1000 [00:51<02:20,  3.46it/s]

Episode 512, Reward: 292.0


 51%|█████▏    | 513/1000 [00:51<02:22,  3.43it/s]

Episode 513, Reward: 207.0


 51%|█████▏    | 514/1000 [00:51<02:29,  3.24it/s]

Episode 514, Reward: 235.0


 52%|█████▏    | 516/1000 [00:52<01:52,  4.32it/s]

Episode 515, Reward: 156.0
Episode 516, Reward: 66.0


 52%|█████▏    | 517/1000 [00:52<01:51,  4.32it/s]

Episode 517, Reward: 162.0


 52%|█████▏    | 518/1000 [00:52<01:54,  4.22it/s]

Episode 518, Reward: 176.0


 52%|█████▏    | 519/1000 [00:52<02:21,  3.41it/s]

Episode 519, Reward: 264.0


 52%|█████▏    | 520/1000 [00:53<02:27,  3.25it/s]

Episode 520, Reward: 237.0


 52%|█████▏    | 521/1000 [00:53<02:28,  3.22it/s]

Episode 521, Reward: 185.0


 52%|█████▏    | 522/1000 [00:54<03:35,  2.21it/s]

Episode 522, Reward: 413.0


 52%|█████▏    | 523/1000 [00:54<03:31,  2.26it/s]

Episode 523, Reward: 202.0


 52%|█████▏    | 524/1000 [00:55<04:21,  1.82it/s]

Episode 524, Reward: 378.0


 52%|█████▎    | 525/1000 [00:56<04:02,  1.96it/s]

Episode 525, Reward: 284.0


 53%|█████▎    | 526/1000 [00:56<03:44,  2.11it/s]

Episode 526, Reward: 271.0


 53%|█████▎    | 527/1000 [00:56<03:07,  2.53it/s]

Episode 527, Reward: 135.0


 53%|█████▎    | 528/1000 [00:57<03:25,  2.29it/s]

Episode 528, Reward: 257.0


 53%|█████▎    | 529/1000 [00:58<04:28,  1.75it/s]

Episode 529, Reward: 500.0


 53%|█████▎    | 530/1000 [00:58<03:46,  2.07it/s]

Episode 530, Reward: 183.0


 53%|█████▎    | 532/1000 [00:58<02:59,  2.61it/s]

Episode 531, Reward: 356.0
Episode 532, Reward: 91.0


 53%|█████▎    | 533/1000 [00:59<03:02,  2.56it/s]

Episode 533, Reward: 281.0


 53%|█████▎    | 534/1000 [00:59<03:00,  2.58it/s]

Episode 534, Reward: 258.0


 54%|█████▎    | 535/1000 [01:00<02:52,  2.70it/s]

Episode 535, Reward: 224.0


 54%|█████▎    | 536/1000 [01:00<02:42,  2.85it/s]

Episode 536, Reward: 208.0


 54%|█████▎    | 537/1000 [01:00<02:41,  2.87it/s]

Episode 537, Reward: 220.0


 54%|█████▍    | 538/1000 [01:01<02:29,  3.08it/s]

Episode 538, Reward: 184.0


 54%|█████▍    | 539/1000 [01:01<02:43,  2.82it/s]

Episode 539, Reward: 297.0


 54%|█████▍    | 540/1000 [01:01<02:35,  2.95it/s]

Episode 540, Reward: 201.0


 54%|█████▍    | 541/1000 [01:02<02:51,  2.68it/s]

Episode 541, Reward: 318.0


 54%|█████▍    | 542/1000 [01:02<03:01,  2.52it/s]

Episode 542, Reward: 312.0


 54%|█████▍    | 543/1000 [01:03<02:58,  2.56it/s]

Episode 543, Reward: 265.0


 54%|█████▍    | 544/1000 [01:03<03:25,  2.22it/s]

Episode 544, Reward: 411.0


 55%|█████▍    | 545/1000 [01:04<03:41,  2.06it/s]

Episode 545, Reward: 392.0


 55%|█████▍    | 546/1000 [01:04<03:11,  2.37it/s]

Episode 546, Reward: 185.0


 55%|█████▍    | 547/1000 [01:04<03:10,  2.37it/s]

Episode 547, Reward: 290.0


 55%|█████▍    | 548/1000 [01:05<02:45,  2.74it/s]

Episode 548, Reward: 161.0


 55%|█████▍    | 549/1000 [01:05<02:26,  3.09it/s]

Episode 549, Reward: 150.0


 55%|█████▌    | 550/1000 [01:05<03:00,  2.49it/s]

Episode 550, Reward: 321.0


 55%|█████▌    | 551/1000 [01:06<03:40,  2.04it/s]

Episode 551, Reward: 378.0


 55%|█████▌    | 552/1000 [01:07<04:59,  1.50it/s]

Episode 552, Reward: 500.0


 55%|█████▌    | 553/1000 [01:08<05:06,  1.46it/s]

Episode 553, Reward: 500.0


 55%|█████▌    | 554/1000 [01:08<04:25,  1.68it/s]

Episode 554, Reward: 255.0


 56%|█████▌    | 555/1000 [01:09<04:41,  1.58it/s]

Episode 555, Reward: 500.0


 56%|█████▌    | 556/1000 [01:09<04:19,  1.71it/s]

Episode 556, Reward: 323.0


 56%|█████▌    | 557/1000 [01:10<04:04,  1.81it/s]

Episode 557, Reward: 321.0


 56%|█████▌    | 558/1000 [01:11<04:02,  1.83it/s]

Episode 558, Reward: 361.0


 56%|█████▌    | 559/1000 [01:11<03:43,  1.97it/s]

Episode 559, Reward: 271.0


 56%|█████▌    | 560/1000 [01:11<03:34,  2.05it/s]

Episode 560, Reward: 301.0


 56%|█████▌    | 561/1000 [01:12<03:55,  1.86it/s]

Episode 561, Reward: 450.0


 56%|█████▌    | 562/1000 [01:12<03:42,  1.97it/s]

Episode 562, Reward: 293.0


 56%|█████▋    | 563/1000 [01:13<03:45,  1.94it/s]

Episode 563, Reward: 360.0


 56%|█████▋    | 564/1000 [01:13<03:40,  1.98it/s]

Episode 564, Reward: 328.0


 56%|█████▋    | 565/1000 [01:14<03:09,  2.30it/s]

Episode 565, Reward: 180.0


 57%|█████▋    | 566/1000 [01:14<03:27,  2.10it/s]

Episode 566, Reward: 387.0


 57%|█████▋    | 567/1000 [01:15<03:19,  2.17it/s]

Episode 567, Reward: 279.0


 57%|█████▋    | 568/1000 [01:15<03:20,  2.16it/s]

Episode 568, Reward: 316.0


 57%|█████▋    | 569/1000 [01:16<03:12,  2.24it/s]

Episode 569, Reward: 281.0


 57%|█████▋    | 570/1000 [01:16<02:56,  2.43it/s]

Episode 570, Reward: 224.0


 57%|█████▋    | 571/1000 [01:16<02:36,  2.74it/s]

Episode 571, Reward: 165.0


 57%|█████▋    | 572/1000 [01:17<02:34,  2.76it/s]

Episode 572, Reward: 248.0


 57%|█████▋    | 573/1000 [01:17<02:15,  3.15it/s]

Episode 573, Reward: 142.0


 57%|█████▋    | 574/1000 [01:17<02:15,  3.15it/s]

Episode 574, Reward: 216.0


 57%|█████▊    | 575/1000 [01:17<02:10,  3.25it/s]

Episode 575, Reward: 132.0


 58%|█████▊    | 576/1000 [01:18<02:25,  2.91it/s]

Episode 576, Reward: 233.0


 58%|█████▊    | 577/1000 [01:18<02:35,  2.72it/s]

Episode 577, Reward: 211.0


 58%|█████▊    | 578/1000 [01:19<02:48,  2.51it/s]

Episode 578, Reward: 220.0


 58%|█████▊    | 580/1000 [01:19<02:23,  2.93it/s]

Episode 579, Reward: 203.0
Episode 580, Reward: 119.0


 58%|█████▊    | 581/1000 [01:20<02:06,  3.30it/s]

Episode 581, Reward: 144.0


 58%|█████▊    | 582/1000 [01:20<02:04,  3.36it/s]

Episode 582, Reward: 196.0


 58%|█████▊    | 584/1000 [01:20<01:52,  3.68it/s]

Episode 583, Reward: 293.0
Episode 584, Reward: 72.0


 58%|█████▊    | 585/1000 [01:21<02:04,  3.33it/s]

Episode 585, Reward: 260.0


 59%|█████▊    | 586/1000 [01:21<02:27,  2.80it/s]

Episode 586, Reward: 317.0


 59%|█████▊    | 587/1000 [01:22<02:29,  2.76it/s]

Episode 587, Reward: 261.0


 59%|█████▉    | 588/1000 [01:22<02:38,  2.60it/s]

Episode 588, Reward: 305.0


 59%|█████▉    | 589/1000 [01:22<02:30,  2.73it/s]

Episode 589, Reward: 211.0


 59%|█████▉    | 591/1000 [01:23<02:28,  2.76it/s]

Episode 590, Reward: 436.0
Episode 591, Reward: 103.0


 59%|█████▉    | 593/1000 [01:24<02:32,  2.67it/s]

Episode 592, Reward: 484.0
Episode 593, Reward: 113.0


 59%|█████▉    | 594/1000 [01:24<02:27,  2.75it/s]

Episode 594, Reward: 232.0


 60%|█████▉    | 596/1000 [01:25<01:54,  3.54it/s]

Episode 595, Reward: 199.0
Episode 596, Reward: 103.0


 60%|█████▉    | 597/1000 [01:25<02:39,  2.52it/s]

Episode 597, Reward: 463.0


 60%|█████▉    | 598/1000 [01:26<03:15,  2.05it/s]

Episode 598, Reward: 500.0


 60%|█████▉    | 599/1000 [01:27<02:58,  2.24it/s]

Episode 599, Reward: 231.0


 60%|██████    | 600/1000 [01:27<02:39,  2.51it/s]

Episode 600, Reward: 197.0


 60%|██████    | 602/1000 [01:28<02:29,  2.65it/s]

Episode 601, Reward: 405.0
Episode 602, Reward: 125.0


 60%|██████    | 603/1000 [01:28<02:19,  2.84it/s]

Episode 603, Reward: 206.0


 60%|██████    | 605/1000 [01:28<01:40,  3.93it/s]

Episode 604, Reward: 165.0
Episode 605, Reward: 64.0


 61%|██████    | 606/1000 [01:29<01:45,  3.72it/s]

Episode 606, Reward: 203.0


 61%|██████    | 607/1000 [01:29<02:12,  2.96it/s]

Episode 607, Reward: 349.0


 61%|██████    | 608/1000 [01:29<02:20,  2.80it/s]

Episode 608, Reward: 189.0


 61%|██████    | 609/1000 [01:30<03:29,  1.87it/s]

Episode 609, Reward: 500.0


 61%|██████    | 610/1000 [01:31<03:51,  1.68it/s]

Episode 610, Reward: 326.0


 61%|██████    | 611/1000 [01:32<03:36,  1.79it/s]

Episode 611, Reward: 316.0


 61%|██████    | 612/1000 [01:32<03:56,  1.64it/s]

Episode 612, Reward: 500.0


 61%|██████▏   | 613/1000 [01:33<03:31,  1.83it/s]

Episode 613, Reward: 274.0


 61%|██████▏   | 614/1000 [01:33<03:40,  1.75it/s]

Episode 614, Reward: 438.0


 62%|██████▏   | 615/1000 [01:34<03:02,  2.11it/s]

Episode 615, Reward: 158.0


 62%|██████▏   | 616/1000 [01:34<02:38,  2.43it/s]

Episode 616, Reward: 185.0


 62%|██████▏   | 617/1000 [01:34<02:24,  2.65it/s]

Episode 617, Reward: 193.0


 62%|██████▏   | 618/1000 [01:35<02:22,  2.68it/s]

Episode 618, Reward: 248.0


 62%|██████▏   | 619/1000 [01:35<02:17,  2.77it/s]

Episode 619, Reward: 233.0


 62%|██████▏   | 620/1000 [01:35<02:13,  2.85it/s]

Episode 620, Reward: 224.0


 62%|██████▏   | 622/1000 [01:36<01:48,  3.47it/s]

Episode 621, Reward: 186.0
Episode 622, Reward: 129.0


 62%|██████▏   | 624/1000 [01:36<01:25,  4.38it/s]

Episode 623, Reward: 130.0
Episode 624, Reward: 107.0


 62%|██████▎   | 625/1000 [01:36<01:23,  4.50it/s]

Episode 625, Reward: 134.0


 63%|██████▎   | 626/1000 [01:36<01:26,  4.32it/s]

Episode 626, Reward: 173.0
Episode 627, Reward: 45.0


 63%|██████▎   | 628/1000 [01:37<01:22,  4.52it/s]

Episode 628, Reward: 247.0


 63%|██████▎   | 629/1000 [01:37<01:20,  4.61it/s]

Episode 629, Reward: 141.0


 63%|██████▎   | 630/1000 [01:37<01:35,  3.89it/s]

Episode 630, Reward: 250.0


 63%|██████▎   | 631/1000 [01:38<01:36,  3.82it/s]

Episode 631, Reward: 182.0


 63%|██████▎   | 632/1000 [01:38<02:24,  2.56it/s]

Episode 632, Reward: 500.0


 63%|██████▎   | 633/1000 [01:39<02:57,  2.07it/s]

Episode 633, Reward: 500.0


 63%|██████▎   | 634/1000 [01:39<02:36,  2.35it/s]

Episode 634, Reward: 184.0


 64%|██████▎   | 635/1000 [01:40<02:31,  2.40it/s]

Episode 635, Reward: 266.0


 64%|██████▎   | 636/1000 [01:40<02:17,  2.65it/s]

Episode 636, Reward: 190.0


 64%|██████▎   | 637/1000 [01:40<02:15,  2.68it/s]

Episode 637, Reward: 245.0


 64%|██████▍   | 638/1000 [01:41<02:23,  2.53it/s]

Episode 638, Reward: 305.0


 64%|██████▍   | 639/1000 [01:41<02:21,  2.55it/s]

Episode 639, Reward: 210.0


 64%|██████▍   | 640/1000 [01:42<02:33,  2.34it/s]

Episode 640, Reward: 260.0


 64%|██████▍   | 641/1000 [01:42<02:22,  2.51it/s]

Episode 641, Reward: 175.0


 64%|██████▍   | 642/1000 [01:43<02:20,  2.54it/s]

Episode 642, Reward: 171.0


 64%|██████▍   | 643/1000 [01:43<02:18,  2.58it/s]

Episode 643, Reward: 166.0


 64%|██████▍   | 644/1000 [01:43<02:27,  2.41it/s]

Episode 644, Reward: 240.0


 64%|██████▍   | 645/1000 [01:44<02:09,  2.75it/s]

Episode 645, Reward: 167.0


 65%|██████▍   | 646/1000 [01:44<02:06,  2.79it/s]

Episode 646, Reward: 235.0


 65%|██████▍   | 647/1000 [01:44<01:56,  3.04it/s]

Episode 647, Reward: 172.0


 65%|██████▍   | 648/1000 [01:44<01:43,  3.40it/s]

Episode 648, Reward: 136.0


 65%|██████▍   | 649/1000 [01:45<01:44,  3.37it/s]

Episode 649, Reward: 210.0


 65%|██████▌   | 650/1000 [01:45<01:54,  3.06it/s]

Episode 650, Reward: 259.0


 65%|██████▌   | 651/1000 [01:46<02:10,  2.68it/s]

Episode 651, Reward: 320.0


 65%|██████▌   | 652/1000 [01:46<02:08,  2.70it/s]

Episode 652, Reward: 244.0


 65%|██████▌   | 653/1000 [01:46<02:17,  2.52it/s]

Episode 653, Reward: 308.0


 65%|██████▌   | 654/1000 [01:47<02:22,  2.42it/s]

Episode 654, Reward: 300.0


 66%|██████▌   | 655/1000 [01:48<02:44,  2.09it/s]

Episode 655, Reward: 442.0


 66%|██████▌   | 656/1000 [01:48<03:10,  1.81it/s]

Episode 656, Reward: 500.0


 66%|██████▌   | 657/1000 [01:49<03:08,  1.82it/s]

Episode 657, Reward: 366.0


 66%|██████▌   | 658/1000 [01:49<02:45,  2.06it/s]

Episode 658, Reward: 218.0


 66%|██████▌   | 659/1000 [01:49<02:24,  2.35it/s]

Episode 659, Reward: 191.0


 66%|██████▌   | 660/1000 [01:50<02:19,  2.43it/s]

Episode 660, Reward: 263.0


 66%|██████▌   | 661/1000 [01:50<02:15,  2.51it/s]

Episode 661, Reward: 246.0


 66%|██████▌   | 662/1000 [01:51<02:34,  2.18it/s]

Episode 662, Reward: 422.0


 66%|██████▋   | 663/1000 [01:51<02:28,  2.28it/s]

Episode 663, Reward: 262.0


 66%|██████▋   | 664/1000 [01:52<02:33,  2.19it/s]

Episode 664, Reward: 350.0


 66%|██████▋   | 665/1000 [01:52<02:38,  2.11it/s]

Episode 665, Reward: 342.0


 67%|██████▋   | 666/1000 [01:53<03:00,  1.85it/s]

Episode 666, Reward: 464.0


 67%|██████▋   | 667/1000 [01:53<02:50,  1.95it/s]

Episode 667, Reward: 267.0


 67%|██████▋   | 668/1000 [01:54<02:41,  2.06it/s]

Episode 668, Reward: 203.0


 67%|██████▋   | 669/1000 [01:54<02:55,  1.88it/s]

Episode 669, Reward: 316.0


 67%|██████▋   | 670/1000 [01:55<02:56,  1.87it/s]

Episode 670, Reward: 246.0


 67%|██████▋   | 671/1000 [01:56<03:08,  1.74it/s]

Episode 671, Reward: 346.0


 67%|██████▋   | 672/1000 [01:56<03:04,  1.78it/s]

Episode 672, Reward: 356.0


 67%|██████▋   | 673/1000 [01:57<03:01,  1.80it/s]

Episode 673, Reward: 371.0


 67%|██████▋   | 674/1000 [01:57<03:18,  1.64it/s]

Episode 674, Reward: 500.0


 68%|██████▊   | 675/1000 [01:58<03:16,  1.65it/s]

Episode 675, Reward: 395.0


 68%|██████▊   | 676/1000 [01:58<02:56,  1.84it/s]

Episode 676, Reward: 269.0


 68%|██████▊   | 677/1000 [01:59<02:54,  1.85it/s]

Episode 677, Reward: 366.0


 68%|██████▊   | 678/1000 [02:00<02:58,  1.80it/s]

Episode 678, Reward: 386.0


 68%|██████▊   | 679/1000 [02:00<02:46,  1.93it/s]

Episode 679, Reward: 284.0


 68%|██████▊   | 680/1000 [02:01<02:58,  1.79it/s]

Episode 680, Reward: 445.0


 68%|██████▊   | 681/1000 [02:01<02:51,  1.86it/s]

Episode 681, Reward: 328.0


 68%|██████▊   | 682/1000 [02:02<02:40,  1.99it/s]

Episode 682, Reward: 272.0
Episode 683, Reward: 30.0


 68%|██████▊   | 684/1000 [02:02<01:52,  2.80it/s]

Episode 684, Reward: 219.0


 69%|██████▊   | 686/1000 [02:03<01:50,  2.85it/s]

Episode 685, Reward: 393.0
Episode 686, Reward: 124.0


 69%|██████▊   | 687/1000 [02:03<01:48,  2.88it/s]

Episode 687, Reward: 228.0


 69%|██████▉   | 688/1000 [02:03<01:49,  2.85it/s]

Episode 688, Reward: 225.0
Episode 689, Reward: 37.0


 69%|██████▉   | 690/1000 [02:04<01:17,  4.02it/s]

Episode 690, Reward: 122.0


 69%|██████▉   | 691/1000 [02:04<01:15,  4.09it/s]

Episode 691, Reward: 155.0


 69%|██████▉   | 692/1000 [02:04<01:25,  3.62it/s]

Episode 692, Reward: 226.0


 69%|██████▉   | 693/1000 [02:05<01:28,  3.47it/s]

Episode 693, Reward: 204.0


 69%|██████▉   | 694/1000 [02:05<01:34,  3.25it/s]

Episode 694, Reward: 245.0


 70%|██████▉   | 695/1000 [02:05<01:41,  3.01it/s]

Episode 695, Reward: 268.0


 70%|██████▉   | 696/1000 [02:06<01:57,  2.59it/s]

Episode 696, Reward: 254.0


 70%|██████▉   | 697/1000 [02:06<02:19,  2.17it/s]

Episode 697, Reward: 331.0


 70%|██████▉   | 698/1000 [02:07<02:23,  2.10it/s]

Episode 698, Reward: 234.0


 70%|██████▉   | 699/1000 [02:07<02:27,  2.04it/s]

Episode 699, Reward: 232.0


 70%|███████   | 700/1000 [02:08<02:28,  2.01it/s]

Episode 700, Reward: 351.0


 70%|███████   | 701/1000 [02:08<02:14,  2.23it/s]

Episode 701, Reward: 222.0


 70%|███████   | 702/1000 [02:09<02:13,  2.23it/s]

Episode 702, Reward: 298.0


 70%|███████   | 703/1000 [02:09<02:21,  2.10it/s]

Episode 703, Reward: 367.0


 70%|███████   | 704/1000 [02:10<02:14,  2.20it/s]

Episode 704, Reward: 259.0


 70%|███████   | 705/1000 [02:10<02:37,  1.87it/s]

Episode 705, Reward: 500.0


 71%|███████   | 706/1000 [02:11<02:15,  2.16it/s]

Episode 706, Reward: 193.0


 71%|███████   | 707/1000 [02:11<02:07,  2.29it/s]

Episode 707, Reward: 254.0


 71%|███████   | 708/1000 [02:12<02:33,  1.90it/s]

Episode 708, Reward: 500.0


 71%|███████   | 709/1000 [02:12<02:45,  1.76it/s]

Episode 709, Reward: 461.0


 71%|███████   | 710/1000 [02:13<03:00,  1.60it/s]

Episode 710, Reward: 500.0


 71%|███████   | 711/1000 [02:14<02:51,  1.68it/s]

Episode 711, Reward: 354.0


 71%|███████   | 712/1000 [02:14<03:02,  1.58it/s]

Episode 712, Reward: 500.0


 71%|███████▏  | 713/1000 [02:15<03:13,  1.48it/s]

Episode 713, Reward: 500.0


 71%|███████▏  | 714/1000 [02:16<03:18,  1.44it/s]

Episode 714, Reward: 500.0


 72%|███████▏  | 715/1000 [02:17<03:16,  1.45it/s]

Episode 715, Reward: 451.0


 72%|███████▏  | 716/1000 [02:17<03:14,  1.46it/s]

Episode 716, Reward: 458.0


 72%|███████▏  | 717/1000 [02:18<03:36,  1.31it/s]

Episode 717, Reward: 500.0


 72%|███████▏  | 718/1000 [02:19<03:09,  1.49it/s]

Episode 718, Reward: 207.0


 72%|███████▏  | 719/1000 [02:19<02:47,  1.68it/s]

Episode 719, Reward: 191.0


 72%|███████▏  | 720/1000 [02:20<02:58,  1.57it/s]

Episode 720, Reward: 420.0


 72%|███████▏  | 721/1000 [02:21<03:04,  1.51it/s]

Episode 721, Reward: 500.0


 72%|███████▏  | 722/1000 [02:21<03:09,  1.47it/s]

Episode 722, Reward: 500.0


 72%|███████▏  | 723/1000 [02:22<03:13,  1.43it/s]

Episode 723, Reward: 500.0


 72%|███████▏  | 724/1000 [02:23<02:54,  1.58it/s]

Episode 724, Reward: 318.0


 72%|███████▎  | 725/1000 [02:23<03:04,  1.49it/s]

Episode 725, Reward: 500.0


 73%|███████▎  | 726/1000 [02:24<03:08,  1.45it/s]

Episode 726, Reward: 500.0


 73%|███████▎  | 727/1000 [02:25<03:11,  1.43it/s]

Episode 727, Reward: 500.0


 73%|███████▎  | 728/1000 [02:25<02:52,  1.58it/s]

Episode 728, Reward: 311.0


 73%|███████▎  | 729/1000 [02:26<02:34,  1.75it/s]

Episode 729, Reward: 281.0


 73%|███████▎  | 730/1000 [02:26<02:47,  1.61it/s]

Episode 730, Reward: 500.0


 73%|███████▎  | 731/1000 [02:27<02:30,  1.79it/s]

Episode 731, Reward: 284.0


 73%|███████▎  | 732/1000 [02:27<02:23,  1.87it/s]

Episode 732, Reward: 312.0


 73%|███████▎  | 733/1000 [02:28<02:50,  1.56it/s]

Episode 733, Reward: 500.0


 73%|███████▎  | 734/1000 [02:29<02:58,  1.49it/s]

Episode 734, Reward: 500.0


 74%|███████▎  | 735/1000 [02:30<02:50,  1.56it/s]

Episode 735, Reward: 392.0


 74%|███████▎  | 736/1000 [02:30<02:21,  1.86it/s]

Episode 736, Reward: 130.0


 74%|███████▎  | 737/1000 [02:30<02:24,  1.82it/s]

Episode 737, Reward: 299.0


 74%|███████▍  | 738/1000 [02:31<02:45,  1.59it/s]

Episode 738, Reward: 366.0


 74%|███████▍  | 739/1000 [02:31<02:12,  1.98it/s]

Episode 739, Reward: 89.0


 74%|███████▍  | 740/1000 [02:32<02:10,  1.99it/s]

Episode 740, Reward: 300.0


 74%|███████▍  | 741/1000 [02:32<02:01,  2.13it/s]

Episode 741, Reward: 255.0


 74%|███████▍  | 742/1000 [02:33<01:54,  2.24it/s]

Episode 742, Reward: 267.0


 74%|███████▍  | 743/1000 [02:33<01:40,  2.55it/s]

Episode 743, Reward: 180.0


 74%|███████▍  | 744/1000 [02:33<01:33,  2.74it/s]

Episode 744, Reward: 196.0


 74%|███████▍  | 745/1000 [02:34<01:25,  2.98it/s]

Episode 745, Reward: 183.0


 75%|███████▍  | 746/1000 [02:34<01:28,  2.88it/s]

Episode 746, Reward: 251.0


 75%|███████▍  | 747/1000 [02:34<01:38,  2.56it/s]

Episode 747, Reward: 324.0


 75%|███████▍  | 749/1000 [02:35<01:18,  3.20it/s]

Episode 748, Reward: 218.0
Episode 749, Reward: 121.0


 75%|███████▌  | 750/1000 [02:35<01:20,  3.12it/s]

Episode 750, Reward: 224.0


 75%|███████▌  | 751/1000 [02:36<01:24,  2.96it/s]

Episode 751, Reward: 256.0


 75%|███████▌  | 752/1000 [02:36<01:31,  2.70it/s]

Episode 752, Reward: 297.0


 75%|███████▌  | 753/1000 [02:36<01:27,  2.82it/s]

Episode 753, Reward: 201.0


 75%|███████▌  | 754/1000 [02:37<01:29,  2.76it/s]

Episode 754, Reward: 260.0


 76%|███████▌  | 755/1000 [02:37<01:40,  2.43it/s]

Episode 755, Reward: 355.0


 76%|███████▌  | 756/1000 [02:38<02:02,  1.99it/s]

Episode 756, Reward: 496.0


 76%|███████▌  | 757/1000 [02:38<01:51,  2.18it/s]

Episode 757, Reward: 211.0


 76%|███████▌  | 758/1000 [02:39<02:10,  1.86it/s]

Episode 758, Reward: 500.0


 76%|███████▌  | 759/1000 [02:40<02:24,  1.66it/s]

Episode 759, Reward: 500.0


 76%|███████▌  | 760/1000 [02:41<02:33,  1.57it/s]

Episode 760, Reward: 500.0


 76%|███████▌  | 761/1000 [02:41<02:32,  1.57it/s]

Episode 761, Reward: 422.0


 76%|███████▌  | 762/1000 [02:42<02:49,  1.40it/s]

Episode 762, Reward: 500.0


 76%|███████▋  | 763/1000 [02:43<03:09,  1.25it/s]

Episode 763, Reward: 484.0


 76%|███████▋  | 764/1000 [02:44<03:16,  1.20it/s]

Episode 764, Reward: 500.0


 76%|███████▋  | 765/1000 [02:44<02:50,  1.38it/s]

Episode 765, Reward: 312.0


 77%|███████▋  | 766/1000 [02:45<02:46,  1.40it/s]

Episode 766, Reward: 473.0


 77%|███████▋  | 767/1000 [02:46<02:30,  1.54it/s]

Episode 767, Reward: 328.0


 77%|███████▋  | 768/1000 [02:46<02:12,  1.75it/s]

Episode 768, Reward: 265.0


 77%|███████▋  | 769/1000 [02:47<02:24,  1.60it/s]

Episode 769, Reward: 500.0


 77%|███████▋  | 770/1000 [02:47<02:10,  1.76it/s]

Episode 770, Reward: 282.0


 77%|███████▋  | 771/1000 [02:48<02:05,  1.82it/s]

Episode 771, Reward: 340.0


 77%|███████▋  | 772/1000 [02:48<01:53,  2.01it/s]

Episode 772, Reward: 254.0


 77%|███████▋  | 773/1000 [02:49<01:45,  2.15it/s]

Episode 773, Reward: 267.0


 77%|███████▋  | 774/1000 [02:49<01:47,  2.10it/s]

Episode 774, Reward: 340.0


 78%|███████▊  | 775/1000 [02:49<01:39,  2.25it/s]

Episode 775, Reward: 246.0


 78%|███████▊  | 776/1000 [02:50<01:39,  2.26it/s]

Episode 776, Reward: 292.0


 78%|███████▊  | 778/1000 [02:51<01:29,  2.49it/s]

Episode 777, Reward: 419.0
Episode 778, Reward: 115.0


 78%|███████▊  | 779/1000 [02:51<01:50,  2.00it/s]

Episode 779, Reward: 500.0


 78%|███████▊  | 780/1000 [02:52<02:05,  1.75it/s]

Episode 780, Reward: 500.0


 78%|███████▊  | 781/1000 [02:53<02:14,  1.62it/s]

Episode 781, Reward: 500.0


 78%|███████▊  | 782/1000 [02:53<02:08,  1.69it/s]

Episode 782, Reward: 338.0


 78%|███████▊  | 783/1000 [02:54<01:54,  1.90it/s]

Episode 783, Reward: 226.0


 78%|███████▊  | 784/1000 [02:55<02:21,  1.53it/s]

Episode 784, Reward: 498.0


 78%|███████▊  | 785/1000 [02:56<02:46,  1.29it/s]

Episode 785, Reward: 500.0


 79%|███████▊  | 786/1000 [02:56<02:38,  1.35it/s]

Episode 786, Reward: 463.0


 79%|███████▊  | 787/1000 [02:57<02:37,  1.36it/s]

Episode 787, Reward: 500.0


 79%|███████▉  | 788/1000 [02:58<02:29,  1.42it/s]

Episode 788, Reward: 428.0


 79%|███████▉  | 789/1000 [02:58<02:28,  1.42it/s]

Episode 789, Reward: 471.0


 79%|███████▉  | 790/1000 [02:59<02:02,  1.71it/s]

Episode 790, Reward: 203.0


 79%|███████▉  | 791/1000 [02:59<02:03,  1.70it/s]

Episode 791, Reward: 409.0


 79%|███████▉  | 792/1000 [03:00<02:10,  1.60it/s]

Episode 792, Reward: 500.0


 79%|███████▉  | 793/1000 [03:01<02:13,  1.56it/s]

Episode 793, Reward: 475.0


 79%|███████▉  | 794/1000 [03:01<02:16,  1.50it/s]

Episode 794, Reward: 486.0


 80%|███████▉  | 795/1000 [03:02<02:06,  1.62it/s]

Episode 795, Reward: 347.0


 80%|███████▉  | 796/1000 [03:02<01:59,  1.71it/s]

Episode 796, Reward: 343.0


 80%|███████▉  | 797/1000 [03:03<01:49,  1.85it/s]

Episode 797, Reward: 293.0


 80%|███████▉  | 798/1000 [03:03<01:40,  2.02it/s]

Episode 798, Reward: 265.0


 80%|███████▉  | 799/1000 [03:04<01:41,  1.99it/s]

Episode 799, Reward: 362.0


 80%|████████  | 800/1000 [03:04<01:31,  2.18it/s]

Episode 800, Reward: 235.0


 80%|████████  | 801/1000 [03:04<01:16,  2.59it/s]

Episode 801, Reward: 152.0


 80%|████████  | 802/1000 [03:05<01:18,  2.51it/s]

Episode 802, Reward: 306.0


 80%|████████  | 803/1000 [03:05<01:31,  2.16it/s]

Episode 803, Reward: 428.0


 80%|████████  | 804/1000 [03:06<01:57,  1.67it/s]

Episode 804, Reward: 500.0


 81%|████████  | 806/1000 [03:08<01:51,  1.74it/s]

Episode 805, Reward: 500.0
Episode 806, Reward: 99.0


 81%|████████  | 807/1000 [03:08<01:58,  1.62it/s]

Episode 807, Reward: 500.0


 81%|████████  | 808/1000 [03:09<02:04,  1.54it/s]

Episode 808, Reward: 500.0


 81%|████████  | 809/1000 [03:09<01:48,  1.76it/s]

Episode 809, Reward: 249.0


 81%|████████  | 810/1000 [03:10<01:56,  1.63it/s]

Episode 810, Reward: 494.0


 81%|████████  | 811/1000 [03:11<01:47,  1.76it/s]

Episode 811, Reward: 308.0


 81%|████████  | 812/1000 [03:11<01:56,  1.61it/s]

Episode 812, Reward: 500.0


 81%|████████▏ | 813/1000 [03:12<02:01,  1.54it/s]

Episode 813, Reward: 483.0


 81%|████████▏ | 814/1000 [03:13<02:06,  1.47it/s]

Episode 814, Reward: 500.0


 82%|████████▏ | 815/1000 [03:13<01:40,  1.84it/s]

Episode 815, Reward: 133.0


 82%|████████▏ | 816/1000 [03:14<01:55,  1.59it/s]

Episode 816, Reward: 429.0


 82%|████████▏ | 817/1000 [03:15<02:17,  1.33it/s]

Episode 817, Reward: 500.0


 82%|████████▏ | 818/1000 [03:16<02:14,  1.35it/s]

Episode 818, Reward: 436.0


 82%|████████▏ | 819/1000 [03:16<02:13,  1.36it/s]

Episode 819, Reward: 500.0


 82%|████████▏ | 820/1000 [03:17<01:52,  1.61it/s]

Episode 820, Reward: 233.0


 82%|████████▏ | 821/1000 [03:17<01:57,  1.53it/s]

Episode 821, Reward: 500.0


 82%|████████▏ | 822/1000 [03:18<01:55,  1.55it/s]

Episode 822, Reward: 323.0


 82%|████████▏ | 823/1000 [03:19<02:14,  1.32it/s]

Episode 823, Reward: 500.0


 82%|████████▏ | 824/1000 [03:20<02:18,  1.27it/s]

Episode 824, Reward: 500.0


 82%|████████▎ | 825/1000 [03:21<02:15,  1.30it/s]

Episode 825, Reward: 500.0


 83%|████████▎ | 826/1000 [03:21<01:53,  1.53it/s]

Episode 826, Reward: 247.0


 83%|████████▎ | 827/1000 [03:21<01:35,  1.82it/s]

Episode 827, Reward: 209.0


 83%|████████▎ | 828/1000 [03:22<01:43,  1.65it/s]

Episode 828, Reward: 500.0


 83%|████████▎ | 829/1000 [03:22<01:30,  1.88it/s]

Episode 829, Reward: 240.0


 83%|████████▎ | 830/1000 [03:23<01:33,  1.82it/s]

Episode 830, Reward: 398.0


 83%|████████▎ | 831/1000 [03:24<01:43,  1.63it/s]

Episode 831, Reward: 500.0


 83%|████████▎ | 832/1000 [03:25<01:45,  1.58it/s]

Episode 832, Reward: 464.0


 83%|████████▎ | 833/1000 [03:25<01:51,  1.50it/s]

Episode 833, Reward: 500.0


 83%|████████▎ | 834/1000 [03:26<01:49,  1.51it/s]

Episode 834, Reward: 442.0


 84%|████████▎ | 835/1000 [03:26<01:42,  1.62it/s]

Episode 835, Reward: 356.0


 84%|████████▎ | 836/1000 [03:27<01:33,  1.75it/s]

Episode 836, Reward: 313.0


 84%|████████▎ | 837/1000 [03:27<01:27,  1.86it/s]

Episode 837, Reward: 302.0


 84%|████████▍ | 838/1000 [03:28<01:32,  1.75it/s]

Episode 838, Reward: 451.0


 84%|████████▍ | 839/1000 [03:29<01:40,  1.61it/s]

Episode 839, Reward: 500.0


 84%|████████▍ | 840/1000 [03:29<01:26,  1.85it/s]

Episode 840, Reward: 238.0


 84%|████████▍ | 841/1000 [03:30<01:24,  1.88it/s]

Episode 841, Reward: 312.0


 84%|████████▍ | 842/1000 [03:30<01:30,  1.75it/s]

Episode 842, Reward: 334.0


 84%|████████▍ | 843/1000 [03:31<01:47,  1.46it/s]

Episode 843, Reward: 447.0


 84%|████████▍ | 844/1000 [03:32<01:31,  1.71it/s]

Episode 844, Reward: 157.0


 84%|████████▍ | 845/1000 [03:32<01:37,  1.59it/s]

Episode 845, Reward: 500.0


 85%|████████▍ | 846/1000 [03:33<01:28,  1.74it/s]

Episode 846, Reward: 300.0


 85%|████████▍ | 847/1000 [03:33<01:35,  1.60it/s]

Episode 847, Reward: 500.0


 85%|████████▍ | 848/1000 [03:34<01:40,  1.52it/s]

Episode 848, Reward: 500.0


 85%|████████▍ | 849/1000 [03:35<01:43,  1.46it/s]

Episode 849, Reward: 500.0


 85%|████████▌ | 850/1000 [03:36<01:43,  1.45it/s]

Episode 850, Reward: 483.0


 85%|████████▌ | 851/1000 [03:36<01:33,  1.60it/s]

Episode 851, Reward: 310.0


 85%|████████▌ | 852/1000 [03:37<01:36,  1.54it/s]

Episode 852, Reward: 500.0


 85%|████████▌ | 853/1000 [03:38<01:39,  1.48it/s]

Episode 853, Reward: 500.0


 85%|████████▌ | 854/1000 [03:38<01:34,  1.54it/s]

Episode 854, Reward: 384.0


 86%|████████▌ | 855/1000 [03:39<01:37,  1.49it/s]

Episode 855, Reward: 500.0


 86%|████████▌ | 856/1000 [03:40<01:39,  1.44it/s]

Episode 856, Reward: 500.0


 86%|████████▌ | 857/1000 [03:40<01:41,  1.41it/s]

Episode 857, Reward: 500.0


 86%|████████▌ | 858/1000 [03:41<01:26,  1.64it/s]

Episode 858, Reward: 254.0


 86%|████████▌ | 859/1000 [03:41<01:31,  1.54it/s]

Episode 859, Reward: 500.0


 86%|████████▌ | 860/1000 [03:42<01:39,  1.41it/s]

Episode 860, Reward: 423.0


 86%|████████▌ | 861/1000 [03:43<01:32,  1.50it/s]

Episode 861, Reward: 260.0


 86%|████████▌ | 862/1000 [03:44<01:43,  1.34it/s]

Episode 862, Reward: 500.0


 86%|████████▋ | 863/1000 [03:44<01:35,  1.43it/s]

Episode 863, Reward: 405.0


 86%|████████▋ | 865/1000 [03:45<01:11,  1.88it/s]

Episode 864, Reward: 480.0
Episode 865, Reward: 93.0


 87%|████████▋ | 866/1000 [03:46<01:18,  1.72it/s]

Episode 866, Reward: 500.0


 87%|████████▋ | 867/1000 [03:47<01:23,  1.60it/s]

Episode 867, Reward: 500.0


 87%|████████▋ | 868/1000 [03:47<01:18,  1.68it/s]

Episode 868, Reward: 359.0


 87%|████████▋ | 869/1000 [03:48<01:13,  1.78it/s]

Episode 869, Reward: 311.0


 87%|████████▋ | 870/1000 [03:48<01:19,  1.63it/s]

Episode 870, Reward: 500.0


 87%|████████▋ | 871/1000 [03:49<01:12,  1.77it/s]

Episode 871, Reward: 306.0


 87%|████████▋ | 872/1000 [03:50<01:18,  1.63it/s]

Episode 872, Reward: 500.0


 87%|████████▋ | 873/1000 [03:50<01:23,  1.52it/s]

Episode 873, Reward: 500.0


 87%|████████▋ | 874/1000 [03:51<01:24,  1.50it/s]

Episode 874, Reward: 466.0


 88%|████████▊ | 875/1000 [03:51<01:09,  1.79it/s]

Episode 875, Reward: 190.0


 88%|████████▊ | 876/1000 [03:52<01:10,  1.77it/s]

Episode 876, Reward: 396.0


 88%|████████▊ | 877/1000 [03:53<01:12,  1.70it/s]

Episode 877, Reward: 423.0


 88%|████████▊ | 878/1000 [03:53<01:02,  1.94it/s]

Episode 878, Reward: 225.0


 88%|████████▊ | 879/1000 [03:54<01:04,  1.88it/s]

Episode 879, Reward: 350.0


 88%|████████▊ | 880/1000 [03:54<01:05,  1.84it/s]

Episode 880, Reward: 278.0


 88%|████████▊ | 881/1000 [03:54<00:54,  2.17it/s]

Episode 881, Reward: 131.0


 88%|████████▊ | 882/1000 [03:55<00:48,  2.42it/s]

Episode 882, Reward: 146.0


 88%|████████▊ | 883/1000 [03:55<00:54,  2.13it/s]

Episode 883, Reward: 282.0


 88%|████████▊ | 884/1000 [03:56<00:56,  2.07it/s]

Episode 884, Reward: 273.0


 88%|████████▊ | 885/1000 [03:56<00:53,  2.14it/s]

Episode 885, Reward: 291.0


 89%|████████▊ | 886/1000 [03:57<00:53,  2.11it/s]

Episode 886, Reward: 334.0


 89%|████████▊ | 887/1000 [03:57<00:59,  1.90it/s]

Episode 887, Reward: 433.0


 89%|████████▉ | 888/1000 [03:58<00:52,  2.13it/s]

Episode 888, Reward: 222.0


 89%|████████▉ | 889/1000 [03:58<00:46,  2.38it/s]

Episode 889, Reward: 207.0
Episode 890, Reward: 37.0


 89%|████████▉ | 891/1000 [03:59<00:44,  2.47it/s]

Episode 891, Reward: 500.0


 89%|████████▉ | 892/1000 [03:59<00:52,  2.07it/s]

Episode 892, Reward: 500.0


 89%|████████▉ | 893/1000 [04:00<00:50,  2.12it/s]

Episode 893, Reward: 296.0


 90%|████████▉ | 895/1000 [04:00<00:38,  2.73it/s]

Episode 894, Reward: 261.0
Episode 895, Reward: 109.0


 90%|████████▉ | 897/1000 [04:01<00:31,  3.30it/s]

Episode 896, Reward: 224.0
Episode 897, Reward: 109.0


 90%|████████▉ | 899/1000 [04:01<00:24,  4.06it/s]

Episode 898, Reward: 134.0
Episode 899, Reward: 116.0


 90%|█████████ | 900/1000 [04:01<00:21,  4.61it/s]

Episode 900, Reward: 97.0
Episode 901, Reward: 34.0


 90%|█████████ | 902/1000 [04:02<00:16,  5.91it/s]

Episode 902, Reward: 111.0
Episode 903, Reward: 24.0


 90%|█████████ | 904/1000 [04:02<00:15,  6.26it/s]

Episode 904, Reward: 177.0


 90%|█████████ | 905/1000 [04:02<00:17,  5.59it/s]

Episode 905, Reward: 160.0


 91%|█████████ | 906/1000 [04:03<00:21,  4.28it/s]

Episode 906, Reward: 274.0


 91%|█████████ | 907/1000 [04:03<00:25,  3.61it/s]

Episode 907, Reward: 281.0


 91%|█████████ | 908/1000 [04:03<00:25,  3.56it/s]

Episode 908, Reward: 189.0


 91%|█████████ | 909/1000 [04:04<00:36,  2.52it/s]

Episode 909, Reward: 500.0


 91%|█████████ | 910/1000 [04:05<00:44,  2.04it/s]

Episode 910, Reward: 500.0


 91%|█████████ | 911/1000 [04:06<00:49,  1.79it/s]

Episode 911, Reward: 500.0


 91%|█████████ | 912/1000 [04:06<00:49,  1.78it/s]

Episode 912, Reward: 278.0


 91%|█████████▏| 913/1000 [04:07<00:53,  1.63it/s]

Episode 913, Reward: 371.0


 91%|█████████▏| 914/1000 [04:07<00:49,  1.73it/s]

Episode 914, Reward: 217.0


 92%|█████████▏| 915/1000 [04:08<00:56,  1.51it/s]

Episode 915, Reward: 500.0


 92%|█████████▏| 916/1000 [04:09<00:47,  1.75it/s]

Episode 916, Reward: 239.0


 92%|█████████▏| 917/1000 [04:09<00:43,  1.92it/s]

Episode 917, Reward: 269.0


 92%|█████████▏| 918/1000 [04:09<00:40,  2.05it/s]

Episode 918, Reward: 280.0


 92%|█████████▏| 919/1000 [04:10<00:42,  1.90it/s]

Episode 919, Reward: 422.0


 92%|█████████▏| 920/1000 [04:10<00:39,  2.02it/s]

Episode 920, Reward: 283.0


 92%|█████████▏| 921/1000 [04:11<00:39,  2.00it/s]

Episode 921, Reward: 340.0


 92%|█████████▏| 922/1000 [04:11<00:36,  2.11it/s]

Episode 922, Reward: 274.0


 92%|█████████▏| 923/1000 [04:12<00:32,  2.36it/s]

Episode 923, Reward: 205.0


 92%|█████████▏| 924/1000 [04:12<00:31,  2.44it/s]

Episode 924, Reward: 246.0


 92%|█████████▎| 925/1000 [04:12<00:29,  2.54it/s]

Episode 925, Reward: 239.0


 93%|█████████▎| 926/1000 [04:13<00:29,  2.49it/s]

Episode 926, Reward: 295.0


 93%|█████████▎| 927/1000 [04:13<00:36,  2.02it/s]

Episode 927, Reward: 480.0


 93%|█████████▎| 928/1000 [04:14<00:29,  2.42it/s]

Episode 928, Reward: 146.0


 93%|█████████▎| 929/1000 [04:14<00:30,  2.34it/s]

Episode 929, Reward: 311.0


 93%|█████████▎| 930/1000 [04:15<00:32,  2.13it/s]

Episode 930, Reward: 397.0


 93%|█████████▎| 931/1000 [04:15<00:32,  2.11it/s]

Episode 931, Reward: 320.0


 93%|█████████▎| 932/1000 [04:16<00:31,  2.19it/s]

Episode 932, Reward: 294.0


 93%|█████████▎| 933/1000 [04:16<00:30,  2.22it/s]

Episode 933, Reward: 292.0


 93%|█████████▎| 934/1000 [04:17<00:34,  1.91it/s]

Episode 934, Reward: 477.0


 94%|█████████▎| 935/1000 [04:17<00:31,  2.03it/s]

Episode 935, Reward: 275.0


 94%|█████████▎| 936/1000 [04:18<00:38,  1.67it/s]

Episode 936, Reward: 500.0


 94%|█████████▎| 937/1000 [04:19<00:44,  1.42it/s]

Episode 937, Reward: 500.0


 94%|█████████▍| 938/1000 [04:20<00:47,  1.30it/s]

Episode 938, Reward: 462.0


 94%|█████████▍| 939/1000 [04:21<00:46,  1.32it/s]

Episode 939, Reward: 500.0


 94%|█████████▍| 940/1000 [04:21<00:45,  1.33it/s]

Episode 940, Reward: 500.0


 94%|█████████▍| 941/1000 [04:22<00:38,  1.52it/s]

Episode 941, Reward: 297.0


 94%|█████████▍| 942/1000 [04:22<00:38,  1.52it/s]

Episode 942, Reward: 435.0


 94%|█████████▍| 943/1000 [04:23<00:40,  1.39it/s]

Episode 943, Reward: 500.0


 94%|█████████▍| 944/1000 [04:24<00:34,  1.64it/s]

Episode 944, Reward: 247.0


 94%|█████████▍| 945/1000 [04:24<00:36,  1.52it/s]

Episode 945, Reward: 500.0


 95%|█████████▍| 946/1000 [04:25<00:34,  1.55it/s]

Episode 946, Reward: 427.0


 95%|█████████▍| 947/1000 [04:26<00:35,  1.49it/s]

Episode 947, Reward: 500.0


 95%|█████████▍| 948/1000 [04:26<00:32,  1.58it/s]

Episode 948, Reward: 355.0


 95%|█████████▍| 949/1000 [04:27<00:29,  1.73it/s]

Episode 949, Reward: 306.0


 95%|█████████▌| 950/1000 [04:27<00:27,  1.80it/s]

Episode 950, Reward: 328.0


 95%|█████████▌| 951/1000 [04:28<00:25,  1.89it/s]

Episode 951, Reward: 317.0


 95%|█████████▌| 952/1000 [04:28<00:24,  1.95it/s]

Episode 952, Reward: 320.0


 95%|█████████▌| 953/1000 [04:29<00:25,  1.82it/s]

Episode 953, Reward: 437.0


 95%|█████████▌| 954/1000 [04:30<00:27,  1.66it/s]

Episode 954, Reward: 500.0


 96%|█████████▌| 955/1000 [04:30<00:29,  1.50it/s]

Episode 955, Reward: 418.0


 96%|█████████▌| 956/1000 [04:31<00:25,  1.69it/s]

Episode 956, Reward: 197.0


 96%|█████████▌| 957/1000 [04:32<00:31,  1.37it/s]

Episode 957, Reward: 500.0


 96%|█████████▌| 959/1000 [04:33<00:20,  1.95it/s]

Episode 958, Reward: 311.0
Episode 959, Reward: 116.0


 96%|█████████▌| 960/1000 [04:33<00:22,  1.74it/s]

Episode 960, Reward: 500.0


 96%|█████████▌| 961/1000 [04:34<00:24,  1.62it/s]

Episode 961, Reward: 500.0


 96%|█████████▌| 962/1000 [04:35<00:25,  1.52it/s]

Episode 962, Reward: 500.0


 96%|█████████▋| 963/1000 [04:35<00:25,  1.47it/s]

Episode 963, Reward: 500.0


 96%|█████████▋| 964/1000 [04:36<00:25,  1.43it/s]

Episode 964, Reward: 500.0


 96%|█████████▋| 965/1000 [04:37<00:21,  1.62it/s]

Episode 965, Reward: 291.0


 97%|█████████▋| 966/1000 [04:37<00:22,  1.54it/s]

Episode 966, Reward: 500.0


 97%|█████████▋| 967/1000 [04:38<00:20,  1.63it/s]

Episode 967, Reward: 366.0


 97%|█████████▋| 968/1000 [04:39<00:20,  1.53it/s]

Episode 968, Reward: 500.0


 97%|█████████▋| 969/1000 [04:39<00:18,  1.68it/s]

Episode 969, Reward: 311.0


 97%|█████████▋| 970/1000 [04:40<00:18,  1.61it/s]

Episode 970, Reward: 460.0


 97%|█████████▋| 971/1000 [04:40<00:17,  1.70it/s]

Episode 971, Reward: 344.0


 97%|█████████▋| 972/1000 [04:41<00:17,  1.62it/s]

Episode 972, Reward: 473.0


 97%|█████████▋| 973/1000 [04:42<00:16,  1.61it/s]

Episode 973, Reward: 435.0


 97%|█████████▋| 974/1000 [04:42<00:16,  1.53it/s]

Episode 974, Reward: 383.0


 98%|█████████▊| 975/1000 [04:43<00:15,  1.59it/s]

Episode 975, Reward: 292.0


 98%|█████████▊| 976/1000 [04:44<00:15,  1.53it/s]

Episode 976, Reward: 321.0


 98%|█████████▊| 977/1000 [04:44<00:16,  1.42it/s]

Episode 977, Reward: 500.0


 98%|█████████▊| 978/1000 [04:45<00:14,  1.50it/s]

Episode 978, Reward: 402.0


 98%|█████████▊| 979/1000 [04:46<00:13,  1.59it/s]

Episode 979, Reward: 355.0


 98%|█████████▊| 980/1000 [04:46<00:13,  1.52it/s]

Episode 980, Reward: 500.0


 98%|█████████▊| 981/1000 [04:47<00:11,  1.66it/s]

Episode 981, Reward: 315.0


 98%|█████████▊| 982/1000 [04:47<00:11,  1.61it/s]

Episode 982, Reward: 448.0


 98%|█████████▊| 983/1000 [04:48<00:11,  1.54it/s]

Episode 983, Reward: 490.0


 98%|█████████▊| 984/1000 [04:49<00:10,  1.47it/s]

Episode 984, Reward: 500.0


 98%|█████████▊| 985/1000 [04:50<00:10,  1.44it/s]

Episode 985, Reward: 490.0


 99%|█████████▊| 986/1000 [04:50<00:09,  1.46it/s]

Episode 986, Reward: 442.0


 99%|█████████▊| 987/1000 [04:51<00:08,  1.48it/s]

Episode 987, Reward: 459.0


 99%|█████████▉| 988/1000 [04:52<00:08,  1.48it/s]

Episode 988, Reward: 451.0


 99%|█████████▉| 989/1000 [04:52<00:07,  1.44it/s]

Episode 989, Reward: 500.0


 99%|█████████▉| 990/1000 [04:53<00:07,  1.41it/s]

Episode 990, Reward: 500.0


 99%|█████████▉| 991/1000 [04:54<00:06,  1.39it/s]

Episode 991, Reward: 500.0


 99%|█████████▉| 992/1000 [04:55<00:06,  1.26it/s]

Episode 992, Reward: 500.0


 99%|█████████▉| 993/1000 [04:55<00:05,  1.31it/s]

Episode 993, Reward: 314.0


 99%|█████████▉| 994/1000 [04:56<00:04,  1.24it/s]

Episode 994, Reward: 500.0


100%|█████████▉| 995/1000 [04:57<00:03,  1.28it/s]

Episode 995, Reward: 500.0


100%|█████████▉| 996/1000 [04:58<00:02,  1.37it/s]

Episode 996, Reward: 400.0


100%|█████████▉| 997/1000 [04:58<00:02,  1.36it/s]

Episode 997, Reward: 500.0


100%|█████████▉| 998/1000 [04:59<00:01,  1.36it/s]

Episode 998, Reward: 500.0


100%|█████████▉| 999/1000 [05:00<00:00,  1.37it/s]

Episode 999, Reward: 500.0


100%|██████████| 1000/1000 [05:01<00:00,  3.32it/s]

Episode 1000, Reward: 482.0





## Evaluation

Use the `choose_action` method of the trained agent to evaluate its performance.

In [19]:
env_name = 'CartPole-v1'
env = gym.make(env_name, render_mode='rgb_array')

num_episodes = 10
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Assuming a2c_model is already trained and available
model = a2c_model

frames = []
episode_rewards = []

for i in range(num_episodes):
    state = env.reset()
    if isinstance(state, tuple):  # Handle different Gym API versions
        state = state[0]

    episode_reward = 0
    done = False

    while not done:
        # Add the current frame to the list if it's the first episode
        if i == 0:
            frame = env.render()

            # Check if frame is a list and handle accordingly
            if isinstance(frame, list):
                frame = np.array(frame[0])  # Assuming the first element of the list is the frame

            # Check if the frame has the expected number of channels
            if frame.shape[-1] not in (1, 2, 3, 4):
                # Convert the frame to RGB if it has a different number of channels
                frame = frame[..., :3]  # Assuming the first 3 channels are RGB
            frames.append(frame)

        # Debugging information
        print(f"State shape: {state.shape}, State type: {type(state)}")

        state = torch.FloatTensor(state).unsqueeze(0).to(device)
        with torch.no_grad():
            action_probs, _ = model.policy_net(state)
        action = torch.multinomial(action_probs, 1).item()

        next_state, reward, done, _ = env.step(action)
        if isinstance(next_state, tuple):  # Handle different Gym API versions
            next_state = next_state[0]

        episode_reward += reward
        state = next_state

    episode_rewards.append(episode_reward)
    print(f"Episode {i + 1} Reward: {episode_reward}")

env.close()

episode_rewards = np.array(episode_rewards)
print(f"Average Reward over {num_episodes} episodes: {np.mean(episode_rewards)}")
imageio.mimsave('./test.mp4', frames, fps=25)

  and should_run_async(code)
  deprecation(
  deprecation(
  if not isinstance(terminated, (bool, np.bool8)):


State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shap



State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
State shape: (4,), State type: <class 'numpy.ndarray'>
Episode 10 Reward: 500.0
Average Reward over 10 episodes: 476.0


In [20]:
show_video('./test.mp4')