##### Copyright 2020 The TensorFlow Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Playing CartPole with the Actor-Critic Method


<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/tutorials/reinforcement_learning/actor_critic">
    <img src="https://www.tensorflow.org/images/tf_logo_32px.png" />
    View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/reinforcement_learning/actor_critic.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />
    Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/docs/blob/master/site/en/tutorials/reinforcement_learning/actor_critic.ipynb">
    <img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />
    View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/site/en/tutorials/reinforcement_learning/actor_critic.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

This tutorial demonstrates how to implement the [Actor-Critic](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf) method using TensorFlow to train an agent on the [Open AI Gym](https://gym.openai.com/) CartPole-V0 environment.
The reader is assumed to have some familiarity with [policy gradient methods](https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf) of reinforcement learning. 


**Actor-Critic methods**

Actor-Critic methods are [temporal difference (TD) learning](https://en.wikipedia.org/wiki/Temporal_difference_learning) methods that represent the policy function independent of the value function. 

A policy function (or policy) returns a probability distribution over actions that the agent can take based on the given state.
A value function determines the expected return for an agent starting at a given state and acting according to a particular policy forever after.

In the Actor-Critic method, the policy is referred to as the *actor* that proposes a set of possible actions given a state, and the estimated value function is referred to as the *critic*, which evaluates actions taken by the *actor* based on the given policy.

In this tutorial, both the *Actor* and *Critic* will be represented using one neural network with two outputs.


**CartPole-v0**

In the [CartPole-v0 environment](https://gym.openai.com/envs/CartPole-v0), a pole is attached to a cart moving along a frictionless track. 
The pole starts upright and the goal of the agent is to prevent it from falling over by applying a force of -1 or +1 to the cart. 
A reward of +1 is given for every time step the pole remains upright.
An episode ends when (1) the pole is more than 15 degrees from vertical or (2) the cart moves more than 2.4 units from the center.

<center>
  <figure>
    <image src="images/cartpole-v0.gif">
    <figcaption>
      Trained actor-critic model in Cartpole-v0 environment
    </figcaption>
  </figure>
</center>


The problem is considered "solved" when the average total reward for the episode reaches 195 over 100 consecutive trials.

## Setup

Import necessary packages and configure global settings.


In [2]:
!pip install gym
!pip install pyglet

Collecting gym


  Downloading gym-0.21.0.tar.gz (1.5 MB)
[?25l




[?25h  Preparing metadata (setup.py) ... [?25l- done




Collecting cloudpickle>=1.2.0
  Using cached cloudpickle-2.0.0-py3-none-any.whl (25 kB)
Building wheels for collected packages: gym


  Building wheel for gym (setup.py) ... [?25l-

 done


[?25h  Created wheel for gym: filename=gym-0.21.0-py3-none-any.whl size=1616821 sha256=dde587224296e01de6bea20f20c31103c8a6b928d8d62d366df7d8a4fdbb2598
  Stored in directory: /home/kbuilder/.cache/pip/wheels/76/ee/9c/36bfe3e079df99acf5ae57f4e3464ff2771b34447d6d2f2148
Successfully built gym


Installing collected packages: cloudpickle, gym


Successfully installed cloudpickle-2.0.0 gym-0.21.0


Collecting pyglet


  Downloading pyglet-1.5.21-py3-none-any.whl (1.1 MB)
[?25l




[?25hInstalling collected packages: pyglet


Successfully installed pyglet-1.5.21


In [3]:
%%bash
# Install additional packages for visualization
sudo apt-get install -y xvfb python-opengl > /dev/null 2>&1
pip install pyvirtualdisplay > /dev/null 2>&1
pip install git+https://github.com/tensorflow/docs > /dev/null 2>&1

In [4]:
import collections
import gym
import numpy as np
import statistics
import tensorflow as tf
import tqdm

from matplotlib import pyplot as plt
from tensorflow.keras import layers
from typing import Any, List, Sequence, Tuple


# Create the environment
env = gym.make("CartPole-v0")

# Set seed for experiment reproducibility
seed = 42
env.seed(seed)
tf.random.set_seed(seed)
np.random.seed(seed)

# Small epsilon value for stabilizing division operations
eps = np.finfo(np.float32).eps.item()

## Model

The *Actor* and *Critic* will be modeled using one neural network that generates the action probabilities and critic value respectively. This tutorial uses model subclassing to define the model. 

During the forward pass, the model will take in the state as the input and will output both action probabilities and critic value $V$, which models the state-dependent [value function](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#value-functions). The goal is to train a model that chooses actions based on a policy $\pi$ that maximizes expected [return](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#reward-and-return).

For Cartpole-v0, there are four values representing the state: cart position, cart-velocity, pole angle and pole velocity respectively. The agent can take two actions to push the cart left (0) and right (1) respectively.

Refer to [OpenAI Gym's CartPole-v0 wiki page](http://www.derongliu.org/adp/adp-cdrom/Barto1983.pdf) for more information.


In [5]:
class ActorCritic(tf.keras.Model):
  """Combined actor-critic network."""

  def __init__(
      self, 
      num_actions: int, 
      num_hidden_units: int):
    """Initialize."""
    super().__init__()

    self.common = layers.Dense(num_hidden_units, activation="relu")
    self.actor = layers.Dense(num_actions)
    self.critic = layers.Dense(1)

  def call(self, inputs: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
    x = self.common(inputs)
    return self.actor(x), self.critic(x)

In [6]:
num_actions = env.action_space.n  # 2
num_hidden_units = 128

model = ActorCritic(num_actions, num_hidden_units)

## Training

To train the agent, you will follow these steps:

1. Run the agent on the environment to collect training data per episode.
2. Compute expected return at each time step.
3. Compute the loss for the combined actor-critic model.
4. Compute gradients and update network parameters.
5. Repeat 1-4 until either success criterion or max episodes has been reached.


### 1. Collecting training data

As in supervised learning, in order to train the actor-critic model, you need
to have training data. However, in order to collect such data, the model would
need to be "run" in the environment.

Training data is collected for each episode. Then at each time step, the model's forward pass will be run on the environment's state in order to generate action probabilities and the critic value based on the current policy parameterized by the model's weights.

The next action will be sampled from the action probabilities generated by the model, which would then be applied to the environment, causing the next state and reward to be generated.

This process is implemented in the `run_episode` function, which uses TensorFlow operations so that it can later be compiled into a TensorFlow graph for faster training. Note that `tf.TensorArray`s were used to support Tensor iteration on variable length arrays.

In [7]:
# Wrap OpenAI Gym's `env.step` call as an operation in a TensorFlow function.
# This would allow it to be included in a callable TensorFlow graph.

def env_step(action: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
  """Returns state, reward and done flag given an action."""

  state, reward, done, _ = env.step(action)
  return (state.astype(np.float32), 
          np.array(reward, np.int32), 
          np.array(done, np.int32))


def tf_env_step(action: tf.Tensor) -> List[tf.Tensor]:
  return tf.numpy_function(env_step, [action], 
                           [tf.float32, tf.int32, tf.int32])

In [8]:
def run_episode(
    initial_state: tf.Tensor,  
    model: tf.keras.Model, 
    max_steps: int) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]:
  """Runs a single episode to collect training data."""

  action_probs = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
  values = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
  rewards = tf.TensorArray(dtype=tf.int32, size=0, dynamic_size=True)

  initial_state_shape = initial_state.shape
  state = initial_state

  for t in tf.range(max_steps):
    # Convert state into a batched tensor (batch size = 1)
    state = tf.expand_dims(state, 0)
  
    # Run the model and to get action probabilities and critic value
    action_logits_t, value = model(state)
  
    # Sample next action from the action probability distribution
    action = tf.random.categorical(action_logits_t, 1)[0, 0]
    action_probs_t = tf.nn.softmax(action_logits_t)

    # Store critic values
    values = values.write(t, tf.squeeze(value))

    # Store log probability of the action chosen
    action_probs = action_probs.write(t, action_probs_t[0, action])
  
    # Apply action to the environment to get next state and reward
    state, reward, done = tf_env_step(action)
    state.set_shape(initial_state_shape)
  
    # Store reward
    rewards = rewards.write(t, reward)

    if tf.cast(done, tf.bool):
      break

  action_probs = action_probs.stack()
  values = values.stack()
  rewards = rewards.stack()
  
  return action_probs, values, rewards

### 2. Computing expected returns

The sequence of rewards for each timestep $t$, $\{r_{t}\}^{T}_{t=1}$ collected during one episode is converted into a sequence of expected returns $\{G_{t}\}^{T}_{t=1}$ in which the sum of rewards is taken from the current timestep $t$ to $T$ and each reward is multiplied with an exponentially decaying discount factor $\gamma$:

$$G_{t} = \sum^{T}_{t'=t} \gamma^{t'-t}r_{t'}$$

Since $\gamma\in(0,1)$, rewards further out from the current timestep are given less weight.

Intuitively, expected return simply implies that rewards now are better than rewards later. In a mathematical sense, it is to ensure that the sum of the rewards converges.

To stabilize training, the resulting sequence of returns is also standardized (i.e. to have zero mean and unit standard deviation).


In [9]:
def get_expected_return(
    rewards: tf.Tensor, 
    gamma: float, 
    standardize: bool = True) -> tf.Tensor:
  """Compute expected returns per timestep."""

  n = tf.shape(rewards)[0]
  returns = tf.TensorArray(dtype=tf.float32, size=n)

  # Start from the end of `rewards` and accumulate reward sums
  # into the `returns` array
  rewards = tf.cast(rewards[::-1], dtype=tf.float32)
  discounted_sum = tf.constant(0.0)
  discounted_sum_shape = discounted_sum.shape
  for i in tf.range(n):
    reward = rewards[i]
    discounted_sum = reward + gamma * discounted_sum
    discounted_sum.set_shape(discounted_sum_shape)
    returns = returns.write(i, discounted_sum)
  returns = returns.stack()[::-1]

  if standardize:
    returns = ((returns - tf.math.reduce_mean(returns)) / 
               (tf.math.reduce_std(returns) + eps))

  return returns

### 3. The actor-critic loss

Since a hybrid actor-critic model is used, the chosen loss function is a combination of actor and critic losses for training, as shown below:

$$L = L_{actor} + L_{critic}$$

#### Actor loss

The actor loss is based on [policy gradients with the critic as a state dependent baseline](https://www.youtube.com/watch?v=EKqxumCuAAY&t=62m23s) and computed with single-sample (per-episode) estimates.

$$L_{actor} = -\sum^{T}_{t=1} \log\pi_{\theta}(a_{t} | s_{t})[G(s_{t}, a_{t})  - V^{\pi}_{\theta}(s_{t})]$$

where:
- $T$: the number of timesteps per episode, which can vary per episode
- $s_{t}$: the state at timestep $t$
- $a_{t}$: chosen action at timestep $t$ given state $s$
- $\pi_{\theta}$: is the policy (actor) parameterized by $\theta$
- $V^{\pi}_{\theta}$: is the value function (critic) also parameterized by $\theta$
- $G = G_{t}$: the expected return for a given state, action pair at timestep $t$

A negative term is added to the sum since the idea is to maximize the probabilities of actions yielding higher rewards by minimizing the combined loss.

<br>

##### Advantage

The $G - V$ term in our $L_{actor}$ formulation is called the [advantage](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#advantage-functions), which indicates how much better an action is given a particular state over a random action selected according to the policy $\pi$ for that state.

While it's possible to exclude a baseline, this may result in high variance during training. And the nice thing about choosing the critic $V$ as a baseline is that it trained to be as close as possible to $G$, leading to a lower variance.

In addition, without the critic, the algorithm would try to increase probabilities for actions taken on a particular state based on expected return, which may not make much of a difference if the relative probabilities between actions remain the same.

For instance, suppose that two actions for a given state would yield the same expected return. Without the critic, the algorithm would try to raise the probability of these actions based on the objective $J$. With the critic, it may turn out that there's no advantage ($G - V = 0$) and thus no benefit gained in increasing the actions' probabilities and the algorithm would set the gradients to zero.

<br>

#### Critic loss

Training $V$ to be as close possible to $G$ can be set up as a regression problem with the following loss function:

$$L_{critic} = L_{\delta}(G, V^{\pi}_{\theta})$$

where $L_{\delta}$ is the [Huber loss](https://en.wikipedia.org/wiki/Huber_loss), which is less sensitive to outliers in data than squared-error loss.


In [10]:
huber_loss = tf.keras.losses.Huber(reduction=tf.keras.losses.Reduction.SUM)

def compute_loss(
    action_probs: tf.Tensor,  
    values: tf.Tensor,  
    returns: tf.Tensor) -> tf.Tensor:
  """Computes the combined actor-critic loss."""

  advantage = returns - values

  action_log_probs = tf.math.log(action_probs)
  actor_loss = -tf.math.reduce_sum(action_log_probs * advantage)

  critic_loss = huber_loss(values, returns)

  return actor_loss + critic_loss

### 4. Defining the training step to update parameters

All of the steps above are combined into a training step that is run every episode. All steps leading up to the loss function are executed with the `tf.GradientTape` context to enable automatic differentiation.

This tutorial uses the Adam optimizer to apply the gradients to the model parameters.

The sum of the undiscounted rewards, `episode_reward`, is also computed in this step. This value will be used later on to evaluate if the success criterion is met.

The `tf.function` context is applied to the `train_step` function so that it can be compiled into a callable TensorFlow graph, which can lead to 10x speedup in training.


In [11]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)


@tf.function
def train_step(
    initial_state: tf.Tensor, 
    model: tf.keras.Model, 
    optimizer: tf.keras.optimizers.Optimizer, 
    gamma: float, 
    max_steps_per_episode: int) -> tf.Tensor:
  """Runs a model training step."""

  with tf.GradientTape() as tape:

    # Run the model for one episode to collect training data
    action_probs, values, rewards = run_episode(
        initial_state, model, max_steps_per_episode) 

    # Calculate expected returns
    returns = get_expected_return(rewards, gamma)

    # Convert training data to appropriate TF tensor shapes
    action_probs, values, returns = [
        tf.expand_dims(x, 1) for x in [action_probs, values, returns]] 

    # Calculating loss values to update our network
    loss = compute_loss(action_probs, values, returns)

  # Compute the gradients from the loss
  grads = tape.gradient(loss, model.trainable_variables)

  # Apply the gradients to the model's parameters
  optimizer.apply_gradients(zip(grads, model.trainable_variables))

  episode_reward = tf.math.reduce_sum(rewards)

  return episode_reward

### 5. Run the training loop

Training is executed by running the training step until either the success criterion or maximum number of episodes is reached.  

A running record of episode rewards is kept in a queue. Once 100 trials are reached, the oldest reward is removed at the left (tail) end of the queue and the newest one is added at the head (right). A running sum of the rewards is also maintained for computational efficiency. 

Depending on your runtime, training can finish in less than a minute.

In [12]:
%%time

min_episodes_criterion = 100
max_episodes = 10000
max_steps_per_episode = 1000

# Cartpole-v0 is considered solved if average reward is >= 195 over 100 
# consecutive trials
reward_threshold = 195
running_reward = 0

# Discount factor for future rewards
gamma = 0.99

# Keep last episodes reward
episodes_reward: collections.deque = collections.deque(maxlen=min_episodes_criterion)

with tqdm.trange(max_episodes) as t:
  for i in t:
    initial_state = tf.constant(env.reset(), dtype=tf.float32)
    episode_reward = int(train_step(
        initial_state, model, optimizer, gamma, max_steps_per_episode))
    
    episodes_reward.append(episode_reward)
    running_reward = statistics.mean(episodes_reward)
  
    t.set_description(f'Episode {i}')
    t.set_postfix(
        episode_reward=episode_reward, running_reward=running_reward)
  
    # Show average episode reward every 10 episodes
    if i % 10 == 0:
      pass # print(f'Episode {i}: average reward: {avg_reward}')
  
    if running_reward > reward_threshold and i >= min_episodes_criterion:  
        break

print(f'\nSolved at episode {i}: average reward: {running_reward:.2f}!')

  0%|          | 0/10000 [00:00<?, ?it/s]

Episode 0:   0%|          | 0/10000 [00:01<?, ?it/s]

Episode 0:   0%|          | 0/10000 [00:01<?, ?it/s, episode_reward=40, running_reward=40]

Episode 0:   0%|          | 1/10000 [00:01<5:33:17,  2.00s/it, episode_reward=40, running_reward=40]

Episode 1:   0%|          | 1/10000 [00:02<5:33:17,  2.00s/it, episode_reward=40, running_reward=40]

Episode 1:   0%|          | 1/10000 [00:02<5:33:17,  2.00s/it, episode_reward=25, running_reward=32.5]

Episode 2:   0%|          | 1/10000 [00:02<5:33:17,  2.00s/it, episode_reward=25, running_reward=32.5]

Episode 2:   0%|          | 1/10000 [00:02<5:33:17,  2.00s/it, episode_reward=59, running_reward=41.3]

Episode 2:   0%|          | 3/10000 [00:02<1:35:03,  1.75it/s, episode_reward=59, running_reward=41.3]

Episode 3:   0%|          | 3/10000 [00:02<1:35:03,  1.75it/s, episode_reward=59, running_reward=41.3]

Episode 3:   0%|          | 3/10000 [00:02<1:35:03,  1.75it/s, episode_reward=24, running_reward=37]  

Episode 4:   0%|          | 3/10000 [00:02<1:35:03,  1.75it/s, episode_reward=24, running_reward=37]

Episode 4:   0%|          | 3/10000 [00:02<1:35:03,  1.75it/s, episode_reward=25, running_reward=34.6]

Episode 5:   0%|          | 3/10000 [00:02<1:35:03,  1.75it/s, episode_reward=25, running_reward=34.6]

Episode 5:   0%|          | 3/10000 [00:02<1:35:03,  1.75it/s, episode_reward=54, running_reward=37.8]

Episode 5:   0%|          | 6/10000 [00:02<42:30,  3.92it/s, episode_reward=54, running_reward=37.8]  

Episode 6:   0%|          | 6/10000 [00:02<42:30,  3.92it/s, episode_reward=54, running_reward=37.8]

Episode 6:   0%|          | 6/10000 [00:02<42:30,  3.92it/s, episode_reward=29, running_reward=36.6]

Episode 7:   0%|          | 6/10000 [00:02<42:30,  3.92it/s, episode_reward=29, running_reward=36.6]

Episode 7:   0%|          | 6/10000 [00:02<42:30,  3.92it/s, episode_reward=20, running_reward=34.5]

Episode 8:   0%|          | 6/10000 [00:02<42:30,  3.92it/s, episode_reward=20, running_reward=34.5]

Episode 8:   0%|          | 6/10000 [00:02<42:30,  3.92it/s, episode_reward=14, running_reward=32.2]

Episode 8:   0%|          | 9/10000 [00:02<25:27,  6.54it/s, episode_reward=14, running_reward=32.2]

Episode 9:   0%|          | 9/10000 [00:02<25:27,  6.54it/s, episode_reward=14, running_reward=32.2]

Episode 9:   0%|          | 9/10000 [00:02<25:27,  6.54it/s, episode_reward=44, running_reward=33.4]

Episode 10:   0%|          | 9/10000 [00:02<25:27,  6.54it/s, episode_reward=44, running_reward=33.4]

Episode 10:   0%|          | 9/10000 [00:02<25:27,  6.54it/s, episode_reward=42, running_reward=34.2]

Episode 10:   0%|          | 11/10000 [00:02<21:45,  7.65it/s, episode_reward=42, running_reward=34.2]

Episode 11:   0%|          | 11/10000 [00:02<21:45,  7.65it/s, episode_reward=42, running_reward=34.2]

Episode 11:   0%|          | 11/10000 [00:02<21:45,  7.65it/s, episode_reward=74, running_reward=37.5]

Episode 12:   0%|          | 11/10000 [00:02<21:45,  7.65it/s, episode_reward=74, running_reward=37.5]

Episode 12:   0%|          | 11/10000 [00:02<21:45,  7.65it/s, episode_reward=27, running_reward=36.7]

Episode 12:   0%|          | 13/10000 [00:02<19:24,  8.58it/s, episode_reward=27, running_reward=36.7]

Episode 13:   0%|          | 13/10000 [00:02<19:24,  8.58it/s, episode_reward=27, running_reward=36.7]

Episode 13:   0%|          | 13/10000 [00:02<19:24,  8.58it/s, episode_reward=120, running_reward=42.6]

Episode 14:   0%|          | 13/10000 [00:02<19:24,  8.58it/s, episode_reward=120, running_reward=42.6]

Episode 14:   0%|          | 13/10000 [00:02<19:24,  8.58it/s, episode_reward=29, running_reward=41.7] 

Episode 14:   0%|          | 15/10000 [00:02<19:25,  8.57it/s, episode_reward=29, running_reward=41.7]

Episode 15:   0%|          | 15/10000 [00:03<19:25,  8.57it/s, episode_reward=29, running_reward=41.7]

Episode 15:   0%|          | 15/10000 [00:03<19:25,  8.57it/s, episode_reward=13, running_reward=39.9]

Episode 16:   0%|          | 15/10000 [00:03<19:25,  8.57it/s, episode_reward=13, running_reward=39.9]

Episode 16:   0%|          | 15/10000 [00:03<19:25,  8.57it/s, episode_reward=45, running_reward=40.2]

Episode 17:   0%|          | 15/10000 [00:03<19:25,  8.57it/s, episode_reward=45, running_reward=40.2]

Episode 17:   0%|          | 15/10000 [00:03<19:25,  8.57it/s, episode_reward=41, running_reward=40.3]

Episode 17:   0%|          | 18/10000 [00:03<15:16, 10.89it/s, episode_reward=41, running_reward=40.3]

Episode 18:   0%|          | 18/10000 [00:03<15:16, 10.89it/s, episode_reward=41, running_reward=40.3]

Episode 18:   0%|          | 18/10000 [00:03<15:16, 10.89it/s, episode_reward=19, running_reward=39.2]

Episode 19:   0%|          | 18/10000 [00:03<15:16, 10.89it/s, episode_reward=19, running_reward=39.2]

Episode 19:   0%|          | 18/10000 [00:03<15:16, 10.89it/s, episode_reward=34, running_reward=38.9]

Episode 20:   0%|          | 18/10000 [00:03<15:16, 10.89it/s, episode_reward=34, running_reward=38.9]

Episode 20:   0%|          | 18/10000 [00:03<15:16, 10.89it/s, episode_reward=43, running_reward=39.1]

Episode 20:   0%|          | 21/10000 [00:03<13:01, 12.78it/s, episode_reward=43, running_reward=39.1]

Episode 21:   0%|          | 21/10000 [00:03<13:01, 12.78it/s, episode_reward=43, running_reward=39.1]

Episode 21:   0%|          | 21/10000 [00:03<13:01, 12.78it/s, episode_reward=22, running_reward=38.3]

Episode 22:   0%|          | 21/10000 [00:03<13:01, 12.78it/s, episode_reward=22, running_reward=38.3]

Episode 22:   0%|          | 21/10000 [00:03<13:01, 12.78it/s, episode_reward=23, running_reward=37.7]

Episode 23:   0%|          | 21/10000 [00:03<13:01, 12.78it/s, episode_reward=23, running_reward=37.7]

Episode 23:   0%|          | 21/10000 [00:03<13:01, 12.78it/s, episode_reward=21, running_reward=37]  

Episode 23:   0%|          | 24/10000 [00:03<10:49, 15.35it/s, episode_reward=21, running_reward=37]

Episode 24:   0%|          | 24/10000 [00:03<10:49, 15.35it/s, episode_reward=21, running_reward=37]

Episode 24:   0%|          | 24/10000 [00:03<10:49, 15.35it/s, episode_reward=35, running_reward=36.9]

Episode 25:   0%|          | 24/10000 [00:03<10:49, 15.35it/s, episode_reward=35, running_reward=36.9]

Episode 25:   0%|          | 24/10000 [00:03<10:49, 15.35it/s, episode_reward=28, running_reward=36.5]

Episode 25:   0%|          | 26/10000 [00:03<10:36, 15.67it/s, episode_reward=28, running_reward=36.5]

Episode 26:   0%|          | 26/10000 [00:03<10:36, 15.67it/s, episode_reward=28, running_reward=36.5]

Episode 26:   0%|          | 26/10000 [00:03<10:36, 15.67it/s, episode_reward=81, running_reward=38.2]

Episode 27:   0%|          | 26/10000 [00:03<10:36, 15.67it/s, episode_reward=81, running_reward=38.2]

Episode 27:   0%|          | 26/10000 [00:03<10:36, 15.67it/s, episode_reward=112, running_reward=40.8]

Episode 27:   0%|          | 28/10000 [00:03<14:43, 11.29it/s, episode_reward=112, running_reward=40.8]

Episode 28:   0%|          | 28/10000 [00:03<14:43, 11.29it/s, episode_reward=112, running_reward=40.8]

Episode 28:   0%|          | 28/10000 [00:03<14:43, 11.29it/s, episode_reward=34, running_reward=40.6] 

Episode 29:   0%|          | 28/10000 [00:04<14:43, 11.29it/s, episode_reward=34, running_reward=40.6]

Episode 29:   0%|          | 28/10000 [00:04<14:43, 11.29it/s, episode_reward=58, running_reward=41.2]

Episode 29:   0%|          | 30/10000 [00:04<14:10, 11.72it/s, episode_reward=58, running_reward=41.2]

Episode 30:   0%|          | 30/10000 [00:04<14:10, 11.72it/s, episode_reward=58, running_reward=41.2]

Episode 30:   0%|          | 30/10000 [00:04<14:10, 11.72it/s, episode_reward=51, running_reward=41.5]

Episode 31:   0%|          | 30/10000 [00:04<14:10, 11.72it/s, episode_reward=51, running_reward=41.5]

Episode 31:   0%|          | 30/10000 [00:04<14:10, 11.72it/s, episode_reward=34, running_reward=41.2]

Episode 31:   0%|          | 32/10000 [00:04<13:33, 12.25it/s, episode_reward=34, running_reward=41.2]

Episode 32:   0%|          | 32/10000 [00:04<13:33, 12.25it/s, episode_reward=34, running_reward=41.2]

Episode 32:   0%|          | 32/10000 [00:04<13:33, 12.25it/s, episode_reward=26, running_reward=40.8]

Episode 33:   0%|          | 32/10000 [00:04<13:33, 12.25it/s, episode_reward=26, running_reward=40.8]

Episode 33:   0%|          | 32/10000 [00:04<13:33, 12.25it/s, episode_reward=24, running_reward=40.3]

Episode 34:   0%|          | 32/10000 [00:04<13:33, 12.25it/s, episode_reward=24, running_reward=40.3]

Episode 34:   0%|          | 32/10000 [00:04<13:33, 12.25it/s, episode_reward=15, running_reward=39.6]

Episode 34:   0%|          | 35/10000 [00:04<10:46, 15.41it/s, episode_reward=15, running_reward=39.6]

Episode 35:   0%|          | 35/10000 [00:04<10:46, 15.41it/s, episode_reward=15, running_reward=39.6]

Episode 35:   0%|          | 35/10000 [00:04<10:46, 15.41it/s, episode_reward=82, running_reward=40.8]

Episode 36:   0%|          | 35/10000 [00:04<10:46, 15.41it/s, episode_reward=82, running_reward=40.8]

Episode 36:   0%|          | 35/10000 [00:04<10:46, 15.41it/s, episode_reward=68, running_reward=41.5]

Episode 36:   0%|          | 37/10000 [00:04<13:21, 12.43it/s, episode_reward=68, running_reward=41.5]

Episode 37:   0%|          | 37/10000 [00:04<13:21, 12.43it/s, episode_reward=68, running_reward=41.5]

Episode 37:   0%|          | 37/10000 [00:04<13:21, 12.43it/s, episode_reward=24, running_reward=41]  

Episode 38:   0%|          | 37/10000 [00:04<13:21, 12.43it/s, episode_reward=24, running_reward=41]

Episode 38:   0%|          | 37/10000 [00:04<13:21, 12.43it/s, episode_reward=25, running_reward=40.6]

Episode 39:   0%|          | 37/10000 [00:04<13:21, 12.43it/s, episode_reward=25, running_reward=40.6]

Episode 39:   0%|          | 37/10000 [00:04<13:21, 12.43it/s, episode_reward=64, running_reward=41.2]

Episode 39:   0%|          | 40/10000 [00:04<12:26, 13.34it/s, episode_reward=64, running_reward=41.2]

Episode 40:   0%|          | 40/10000 [00:04<12:26, 13.34it/s, episode_reward=64, running_reward=41.2]

Episode 40:   0%|          | 40/10000 [00:04<12:26, 13.34it/s, episode_reward=72, running_reward=42]  

Episode 41:   0%|          | 40/10000 [00:04<12:26, 13.34it/s, episode_reward=72, running_reward=42]

Episode 41:   0%|          | 40/10000 [00:04<12:26, 13.34it/s, episode_reward=60, running_reward=42.4]

Episode 41:   0%|          | 42/10000 [00:04<13:55, 11.92it/s, episode_reward=60, running_reward=42.4]

Episode 42:   0%|          | 42/10000 [00:05<13:55, 11.92it/s, episode_reward=60, running_reward=42.4]

Episode 42:   0%|          | 42/10000 [00:05<13:55, 11.92it/s, episode_reward=61, running_reward=42.8]

Episode 43:   0%|          | 42/10000 [00:05<13:55, 11.92it/s, episode_reward=61, running_reward=42.8]

Episode 43:   0%|          | 42/10000 [00:05<13:55, 11.92it/s, episode_reward=72, running_reward=43.5]

Episode 43:   0%|          | 44/10000 [00:05<15:04, 11.00it/s, episode_reward=72, running_reward=43.5]

Episode 44:   0%|          | 44/10000 [00:05<15:04, 11.00it/s, episode_reward=72, running_reward=43.5]

Episode 44:   0%|          | 44/10000 [00:05<15:04, 11.00it/s, episode_reward=47, running_reward=43.6]

Episode 45:   0%|          | 44/10000 [00:05<15:04, 11.00it/s, episode_reward=47, running_reward=43.6]

Episode 45:   0%|          | 44/10000 [00:05<15:04, 11.00it/s, episode_reward=39, running_reward=43.5]

Episode 45:   0%|          | 46/10000 [00:05<14:14, 11.66it/s, episode_reward=39, running_reward=43.5]

Episode 46:   0%|          | 46/10000 [00:05<14:14, 11.66it/s, episode_reward=39, running_reward=43.5]

Episode 46:   0%|          | 46/10000 [00:05<14:14, 11.66it/s, episode_reward=53, running_reward=43.7]

Episode 47:   0%|          | 46/10000 [00:05<14:14, 11.66it/s, episode_reward=53, running_reward=43.7]

Episode 47:   0%|          | 46/10000 [00:05<14:14, 11.66it/s, episode_reward=31, running_reward=43.4]

Episode 47:   0%|          | 48/10000 [00:05<13:30, 12.28it/s, episode_reward=31, running_reward=43.4]

Episode 48:   0%|          | 48/10000 [00:05<13:30, 12.28it/s, episode_reward=31, running_reward=43.4]

Episode 48:   0%|          | 48/10000 [00:05<13:30, 12.28it/s, episode_reward=69, running_reward=43.9]

Episode 49:   0%|          | 48/10000 [00:05<13:30, 12.28it/s, episode_reward=69, running_reward=43.9]

Episode 49:   0%|          | 48/10000 [00:05<13:30, 12.28it/s, episode_reward=18, running_reward=43.4]

Episode 49:   0%|          | 50/10000 [00:05<13:06, 12.65it/s, episode_reward=18, running_reward=43.4]

Episode 50:   0%|          | 50/10000 [00:05<13:06, 12.65it/s, episode_reward=18, running_reward=43.4]

Episode 50:   0%|          | 50/10000 [00:05<13:06, 12.65it/s, episode_reward=18, running_reward=42.9]

Episode 51:   0%|          | 50/10000 [00:05<13:06, 12.65it/s, episode_reward=18, running_reward=42.9]

Episode 51:   0%|          | 50/10000 [00:05<13:06, 12.65it/s, episode_reward=18, running_reward=42.4]

Episode 52:   0%|          | 50/10000 [00:05<13:06, 12.65it/s, episode_reward=18, running_reward=42.4]

Episode 52:   0%|          | 50/10000 [00:05<13:06, 12.65it/s, episode_reward=49, running_reward=42.5]

Episode 52:   1%|          | 53/10000 [00:05<11:12, 14.78it/s, episode_reward=49, running_reward=42.5]

Episode 53:   1%|          | 53/10000 [00:05<11:12, 14.78it/s, episode_reward=49, running_reward=42.5]

Episode 53:   1%|          | 53/10000 [00:05<11:12, 14.78it/s, episode_reward=52, running_reward=42.7]

Episode 54:   1%|          | 53/10000 [00:05<11:12, 14.78it/s, episode_reward=52, running_reward=42.7]

Episode 54:   1%|          | 53/10000 [00:05<11:12, 14.78it/s, episode_reward=31, running_reward=42.5]

Episode 54:   1%|          | 55/10000 [00:05<11:14, 14.75it/s, episode_reward=31, running_reward=42.5]

Episode 55:   1%|          | 55/10000 [00:05<11:14, 14.75it/s, episode_reward=31, running_reward=42.5]

Episode 55:   1%|          | 55/10000 [00:05<11:14, 14.75it/s, episode_reward=34, running_reward=42.4]

Episode 56:   1%|          | 55/10000 [00:06<11:14, 14.75it/s, episode_reward=34, running_reward=42.4]

Episode 56:   1%|          | 55/10000 [00:06<11:14, 14.75it/s, episode_reward=58, running_reward=42.6]

Episode 56:   1%|          | 57/10000 [00:06<11:38, 14.23it/s, episode_reward=58, running_reward=42.6]

Episode 57:   1%|          | 57/10000 [00:06<11:38, 14.23it/s, episode_reward=58, running_reward=42.6]

Episode 57:   1%|          | 57/10000 [00:06<11:38, 14.23it/s, episode_reward=73, running_reward=43.2]

Episode 58:   1%|          | 57/10000 [00:06<11:38, 14.23it/s, episode_reward=73, running_reward=43.2]

Episode 58:   1%|          | 57/10000 [00:06<11:38, 14.23it/s, episode_reward=121, running_reward=44.5]

Episode 58:   1%|          | 59/10000 [00:06<15:38, 10.59it/s, episode_reward=121, running_reward=44.5]

Episode 59:   1%|          | 59/10000 [00:06<15:38, 10.59it/s, episode_reward=121, running_reward=44.5]

Episode 59:   1%|          | 59/10000 [00:06<15:38, 10.59it/s, episode_reward=136, running_reward=46]  

Episode 60:   1%|          | 59/10000 [00:06<15:38, 10.59it/s, episode_reward=136, running_reward=46]

Episode 60:   1%|          | 59/10000 [00:06<15:38, 10.59it/s, episode_reward=59, running_reward=46.2]

Episode 60:   1%|          | 61/10000 [00:06<18:41,  8.86it/s, episode_reward=59, running_reward=46.2]

Episode 61:   1%|          | 61/10000 [00:06<18:41,  8.86it/s, episode_reward=59, running_reward=46.2]

Episode 61:   1%|          | 61/10000 [00:06<18:41,  8.86it/s, episode_reward=55, running_reward=46.4]

Episode 62:   1%|          | 61/10000 [00:06<18:41,  8.86it/s, episode_reward=55, running_reward=46.4]

Episode 62:   1%|          | 61/10000 [00:06<18:41,  8.86it/s, episode_reward=112, running_reward=47.4]

Episode 62:   1%|          | 63/10000 [00:06<19:42,  8.40it/s, episode_reward=112, running_reward=47.4]

Episode 63:   1%|          | 63/10000 [00:07<19:42,  8.40it/s, episode_reward=112, running_reward=47.4]

Episode 63:   1%|          | 63/10000 [00:07<19:42,  8.40it/s, episode_reward=169, running_reward=49.3]

Episode 63:   1%|          | 64/10000 [00:07<23:52,  6.94it/s, episode_reward=169, running_reward=49.3]

Episode 64:   1%|          | 64/10000 [00:07<23:52,  6.94it/s, episode_reward=169, running_reward=49.3]

Episode 64:   1%|          | 64/10000 [00:07<23:52,  6.94it/s, episode_reward=182, running_reward=51.3]

Episode 64:   1%|          | 65/10000 [00:07<28:38,  5.78it/s, episode_reward=182, running_reward=51.3]

Episode 65:   1%|          | 65/10000 [00:07<28:38,  5.78it/s, episode_reward=182, running_reward=51.3]

Episode 65:   1%|          | 65/10000 [00:07<28:38,  5.78it/s, episode_reward=113, running_reward=52.3]

Episode 65:   1%|          | 66/10000 [00:07<28:50,  5.74it/s, episode_reward=113, running_reward=52.3]

Episode 66:   1%|          | 66/10000 [00:07<28:50,  5.74it/s, episode_reward=113, running_reward=52.3]

Episode 66:   1%|          | 66/10000 [00:07<28:50,  5.74it/s, episode_reward=129, running_reward=53.4]

Episode 66:   1%|          | 67/10000 [00:07<29:48,  5.55it/s, episode_reward=129, running_reward=53.4]

Episode 67:   1%|          | 67/10000 [00:07<29:48,  5.55it/s, episode_reward=129, running_reward=53.4]

Episode 67:   1%|          | 67/10000 [00:07<29:48,  5.55it/s, episode_reward=36, running_reward=53.2] 

Episode 68:   1%|          | 67/10000 [00:07<29:48,  5.55it/s, episode_reward=36, running_reward=53.2]

Episode 68:   1%|          | 67/10000 [00:07<29:48,  5.55it/s, episode_reward=37, running_reward=52.9]

Episode 68:   1%|          | 69/10000 [00:07<21:53,  7.56it/s, episode_reward=37, running_reward=52.9]

Episode 69:   1%|          | 69/10000 [00:08<21:53,  7.56it/s, episode_reward=37, running_reward=52.9]

Episode 69:   1%|          | 69/10000 [00:08<21:53,  7.56it/s, episode_reward=137, running_reward=54.1]

Episode 69:   1%|          | 70/10000 [00:08<24:54,  6.64it/s, episode_reward=137, running_reward=54.1]

Episode 70:   1%|          | 70/10000 [00:08<24:54,  6.64it/s, episode_reward=137, running_reward=54.1]

Episode 70:   1%|          | 70/10000 [00:08<24:54,  6.64it/s, episode_reward=44, running_reward=54]   

Episode 71:   1%|          | 70/10000 [00:08<24:54,  6.64it/s, episode_reward=44, running_reward=54]

Episode 71:   1%|          | 70/10000 [00:08<24:54,  6.64it/s, episode_reward=100, running_reward=54.6]

Episode 71:   1%|          | 72/10000 [00:08<22:42,  7.29it/s, episode_reward=100, running_reward=54.6]

Episode 72:   1%|          | 72/10000 [00:08<22:42,  7.29it/s, episode_reward=100, running_reward=54.6]

Episode 72:   1%|          | 72/10000 [00:08<22:42,  7.29it/s, episode_reward=35, running_reward=54.4] 

Episode 73:   1%|          | 72/10000 [00:08<22:42,  7.29it/s, episode_reward=35, running_reward=54.4]

Episode 73:   1%|          | 72/10000 [00:08<22:42,  7.29it/s, episode_reward=38, running_reward=54.1]

Episode 73:   1%|          | 74/10000 [00:08<18:03,  9.16it/s, episode_reward=38, running_reward=54.1]

Episode 74:   1%|          | 74/10000 [00:08<18:03,  9.16it/s, episode_reward=38, running_reward=54.1]

Episode 74:   1%|          | 74/10000 [00:08<18:03,  9.16it/s, episode_reward=34, running_reward=53.9]

Episode 75:   1%|          | 74/10000 [00:08<18:03,  9.16it/s, episode_reward=34, running_reward=53.9]

Episode 75:   1%|          | 74/10000 [00:08<18:03,  9.16it/s, episode_reward=41, running_reward=53.7]

Episode 75:   1%|          | 76/10000 [00:08<15:28, 10.69it/s, episode_reward=41, running_reward=53.7]

Episode 76:   1%|          | 76/10000 [00:08<15:28, 10.69it/s, episode_reward=41, running_reward=53.7]

Episode 76:   1%|          | 76/10000 [00:08<15:28, 10.69it/s, episode_reward=41, running_reward=53.5]

Episode 77:   1%|          | 76/10000 [00:09<15:28, 10.69it/s, episode_reward=41, running_reward=53.5]

Episode 77:   1%|          | 76/10000 [00:09<15:28, 10.69it/s, episode_reward=200, running_reward=55.4]

Episode 77:   1%|          | 78/10000 [00:09<20:29,  8.07it/s, episode_reward=200, running_reward=55.4]

Episode 78:   1%|          | 78/10000 [00:09<20:29,  8.07it/s, episode_reward=200, running_reward=55.4]

Episode 78:   1%|          | 78/10000 [00:09<20:29,  8.07it/s, episode_reward=161, running_reward=56.7]

Episode 79:   1%|          | 78/10000 [00:09<20:29,  8.07it/s, episode_reward=161, running_reward=56.7]

Episode 79:   1%|          | 78/10000 [00:09<20:29,  8.07it/s, episode_reward=30, running_reward=56.4] 

Episode 79:   1%|          | 80/10000 [00:09<21:55,  7.54it/s, episode_reward=30, running_reward=56.4]

Episode 80:   1%|          | 80/10000 [00:09<21:55,  7.54it/s, episode_reward=30, running_reward=56.4]

Episode 80:   1%|          | 80/10000 [00:09<21:55,  7.54it/s, episode_reward=200, running_reward=58.2]

Episode 80:   1%|          | 81/10000 [00:09<27:29,  6.01it/s, episode_reward=200, running_reward=58.2]

Episode 81:   1%|          | 81/10000 [00:09<27:29,  6.01it/s, episode_reward=200, running_reward=58.2]

Episode 81:   1%|          | 81/10000 [00:09<27:29,  6.01it/s, episode_reward=100, running_reward=58.7]

Episode 81:   1%|          | 82/10000 [00:09<27:25,  6.03it/s, episode_reward=100, running_reward=58.7]

Episode 82:   1%|          | 82/10000 [00:10<27:25,  6.03it/s, episode_reward=100, running_reward=58.7]

Episode 82:   1%|          | 82/10000 [00:10<27:25,  6.03it/s, episode_reward=186, running_reward=60.2]

Episode 82:   1%|          | 83/10000 [00:10<32:01,  5.16it/s, episode_reward=186, running_reward=60.2]

Episode 83:   1%|          | 83/10000 [00:10<32:01,  5.16it/s, episode_reward=186, running_reward=60.2]

Episode 83:   1%|          | 83/10000 [00:10<32:01,  5.16it/s, episode_reward=200, running_reward=61.9]

Episode 83:   1%|          | 84/10000 [00:10<36:41,  4.50it/s, episode_reward=200, running_reward=61.9]

Episode 84:   1%|          | 84/10000 [00:10<36:41,  4.50it/s, episode_reward=200, running_reward=61.9]

Episode 84:   1%|          | 84/10000 [00:10<36:41,  4.50it/s, episode_reward=200, running_reward=63.5]

Episode 84:   1%|          | 85/10000 [00:10<40:30,  4.08it/s, episode_reward=200, running_reward=63.5]

Episode 85:   1%|          | 85/10000 [00:11<40:30,  4.08it/s, episode_reward=200, running_reward=63.5]

Episode 85:   1%|          | 85/10000 [00:11<40:30,  4.08it/s, episode_reward=200, running_reward=65.1]

Episode 85:   1%|          | 86/10000 [00:11<43:15,  3.82it/s, episode_reward=200, running_reward=65.1]

Episode 86:   1%|          | 86/10000 [00:11<43:15,  3.82it/s, episode_reward=200, running_reward=65.1]

Episode 86:   1%|          | 86/10000 [00:11<43:15,  3.82it/s, episode_reward=200, running_reward=66.7]

Episode 86:   1%|          | 87/10000 [00:11<45:19,  3.64it/s, episode_reward=200, running_reward=66.7]

Episode 87:   1%|          | 87/10000 [00:11<45:19,  3.64it/s, episode_reward=200, running_reward=66.7]

Episode 87:   1%|          | 87/10000 [00:11<45:19,  3.64it/s, episode_reward=200, running_reward=68.2]

Episode 87:   1%|          | 88/10000 [00:11<46:59,  3.52it/s, episode_reward=200, running_reward=68.2]

Episode 88:   1%|          | 88/10000 [00:11<46:59,  3.52it/s, episode_reward=200, running_reward=68.2]

Episode 88:   1%|          | 88/10000 [00:11<46:59,  3.52it/s, episode_reward=200, running_reward=69.7]

Episode 88:   1%|          | 89/10000 [00:11<48:02,  3.44it/s, episode_reward=200, running_reward=69.7]

Episode 89:   1%|          | 89/10000 [00:12<48:02,  3.44it/s, episode_reward=200, running_reward=69.7]

Episode 89:   1%|          | 89/10000 [00:12<48:02,  3.44it/s, episode_reward=128, running_reward=70.3]

Episode 89:   1%|          | 90/10000 [00:12<43:40,  3.78it/s, episode_reward=128, running_reward=70.3]

Episode 90:   1%|          | 90/10000 [00:12<43:40,  3.78it/s, episode_reward=128, running_reward=70.3]

Episode 90:   1%|          | 90/10000 [00:12<43:40,  3.78it/s, episode_reward=185, running_reward=71.6]

Episode 90:   1%|          | 91/10000 [00:12<44:32,  3.71it/s, episode_reward=185, running_reward=71.6]

Episode 91:   1%|          | 91/10000 [00:12<44:32,  3.71it/s, episode_reward=185, running_reward=71.6]

Episode 91:   1%|          | 91/10000 [00:12<44:32,  3.71it/s, episode_reward=200, running_reward=73]  

Episode 91:   1%|          | 92/10000 [00:12<46:24,  3.56it/s, episode_reward=200, running_reward=73]

Episode 92:   1%|          | 92/10000 [00:13<46:24,  3.56it/s, episode_reward=200, running_reward=73]

Episode 92:   1%|          | 92/10000 [00:13<46:24,  3.56it/s, episode_reward=194, running_reward=74.3]

Episode 92:   1%|          | 93/10000 [00:13<47:18,  3.49it/s, episode_reward=194, running_reward=74.3]

Episode 93:   1%|          | 93/10000 [00:13<47:18,  3.49it/s, episode_reward=194, running_reward=74.3]

Episode 93:   1%|          | 93/10000 [00:13<47:18,  3.49it/s, episode_reward=137, running_reward=74.9]

Episode 93:   1%|          | 94/10000 [00:13<43:34,  3.79it/s, episode_reward=137, running_reward=74.9]

Episode 94:   1%|          | 94/10000 [00:13<43:34,  3.79it/s, episode_reward=137, running_reward=74.9]

Episode 94:   1%|          | 94/10000 [00:13<43:34,  3.79it/s, episode_reward=200, running_reward=76.2]

Episode 94:   1%|          | 95/10000 [00:13<46:18,  3.57it/s, episode_reward=200, running_reward=76.2]

Episode 95:   1%|          | 95/10000 [00:13<46:18,  3.57it/s, episode_reward=200, running_reward=76.2]

Episode 95:   1%|          | 95/10000 [00:13<46:18,  3.57it/s, episode_reward=125, running_reward=76.8]

Episode 95:   1%|          | 96/10000 [00:13<43:12,  3.82it/s, episode_reward=125, running_reward=76.8]

Episode 96:   1%|          | 96/10000 [00:14<43:12,  3.82it/s, episode_reward=125, running_reward=76.8]

Episode 96:   1%|          | 96/10000 [00:14<43:12,  3.82it/s, episode_reward=157, running_reward=77.6]

Episode 96:   1%|          | 97/10000 [00:14<42:32,  3.88it/s, episode_reward=157, running_reward=77.6]

Episode 97:   1%|          | 97/10000 [00:14<42:32,  3.88it/s, episode_reward=157, running_reward=77.6]

Episode 97:   1%|          | 97/10000 [00:14<42:32,  3.88it/s, episode_reward=132, running_reward=78.1]

Episode 97:   1%|          | 98/10000 [00:14<39:52,  4.14it/s, episode_reward=132, running_reward=78.1]

Episode 98:   1%|          | 98/10000 [00:14<39:52,  4.14it/s, episode_reward=132, running_reward=78.1]

Episode 98:   1%|          | 98/10000 [00:14<39:52,  4.14it/s, episode_reward=36, running_reward=77.7] 

Episode 99:   1%|          | 98/10000 [00:14<39:52,  4.14it/s, episode_reward=36, running_reward=77.7]

Episode 99:   1%|          | 98/10000 [00:14<39:52,  4.14it/s, episode_reward=30, running_reward=77.2]

Episode 99:   1%|          | 100/10000 [00:14<25:37,  6.44it/s, episode_reward=30, running_reward=77.2]

Episode 100:   1%|          | 100/10000 [00:14<25:37,  6.44it/s, episode_reward=30, running_reward=77.2]

Episode 100:   1%|          | 100/10000 [00:14<25:37,  6.44it/s, episode_reward=82, running_reward=77.7]

Episode 100:   1%|          | 101/10000 [00:14<24:32,  6.72it/s, episode_reward=82, running_reward=77.7]

Episode 101:   1%|          | 101/10000 [00:14<24:32,  6.72it/s, episode_reward=82, running_reward=77.7]

Episode 101:   1%|          | 101/10000 [00:14<24:32,  6.72it/s, episode_reward=28, running_reward=77.7]

Episode 102:   1%|          | 101/10000 [00:14<24:32,  6.72it/s, episode_reward=28, running_reward=77.7]

Episode 102:   1%|          | 101/10000 [00:14<24:32,  6.72it/s, episode_reward=73, running_reward=77.8]

Episode 102:   1%|          | 103/10000 [00:14<19:58,  8.26it/s, episode_reward=73, running_reward=77.8]

Episode 103:   1%|          | 103/10000 [00:14<19:58,  8.26it/s, episode_reward=73, running_reward=77.8]

Episode 103:   1%|          | 103/10000 [00:14<19:58,  8.26it/s, episode_reward=28, running_reward=77.9]

Episode 104:   1%|          | 103/10000 [00:14<19:58,  8.26it/s, episode_reward=28, running_reward=77.9]

Episode 104:   1%|          | 103/10000 [00:14<19:58,  8.26it/s, episode_reward=29, running_reward=77.9]

Episode 105:   1%|          | 103/10000 [00:14<19:58,  8.26it/s, episode_reward=29, running_reward=77.9]

Episode 105:   1%|          | 103/10000 [00:14<19:58,  8.26it/s, episode_reward=36, running_reward=77.7]

Episode 105:   1%|          | 106/10000 [00:14<14:26, 11.42it/s, episode_reward=36, running_reward=77.7]

Episode 106:   1%|          | 106/10000 [00:14<14:26, 11.42it/s, episode_reward=36, running_reward=77.7]

Episode 106:   1%|          | 106/10000 [00:14<14:26, 11.42it/s, episode_reward=25, running_reward=77.7]

Episode 107:   1%|          | 106/10000 [00:14<14:26, 11.42it/s, episode_reward=25, running_reward=77.7]

Episode 107:   1%|          | 106/10000 [00:14<14:26, 11.42it/s, episode_reward=37, running_reward=77.8]

Episode 107:   1%|          | 108/10000 [00:14<12:35, 13.09it/s, episode_reward=37, running_reward=77.8]

Episode 108:   1%|          | 108/10000 [00:14<12:35, 13.09it/s, episode_reward=37, running_reward=77.8]

Episode 108:   1%|          | 108/10000 [00:14<12:35, 13.09it/s, episode_reward=32, running_reward=78]  

Episode 109:   1%|          | 108/10000 [00:15<12:35, 13.09it/s, episode_reward=32, running_reward=78]

Episode 109:   1%|          | 108/10000 [00:15<12:35, 13.09it/s, episode_reward=40, running_reward=78]

Episode 109:   1%|          | 110/10000 [00:15<11:37, 14.18it/s, episode_reward=40, running_reward=78]

Episode 110:   1%|          | 110/10000 [00:15<11:37, 14.18it/s, episode_reward=40, running_reward=78]

Episode 110:   1%|          | 110/10000 [00:15<11:37, 14.18it/s, episode_reward=40, running_reward=78]

Episode 111:   1%|          | 110/10000 [00:15<11:37, 14.18it/s, episode_reward=40, running_reward=78]

Episode 111:   1%|          | 110/10000 [00:15<11:37, 14.18it/s, episode_reward=24, running_reward=77.5]

Episode 111:   1%|          | 112/10000 [00:15<10:39, 15.47it/s, episode_reward=24, running_reward=77.5]

Episode 112:   1%|          | 112/10000 [00:15<10:39, 15.47it/s, episode_reward=24, running_reward=77.5]

Episode 112:   1%|          | 112/10000 [00:15<10:39, 15.47it/s, episode_reward=39, running_reward=77.6]

Episode 113:   1%|          | 112/10000 [00:15<10:39, 15.47it/s, episode_reward=39, running_reward=77.6]

Episode 113:   1%|          | 112/10000 [00:15<10:39, 15.47it/s, episode_reward=36, running_reward=76.8]

Episode 113:   1%|          | 114/10000 [00:15<10:31, 15.66it/s, episode_reward=36, running_reward=76.8]

Episode 114:   1%|          | 114/10000 [00:15<10:31, 15.66it/s, episode_reward=36, running_reward=76.8]

Episode 114:   1%|          | 114/10000 [00:15<10:31, 15.66it/s, episode_reward=20, running_reward=76.7]

Episode 115:   1%|          | 114/10000 [00:15<10:31, 15.66it/s, episode_reward=20, running_reward=76.7]

Episode 115:   1%|          | 114/10000 [00:15<10:31, 15.66it/s, episode_reward=74, running_reward=77.3]

Episode 115:   1%|          | 116/10000 [00:15<11:10, 14.74it/s, episode_reward=74, running_reward=77.3]

Episode 116:   1%|          | 116/10000 [00:15<11:10, 14.74it/s, episode_reward=74, running_reward=77.3]

Episode 116:   1%|          | 116/10000 [00:15<11:10, 14.74it/s, episode_reward=50, running_reward=77.3]

Episode 117:   1%|          | 116/10000 [00:15<11:10, 14.74it/s, episode_reward=50, running_reward=77.3]

Episode 117:   1%|          | 116/10000 [00:15<11:10, 14.74it/s, episode_reward=35, running_reward=77.3]

Episode 117:   1%|          | 118/10000 [00:15<11:25, 14.42it/s, episode_reward=35, running_reward=77.3]

Episode 118:   1%|          | 118/10000 [00:15<11:25, 14.42it/s, episode_reward=35, running_reward=77.3]

Episode 118:   1%|          | 118/10000 [00:15<11:25, 14.42it/s, episode_reward=44, running_reward=77.5]

Episode 119:   1%|          | 118/10000 [00:15<11:25, 14.42it/s, episode_reward=44, running_reward=77.5]

Episode 119:   1%|          | 118/10000 [00:15<11:25, 14.42it/s, episode_reward=43, running_reward=77.6]

Episode 119:   1%|          | 120/10000 [00:15<11:33, 14.24it/s, episode_reward=43, running_reward=77.6]

Episode 120:   1%|          | 120/10000 [00:15<11:33, 14.24it/s, episode_reward=43, running_reward=77.6]

Episode 120:   1%|          | 120/10000 [00:15<11:33, 14.24it/s, episode_reward=42, running_reward=77.6]

Episode 121:   1%|          | 120/10000 [00:15<11:33, 14.24it/s, episode_reward=42, running_reward=77.6]

Episode 121:   1%|          | 120/10000 [00:15<11:33, 14.24it/s, episode_reward=50, running_reward=77.9]

Episode 121:   1%|          | 122/10000 [00:15<11:56, 13.78it/s, episode_reward=50, running_reward=77.9]

Episode 122:   1%|          | 122/10000 [00:15<11:56, 13.78it/s, episode_reward=50, running_reward=77.9]

Episode 122:   1%|          | 122/10000 [00:15<11:56, 13.78it/s, episode_reward=68, running_reward=78.3]

Episode 123:   1%|          | 122/10000 [00:16<11:56, 13.78it/s, episode_reward=68, running_reward=78.3]

Episode 123:   1%|          | 122/10000 [00:16<11:56, 13.78it/s, episode_reward=49, running_reward=78.6]

Episode 123:   1%|          | 124/10000 [00:16<13:05, 12.57it/s, episode_reward=49, running_reward=78.6]

Episode 124:   1%|          | 124/10000 [00:16<13:05, 12.57it/s, episode_reward=49, running_reward=78.6]

Episode 124:   1%|          | 124/10000 [00:16<13:05, 12.57it/s, episode_reward=55, running_reward=78.8]

Episode 125:   1%|          | 124/10000 [00:16<13:05, 12.57it/s, episode_reward=55, running_reward=78.8]

Episode 125:   1%|          | 124/10000 [00:16<13:05, 12.57it/s, episode_reward=66, running_reward=79.2]

Episode 125:   1%|▏         | 126/10000 [00:16<14:03, 11.71it/s, episode_reward=66, running_reward=79.2]

Episode 126:   1%|▏         | 126/10000 [00:16<14:03, 11.71it/s, episode_reward=66, running_reward=79.2]

Episode 126:   1%|▏         | 126/10000 [00:16<14:03, 11.71it/s, episode_reward=110, running_reward=79.5]

Episode 127:   1%|▏         | 126/10000 [00:16<14:03, 11.71it/s, episode_reward=110, running_reward=79.5]

Episode 127:   1%|▏         | 126/10000 [00:16<14:03, 11.71it/s, episode_reward=57, running_reward=78.9] 

Episode 127:   1%|▏         | 128/10000 [00:16<16:44,  9.82it/s, episode_reward=57, running_reward=78.9]

Episode 128:   1%|▏         | 128/10000 [00:16<16:44,  9.82it/s, episode_reward=57, running_reward=78.9]

Episode 128:   1%|▏         | 128/10000 [00:16<16:44,  9.82it/s, episode_reward=66, running_reward=79.2]

Episode 129:   1%|▏         | 128/10000 [00:16<16:44,  9.82it/s, episode_reward=66, running_reward=79.2]

Episode 129:   1%|▏         | 128/10000 [00:16<16:44,  9.82it/s, episode_reward=62, running_reward=79.3]

Episode 129:   1%|▏         | 130/10000 [00:16<16:52,  9.75it/s, episode_reward=62, running_reward=79.3]

Episode 130:   1%|▏         | 130/10000 [00:16<16:52,  9.75it/s, episode_reward=62, running_reward=79.3]

Episode 130:   1%|▏         | 130/10000 [00:16<16:52,  9.75it/s, episode_reward=71, running_reward=79.5]

Episode 131:   1%|▏         | 130/10000 [00:17<16:52,  9.75it/s, episode_reward=71, running_reward=79.5]

Episode 131:   1%|▏         | 130/10000 [00:17<16:52,  9.75it/s, episode_reward=152, running_reward=80.7]

Episode 131:   1%|▏         | 132/10000 [00:17<20:38,  7.96it/s, episode_reward=152, running_reward=80.7]

Episode 132:   1%|▏         | 132/10000 [00:17<20:38,  7.96it/s, episode_reward=152, running_reward=80.7]

Episode 132:   1%|▏         | 132/10000 [00:17<20:38,  7.96it/s, episode_reward=84, running_reward=81.2] 

Episode 132:   1%|▏         | 133/10000 [00:17<20:51,  7.89it/s, episode_reward=84, running_reward=81.2]

Episode 133:   1%|▏         | 133/10000 [00:17<20:51,  7.89it/s, episode_reward=84, running_reward=81.2]

Episode 133:   1%|▏         | 133/10000 [00:17<20:51,  7.89it/s, episode_reward=98, running_reward=82]  

Episode 133:   1%|▏         | 134/10000 [00:17<21:53,  7.51it/s, episode_reward=98, running_reward=82]

Episode 134:   1%|▏         | 134/10000 [00:17<21:53,  7.51it/s, episode_reward=98, running_reward=82]

Episode 134:   1%|▏         | 134/10000 [00:17<21:53,  7.51it/s, episode_reward=200, running_reward=83.8]

Episode 134:   1%|▏         | 135/10000 [00:17<28:34,  5.75it/s, episode_reward=200, running_reward=83.8]

Episode 135:   1%|▏         | 135/10000 [00:17<28:34,  5.75it/s, episode_reward=200, running_reward=83.8]

Episode 135:   1%|▏         | 135/10000 [00:17<28:34,  5.75it/s, episode_reward=74, running_reward=83.8] 

Episode 135:   1%|▏         | 136/10000 [00:17<26:39,  6.17it/s, episode_reward=74, running_reward=83.8]

Episode 136:   1%|▏         | 136/10000 [00:17<26:39,  6.17it/s, episode_reward=74, running_reward=83.8]

Episode 136:   1%|▏         | 136/10000 [00:17<26:39,  6.17it/s, episode_reward=90, running_reward=84]  

Episode 136:   1%|▏         | 137/10000 [00:17<25:56,  6.34it/s, episode_reward=90, running_reward=84]

Episode 137:   1%|▏         | 137/10000 [00:18<25:56,  6.34it/s, episode_reward=90, running_reward=84]

Episode 137:   1%|▏         | 137/10000 [00:18<25:56,  6.34it/s, episode_reward=100, running_reward=84.7]

Episode 137:   1%|▏         | 138/10000 [00:18<25:55,  6.34it/s, episode_reward=100, running_reward=84.7]

Episode 138:   1%|▏         | 138/10000 [00:18<25:55,  6.34it/s, episode_reward=100, running_reward=84.7]

Episode 138:   1%|▏         | 138/10000 [00:18<25:55,  6.34it/s, episode_reward=94, running_reward=85.4] 

Episode 138:   1%|▏         | 139/10000 [00:18<25:28,  6.45it/s, episode_reward=94, running_reward=85.4]

Episode 139:   1%|▏         | 139/10000 [00:18<25:28,  6.45it/s, episode_reward=94, running_reward=85.4]

Episode 139:   1%|▏         | 139/10000 [00:18<25:28,  6.45it/s, episode_reward=74, running_reward=85.5]

Episode 139:   1%|▏         | 140/10000 [00:18<24:25,  6.73it/s, episode_reward=74, running_reward=85.5]

Episode 140:   1%|▏         | 140/10000 [00:18<24:25,  6.73it/s, episode_reward=74, running_reward=85.5]

Episode 140:   1%|▏         | 140/10000 [00:18<24:25,  6.73it/s, episode_reward=88, running_reward=85.7]

Episode 140:   1%|▏         | 141/10000 [00:18<24:45,  6.64it/s, episode_reward=88, running_reward=85.7]

Episode 141:   1%|▏         | 141/10000 [00:18<24:45,  6.64it/s, episode_reward=88, running_reward=85.7]

Episode 141:   1%|▏         | 141/10000 [00:18<24:45,  6.64it/s, episode_reward=105, running_reward=86.1]

Episode 141:   1%|▏         | 142/10000 [00:18<25:38,  6.41it/s, episode_reward=105, running_reward=86.1]

Episode 142:   1%|▏         | 142/10000 [00:18<25:38,  6.41it/s, episode_reward=105, running_reward=86.1]

Episode 142:   1%|▏         | 142/10000 [00:18<25:38,  6.41it/s, episode_reward=91, running_reward=86.4] 

Episode 142:   1%|▏         | 143/10000 [00:18<25:01,  6.56it/s, episode_reward=91, running_reward=86.4]

Episode 143:   1%|▏         | 143/10000 [00:18<25:01,  6.56it/s, episode_reward=91, running_reward=86.4]

Episode 143:   1%|▏         | 143/10000 [00:18<25:01,  6.56it/s, episode_reward=76, running_reward=86.5]

Episode 143:   1%|▏         | 144/10000 [00:18<23:38,  6.95it/s, episode_reward=76, running_reward=86.5]

Episode 144:   1%|▏         | 144/10000 [00:19<23:38,  6.95it/s, episode_reward=76, running_reward=86.5]

Episode 144:   1%|▏         | 144/10000 [00:19<23:38,  6.95it/s, episode_reward=71, running_reward=86.7]

Episode 144:   1%|▏         | 145/10000 [00:19<22:11,  7.40it/s, episode_reward=71, running_reward=86.7]

Episode 145:   1%|▏         | 145/10000 [00:19<22:11,  7.40it/s, episode_reward=71, running_reward=86.7]

Episode 145:   1%|▏         | 145/10000 [00:19<22:11,  7.40it/s, episode_reward=95, running_reward=87.3]

Episode 145:   1%|▏         | 146/10000 [00:19<22:59,  7.14it/s, episode_reward=95, running_reward=87.3]

Episode 146:   1%|▏         | 146/10000 [00:19<22:59,  7.14it/s, episode_reward=95, running_reward=87.3]

Episode 146:   1%|▏         | 146/10000 [00:19<22:59,  7.14it/s, episode_reward=95, running_reward=87.7]

Episode 146:   1%|▏         | 147/10000 [00:19<24:06,  6.81it/s, episode_reward=95, running_reward=87.7]

Episode 147:   1%|▏         | 147/10000 [00:19<24:06,  6.81it/s, episode_reward=95, running_reward=87.7]

Episode 147:   1%|▏         | 147/10000 [00:19<24:06,  6.81it/s, episode_reward=106, running_reward=88.4]

Episode 147:   1%|▏         | 148/10000 [00:19<25:44,  6.38it/s, episode_reward=106, running_reward=88.4]

Episode 148:   1%|▏         | 148/10000 [00:19<25:44,  6.38it/s, episode_reward=106, running_reward=88.4]

Episode 148:   1%|▏         | 148/10000 [00:19<25:44,  6.38it/s, episode_reward=92, running_reward=88.7] 

Episode 148:   1%|▏         | 149/10000 [00:19<25:09,  6.53it/s, episode_reward=92, running_reward=88.7]

Episode 149:   1%|▏         | 149/10000 [00:19<25:09,  6.53it/s, episode_reward=92, running_reward=88.7]

Episode 149:   1%|▏         | 149/10000 [00:19<25:09,  6.53it/s, episode_reward=116, running_reward=89.7]

Episode 149:   2%|▏         | 150/10000 [00:19<26:36,  6.17it/s, episode_reward=116, running_reward=89.7]

Episode 150:   2%|▏         | 150/10000 [00:20<26:36,  6.17it/s, episode_reward=116, running_reward=89.7]

Episode 150:   2%|▏         | 150/10000 [00:20<26:36,  6.17it/s, episode_reward=73, running_reward=90.2] 

Episode 150:   2%|▏         | 151/10000 [00:20<24:38,  6.66it/s, episode_reward=73, running_reward=90.2]

Episode 151:   2%|▏         | 151/10000 [00:20<24:38,  6.66it/s, episode_reward=73, running_reward=90.2]

Episode 151:   2%|▏         | 151/10000 [00:20<24:38,  6.66it/s, episode_reward=108, running_reward=91.1]

Episode 151:   2%|▏         | 152/10000 [00:20<25:40,  6.39it/s, episode_reward=108, running_reward=91.1]

Episode 152:   2%|▏         | 152/10000 [00:20<25:40,  6.39it/s, episode_reward=108, running_reward=91.1]

Episode 152:   2%|▏         | 152/10000 [00:20<25:40,  6.39it/s, episode_reward=140, running_reward=92]  

Episode 152:   2%|▏         | 153/10000 [00:20<28:57,  5.67it/s, episode_reward=140, running_reward=92]

Episode 153:   2%|▏         | 153/10000 [00:20<28:57,  5.67it/s, episode_reward=140, running_reward=92]

Episode 153:   2%|▏         | 153/10000 [00:20<28:57,  5.67it/s, episode_reward=117, running_reward=92.7]

Episode 153:   2%|▏         | 154/10000 [00:20<29:27,  5.57it/s, episode_reward=117, running_reward=92.7]

Episode 154:   2%|▏         | 154/10000 [00:20<29:27,  5.57it/s, episode_reward=117, running_reward=92.7]

Episode 154:   2%|▏         | 154/10000 [00:20<29:27,  5.57it/s, episode_reward=111, running_reward=93.5]

Episode 154:   2%|▏         | 155/10000 [00:20<29:07,  5.63it/s, episode_reward=111, running_reward=93.5]

Episode 155:   2%|▏         | 155/10000 [00:20<29:07,  5.63it/s, episode_reward=111, running_reward=93.5]

Episode 155:   2%|▏         | 155/10000 [00:20<29:07,  5.63it/s, episode_reward=75, running_reward=93.9] 

Episode 155:   2%|▏         | 156/10000 [00:20<26:20,  6.23it/s, episode_reward=75, running_reward=93.9]

Episode 156:   2%|▏         | 156/10000 [00:21<26:20,  6.23it/s, episode_reward=75, running_reward=93.9]

Episode 156:   2%|▏         | 156/10000 [00:21<26:20,  6.23it/s, episode_reward=89, running_reward=94.2]

Episode 156:   2%|▏         | 157/10000 [00:21<25:21,  6.47it/s, episode_reward=89, running_reward=94.2]

Episode 157:   2%|▏         | 157/10000 [00:21<25:21,  6.47it/s, episode_reward=89, running_reward=94.2]

Episode 157:   2%|▏         | 157/10000 [00:21<25:21,  6.47it/s, episode_reward=54, running_reward=94]  

Episode 158:   2%|▏         | 157/10000 [00:21<25:21,  6.47it/s, episode_reward=54, running_reward=94]

Episode 158:   2%|▏         | 157/10000 [00:21<25:21,  6.47it/s, episode_reward=78, running_reward=93.6]

Episode 158:   2%|▏         | 159/10000 [00:21<21:49,  7.51it/s, episode_reward=78, running_reward=93.6]

Episode 159:   2%|▏         | 159/10000 [00:21<21:49,  7.51it/s, episode_reward=78, running_reward=93.6]

Episode 159:   2%|▏         | 159/10000 [00:21<21:49,  7.51it/s, episode_reward=50, running_reward=92.7]

Episode 160:   2%|▏         | 159/10000 [00:21<21:49,  7.51it/s, episode_reward=50, running_reward=92.7]

Episode 160:   2%|▏         | 159/10000 [00:21<21:49,  7.51it/s, episode_reward=44, running_reward=92.5]

Episode 160:   2%|▏         | 161/10000 [00:21<18:20,  8.94it/s, episode_reward=44, running_reward=92.5]

Episode 161:   2%|▏         | 161/10000 [00:21<18:20,  8.94it/s, episode_reward=44, running_reward=92.5]

Episode 161:   2%|▏         | 161/10000 [00:21<18:20,  8.94it/s, episode_reward=123, running_reward=93.2]

Episode 161:   2%|▏         | 162/10000 [00:21<21:23,  7.66it/s, episode_reward=123, running_reward=93.2]

Episode 162:   2%|▏         | 162/10000 [00:21<21:23,  7.66it/s, episode_reward=123, running_reward=93.2]

Episode 162:   2%|▏         | 162/10000 [00:21<21:23,  7.66it/s, episode_reward=90, running_reward=93]   

Episode 162:   2%|▏         | 163/10000 [00:21<22:03,  7.43it/s, episode_reward=90, running_reward=93]

Episode 163:   2%|▏         | 163/10000 [00:21<22:03,  7.43it/s, episode_reward=90, running_reward=93]

Episode 163:   2%|▏         | 163/10000 [00:21<22:03,  7.43it/s, episode_reward=48, running_reward=91.8]

Episode 164:   2%|▏         | 163/10000 [00:21<22:03,  7.43it/s, episode_reward=48, running_reward=91.8]

Episode 164:   2%|▏         | 163/10000 [00:21<22:03,  7.43it/s, episode_reward=78, running_reward=90.8]

Episode 164:   2%|▏         | 165/10000 [00:21<20:00,  8.20it/s, episode_reward=78, running_reward=90.8]

Episode 165:   2%|▏         | 165/10000 [00:22<20:00,  8.20it/s, episode_reward=78, running_reward=90.8]

Episode 165:   2%|▏         | 165/10000 [00:22<20:00,  8.20it/s, episode_reward=117, running_reward=90.8]

Episode 165:   2%|▏         | 166/10000 [00:22<22:17,  7.35it/s, episode_reward=117, running_reward=90.8]

Episode 166:   2%|▏         | 166/10000 [00:22<22:17,  7.35it/s, episode_reward=117, running_reward=90.8]

Episode 166:   2%|▏         | 166/10000 [00:22<22:17,  7.35it/s, episode_reward=158, running_reward=91.1]

Episode 166:   2%|▏         | 167/10000 [00:22<26:32,  6.18it/s, episode_reward=158, running_reward=91.1]

Episode 167:   2%|▏         | 167/10000 [00:22<26:32,  6.18it/s, episode_reward=158, running_reward=91.1]

Episode 167:   2%|▏         | 167/10000 [00:22<26:32,  6.18it/s, episode_reward=145, running_reward=92.2]

Episode 167:   2%|▏         | 168/10000 [00:22<29:41,  5.52it/s, episode_reward=145, running_reward=92.2]

Episode 168:   2%|▏         | 168/10000 [00:22<29:41,  5.52it/s, episode_reward=145, running_reward=92.2]

Episode 168:   2%|▏         | 168/10000 [00:22<29:41,  5.52it/s, episode_reward=200, running_reward=93.8]

Episode 168:   2%|▏         | 169/10000 [00:22<35:19,  4.64it/s, episode_reward=200, running_reward=93.8]

Episode 169:   2%|▏         | 169/10000 [00:23<35:19,  4.64it/s, episode_reward=200, running_reward=93.8]

Episode 169:   2%|▏         | 169/10000 [00:23<35:19,  4.64it/s, episode_reward=186, running_reward=94.3]

Episode 169:   2%|▏         | 170/10000 [00:23<38:45,  4.23it/s, episode_reward=186, running_reward=94.3]

Episode 170:   2%|▏         | 170/10000 [00:23<38:45,  4.23it/s, episode_reward=186, running_reward=94.3]

Episode 170:   2%|▏         | 170/10000 [00:23<38:45,  4.23it/s, episode_reward=200, running_reward=95.9]

Episode 170:   2%|▏         | 171/10000 [00:23<42:07,  3.89it/s, episode_reward=200, running_reward=95.9]

Episode 171:   2%|▏         | 171/10000 [00:23<42:07,  3.89it/s, episode_reward=200, running_reward=95.9]

Episode 171:   2%|▏         | 171/10000 [00:23<42:07,  3.89it/s, episode_reward=144, running_reward=96.3]

Episode 171:   2%|▏         | 172/10000 [00:23<40:34,  4.04it/s, episode_reward=144, running_reward=96.3]

Episode 172:   2%|▏         | 172/10000 [00:24<40:34,  4.04it/s, episode_reward=144, running_reward=96.3]

Episode 172:   2%|▏         | 172/10000 [00:24<40:34,  4.04it/s, episode_reward=138, running_reward=97.3]

Episode 172:   2%|▏         | 173/10000 [00:24<39:10,  4.18it/s, episode_reward=138, running_reward=97.3]

Episode 173:   2%|▏         | 173/10000 [00:24<39:10,  4.18it/s, episode_reward=138, running_reward=97.3]

Episode 173:   2%|▏         | 173/10000 [00:24<39:10,  4.18it/s, episode_reward=119, running_reward=98.1]

Episode 173:   2%|▏         | 174/10000 [00:24<36:45,  4.45it/s, episode_reward=119, running_reward=98.1]

Episode 174:   2%|▏         | 174/10000 [00:24<36:45,  4.45it/s, episode_reward=119, running_reward=98.1]

Episode 174:   2%|▏         | 174/10000 [00:24<36:45,  4.45it/s, episode_reward=135, running_reward=99.2]

Episode 174:   2%|▏         | 175/10000 [00:24<36:00,  4.55it/s, episode_reward=135, running_reward=99.2]

Episode 175:   2%|▏         | 175/10000 [00:24<36:00,  4.55it/s, episode_reward=135, running_reward=99.2]

Episode 175:   2%|▏         | 175/10000 [00:24<36:00,  4.55it/s, episode_reward=113, running_reward=99.9]

Episode 175:   2%|▏         | 176/10000 [00:24<34:12,  4.79it/s, episode_reward=113, running_reward=99.9]

Episode 176:   2%|▏         | 176/10000 [00:24<34:12,  4.79it/s, episode_reward=113, running_reward=99.9]

Episode 176:   2%|▏         | 176/10000 [00:24<34:12,  4.79it/s, episode_reward=106, running_reward=101] 

Episode 176:   2%|▏         | 177/10000 [00:24<32:33,  5.03it/s, episode_reward=106, running_reward=101]

Episode 177:   2%|▏         | 177/10000 [00:24<32:33,  5.03it/s, episode_reward=106, running_reward=101]

Episode 177:   2%|▏         | 177/10000 [00:24<32:33,  5.03it/s, episode_reward=85, running_reward=99.4]

Episode 177:   2%|▏         | 178/10000 [00:24<29:34,  5.54it/s, episode_reward=85, running_reward=99.4]

Episode 178:   2%|▏         | 178/10000 [00:25<29:34,  5.54it/s, episode_reward=85, running_reward=99.4]

Episode 178:   2%|▏         | 178/10000 [00:25<29:34,  5.54it/s, episode_reward=98, running_reward=98.7]

Episode 178:   2%|▏         | 179/10000 [00:25<28:25,  5.76it/s, episode_reward=98, running_reward=98.7]

Episode 179:   2%|▏         | 179/10000 [00:25<28:25,  5.76it/s, episode_reward=98, running_reward=98.7]

Episode 179:   2%|▏         | 179/10000 [00:25<28:25,  5.76it/s, episode_reward=57, running_reward=99]  

Episode 180:   2%|▏         | 179/10000 [00:25<28:25,  5.76it/s, episode_reward=57, running_reward=99]

Episode 180:   2%|▏         | 179/10000 [00:25<28:25,  5.76it/s, episode_reward=62, running_reward=97.6]

Episode 180:   2%|▏         | 181/10000 [00:25<22:52,  7.16it/s, episode_reward=62, running_reward=97.6]

Episode 181:   2%|▏         | 181/10000 [00:25<22:52,  7.16it/s, episode_reward=62, running_reward=97.6]

Episode 181:   2%|▏         | 181/10000 [00:25<22:52,  7.16it/s, episode_reward=51, running_reward=97.1]

Episode 182:   2%|▏         | 181/10000 [00:25<22:52,  7.16it/s, episode_reward=51, running_reward=97.1]

Episode 182:   2%|▏         | 181/10000 [00:25<22:52,  7.16it/s, episode_reward=92, running_reward=96.2]

Episode 182:   2%|▏         | 183/10000 [00:25<21:44,  7.53it/s, episode_reward=92, running_reward=96.2]

Episode 183:   2%|▏         | 183/10000 [00:25<21:44,  7.53it/s, episode_reward=92, running_reward=96.2]

Episode 183:   2%|▏         | 183/10000 [00:25<21:44,  7.53it/s, episode_reward=101, running_reward=95.2]

Episode 183:   2%|▏         | 184/10000 [00:25<22:56,  7.13it/s, episode_reward=101, running_reward=95.2]

Episode 184:   2%|▏         | 184/10000 [00:25<22:56,  7.13it/s, episode_reward=101, running_reward=95.2]

Episode 184:   2%|▏         | 184/10000 [00:25<22:56,  7.13it/s, episode_reward=51, running_reward=93.7] 

Episode 185:   2%|▏         | 184/10000 [00:25<22:56,  7.13it/s, episode_reward=51, running_reward=93.7]

Episode 185:   2%|▏         | 184/10000 [00:25<22:56,  7.13it/s, episode_reward=61, running_reward=92.3]

Episode 185:   2%|▏         | 186/10000 [00:25<19:56,  8.20it/s, episode_reward=61, running_reward=92.3]

Episode 186:   2%|▏         | 186/10000 [00:26<19:56,  8.20it/s, episode_reward=61, running_reward=92.3]

Episode 186:   2%|▏         | 186/10000 [00:26<19:56,  8.20it/s, episode_reward=99, running_reward=91.3]

Episode 186:   2%|▏         | 187/10000 [00:26<21:20,  7.66it/s, episode_reward=99, running_reward=91.3]

Episode 187:   2%|▏         | 187/10000 [00:26<21:20,  7.66it/s, episode_reward=99, running_reward=91.3]

Episode 187:   2%|▏         | 187/10000 [00:26<21:20,  7.66it/s, episode_reward=87, running_reward=90.2]

Episode 187:   2%|▏         | 188/10000 [00:26<22:14,  7.35it/s, episode_reward=87, running_reward=90.2]

Episode 188:   2%|▏         | 188/10000 [00:26<22:14,  7.35it/s, episode_reward=87, running_reward=90.2]

Episode 188:   2%|▏         | 188/10000 [00:26<22:14,  7.35it/s, episode_reward=75, running_reward=88.9]

Episode 188:   2%|▏         | 189/10000 [00:26<21:50,  7.49it/s, episode_reward=75, running_reward=88.9]

Episode 189:   2%|▏         | 189/10000 [00:26<21:50,  7.49it/s, episode_reward=75, running_reward=88.9]

Episode 189:   2%|▏         | 189/10000 [00:26<21:50,  7.49it/s, episode_reward=75, running_reward=88.4]

Episode 189:   2%|▏         | 190/10000 [00:26<21:49,  7.49it/s, episode_reward=75, running_reward=88.4]

Episode 190:   2%|▏         | 190/10000 [00:26<21:49,  7.49it/s, episode_reward=75, running_reward=88.4]

Episode 190:   2%|▏         | 190/10000 [00:26<21:49,  7.49it/s, episode_reward=99, running_reward=87.5]

Episode 190:   2%|▏         | 191/10000 [00:26<22:57,  7.12it/s, episode_reward=99, running_reward=87.5]

Episode 191:   2%|▏         | 191/10000 [00:26<22:57,  7.12it/s, episode_reward=99, running_reward=87.5]

Episode 191:   2%|▏         | 191/10000 [00:26<22:57,  7.12it/s, episode_reward=94, running_reward=86.5]

Episode 191:   2%|▏         | 192/10000 [00:26<23:31,  6.95it/s, episode_reward=94, running_reward=86.5]

Episode 192:   2%|▏         | 192/10000 [00:26<23:31,  6.95it/s, episode_reward=94, running_reward=86.5]

Episode 192:   2%|▏         | 192/10000 [00:26<23:31,  6.95it/s, episode_reward=95, running_reward=85.5]

Episode 192:   2%|▏         | 193/10000 [00:26<23:40,  6.90it/s, episode_reward=95, running_reward=85.5]

Episode 193:   2%|▏         | 193/10000 [00:27<23:40,  6.90it/s, episode_reward=95, running_reward=85.5]

Episode 193:   2%|▏         | 193/10000 [00:27<23:40,  6.90it/s, episode_reward=92, running_reward=85]  

Episode 193:   2%|▏         | 194/10000 [00:27<23:50,  6.86it/s, episode_reward=92, running_reward=85]

Episode 194:   2%|▏         | 194/10000 [00:27<23:50,  6.86it/s, episode_reward=92, running_reward=85]

Episode 194:   2%|▏         | 194/10000 [00:27<23:50,  6.86it/s, episode_reward=106, running_reward=84.1]

Episode 194:   2%|▏         | 195/10000 [00:27<24:48,  6.59it/s, episode_reward=106, running_reward=84.1]

Episode 195:   2%|▏         | 195/10000 [00:27<24:48,  6.59it/s, episode_reward=106, running_reward=84.1]

Episode 195:   2%|▏         | 195/10000 [00:27<24:48,  6.59it/s, episode_reward=115, running_reward=84]  

Episode 195:   2%|▏         | 196/10000 [00:27<26:08,  6.25it/s, episode_reward=115, running_reward=84]

Episode 196:   2%|▏         | 196/10000 [00:27<26:08,  6.25it/s, episode_reward=115, running_reward=84]

Episode 196:   2%|▏         | 196/10000 [00:27<26:08,  6.25it/s, episode_reward=123, running_reward=83.7]

Episode 196:   2%|▏         | 197/10000 [00:27<27:35,  5.92it/s, episode_reward=123, running_reward=83.7]

Episode 197:   2%|▏         | 197/10000 [00:27<27:35,  5.92it/s, episode_reward=123, running_reward=83.7]

Episode 197:   2%|▏         | 197/10000 [00:27<27:35,  5.92it/s, episode_reward=133, running_reward=83.7]

Episode 197:   2%|▏         | 198/10000 [00:27<29:31,  5.53it/s, episode_reward=133, running_reward=83.7]

Episode 198:   2%|▏         | 198/10000 [00:27<29:31,  5.53it/s, episode_reward=133, running_reward=83.7]

Episode 198:   2%|▏         | 198/10000 [00:27<29:31,  5.53it/s, episode_reward=75, running_reward=84.1] 

Episode 198:   2%|▏         | 199/10000 [00:27<26:40,  6.12it/s, episode_reward=75, running_reward=84.1]

Episode 199:   2%|▏         | 199/10000 [00:28<26:40,  6.12it/s, episode_reward=75, running_reward=84.1]

Episode 199:   2%|▏         | 199/10000 [00:28<26:40,  6.12it/s, episode_reward=122, running_reward=85] 

Episode 199:   2%|▏         | 200/10000 [00:28<28:04,  5.82it/s, episode_reward=122, running_reward=85]

Episode 200:   2%|▏         | 200/10000 [00:28<28:04,  5.82it/s, episode_reward=122, running_reward=85]

Episode 200:   2%|▏         | 200/10000 [00:28<28:04,  5.82it/s, episode_reward=110, running_reward=85.3]

Episode 200:   2%|▏         | 201/10000 [00:28<28:14,  5.78it/s, episode_reward=110, running_reward=85.3]

Episode 201:   2%|▏         | 201/10000 [00:28<28:14,  5.78it/s, episode_reward=110, running_reward=85.3]

Episode 201:   2%|▏         | 201/10000 [00:28<28:14,  5.78it/s, episode_reward=92, running_reward=85.9] 

Episode 201:   2%|▏         | 202/10000 [00:28<27:27,  5.95it/s, episode_reward=92, running_reward=85.9]

Episode 202:   2%|▏         | 202/10000 [00:28<27:27,  5.95it/s, episode_reward=92, running_reward=85.9]

Episode 202:   2%|▏         | 202/10000 [00:28<27:27,  5.95it/s, episode_reward=147, running_reward=86.7]

Episode 202:   2%|▏         | 203/10000 [00:28<30:25,  5.37it/s, episode_reward=147, running_reward=86.7]

Episode 203:   2%|▏         | 203/10000 [00:28<30:25,  5.37it/s, episode_reward=147, running_reward=86.7]

Episode 203:   2%|▏         | 203/10000 [00:28<30:25,  5.37it/s, episode_reward=190, running_reward=88.3]

Episode 203:   2%|▏         | 204/10000 [00:28<35:42,  4.57it/s, episode_reward=190, running_reward=88.3]

Episode 204:   2%|▏         | 204/10000 [00:29<35:42,  4.57it/s, episode_reward=190, running_reward=88.3]

Episode 204:   2%|▏         | 204/10000 [00:29<35:42,  4.57it/s, episode_reward=200, running_reward=90]  

Episode 204:   2%|▏         | 205/10000 [00:29<40:16,  4.05it/s, episode_reward=200, running_reward=90]

Episode 205:   2%|▏         | 205/10000 [00:29<40:16,  4.05it/s, episode_reward=200, running_reward=90]

Episode 205:   2%|▏         | 205/10000 [00:29<40:16,  4.05it/s, episode_reward=200, running_reward=91.6]

Episode 205:   2%|▏         | 206/10000 [00:29<43:26,  3.76it/s, episode_reward=200, running_reward=91.6]

Episode 206:   2%|▏         | 206/10000 [00:29<43:26,  3.76it/s, episode_reward=200, running_reward=91.6]

Episode 206:   2%|▏         | 206/10000 [00:29<43:26,  3.76it/s, episode_reward=148, running_reward=92.8]

Episode 206:   2%|▏         | 207/10000 [00:29<41:29,  3.93it/s, episode_reward=148, running_reward=92.8]

Episode 207:   2%|▏         | 207/10000 [00:30<41:29,  3.93it/s, episode_reward=148, running_reward=92.8]

Episode 207:   2%|▏         | 207/10000 [00:30<41:29,  3.93it/s, episode_reward=154, running_reward=94]  

Episode 207:   2%|▏         | 208/10000 [00:30<40:39,  4.01it/s, episode_reward=154, running_reward=94]

Episode 208:   2%|▏         | 208/10000 [00:30<40:39,  4.01it/s, episode_reward=154, running_reward=94]

Episode 208:   2%|▏         | 208/10000 [00:30<40:39,  4.01it/s, episode_reward=168, running_reward=95.4]

Episode 208:   2%|▏         | 209/10000 [00:30<41:21,  3.95it/s, episode_reward=168, running_reward=95.4]

Episode 209:   2%|▏         | 209/10000 [00:30<41:21,  3.95it/s, episode_reward=168, running_reward=95.4]

Episode 209:   2%|▏         | 209/10000 [00:30<41:21,  3.95it/s, episode_reward=192, running_reward=96.9]

Episode 209:   2%|▏         | 210/10000 [00:30<43:33,  3.75it/s, episode_reward=192, running_reward=96.9]

Episode 210:   2%|▏         | 210/10000 [00:30<43:33,  3.75it/s, episode_reward=192, running_reward=96.9]

Episode 210:   2%|▏         | 210/10000 [00:30<43:33,  3.75it/s, episode_reward=200, running_reward=98.5]

Episode 210:   2%|▏         | 211/10000 [00:30<45:37,  3.58it/s, episode_reward=200, running_reward=98.5]

Episode 211:   2%|▏         | 211/10000 [00:31<45:37,  3.58it/s, episode_reward=200, running_reward=98.5]

Episode 211:   2%|▏         | 211/10000 [00:31<45:37,  3.58it/s, episode_reward=200, running_reward=100] 

Episode 211:   2%|▏         | 212/10000 [00:31<46:50,  3.48it/s, episode_reward=200, running_reward=100]

Episode 212:   2%|▏         | 212/10000 [00:31<46:50,  3.48it/s, episode_reward=200, running_reward=100]

Episode 212:   2%|▏         | 212/10000 [00:31<46:50,  3.48it/s, episode_reward=200, running_reward=102]

Episode 212:   2%|▏         | 213/10000 [00:31<48:19,  3.38it/s, episode_reward=200, running_reward=102]

Episode 213:   2%|▏         | 213/10000 [00:31<48:19,  3.38it/s, episode_reward=200, running_reward=102]

Episode 213:   2%|▏         | 213/10000 [00:31<48:19,  3.38it/s, episode_reward=200, running_reward=104]

Episode 213:   2%|▏         | 214/10000 [00:31<49:08,  3.32it/s, episode_reward=200, running_reward=104]

Episode 214:   2%|▏         | 214/10000 [00:32<49:08,  3.32it/s, episode_reward=200, running_reward=104]

Episode 214:   2%|▏         | 214/10000 [00:32<49:08,  3.32it/s, episode_reward=200, running_reward=105]

Episode 214:   2%|▏         | 215/10000 [00:32<49:15,  3.31it/s, episode_reward=200, running_reward=105]

Episode 215:   2%|▏         | 215/10000 [00:32<49:15,  3.31it/s, episode_reward=200, running_reward=105]

Episode 215:   2%|▏         | 215/10000 [00:32<49:15,  3.31it/s, episode_reward=189, running_reward=106]

Episode 215:   2%|▏         | 216/10000 [00:32<48:48,  3.34it/s, episode_reward=189, running_reward=106]

Episode 216:   2%|▏         | 216/10000 [00:32<48:48,  3.34it/s, episode_reward=189, running_reward=106]

Episode 216:   2%|▏         | 216/10000 [00:32<48:48,  3.34it/s, episode_reward=197, running_reward=108]

Episode 216:   2%|▏         | 217/10000 [00:32<49:02,  3.32it/s, episode_reward=197, running_reward=108]

Episode 217:   2%|▏         | 217/10000 [00:33<49:02,  3.32it/s, episode_reward=197, running_reward=108]

Episode 217:   2%|▏         | 217/10000 [00:33<49:02,  3.32it/s, episode_reward=200, running_reward=110]

Episode 217:   2%|▏         | 218/10000 [00:33<49:28,  3.30it/s, episode_reward=200, running_reward=110]

Episode 218:   2%|▏         | 218/10000 [00:33<49:28,  3.30it/s, episode_reward=200, running_reward=110]

Episode 218:   2%|▏         | 218/10000 [00:33<49:28,  3.30it/s, episode_reward=200, running_reward=111]

Episode 218:   2%|▏         | 219/10000 [00:33<49:47,  3.27it/s, episode_reward=200, running_reward=111]

Episode 219:   2%|▏         | 219/10000 [00:33<49:47,  3.27it/s, episode_reward=200, running_reward=111]

Episode 219:   2%|▏         | 219/10000 [00:33<49:47,  3.27it/s, episode_reward=200, running_reward=113]

Episode 219:   2%|▏         | 220/10000 [00:33<49:56,  3.26it/s, episode_reward=200, running_reward=113]

Episode 220:   2%|▏         | 220/10000 [00:33<49:56,  3.26it/s, episode_reward=200, running_reward=113]

Episode 220:   2%|▏         | 220/10000 [00:33<49:56,  3.26it/s, episode_reward=200, running_reward=114]

Episode 220:   2%|▏         | 221/10000 [00:33<49:47,  3.27it/s, episode_reward=200, running_reward=114]

Episode 221:   2%|▏         | 221/10000 [00:34<49:47,  3.27it/s, episode_reward=200, running_reward=114]

Episode 221:   2%|▏         | 221/10000 [00:34<49:47,  3.27it/s, episode_reward=200, running_reward=116]

Episode 221:   2%|▏         | 222/10000 [00:34<49:41,  3.28it/s, episode_reward=200, running_reward=116]

Episode 222:   2%|▏         | 222/10000 [00:34<49:41,  3.28it/s, episode_reward=200, running_reward=116]

Episode 222:   2%|▏         | 222/10000 [00:34<49:41,  3.28it/s, episode_reward=200, running_reward=117]

Episode 222:   2%|▏         | 223/10000 [00:34<49:56,  3.26it/s, episode_reward=200, running_reward=117]

Episode 223:   2%|▏         | 223/10000 [00:34<49:56,  3.26it/s, episode_reward=200, running_reward=117]

Episode 223:   2%|▏         | 223/10000 [00:34<49:56,  3.26it/s, episode_reward=200, running_reward=119]

Episode 223:   2%|▏         | 224/10000 [00:34<50:16,  3.24it/s, episode_reward=200, running_reward=119]

Episode 224:   2%|▏         | 224/10000 [00:35<50:16,  3.24it/s, episode_reward=200, running_reward=119]

Episode 224:   2%|▏         | 224/10000 [00:35<50:16,  3.24it/s, episode_reward=200, running_reward=120]

Episode 224:   2%|▏         | 225/10000 [00:35<49:58,  3.26it/s, episode_reward=200, running_reward=120]

Episode 225:   2%|▏         | 225/10000 [00:35<49:58,  3.26it/s, episode_reward=200, running_reward=120]

Episode 225:   2%|▏         | 225/10000 [00:35<49:58,  3.26it/s, episode_reward=200, running_reward=121]

Episode 225:   2%|▏         | 226/10000 [00:35<50:24,  3.23it/s, episode_reward=200, running_reward=121]

Episode 226:   2%|▏         | 226/10000 [00:35<50:24,  3.23it/s, episode_reward=200, running_reward=121]

Episode 226:   2%|▏         | 226/10000 [00:35<50:24,  3.23it/s, episode_reward=200, running_reward=122]

Episode 226:   2%|▏         | 227/10000 [00:35<50:24,  3.23it/s, episode_reward=200, running_reward=122]

Episode 227:   2%|▏         | 227/10000 [00:36<50:24,  3.23it/s, episode_reward=200, running_reward=122]

Episode 227:   2%|▏         | 227/10000 [00:36<50:24,  3.23it/s, episode_reward=200, running_reward=124]

Episode 227:   2%|▏         | 228/10000 [00:36<50:10,  3.25it/s, episode_reward=200, running_reward=124]

Episode 228:   2%|▏         | 228/10000 [00:36<50:10,  3.25it/s, episode_reward=200, running_reward=124]

Episode 228:   2%|▏         | 228/10000 [00:36<50:10,  3.25it/s, episode_reward=200, running_reward=125]

Episode 228:   2%|▏         | 229/10000 [00:36<50:05,  3.25it/s, episode_reward=200, running_reward=125]

Episode 229:   2%|▏         | 229/10000 [00:36<50:05,  3.25it/s, episode_reward=200, running_reward=125]

Episode 229:   2%|▏         | 229/10000 [00:36<50:05,  3.25it/s, episode_reward=200, running_reward=126]

Episode 229:   2%|▏         | 230/10000 [00:36<50:20,  3.23it/s, episode_reward=200, running_reward=126]

Episode 230:   2%|▏         | 230/10000 [00:37<50:20,  3.23it/s, episode_reward=200, running_reward=126]

Episode 230:   2%|▏         | 230/10000 [00:37<50:20,  3.23it/s, episode_reward=183, running_reward=128]

Episode 230:   2%|▏         | 231/10000 [00:37<49:10,  3.31it/s, episode_reward=183, running_reward=128]

Episode 231:   2%|▏         | 231/10000 [00:37<49:10,  3.31it/s, episode_reward=183, running_reward=128]

Episode 231:   2%|▏         | 231/10000 [00:37<49:10,  3.31it/s, episode_reward=167, running_reward=128]

Episode 231:   2%|▏         | 232/10000 [00:37<47:04,  3.46it/s, episode_reward=167, running_reward=128]

Episode 232:   2%|▏         | 232/10000 [00:37<47:04,  3.46it/s, episode_reward=167, running_reward=128]

Episode 232:   2%|▏         | 232/10000 [00:37<47:04,  3.46it/s, episode_reward=134, running_reward=128]

Episode 232:   2%|▏         | 233/10000 [00:37<43:21,  3.75it/s, episode_reward=134, running_reward=128]

Episode 233:   2%|▏         | 233/10000 [00:37<43:21,  3.75it/s, episode_reward=134, running_reward=128]

Episode 233:   2%|▏         | 233/10000 [00:37<43:21,  3.75it/s, episode_reward=141, running_reward=129]

Episode 233:   2%|▏         | 234/10000 [00:37<41:28,  3.92it/s, episode_reward=141, running_reward=129]

Episode 234:   2%|▏         | 234/10000 [00:37<41:28,  3.92it/s, episode_reward=141, running_reward=129]

Episode 234:   2%|▏         | 234/10000 [00:37<41:28,  3.92it/s, episode_reward=125, running_reward=128]

Episode 234:   2%|▏         | 235/10000 [00:37<38:39,  4.21it/s, episode_reward=125, running_reward=128]

Episode 235:   2%|▏         | 235/10000 [00:38<38:39,  4.21it/s, episode_reward=125, running_reward=128]

Episode 235:   2%|▏         | 235/10000 [00:38<38:39,  4.21it/s, episode_reward=153, running_reward=129]

Episode 235:   2%|▏         | 236/10000 [00:38<38:37,  4.21it/s, episode_reward=153, running_reward=129]

Episode 236:   2%|▏         | 236/10000 [00:38<38:37,  4.21it/s, episode_reward=153, running_reward=129]

Episode 236:   2%|▏         | 236/10000 [00:38<38:37,  4.21it/s, episode_reward=124, running_reward=129]

Episode 236:   2%|▏         | 237/10000 [00:38<36:23,  4.47it/s, episode_reward=124, running_reward=129]

Episode 237:   2%|▏         | 237/10000 [00:38<36:23,  4.47it/s, episode_reward=124, running_reward=129]

Episode 237:   2%|▏         | 237/10000 [00:38<36:23,  4.47it/s, episode_reward=113, running_reward=129]

Episode 237:   2%|▏         | 238/10000 [00:38<34:15,  4.75it/s, episode_reward=113, running_reward=129]

Episode 238:   2%|▏         | 238/10000 [00:38<34:15,  4.75it/s, episode_reward=113, running_reward=129]

Episode 238:   2%|▏         | 238/10000 [00:38<34:15,  4.75it/s, episode_reward=125, running_reward=129]

Episode 238:   2%|▏         | 239/10000 [00:38<33:43,  4.82it/s, episode_reward=125, running_reward=129]

Episode 239:   2%|▏         | 239/10000 [00:38<33:43,  4.82it/s, episode_reward=125, running_reward=129]

Episode 239:   2%|▏         | 239/10000 [00:38<33:43,  4.82it/s, episode_reward=112, running_reward=130]

Episode 239:   2%|▏         | 240/10000 [00:38<32:21,  5.03it/s, episode_reward=112, running_reward=130]

Episode 240:   2%|▏         | 240/10000 [00:39<32:21,  5.03it/s, episode_reward=112, running_reward=130]

Episode 240:   2%|▏         | 240/10000 [00:39<32:21,  5.03it/s, episode_reward=118, running_reward=130]

Episode 240:   2%|▏         | 241/10000 [00:39<31:35,  5.15it/s, episode_reward=118, running_reward=130]

Episode 241:   2%|▏         | 241/10000 [00:39<31:35,  5.15it/s, episode_reward=118, running_reward=130]

Episode 241:   2%|▏         | 241/10000 [00:39<31:35,  5.15it/s, episode_reward=124, running_reward=130]

Episode 241:   2%|▏         | 242/10000 [00:39<32:13,  5.05it/s, episode_reward=124, running_reward=130]

Episode 242:   2%|▏         | 242/10000 [00:39<32:13,  5.05it/s, episode_reward=124, running_reward=130]

Episode 242:   2%|▏         | 242/10000 [00:39<32:13,  5.05it/s, episode_reward=119, running_reward=131]

Episode 242:   2%|▏         | 243/10000 [00:39<32:22,  5.02it/s, episode_reward=119, running_reward=131]

Episode 243:   2%|▏         | 243/10000 [00:39<32:22,  5.02it/s, episode_reward=119, running_reward=131]

Episode 243:   2%|▏         | 243/10000 [00:39<32:22,  5.02it/s, episode_reward=113, running_reward=131]

Episode 243:   2%|▏         | 244/10000 [00:39<31:47,  5.11it/s, episode_reward=113, running_reward=131]

Episode 244:   2%|▏         | 244/10000 [00:39<31:47,  5.11it/s, episode_reward=113, running_reward=131]

Episode 244:   2%|▏         | 244/10000 [00:39<31:47,  5.11it/s, episode_reward=115, running_reward=131]

Episode 244:   2%|▏         | 245/10000 [00:39<31:04,  5.23it/s, episode_reward=115, running_reward=131]

Episode 245:   2%|▏         | 245/10000 [00:39<31:04,  5.23it/s, episode_reward=115, running_reward=131]

Episode 245:   2%|▏         | 245/10000 [00:39<31:04,  5.23it/s, episode_reward=23, running_reward=131] 

Episode 246:   2%|▏         | 245/10000 [00:39<31:04,  5.23it/s, episode_reward=23, running_reward=131]

Episode 246:   2%|▏         | 245/10000 [00:39<31:04,  5.23it/s, episode_reward=18, running_reward=130]

Episode 247:   2%|▏         | 245/10000 [00:40<31:04,  5.23it/s, episode_reward=18, running_reward=130]

Episode 247:   2%|▏         | 245/10000 [00:40<31:04,  5.23it/s, episode_reward=82, running_reward=130]

Episode 247:   2%|▏         | 248/10000 [00:40<19:41,  8.26it/s, episode_reward=82, running_reward=130]

Episode 248:   2%|▏         | 248/10000 [00:40<19:41,  8.26it/s, episode_reward=82, running_reward=130]

Episode 248:   2%|▏         | 248/10000 [00:40<19:41,  8.26it/s, episode_reward=117, running_reward=130]

Episode 248:   2%|▏         | 249/10000 [00:40<21:54,  7.42it/s, episode_reward=117, running_reward=130]

Episode 249:   2%|▏         | 249/10000 [00:40<21:54,  7.42it/s, episode_reward=117, running_reward=130]

Episode 249:   2%|▏         | 249/10000 [00:40<21:54,  7.42it/s, episode_reward=21, running_reward=129] 

Episode 250:   2%|▏         | 249/10000 [00:40<21:54,  7.42it/s, episode_reward=21, running_reward=129]

Episode 250:   2%|▏         | 249/10000 [00:40<21:54,  7.42it/s, episode_reward=128, running_reward=130]

Episode 250:   3%|▎         | 251/10000 [00:40<20:45,  7.83it/s, episode_reward=128, running_reward=130]

Episode 251:   3%|▎         | 251/10000 [00:40<20:45,  7.83it/s, episode_reward=128, running_reward=130]

Episode 251:   3%|▎         | 251/10000 [00:40<20:45,  7.83it/s, episode_reward=123, running_reward=130]

Episode 251:   3%|▎         | 252/10000 [00:40<23:09,  7.02it/s, episode_reward=123, running_reward=130]

Episode 252:   3%|▎         | 252/10000 [00:40<23:09,  7.02it/s, episode_reward=123, running_reward=130]

Episode 252:   3%|▎         | 252/10000 [00:40<23:09,  7.02it/s, episode_reward=133, running_reward=130]

Episode 252:   3%|▎         | 253/10000 [00:40<25:46,  6.30it/s, episode_reward=133, running_reward=130]

Episode 253:   3%|▎         | 253/10000 [00:41<25:46,  6.30it/s, episode_reward=133, running_reward=130]

Episode 253:   3%|▎         | 253/10000 [00:41<25:46,  6.30it/s, episode_reward=130, running_reward=130]

Episode 253:   3%|▎         | 254/10000 [00:41<27:35,  5.89it/s, episode_reward=130, running_reward=130]

Episode 254:   3%|▎         | 254/10000 [00:41<27:35,  5.89it/s, episode_reward=130, running_reward=130]

Episode 254:   3%|▎         | 254/10000 [00:41<27:35,  5.89it/s, episode_reward=145, running_reward=130]

Episode 254:   3%|▎         | 255/10000 [00:41<30:02,  5.41it/s, episode_reward=145, running_reward=130]

Episode 255:   3%|▎         | 255/10000 [00:41<30:02,  5.41it/s, episode_reward=145, running_reward=130]

Episode 255:   3%|▎         | 255/10000 [00:41<30:02,  5.41it/s, episode_reward=121, running_reward=131]

Episode 255:   3%|▎         | 256/10000 [00:41<30:16,  5.36it/s, episode_reward=121, running_reward=131]

Episode 256:   3%|▎         | 256/10000 [00:41<30:16,  5.36it/s, episode_reward=121, running_reward=131]

Episode 256:   3%|▎         | 256/10000 [00:41<30:16,  5.36it/s, episode_reward=128, running_reward=131]

Episode 256:   3%|▎         | 257/10000 [00:41<30:57,  5.24it/s, episode_reward=128, running_reward=131]

Episode 257:   3%|▎         | 257/10000 [00:41<30:57,  5.24it/s, episode_reward=128, running_reward=131]

Episode 257:   3%|▎         | 257/10000 [00:41<30:57,  5.24it/s, episode_reward=116, running_reward=132]

Episode 257:   3%|▎         | 258/10000 [00:41<30:26,  5.33it/s, episode_reward=116, running_reward=132]

Episode 258:   3%|▎         | 258/10000 [00:42<30:26,  5.33it/s, episode_reward=116, running_reward=132]

Episode 258:   3%|▎         | 258/10000 [00:42<30:26,  5.33it/s, episode_reward=115, running_reward=132]

Episode 258:   3%|▎         | 259/10000 [00:42<30:16,  5.36it/s, episode_reward=115, running_reward=132]

Episode 259:   3%|▎         | 259/10000 [00:42<30:16,  5.36it/s, episode_reward=115, running_reward=132]

Episode 259:   3%|▎         | 259/10000 [00:42<30:16,  5.36it/s, episode_reward=119, running_reward=133]

Episode 259:   3%|▎         | 260/10000 [00:42<30:17,  5.36it/s, episode_reward=119, running_reward=133]

Episode 260:   3%|▎         | 260/10000 [00:42<30:17,  5.36it/s, episode_reward=119, running_reward=133]

Episode 260:   3%|▎         | 260/10000 [00:42<30:17,  5.36it/s, episode_reward=126, running_reward=133]

Episode 260:   3%|▎         | 261/10000 [00:42<30:38,  5.30it/s, episode_reward=126, running_reward=133]

Episode 261:   3%|▎         | 261/10000 [00:42<30:38,  5.30it/s, episode_reward=126, running_reward=133]

Episode 261:   3%|▎         | 261/10000 [00:42<30:38,  5.30it/s, episode_reward=132, running_reward=134]

Episode 261:   3%|▎         | 262/10000 [00:42<31:37,  5.13it/s, episode_reward=132, running_reward=134]

Episode 262:   3%|▎         | 262/10000 [00:42<31:37,  5.13it/s, episode_reward=132, running_reward=134]

Episode 262:   3%|▎         | 262/10000 [00:42<31:37,  5.13it/s, episode_reward=137, running_reward=134]

Episode 262:   3%|▎         | 263/10000 [00:42<32:40,  4.97it/s, episode_reward=137, running_reward=134]

Episode 263:   3%|▎         | 263/10000 [00:43<32:40,  4.97it/s, episode_reward=137, running_reward=134]

Episode 263:   3%|▎         | 263/10000 [00:43<32:40,  4.97it/s, episode_reward=169, running_reward=135]

Episode 263:   3%|▎         | 264/10000 [00:43<35:34,  4.56it/s, episode_reward=169, running_reward=135]

Episode 264:   3%|▎         | 264/10000 [00:43<35:34,  4.56it/s, episode_reward=169, running_reward=135]

Episode 264:   3%|▎         | 264/10000 [00:43<35:34,  4.56it/s, episode_reward=177, running_reward=136]

Episode 264:   3%|▎         | 265/10000 [00:43<39:03,  4.15it/s, episode_reward=177, running_reward=136]

Episode 265:   3%|▎         | 265/10000 [00:43<39:03,  4.15it/s, episode_reward=177, running_reward=136]

Episode 265:   3%|▎         | 265/10000 [00:43<39:03,  4.15it/s, episode_reward=152, running_reward=137]

Episode 265:   3%|▎         | 266/10000 [00:43<38:58,  4.16it/s, episode_reward=152, running_reward=137]

Episode 266:   3%|▎         | 266/10000 [00:44<38:58,  4.16it/s, episode_reward=152, running_reward=137]

Episode 266:   3%|▎         | 266/10000 [00:44<38:58,  4.16it/s, episode_reward=200, running_reward=137]

Episode 266:   3%|▎         | 267/10000 [00:44<42:18,  3.83it/s, episode_reward=200, running_reward=137]

Episode 267:   3%|▎         | 267/10000 [00:44<42:18,  3.83it/s, episode_reward=200, running_reward=137]

Episode 267:   3%|▎         | 267/10000 [00:44<42:18,  3.83it/s, episode_reward=196, running_reward=138]

Episode 267:   3%|▎         | 268/10000 [00:44<44:30,  3.64it/s, episode_reward=196, running_reward=138]

Episode 268:   3%|▎         | 268/10000 [00:44<44:30,  3.64it/s, episode_reward=196, running_reward=138]

Episode 268:   3%|▎         | 268/10000 [00:44<44:30,  3.64it/s, episode_reward=200, running_reward=138]

Episode 268:   3%|▎         | 269/10000 [00:44<46:15,  3.51it/s, episode_reward=200, running_reward=138]

Episode 269:   3%|▎         | 269/10000 [00:44<46:15,  3.51it/s, episode_reward=200, running_reward=138]

Episode 269:   3%|▎         | 269/10000 [00:44<46:15,  3.51it/s, episode_reward=168, running_reward=137]

Episode 269:   3%|▎         | 270/10000 [00:44<45:00,  3.60it/s, episode_reward=168, running_reward=137]

Episode 270:   3%|▎         | 270/10000 [00:45<45:00,  3.60it/s, episode_reward=168, running_reward=137]

Episode 270:   3%|▎         | 270/10000 [00:45<45:00,  3.60it/s, episode_reward=194, running_reward=137]

Episode 270:   3%|▎         | 271/10000 [00:45<45:50,  3.54it/s, episode_reward=194, running_reward=137]

Episode 271:   3%|▎         | 271/10000 [00:45<45:50,  3.54it/s, episode_reward=194, running_reward=137]

Episode 271:   3%|▎         | 271/10000 [00:45<45:50,  3.54it/s, episode_reward=191, running_reward=138]

Episode 271:   3%|▎         | 272/10000 [00:45<46:54,  3.46it/s, episode_reward=191, running_reward=138]

Episode 272:   3%|▎         | 272/10000 [00:45<46:54,  3.46it/s, episode_reward=191, running_reward=138]

Episode 272:   3%|▎         | 272/10000 [00:45<46:54,  3.46it/s, episode_reward=181, running_reward=138]

Episode 272:   3%|▎         | 273/10000 [00:45<47:02,  3.45it/s, episode_reward=181, running_reward=138]

Episode 273:   3%|▎         | 273/10000 [00:46<47:02,  3.45it/s, episode_reward=181, running_reward=138]

Episode 273:   3%|▎         | 273/10000 [00:46<47:02,  3.45it/s, episode_reward=169, running_reward=139]

Episode 273:   3%|▎         | 274/10000 [00:46<45:36,  3.55it/s, episode_reward=169, running_reward=139]

Episode 274:   3%|▎         | 274/10000 [00:46<45:36,  3.55it/s, episode_reward=169, running_reward=139]

Episode 274:   3%|▎         | 274/10000 [00:46<45:36,  3.55it/s, episode_reward=200, running_reward=139]

Episode 274:   3%|▎         | 275/10000 [00:46<46:58,  3.45it/s, episode_reward=200, running_reward=139]

Episode 275:   3%|▎         | 275/10000 [00:46<46:58,  3.45it/s, episode_reward=200, running_reward=139]

Episode 275:   3%|▎         | 275/10000 [00:46<46:58,  3.45it/s, episode_reward=162, running_reward=140]

Episode 275:   3%|▎         | 276/10000 [00:46<44:53,  3.61it/s, episode_reward=162, running_reward=140]

Episode 276:   3%|▎         | 276/10000 [00:46<44:53,  3.61it/s, episode_reward=162, running_reward=140]

Episode 276:   3%|▎         | 276/10000 [00:46<44:53,  3.61it/s, episode_reward=200, running_reward=141]

Episode 276:   3%|▎         | 277/10000 [00:46<46:35,  3.48it/s, episode_reward=200, running_reward=141]

Episode 277:   3%|▎         | 277/10000 [00:47<46:35,  3.48it/s, episode_reward=200, running_reward=141]

Episode 277:   3%|▎         | 277/10000 [00:47<46:35,  3.48it/s, episode_reward=168, running_reward=142]

Episode 277:   3%|▎         | 278/10000 [00:47<45:06,  3.59it/s, episode_reward=168, running_reward=142]

Episode 278:   3%|▎         | 278/10000 [00:47<45:06,  3.59it/s, episode_reward=168, running_reward=142]

Episode 278:   3%|▎         | 278/10000 [00:47<45:06,  3.59it/s, episode_reward=198, running_reward=143]

Episode 278:   3%|▎         | 279/10000 [00:47<46:41,  3.47it/s, episode_reward=198, running_reward=143]

Episode 279:   3%|▎         | 279/10000 [00:47<46:41,  3.47it/s, episode_reward=198, running_reward=143]

Episode 279:   3%|▎         | 279/10000 [00:47<46:41,  3.47it/s, episode_reward=187, running_reward=144]

Episode 279:   3%|▎         | 280/10000 [00:47<46:54,  3.45it/s, episode_reward=187, running_reward=144]

Episode 280:   3%|▎         | 280/10000 [00:48<46:54,  3.45it/s, episode_reward=187, running_reward=144]

Episode 280:   3%|▎         | 280/10000 [00:48<46:54,  3.45it/s, episode_reward=200, running_reward=145]

Episode 280:   3%|▎         | 281/10000 [00:48<47:47,  3.39it/s, episode_reward=200, running_reward=145]

Episode 281:   3%|▎         | 281/10000 [00:48<47:47,  3.39it/s, episode_reward=200, running_reward=145]

Episode 281:   3%|▎         | 281/10000 [00:48<47:47,  3.39it/s, episode_reward=200, running_reward=147]

Episode 281:   3%|▎         | 282/10000 [00:48<48:30,  3.34it/s, episode_reward=200, running_reward=147]

Episode 282:   3%|▎         | 282/10000 [00:48<48:30,  3.34it/s, episode_reward=200, running_reward=147]

Episode 282:   3%|▎         | 282/10000 [00:48<48:30,  3.34it/s, episode_reward=200, running_reward=148]

Episode 282:   3%|▎         | 283/10000 [00:48<49:06,  3.30it/s, episode_reward=200, running_reward=148]

Episode 283:   3%|▎         | 283/10000 [00:48<49:06,  3.30it/s, episode_reward=200, running_reward=148]

Episode 283:   3%|▎         | 283/10000 [00:48<49:06,  3.30it/s, episode_reward=200, running_reward=149]

Episode 283:   3%|▎         | 284/10000 [00:49<49:13,  3.29it/s, episode_reward=200, running_reward=149]

Episode 284:   3%|▎         | 284/10000 [00:49<49:13,  3.29it/s, episode_reward=200, running_reward=149]

Episode 284:   3%|▎         | 284/10000 [00:49<49:13,  3.29it/s, episode_reward=200, running_reward=150]

Episode 284:   3%|▎         | 285/10000 [00:49<49:25,  3.28it/s, episode_reward=200, running_reward=150]

Episode 285:   3%|▎         | 285/10000 [00:49<49:25,  3.28it/s, episode_reward=200, running_reward=150]

Episode 285:   3%|▎         | 285/10000 [00:49<49:25,  3.28it/s, episode_reward=200, running_reward=152]

Episode 285:   3%|▎         | 286/10000 [00:49<49:53,  3.25it/s, episode_reward=200, running_reward=152]

Episode 286:   3%|▎         | 286/10000 [00:49<49:53,  3.25it/s, episode_reward=200, running_reward=152]

Episode 286:   3%|▎         | 286/10000 [00:49<49:53,  3.25it/s, episode_reward=200, running_reward=153]

Episode 286:   3%|▎         | 287/10000 [00:49<49:55,  3.24it/s, episode_reward=200, running_reward=153]

Episode 287:   3%|▎         | 287/10000 [00:50<49:55,  3.24it/s, episode_reward=200, running_reward=153]

Episode 287:   3%|▎         | 287/10000 [00:50<49:55,  3.24it/s, episode_reward=200, running_reward=154]

Episode 287:   3%|▎         | 288/10000 [00:50<49:56,  3.24it/s, episode_reward=200, running_reward=154]

Episode 288:   3%|▎         | 288/10000 [00:50<49:56,  3.24it/s, episode_reward=200, running_reward=154]

Episode 288:   3%|▎         | 288/10000 [00:50<49:56,  3.24it/s, episode_reward=200, running_reward=155]

Episode 288:   3%|▎         | 289/10000 [00:50<49:54,  3.24it/s, episode_reward=200, running_reward=155]

Episode 289:   3%|▎         | 289/10000 [00:50<49:54,  3.24it/s, episode_reward=200, running_reward=155]

Episode 289:   3%|▎         | 289/10000 [00:50<49:54,  3.24it/s, episode_reward=200, running_reward=156]

Episode 289:   3%|▎         | 290/10000 [00:50<50:16,  3.22it/s, episode_reward=200, running_reward=156]

Episode 290:   3%|▎         | 290/10000 [00:51<50:16,  3.22it/s, episode_reward=200, running_reward=156]

Episode 290:   3%|▎         | 290/10000 [00:51<50:16,  3.22it/s, episode_reward=200, running_reward=157]

Episode 290:   3%|▎         | 291/10000 [00:51<50:41,  3.19it/s, episode_reward=200, running_reward=157]

Episode 291:   3%|▎         | 291/10000 [00:51<50:41,  3.19it/s, episode_reward=200, running_reward=157]

Episode 291:   3%|▎         | 291/10000 [00:51<50:41,  3.19it/s, episode_reward=200, running_reward=158]

Episode 291:   3%|▎         | 292/10000 [00:51<50:56,  3.18it/s, episode_reward=200, running_reward=158]

Episode 292:   3%|▎         | 292/10000 [00:51<50:56,  3.18it/s, episode_reward=200, running_reward=158]

Episode 292:   3%|▎         | 292/10000 [00:51<50:56,  3.18it/s, episode_reward=200, running_reward=159]

Episode 292:   3%|▎         | 293/10000 [00:51<51:01,  3.17it/s, episode_reward=200, running_reward=159]

Episode 293:   3%|▎         | 293/10000 [00:52<51:01,  3.17it/s, episode_reward=200, running_reward=159]

Episode 293:   3%|▎         | 293/10000 [00:52<51:01,  3.17it/s, episode_reward=200, running_reward=161]

Episode 293:   3%|▎         | 294/10000 [00:52<50:42,  3.19it/s, episode_reward=200, running_reward=161]

Episode 294:   3%|▎         | 294/10000 [00:52<50:42,  3.19it/s, episode_reward=200, running_reward=161]

Episode 294:   3%|▎         | 294/10000 [00:52<50:42,  3.19it/s, episode_reward=200, running_reward=161]

Episode 294:   3%|▎         | 295/10000 [00:52<50:33,  3.20it/s, episode_reward=200, running_reward=161]

Episode 295:   3%|▎         | 295/10000 [00:52<50:33,  3.20it/s, episode_reward=200, running_reward=161]

Episode 295:   3%|▎         | 295/10000 [00:52<50:33,  3.20it/s, episode_reward=200, running_reward=162]

Episode 295:   3%|▎         | 296/10000 [00:52<50:38,  3.19it/s, episode_reward=200, running_reward=162]

Episode 296:   3%|▎         | 296/10000 [00:53<50:38,  3.19it/s, episode_reward=200, running_reward=162]

Episode 296:   3%|▎         | 296/10000 [00:53<50:38,  3.19it/s, episode_reward=200, running_reward=163]

Episode 296:   3%|▎         | 297/10000 [00:53<50:26,  3.21it/s, episode_reward=200, running_reward=163]

Episode 297:   3%|▎         | 297/10000 [00:53<50:26,  3.21it/s, episode_reward=200, running_reward=163]

Episode 297:   3%|▎         | 297/10000 [00:53<50:26,  3.21it/s, episode_reward=200, running_reward=164]

Episode 297:   3%|▎         | 298/10000 [00:53<50:05,  3.23it/s, episode_reward=200, running_reward=164]

Episode 298:   3%|▎         | 298/10000 [00:53<50:05,  3.23it/s, episode_reward=200, running_reward=164]

Episode 298:   3%|▎         | 298/10000 [00:53<50:05,  3.23it/s, episode_reward=200, running_reward=165]

Episode 298:   3%|▎         | 299/10000 [00:53<50:22,  3.21it/s, episode_reward=200, running_reward=165]

Episode 299:   3%|▎         | 299/10000 [00:53<50:22,  3.21it/s, episode_reward=200, running_reward=165]

Episode 299:   3%|▎         | 299/10000 [00:53<50:22,  3.21it/s, episode_reward=200, running_reward=166]

Episode 299:   3%|▎         | 300/10000 [00:53<50:09,  3.22it/s, episode_reward=200, running_reward=166]

Episode 300:   3%|▎         | 300/10000 [00:54<50:09,  3.22it/s, episode_reward=200, running_reward=166]

Episode 300:   3%|▎         | 300/10000 [00:54<50:09,  3.22it/s, episode_reward=200, running_reward=167]

Episode 300:   3%|▎         | 301/10000 [00:54<49:51,  3.24it/s, episode_reward=200, running_reward=167]

Episode 301:   3%|▎         | 301/10000 [00:54<49:51,  3.24it/s, episode_reward=200, running_reward=167]

Episode 301:   3%|▎         | 301/10000 [00:54<49:51,  3.24it/s, episode_reward=200, running_reward=168]

Episode 301:   3%|▎         | 302/10000 [00:54<50:23,  3.21it/s, episode_reward=200, running_reward=168]

Episode 302:   3%|▎         | 302/10000 [00:54<50:23,  3.21it/s, episode_reward=200, running_reward=168]

Episode 302:   3%|▎         | 302/10000 [00:54<50:23,  3.21it/s, episode_reward=200, running_reward=168]

Episode 302:   3%|▎         | 303/10000 [00:54<50:13,  3.22it/s, episode_reward=200, running_reward=168]

Episode 303:   3%|▎         | 303/10000 [00:55<50:13,  3.22it/s, episode_reward=200, running_reward=168]

Episode 303:   3%|▎         | 303/10000 [00:55<50:13,  3.22it/s, episode_reward=200, running_reward=168]

Episode 303:   3%|▎         | 304/10000 [00:55<50:09,  3.22it/s, episode_reward=200, running_reward=168]

Episode 304:   3%|▎         | 304/10000 [00:55<50:09,  3.22it/s, episode_reward=200, running_reward=168]

Episode 304:   3%|▎         | 304/10000 [00:55<50:09,  3.22it/s, episode_reward=200, running_reward=168]

Episode 304:   3%|▎         | 305/10000 [00:55<50:07,  3.22it/s, episode_reward=200, running_reward=168]

Episode 305:   3%|▎         | 305/10000 [00:55<50:07,  3.22it/s, episode_reward=200, running_reward=168]

Episode 305:   3%|▎         | 305/10000 [00:55<50:07,  3.22it/s, episode_reward=200, running_reward=168]

Episode 305:   3%|▎         | 306/10000 [00:55<50:32,  3.20it/s, episode_reward=200, running_reward=168]

Episode 306:   3%|▎         | 306/10000 [00:56<50:32,  3.20it/s, episode_reward=200, running_reward=168]

Episode 306:   3%|▎         | 306/10000 [00:56<50:32,  3.20it/s, episode_reward=200, running_reward=169]

Episode 306:   3%|▎         | 307/10000 [00:56<50:33,  3.20it/s, episode_reward=200, running_reward=169]

Episode 307:   3%|▎         | 307/10000 [00:56<50:33,  3.20it/s, episode_reward=200, running_reward=169]

Episode 307:   3%|▎         | 307/10000 [00:56<50:33,  3.20it/s, episode_reward=200, running_reward=169]

Episode 307:   3%|▎         | 308/10000 [00:56<50:40,  3.19it/s, episode_reward=200, running_reward=169]

Episode 308:   3%|▎         | 308/10000 [00:56<50:40,  3.19it/s, episode_reward=200, running_reward=169]

Episode 308:   3%|▎         | 308/10000 [00:56<50:40,  3.19it/s, episode_reward=200, running_reward=170]

Episode 308:   3%|▎         | 309/10000 [00:56<50:52,  3.18it/s, episode_reward=200, running_reward=170]

Episode 309:   3%|▎         | 309/10000 [00:57<50:52,  3.18it/s, episode_reward=200, running_reward=170]

Episode 309:   3%|▎         | 309/10000 [00:57<50:52,  3.18it/s, episode_reward=200, running_reward=170]

Episode 309:   3%|▎         | 310/10000 [00:57<50:53,  3.17it/s, episode_reward=200, running_reward=170]

Episode 310:   3%|▎         | 310/10000 [00:57<50:53,  3.17it/s, episode_reward=200, running_reward=170]

Episode 310:   3%|▎         | 310/10000 [00:57<50:53,  3.17it/s, episode_reward=200, running_reward=170]

Episode 310:   3%|▎         | 311/10000 [00:57<51:01,  3.17it/s, episode_reward=200, running_reward=170]

Episode 311:   3%|▎         | 311/10000 [00:57<51:01,  3.17it/s, episode_reward=200, running_reward=170]

Episode 311:   3%|▎         | 311/10000 [00:57<51:01,  3.17it/s, episode_reward=200, running_reward=170]

Episode 311:   3%|▎         | 312/10000 [00:57<51:01,  3.16it/s, episode_reward=200, running_reward=170]

Episode 312:   3%|▎         | 312/10000 [00:58<51:01,  3.16it/s, episode_reward=200, running_reward=170]

Episode 312:   3%|▎         | 312/10000 [00:58<51:01,  3.16it/s, episode_reward=200, running_reward=170]

Episode 312:   3%|▎         | 313/10000 [00:58<52:25,  3.08it/s, episode_reward=200, running_reward=170]

Episode 313:   3%|▎         | 313/10000 [00:58<52:25,  3.08it/s, episode_reward=200, running_reward=170]

Episode 313:   3%|▎         | 313/10000 [00:58<52:25,  3.08it/s, episode_reward=200, running_reward=170]

Episode 313:   3%|▎         | 314/10000 [00:58<52:00,  3.10it/s, episode_reward=200, running_reward=170]

Episode 314:   3%|▎         | 314/10000 [00:58<52:00,  3.10it/s, episode_reward=200, running_reward=170]

Episode 314:   3%|▎         | 314/10000 [00:58<52:00,  3.10it/s, episode_reward=200, running_reward=170]

Episode 314:   3%|▎         | 315/10000 [00:58<51:42,  3.12it/s, episode_reward=200, running_reward=170]

Episode 315:   3%|▎         | 315/10000 [00:59<51:42,  3.12it/s, episode_reward=200, running_reward=170]

Episode 315:   3%|▎         | 315/10000 [00:59<51:42,  3.12it/s, episode_reward=200, running_reward=170]

Episode 315:   3%|▎         | 316/10000 [00:59<51:08,  3.16it/s, episode_reward=200, running_reward=170]

Episode 316:   3%|▎         | 316/10000 [00:59<51:08,  3.16it/s, episode_reward=200, running_reward=170]

Episode 316:   3%|▎         | 316/10000 [00:59<51:08,  3.16it/s, episode_reward=200, running_reward=170]

Episode 316:   3%|▎         | 317/10000 [00:59<50:41,  3.18it/s, episode_reward=200, running_reward=170]

Episode 317:   3%|▎         | 317/10000 [00:59<50:41,  3.18it/s, episode_reward=200, running_reward=170]

Episode 317:   3%|▎         | 317/10000 [00:59<50:41,  3.18it/s, episode_reward=200, running_reward=170]

Episode 317:   3%|▎         | 318/10000 [00:59<50:26,  3.20it/s, episode_reward=200, running_reward=170]

Episode 318:   3%|▎         | 318/10000 [00:59<50:26,  3.20it/s, episode_reward=200, running_reward=170]

Episode 318:   3%|▎         | 318/10000 [00:59<50:26,  3.20it/s, episode_reward=200, running_reward=170]

Episode 318:   3%|▎         | 319/10000 [00:59<50:05,  3.22it/s, episode_reward=200, running_reward=170]

Episode 319:   3%|▎         | 319/10000 [01:00<50:05,  3.22it/s, episode_reward=200, running_reward=170]

Episode 319:   3%|▎         | 319/10000 [01:00<50:05,  3.22it/s, episode_reward=200, running_reward=170]

Episode 319:   3%|▎         | 320/10000 [01:00<50:08,  3.22it/s, episode_reward=200, running_reward=170]

Episode 320:   3%|▎         | 320/10000 [01:00<50:08,  3.22it/s, episode_reward=200, running_reward=170]

Episode 320:   3%|▎         | 320/10000 [01:00<50:08,  3.22it/s, episode_reward=200, running_reward=170]

Episode 320:   3%|▎         | 321/10000 [01:00<50:22,  3.20it/s, episode_reward=200, running_reward=170]

Episode 321:   3%|▎         | 321/10000 [01:00<50:22,  3.20it/s, episode_reward=200, running_reward=170]

Episode 321:   3%|▎         | 321/10000 [01:00<50:22,  3.20it/s, episode_reward=200, running_reward=170]

Episode 321:   3%|▎         | 322/10000 [01:00<50:20,  3.20it/s, episode_reward=200, running_reward=170]

Episode 322:   3%|▎         | 322/10000 [01:01<50:20,  3.20it/s, episode_reward=200, running_reward=170]

Episode 322:   3%|▎         | 322/10000 [01:01<50:20,  3.20it/s, episode_reward=200, running_reward=170]

Episode 322:   3%|▎         | 323/10000 [01:01<50:10,  3.21it/s, episode_reward=200, running_reward=170]

Episode 323:   3%|▎         | 323/10000 [01:01<50:10,  3.21it/s, episode_reward=200, running_reward=170]

Episode 323:   3%|▎         | 323/10000 [01:01<50:10,  3.21it/s, episode_reward=200, running_reward=170]

Episode 323:   3%|▎         | 324/10000 [01:01<49:52,  3.23it/s, episode_reward=200, running_reward=170]

Episode 324:   3%|▎         | 324/10000 [01:01<49:52,  3.23it/s, episode_reward=200, running_reward=170]

Episode 324:   3%|▎         | 324/10000 [01:01<49:52,  3.23it/s, episode_reward=200, running_reward=170]

Episode 324:   3%|▎         | 325/10000 [01:01<49:54,  3.23it/s, episode_reward=200, running_reward=170]

Episode 325:   3%|▎         | 325/10000 [01:02<49:54,  3.23it/s, episode_reward=200, running_reward=170]

Episode 325:   3%|▎         | 325/10000 [01:02<49:54,  3.23it/s, episode_reward=200, running_reward=170]

Episode 325:   3%|▎         | 326/10000 [01:02<49:44,  3.24it/s, episode_reward=200, running_reward=170]

Episode 326:   3%|▎         | 326/10000 [01:02<49:44,  3.24it/s, episode_reward=200, running_reward=170]

Episode 326:   3%|▎         | 326/10000 [01:02<49:44,  3.24it/s, episode_reward=200, running_reward=170]

Episode 326:   3%|▎         | 327/10000 [01:02<49:46,  3.24it/s, episode_reward=200, running_reward=170]

Episode 327:   3%|▎         | 327/10000 [01:02<49:46,  3.24it/s, episode_reward=200, running_reward=170]

Episode 327:   3%|▎         | 327/10000 [01:02<49:46,  3.24it/s, episode_reward=200, running_reward=170]

Episode 327:   3%|▎         | 328/10000 [01:02<50:28,  3.19it/s, episode_reward=200, running_reward=170]

Episode 328:   3%|▎         | 328/10000 [01:03<50:28,  3.19it/s, episode_reward=200, running_reward=170]

Episode 328:   3%|▎         | 328/10000 [01:03<50:28,  3.19it/s, episode_reward=200, running_reward=170]

Episode 328:   3%|▎         | 329/10000 [01:03<50:13,  3.21it/s, episode_reward=200, running_reward=170]

Episode 329:   3%|▎         | 329/10000 [01:03<50:13,  3.21it/s, episode_reward=200, running_reward=170]

Episode 329:   3%|▎         | 329/10000 [01:03<50:13,  3.21it/s, episode_reward=200, running_reward=170]

Episode 329:   3%|▎         | 330/10000 [01:03<49:48,  3.24it/s, episode_reward=200, running_reward=170]

Episode 330:   3%|▎         | 330/10000 [01:03<49:48,  3.24it/s, episode_reward=200, running_reward=170]

Episode 330:   3%|▎         | 330/10000 [01:03<49:48,  3.24it/s, episode_reward=200, running_reward=170]

Episode 330:   3%|▎         | 331/10000 [01:03<49:45,  3.24it/s, episode_reward=200, running_reward=170]

Episode 331:   3%|▎         | 331/10000 [01:03<49:45,  3.24it/s, episode_reward=200, running_reward=170]

Episode 331:   3%|▎         | 331/10000 [01:03<49:45,  3.24it/s, episode_reward=200, running_reward=170]

Episode 331:   3%|▎         | 332/10000 [01:03<49:36,  3.25it/s, episode_reward=200, running_reward=170]

Episode 332:   3%|▎         | 332/10000 [01:04<49:36,  3.25it/s, episode_reward=200, running_reward=170]

Episode 332:   3%|▎         | 332/10000 [01:04<49:36,  3.25it/s, episode_reward=200, running_reward=171]

Episode 332:   3%|▎         | 333/10000 [01:04<49:38,  3.25it/s, episode_reward=200, running_reward=171]

Episode 333:   3%|▎         | 333/10000 [01:04<49:38,  3.25it/s, episode_reward=200, running_reward=171]

Episode 333:   3%|▎         | 333/10000 [01:04<49:38,  3.25it/s, episode_reward=200, running_reward=172]

Episode 333:   3%|▎         | 334/10000 [01:04<50:30,  3.19it/s, episode_reward=200, running_reward=172]

Episode 334:   3%|▎         | 334/10000 [01:04<50:30,  3.19it/s, episode_reward=200, running_reward=172]

Episode 334:   3%|▎         | 334/10000 [01:04<50:30,  3.19it/s, episode_reward=200, running_reward=172]

Episode 334:   3%|▎         | 335/10000 [01:04<50:15,  3.21it/s, episode_reward=200, running_reward=172]

Episode 335:   3%|▎         | 335/10000 [01:05<50:15,  3.21it/s, episode_reward=200, running_reward=172]

Episode 335:   3%|▎         | 335/10000 [01:05<50:15,  3.21it/s, episode_reward=200, running_reward=173]

Episode 335:   3%|▎         | 336/10000 [01:05<49:43,  3.24it/s, episode_reward=200, running_reward=173]

Episode 336:   3%|▎         | 336/10000 [01:05<49:43,  3.24it/s, episode_reward=200, running_reward=173]

Episode 336:   3%|▎         | 336/10000 [01:05<49:43,  3.24it/s, episode_reward=200, running_reward=174]

Episode 336:   3%|▎         | 337/10000 [01:05<49:46,  3.24it/s, episode_reward=200, running_reward=174]

Episode 337:   3%|▎         | 337/10000 [01:05<49:46,  3.24it/s, episode_reward=200, running_reward=174]

Episode 337:   3%|▎         | 337/10000 [01:05<49:46,  3.24it/s, episode_reward=197, running_reward=174]

Episode 337:   3%|▎         | 338/10000 [01:05<49:43,  3.24it/s, episode_reward=197, running_reward=174]

Episode 338:   3%|▎         | 338/10000 [01:06<49:43,  3.24it/s, episode_reward=197, running_reward=174]

Episode 338:   3%|▎         | 338/10000 [01:06<49:43,  3.24it/s, episode_reward=200, running_reward=175]

Episode 338:   3%|▎         | 339/10000 [01:06<49:49,  3.23it/s, episode_reward=200, running_reward=175]

Episode 339:   3%|▎         | 339/10000 [01:06<49:49,  3.23it/s, episode_reward=200, running_reward=175]

Episode 339:   3%|▎         | 339/10000 [01:06<49:49,  3.23it/s, episode_reward=200, running_reward=176]

Episode 339:   3%|▎         | 340/10000 [01:06<49:48,  3.23it/s, episode_reward=200, running_reward=176]

Episode 340:   3%|▎         | 340/10000 [01:06<49:48,  3.23it/s, episode_reward=200, running_reward=176]

Episode 340:   3%|▎         | 340/10000 [01:06<49:48,  3.23it/s, episode_reward=200, running_reward=177]

Episode 340:   3%|▎         | 341/10000 [01:06<50:01,  3.22it/s, episode_reward=200, running_reward=177]

Episode 341:   3%|▎         | 341/10000 [01:07<50:01,  3.22it/s, episode_reward=200, running_reward=177]

Episode 341:   3%|▎         | 341/10000 [01:07<50:01,  3.22it/s, episode_reward=200, running_reward=178]

Episode 341:   3%|▎         | 342/10000 [01:07<50:06,  3.21it/s, episode_reward=200, running_reward=178]

Episode 342:   3%|▎         | 342/10000 [01:07<50:06,  3.21it/s, episode_reward=200, running_reward=178]

Episode 342:   3%|▎         | 342/10000 [01:07<50:06,  3.21it/s, episode_reward=200, running_reward=179]

Episode 342:   3%|▎         | 343/10000 [01:07<50:04,  3.21it/s, episode_reward=200, running_reward=179]

Episode 343:   3%|▎         | 343/10000 [01:07<50:04,  3.21it/s, episode_reward=200, running_reward=179]

Episode 343:   3%|▎         | 343/10000 [01:07<50:04,  3.21it/s, episode_reward=200, running_reward=179]

Episode 343:   3%|▎         | 344/10000 [01:07<50:00,  3.22it/s, episode_reward=200, running_reward=179]

Episode 344:   3%|▎         | 344/10000 [01:08<50:00,  3.22it/s, episode_reward=200, running_reward=179]

Episode 344:   3%|▎         | 344/10000 [01:08<50:00,  3.22it/s, episode_reward=200, running_reward=180]

Episode 344:   3%|▎         | 345/10000 [01:08<50:11,  3.21it/s, episode_reward=200, running_reward=180]

Episode 345:   3%|▎         | 345/10000 [01:08<50:11,  3.21it/s, episode_reward=200, running_reward=180]

Episode 345:   3%|▎         | 345/10000 [01:08<50:11,  3.21it/s, episode_reward=200, running_reward=182]

Episode 345:   3%|▎         | 346/10000 [01:08<50:13,  3.20it/s, episode_reward=200, running_reward=182]

Episode 346:   3%|▎         | 346/10000 [01:08<50:13,  3.20it/s, episode_reward=200, running_reward=182]

Episode 346:   3%|▎         | 346/10000 [01:08<50:13,  3.20it/s, episode_reward=200, running_reward=184]

Episode 346:   3%|▎         | 347/10000 [01:08<50:36,  3.18it/s, episode_reward=200, running_reward=184]

Episode 347:   3%|▎         | 347/10000 [01:08<50:36,  3.18it/s, episode_reward=200, running_reward=184]

Episode 347:   3%|▎         | 347/10000 [01:08<50:36,  3.18it/s, episode_reward=200, running_reward=185]

Episode 347:   3%|▎         | 348/10000 [01:08<50:18,  3.20it/s, episode_reward=200, running_reward=185]

Episode 348:   3%|▎         | 348/10000 [01:09<50:18,  3.20it/s, episode_reward=200, running_reward=185]

Episode 348:   3%|▎         | 348/10000 [01:09<50:18,  3.20it/s, episode_reward=200, running_reward=186]

Episode 348:   3%|▎         | 349/10000 [01:09<50:17,  3.20it/s, episode_reward=200, running_reward=186]

Episode 349:   3%|▎         | 349/10000 [01:09<50:17,  3.20it/s, episode_reward=200, running_reward=186]

Episode 349:   3%|▎         | 349/10000 [01:09<50:17,  3.20it/s, episode_reward=86, running_reward=186] 

Episode 349:   4%|▎         | 350/10000 [01:09<41:51,  3.84it/s, episode_reward=86, running_reward=186]

Episode 350:   4%|▎         | 350/10000 [01:09<41:51,  3.84it/s, episode_reward=86, running_reward=186]

Episode 350:   4%|▎         | 350/10000 [01:09<41:51,  3.84it/s, episode_reward=200, running_reward=187]

Episode 350:   4%|▎         | 351/10000 [01:09<44:28,  3.62it/s, episode_reward=200, running_reward=187]

Episode 351:   4%|▎         | 351/10000 [01:10<44:28,  3.62it/s, episode_reward=200, running_reward=187]

Episode 351:   4%|▎         | 351/10000 [01:10<44:28,  3.62it/s, episode_reward=200, running_reward=188]

Episode 351:   4%|▎         | 352/10000 [01:10<45:57,  3.50it/s, episode_reward=200, running_reward=188]

Episode 352:   4%|▎         | 352/10000 [01:10<45:57,  3.50it/s, episode_reward=200, running_reward=188]

Episode 352:   4%|▎         | 352/10000 [01:10<45:57,  3.50it/s, episode_reward=200, running_reward=189]

Episode 352:   4%|▎         | 353/10000 [01:10<47:06,  3.41it/s, episode_reward=200, running_reward=189]

Episode 353:   4%|▎         | 353/10000 [01:10<47:06,  3.41it/s, episode_reward=200, running_reward=189]

Episode 353:   4%|▎         | 353/10000 [01:10<47:06,  3.41it/s, episode_reward=200, running_reward=189]

Episode 353:   4%|▎         | 354/10000 [01:10<48:38,  3.31it/s, episode_reward=200, running_reward=189]

Episode 354:   4%|▎         | 354/10000 [01:10<48:38,  3.31it/s, episode_reward=200, running_reward=189]

Episode 354:   4%|▎         | 354/10000 [01:10<48:38,  3.31it/s, episode_reward=200, running_reward=190]

Episode 354:   4%|▎         | 355/10000 [01:10<49:02,  3.28it/s, episode_reward=200, running_reward=190]

Episode 355:   4%|▎         | 355/10000 [01:11<49:02,  3.28it/s, episode_reward=200, running_reward=190]

Episode 355:   4%|▎         | 355/10000 [01:11<49:02,  3.28it/s, episode_reward=200, running_reward=191]

Episode 355:   4%|▎         | 356/10000 [01:11<49:50,  3.22it/s, episode_reward=200, running_reward=191]

Episode 356:   4%|▎         | 356/10000 [01:11<49:50,  3.22it/s, episode_reward=200, running_reward=191]

Episode 356:   4%|▎         | 356/10000 [01:11<49:50,  3.22it/s, episode_reward=200, running_reward=191]

Episode 356:   4%|▎         | 357/10000 [01:11<50:17,  3.20it/s, episode_reward=200, running_reward=191]

Episode 357:   4%|▎         | 357/10000 [01:11<50:17,  3.20it/s, episode_reward=200, running_reward=191]

Episode 357:   4%|▎         | 357/10000 [01:11<50:17,  3.20it/s, episode_reward=200, running_reward=192]

Episode 357:   4%|▎         | 358/10000 [01:11<50:18,  3.19it/s, episode_reward=200, running_reward=192]

Episode 358:   4%|▎         | 358/10000 [01:12<50:18,  3.19it/s, episode_reward=200, running_reward=192]

Episode 358:   4%|▎         | 358/10000 [01:12<50:18,  3.19it/s, episode_reward=200, running_reward=193]

Episode 358:   4%|▎         | 359/10000 [01:12<50:09,  3.20it/s, episode_reward=200, running_reward=193]

Episode 359:   4%|▎         | 359/10000 [01:12<50:09,  3.20it/s, episode_reward=200, running_reward=193]

Episode 359:   4%|▎         | 359/10000 [01:12<50:09,  3.20it/s, episode_reward=200, running_reward=194]

Episode 359:   4%|▎         | 360/10000 [01:12<50:13,  3.20it/s, episode_reward=200, running_reward=194]

Episode 360:   4%|▎         | 360/10000 [01:12<50:13,  3.20it/s, episode_reward=200, running_reward=194]

Episode 360:   4%|▎         | 360/10000 [01:12<50:13,  3.20it/s, episode_reward=200, running_reward=195]

Episode 360:   4%|▎         | 361/10000 [01:12<50:11,  3.20it/s, episode_reward=200, running_reward=195]

Episode 361:   4%|▎         | 361/10000 [01:13<50:11,  3.20it/s, episode_reward=200, running_reward=195]

Episode 361:   4%|▎         | 361/10000 [01:13<50:11,  3.20it/s, episode_reward=182, running_reward=195]

Episode 361:   4%|▎         | 361/10000 [01:13<32:33,  4.93it/s, episode_reward=182, running_reward=195]


Solved at episode 361: average reward: 195.14!
CPU times: user 2min 46s, sys: 35.4 s, total: 3min 21s
Wall time: 1min 13s





## Visualization

After training, it would be good to visualize how the model performs in the environment. You can run the cells below to generate a GIF animation of one episode run of the model. Note that additional packages need to be installed for OpenAI Gym to render the environment's images correctly in Colab.

In [13]:
# Render an episode and save as a GIF file

from IPython import display as ipythondisplay
from PIL import Image
from pyvirtualdisplay import Display


display = Display(visible=0, size=(400, 300))
display.start()


def render_episode(env: gym.Env, model: tf.keras.Model, max_steps: int): 
  screen = env.render(mode='rgb_array')
  im = Image.fromarray(screen)

  images = [im]
  
  state = tf.constant(env.reset(), dtype=tf.float32)
  for i in range(1, max_steps + 1):
    state = tf.expand_dims(state, 0)
    action_probs, _ = model(state)
    action = np.argmax(np.squeeze(action_probs))

    state, _, done, _ = env.step(action)
    state = tf.constant(state, dtype=tf.float32)

    # Render screen every 10 steps
    if i % 10 == 0:
      screen = env.render(mode='rgb_array')
      images.append(Image.fromarray(screen))
  
    if done:
      break
  
  return images


# Save GIF image
images = render_episode(env, model, max_steps_per_episode)
image_file = 'cartpole-v0.gif'
# loop=0: loop forever, duration=1: play each frame for 1ms
images[0].save(
    image_file, save_all=True, append_images=images[1:], loop=0, duration=1)

In [14]:
import tensorflow_docs.vis.embed as embed
embed.embed_file(image_file)

## Next steps

This tutorial demonstrated how to implement the actor-critic method using Tensorflow.

As a next step, you could try training a model on a different environment in OpenAI Gym. 

For additional information regarding actor-critic methods and the Cartpole-v0 problem, you may refer to the following resources:

- [Actor Critic Method](https://hal.inria.fr/hal-00840470/document)
- [Actor Critic Lecture (CAL)](https://www.youtube.com/watch?v=EKqxumCuAAY&list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A&index=7&t=0s)
- [Cartpole learning control problem \[Barto, et al. 1983\]](http://www.derongliu.org/adp/adp-cdrom/Barto1983.pdf) 

For more reinforcement learning examples in TensorFlow, you can check the following resources:
- [Reinforcement learning code examples (keras.io)](https://keras.io/examples/rl/)
- [TF-Agents reinforcement learning library](https://www.tensorflow.org/agents)
