## Установка пакетов:

`pip install gym[atari]` -- непосредственно наша тестовая среда с различными играми

`pip install tqdm` -- progress bar для python 

`pip install keras` -- библиотека глубинного обучения

In [1]:
%%bash

pip install gym gym[atari] tqdm keras

Collecting keras_applications>=1.0.6 (from keras)
  Downloading https://files.pythonhosted.org/packages/3f/c4/2ff40221029f7098d58f8d7fb99b97e8100f3293f9856f0fb5834bef100b/Keras_Applications-1.0.6-py2.py3-none-any.whl (44kB)
Collecting keras_preprocessing>=1.0.5 (from keras)
  Downloading https://files.pythonhosted.org/packages/fc/94/74e0fa783d3fc07e41715973435dd051ca89c550881b3454233c39c73e69/Keras_Preprocessing-1.0.5-py2.py3-none-any.whl
Installing collected packages: keras-applications, keras-preprocessing
  Found existing installation: Keras-Applications 1.0.4
    Uninstalling Keras-Applications-1.0.4:
      Successfully uninstalled Keras-Applications-1.0.4
  Found existing installation: Keras-Preprocessing 1.0.2
    Uninstalling Keras-Preprocessing-1.0.2:
      Successfully uninstalled Keras-Preprocessing-1.0.2
Successfully installed keras-applications-1.0.6 keras-preprocessing-1.0.5


In [2]:
import random
import gym

import numpy as np
from collections import deque
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten, InputLayer
from keras.optimizers import Adam
from tqdm import tqdm, tqdm_notebook
import cv2
import time
from IPython.display import display, Image

import matplotlib.pyplot as plt

%matplotlib inline

Using TensorFlow backend.


In [3]:
(keras.__version__, gym.__version__)

('2.2.4', '0.10.9')

## Знакомство с OpenAI Gym

# TODO

[OpenAI Gym](https://gym.openai.com/) -- это фреймворк с коллекцией разнообразных тестовых сред для обучения, наподобие набора данных ImageNet.

Основная идея стоит в стандартизации тестовых сред для более легкого воспроизведения результатов научных публикаций.

За основые среды для обучения наших моделек, а также для последующего соревнования возьмем среды игр Atari:

* [Breakout](https://gym.openai.com/envs/Breakout-v0/)

* [SpaceInvaders](https://gym.openai.com/envs/SpaceInvaders-v0)

* [MsPacman](https://gym.openai.com/envs/MsPacman-v0/)

Посмотрим на одну из игр подробнее. В силу стандартизированности тестовых сред, для изучения других игр вам понадобится изменить только название среды :)

In [4]:
env_name = "Breakout-v0"
env = gym.make(env_name)

`env` -- класс той самой среды, которую мы запускаем

Посмотрим, что мы можем извлекать из этого класса:

### пример запуска тестовой среды от Atari

Запустим код ниже.

Процесс начинается с вызова `env.reset()`, который возвращает начальное наблюдение в игре (в данных играх, наблюдение -- это картинка параметы которой описаны в `env.observation_space`).

`env.render()` запускает окно с отрисовкой текущего наблюдения

Для закрытия окна не забывайте делать `env.close()`

In [5]:
try:
    for i_episode in range(20):
        observation = env.reset()
        print(i_episode)
        for t in range(500):
            time.sleep(1./30)
            env.render()
            action = env.action_space.sample()
            observation, reward, done, info = env.step(action)
            if done:
                print("Episode finished after {} timesteps".format(t+1))
                break
except KeyboardInterrupt:
    pass
env.close()

0
Episode finished after 259 timesteps
1
Episode finished after 386 timesteps
2
Episode finished after 165 timesteps
3
Episode finished after 185 timesteps
4
Episode finished after 273 timesteps
5
Episode finished after 233 timesteps
6
Episode finished after 246 timesteps
7
Episode finished after 185 timesteps
8
Episode finished after 206 timesteps
9
Episode finished after 278 timesteps
10
Episode finished after 171 timesteps
11
Episode finished after 180 timesteps
12
Episode finished after 170 timesteps
13
Episode finished after 300 timesteps
14
Episode finished after 179 timesteps
15
Episode finished after 190 timesteps
16
Episode finished after 234 timesteps
17
Episode finished after 180 timesteps
18
Episode finished after 236 timesteps
19
Episode finished after 170 timesteps


По сути, это код работы рандомного агента. Его действия -- это элементы пространства действий игры, причем их выбор этих действий равновероятен 

Более подробно про среды и работу с ними вы можете прочитать в [документации на официальном сайте OpenAI Gym](https://gym.openai.com/docs/)

## Инициализация модели

За бейзлайн возьмем алгоритм DQN, выход которого равен количеству действий в играх (для игр Atari равен количеству кнопок на джойстике, а именно 18). 

Для простого старта обучения, вам предоставляется класс DQN-агента (более сложные методы можно найти [тут](https://github.com/keon/deep-q-learning))

Ссылки для более подробного изучения:

* [статья о DQN на towards data science](https://towardsdatascience.com/welcome-to-deep-reinforcement-learning-part-1-dqn-c3cab4d41b6b)

* [фреймворк с RL-моделями на keras](https://github.com/keras-rl/keras-rl)

* [Релизация DQN на pytorch](https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html)


In [6]:
class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95    # discount rate
        self.epsilon = 0.05  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = self._build_model()

    def _build_model(self):
        # Neural Net for Deep-Q learning Model
        model = Sequential()
        model.add(InputLayer(input_shape=self.state_size))
        for _ in range(2):
            model.add(Conv2D(8, (3, 3), activation='relu'))
            model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Flatten())
        model.add(Dense(self.action_size, activation='linear'))
        model.compile(loss='mse',
                      optimizer=Adam(lr=self.learning_rate))
        return model

    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))

    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_size)
        act_values = self.model.predict(state)
        return np.argmax(act_values[0])  # returns action

    def replay(self, batch_size):
        minibatch = random.sample(self.memory, batch_size)
        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
                target = (reward + self.gamma *
                          np.amax(self.model.predict(next_state)[0]))
            target_f = self.model.predict(state)
            target_f[0][action] = target
            self.model.fit(state, target_f, epochs=1, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

    def load(self, name):
        self.model.load_weights(name)

    def save(self, name):
        self.model.save_weights(name)

## Обучение модели

In [7]:
env_name = "Breakout-v0"
env = gym.make(env_name)

GRAYSCALE = True
observation = env.reset()
downsample = 4
new_shape = [i // downsample if i > 3 else i for i in observation.shape]
if GRAYSCALE:
    new_shape[-1] = 1
new_shape = tuple(new_shape)

action_size = env.action_space.n
agent = DQNAgent(new_shape, action_size)
##agent.load("./pong_2.h5")
done = False
batch_size = 32

def process_state(state, grayscale=GRAYSCALE):
    if grayscale:
        state = cv2.cvtColor(state, cv2.COLOR_BGR2GRAY)
    state = cv2.resize(state, new_shape[1::-1])
    if grayscale:
        state = np.reshape(state, (1,) + state.shape + (1,)) / 255.
    else:
        state = np.reshape(state, (1,) + state.shape) / 255.
    return state

EPISODES = 10000

for e in range(EPISODES):
    state = env.reset()
    state = process_state(state)
    total_reward = 0
    for time in tqdm_notebook(range(1000)):
        env.render()
        action = agent.act(state)
        next_state, reward, done, _ = env.step(action)
        total_reward += reward
        next_state = process_state(next_state)
        agent.remember(state, action, reward, next_state, done)
        state = next_state
        if done:
            print("episode: {}/{}, time: {}, e: {:.2}"
                  .format(e, EPISODES, time, agent.epsilon))
            break
        if len(agent.memory) > batch_size:
            agent.replay(batch_size)
        
    print("epoch {}, total_reward = {}".format(e, total_reward))
    # if e % 10 == 0:
    #     agent.save("./save/cartpole-dqn.h5")
env.close()

HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 0/10000, time: 167, e: 0.025
epoch 0, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 1/10000, time: 211, e: 0.01
epoch 1, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 2/10000, time: 346, e: 0.01
epoch 2, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 3/10000, time: 401, e: 0.01
epoch 3, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 4/10000, time: 305, e: 0.01
epoch 4, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 5/10000, time: 336, e: 0.01
epoch 5, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 6/10000, time: 290, e: 0.01
epoch 6, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 7/10000, time: 242, e: 0.01
epoch 7, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 8/10000, time: 344, e: 0.01
epoch 8, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 9/10000, time: 574, e: 0.01
epoch 9, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 10, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 11/10000, time: 304, e: 0.01
epoch 11, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 12/10000, time: 322, e: 0.01
epoch 12, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 13/10000, time: 519, e: 0.01
epoch 13, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 14/10000, time: 513, e: 0.01
epoch 14, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 15/10000, time: 456, e: 0.01
epoch 15, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 16/10000, time: 486, e: 0.01
epoch 16, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 17/10000, time: 235, e: 0.01
epoch 17, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 18/10000, time: 306, e: 0.01
epoch 18, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 19/10000, time: 317, e: 0.01
epoch 19, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 20/10000, time: 630, e: 0.01
epoch 20, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 21/10000, time: 471, e: 0.01
epoch 21, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 22/10000, time: 553, e: 0.01
epoch 22, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 23/10000, time: 316, e: 0.01
epoch 23, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 24/10000, time: 478, e: 0.01
epoch 24, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 25/10000, time: 646, e: 0.01
epoch 25, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 26/10000, time: 555, e: 0.01
epoch 26, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 27/10000, time: 245, e: 0.01
epoch 27, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 28/10000, time: 421, e: 0.01
epoch 28, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 29/10000, time: 396, e: 0.01
epoch 29, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 30/10000, time: 436, e: 0.01
epoch 30, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 31/10000, time: 340, e: 0.01
epoch 31, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 32/10000, time: 278, e: 0.01
epoch 32, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 33/10000, time: 345, e: 0.01
epoch 33, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 34/10000, time: 375, e: 0.01
epoch 34, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 35/10000, time: 346, e: 0.01
epoch 35, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 36/10000, time: 662, e: 0.01
epoch 36, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 37/10000, time: 896, e: 0.01
epoch 37, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 38/10000, time: 360, e: 0.01
epoch 38, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 39/10000, time: 201, e: 0.01
epoch 39, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 40/10000, time: 379, e: 0.01
epoch 40, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 41/10000, time: 360, e: 0.01
epoch 41, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 42/10000, time: 375, e: 0.01
epoch 42, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 43/10000, time: 595, e: 0.01
epoch 43, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 44/10000, time: 299, e: 0.01
epoch 44, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 45/10000, time: 333, e: 0.01
epoch 45, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 46/10000, time: 248, e: 0.01
epoch 46, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 47/10000, time: 564, e: 0.01
epoch 47, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 48/10000, time: 391, e: 0.01
epoch 48, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 49/10000, time: 343, e: 0.01
epoch 49, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 50/10000, time: 471, e: 0.01
epoch 50, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 51/10000, time: 324, e: 0.01
epoch 51, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 52/10000, time: 382, e: 0.01
epoch 52, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 53/10000, time: 580, e: 0.01
epoch 53, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 54/10000, time: 368, e: 0.01
epoch 54, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 55/10000, time: 541, e: 0.01
epoch 55, total_reward = 7.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 56/10000, time: 741, e: 0.01
epoch 56, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 57/10000, time: 478, e: 0.01
epoch 57, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 58/10000, time: 367, e: 0.01
epoch 58, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 59/10000, time: 351, e: 0.01
epoch 59, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 60/10000, time: 239, e: 0.01
epoch 60, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 61/10000, time: 353, e: 0.01
epoch 61, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 62/10000, time: 361, e: 0.01
epoch 62, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 63/10000, time: 416, e: 0.01
epoch 63, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 64/10000, time: 259, e: 0.01
epoch 64, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 65/10000, time: 392, e: 0.01
epoch 65, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 66/10000, time: 333, e: 0.01
epoch 66, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 67/10000, time: 433, e: 0.01
epoch 67, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 68/10000, time: 342, e: 0.01
epoch 68, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 69/10000, time: 378, e: 0.01
epoch 69, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 70/10000, time: 290, e: 0.01
epoch 70, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 71/10000, time: 439, e: 0.01
epoch 71, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 72/10000, time: 377, e: 0.01
epoch 72, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 73/10000, time: 444, e: 0.01
epoch 73, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 74/10000, time: 398, e: 0.01
epoch 74, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 75/10000, time: 397, e: 0.01
epoch 75, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 76/10000, time: 400, e: 0.01
epoch 76, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 77, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 78/10000, time: 464, e: 0.01
epoch 78, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 79/10000, time: 246, e: 0.01
epoch 79, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 80/10000, time: 302, e: 0.01
epoch 80, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 81/10000, time: 322, e: 0.01
epoch 81, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 82/10000, time: 310, e: 0.01
epoch 82, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 83/10000, time: 294, e: 0.01
epoch 83, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 84/10000, time: 212, e: 0.01
epoch 84, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 85/10000, time: 323, e: 0.01
epoch 85, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 86/10000, time: 235, e: 0.01
epoch 86, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 87/10000, time: 269, e: 0.01
epoch 87, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 88/10000, time: 240, e: 0.01
epoch 88, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 89/10000, time: 203, e: 0.01
epoch 89, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 90/10000, time: 263, e: 0.01
epoch 90, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 91/10000, time: 196, e: 0.01
epoch 91, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 92/10000, time: 178, e: 0.01
epoch 92, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 93/10000, time: 204, e: 0.01
epoch 93, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 94/10000, time: 171, e: 0.01
epoch 94, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 95/10000, time: 193, e: 0.01
epoch 95, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 96/10000, time: 541, e: 0.01
epoch 96, total_reward = 7.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 97/10000, time: 277, e: 0.01
epoch 97, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 98/10000, time: 422, e: 0.01
epoch 98, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 99/10000, time: 332, e: 0.01
epoch 99, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 100/10000, time: 341, e: 0.01
epoch 100, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 101/10000, time: 271, e: 0.01
epoch 101, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 102/10000, time: 285, e: 0.01
epoch 102, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 103/10000, time: 321, e: 0.01
epoch 103, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 104/10000, time: 355, e: 0.01
epoch 104, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 105/10000, time: 281, e: 0.01
epoch 105, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 106/10000, time: 354, e: 0.01
epoch 106, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 107/10000, time: 341, e: 0.01
epoch 107, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 108/10000, time: 358, e: 0.01
epoch 108, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 109/10000, time: 339, e: 0.01
epoch 109, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 110/10000, time: 236, e: 0.01
epoch 110, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 111/10000, time: 239, e: 0.01
epoch 111, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 112/10000, time: 363, e: 0.01
epoch 112, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 113/10000, time: 344, e: 0.01
epoch 113, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 114/10000, time: 244, e: 0.01
epoch 114, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 115/10000, time: 302, e: 0.01
epoch 115, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 116/10000, time: 263, e: 0.01
epoch 116, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 117/10000, time: 292, e: 0.01
epoch 117, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 118/10000, time: 234, e: 0.01
epoch 118, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 119/10000, time: 359, e: 0.01
epoch 119, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 120/10000, time: 215, e: 0.01
epoch 120, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 121/10000, time: 163, e: 0.01
epoch 121, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 122/10000, time: 184, e: 0.01
epoch 122, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 123/10000, time: 350, e: 0.01
epoch 123, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 124/10000, time: 202, e: 0.01
epoch 124, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 125/10000, time: 213, e: 0.01
epoch 125, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 126/10000, time: 187, e: 0.01
epoch 126, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 127/10000, time: 338, e: 0.01
epoch 127, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 128/10000, time: 217, e: 0.01
epoch 128, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 129/10000, time: 269, e: 0.01
epoch 129, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 130/10000, time: 351, e: 0.01
epoch 130, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 131/10000, time: 232, e: 0.01
epoch 131, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 132/10000, time: 202, e: 0.01
epoch 132, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 133/10000, time: 243, e: 0.01
epoch 133, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 134/10000, time: 315, e: 0.01
epoch 134, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 135/10000, time: 266, e: 0.01
epoch 135, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 136/10000, time: 160, e: 0.01
epoch 136, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 137/10000, time: 246, e: 0.01
epoch 137, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 138/10000, time: 198, e: 0.01
epoch 138, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 139/10000, time: 198, e: 0.01
epoch 139, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 140/10000, time: 264, e: 0.01
epoch 140, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 141/10000, time: 260, e: 0.01
epoch 141, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 142/10000, time: 390, e: 0.01
epoch 142, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 143/10000, time: 206, e: 0.01
epoch 143, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 144/10000, time: 285, e: 0.01
epoch 144, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 145/10000, time: 268, e: 0.01
epoch 145, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 146/10000, time: 216, e: 0.01
epoch 146, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 147/10000, time: 242, e: 0.01
epoch 147, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 148/10000, time: 243, e: 0.01
epoch 148, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 149/10000, time: 240, e: 0.01
epoch 149, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 150/10000, time: 233, e: 0.01
epoch 150, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 151/10000, time: 268, e: 0.01
epoch 151, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 152/10000, time: 213, e: 0.01
epoch 152, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 153/10000, time: 254, e: 0.01
epoch 153, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 154/10000, time: 204, e: 0.01
epoch 154, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 155/10000, time: 174, e: 0.01
epoch 155, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 156/10000, time: 286, e: 0.01
epoch 156, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 157/10000, time: 217, e: 0.01
epoch 157, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 158/10000, time: 384, e: 0.01
epoch 158, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 159/10000, time: 337, e: 0.01
epoch 159, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 160/10000, time: 224, e: 0.01
epoch 160, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 161/10000, time: 249, e: 0.01
epoch 161, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 162/10000, time: 262, e: 0.01
epoch 162, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 163/10000, time: 471, e: 0.01
epoch 163, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 164/10000, time: 210, e: 0.01
epoch 164, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 165/10000, time: 231, e: 0.01
epoch 165, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 166/10000, time: 302, e: 0.01
epoch 166, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 167/10000, time: 315, e: 0.01
epoch 167, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 168/10000, time: 391, e: 0.01
epoch 168, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 169/10000, time: 274, e: 0.01
epoch 169, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 170/10000, time: 426, e: 0.01
epoch 170, total_reward = 5.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 171/10000, time: 356, e: 0.01
epoch 171, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 172/10000, time: 287, e: 0.01
epoch 172, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 173/10000, time: 309, e: 0.01
epoch 173, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 174/10000, time: 269, e: 0.01
epoch 174, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 175/10000, time: 172, e: 0.01
epoch 175, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 176/10000, time: 368, e: 0.01
epoch 176, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 177/10000, time: 218, e: 0.01
epoch 177, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 178/10000, time: 388, e: 0.01
epoch 178, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 179/10000, time: 271, e: 0.01
epoch 179, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 180/10000, time: 268, e: 0.01
epoch 180, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 181/10000, time: 216, e: 0.01
epoch 181, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 182/10000, time: 393, e: 0.01
epoch 182, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 183/10000, time: 263, e: 0.01
epoch 183, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 184/10000, time: 260, e: 0.01
epoch 184, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 185/10000, time: 158, e: 0.01
epoch 185, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 186/10000, time: 223, e: 0.01
epoch 186, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 187/10000, time: 337, e: 0.01
epoch 187, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 188/10000, time: 169, e: 0.01
epoch 188, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 189/10000, time: 334, e: 0.01
epoch 189, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 190/10000, time: 401, e: 0.01
epoch 190, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 191/10000, time: 373, e: 0.01
epoch 191, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 192/10000, time: 332, e: 0.01
epoch 192, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 193/10000, time: 355, e: 0.01
epoch 193, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 194/10000, time: 341, e: 0.01
epoch 194, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 195/10000, time: 284, e: 0.01
epoch 195, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 196/10000, time: 343, e: 0.01
epoch 196, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 197/10000, time: 320, e: 0.01
epoch 197, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 198/10000, time: 278, e: 0.01
epoch 198, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 199/10000, time: 168, e: 0.01
epoch 199, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 200/10000, time: 273, e: 0.01
epoch 200, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 201/10000, time: 289, e: 0.01
epoch 201, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 202/10000, time: 191, e: 0.01
epoch 202, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 203/10000, time: 278, e: 0.01
epoch 203, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 204/10000, time: 166, e: 0.01
epoch 204, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 205/10000, time: 171, e: 0.01
epoch 205, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 206/10000, time: 318, e: 0.01
epoch 206, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 207/10000, time: 282, e: 0.01
epoch 207, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 208/10000, time: 213, e: 0.01
epoch 208, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 209/10000, time: 308, e: 0.01
epoch 209, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 210/10000, time: 311, e: 0.01
epoch 210, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 211/10000, time: 378, e: 0.01
epoch 211, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 212/10000, time: 796, e: 0.01
epoch 212, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 213/10000, time: 561, e: 0.01
epoch 213, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 214/10000, time: 207, e: 0.01
epoch 214, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 215/10000, time: 398, e: 0.01
epoch 215, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 216/10000, time: 311, e: 0.01
epoch 216, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 217/10000, time: 488, e: 0.01
epoch 217, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 218/10000, time: 235, e: 0.01
epoch 218, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 219/10000, time: 287, e: 0.01
epoch 219, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 220/10000, time: 552, e: 0.01
epoch 220, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 221/10000, time: 251, e: 0.01
epoch 221, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 222/10000, time: 340, e: 0.01
epoch 222, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 223/10000, time: 225, e: 0.01
epoch 223, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 224/10000, time: 385, e: 0.01
epoch 224, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 225/10000, time: 218, e: 0.01
epoch 225, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 226/10000, time: 192, e: 0.01
epoch 226, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 227/10000, time: 171, e: 0.01
epoch 227, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 228/10000, time: 329, e: 0.01
epoch 228, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 229/10000, time: 350, e: 0.01
epoch 229, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 230/10000, time: 179, e: 0.01
epoch 230, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 231/10000, time: 310, e: 0.01
epoch 231, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 232/10000, time: 300, e: 0.01
epoch 232, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 233/10000, time: 250, e: 0.01
epoch 233, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 234/10000, time: 279, e: 0.01
epoch 234, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 235/10000, time: 241, e: 0.01
epoch 235, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 236/10000, time: 209, e: 0.01
epoch 236, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 237/10000, time: 170, e: 0.01
epoch 237, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 238/10000, time: 271, e: 0.01
epoch 238, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 239/10000, time: 265, e: 0.01
epoch 239, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 240/10000, time: 199, e: 0.01
epoch 240, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 241/10000, time: 222, e: 0.01
epoch 241, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 242/10000, time: 239, e: 0.01
epoch 242, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 243/10000, time: 185, e: 0.01
epoch 243, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 244/10000, time: 166, e: 0.01
epoch 244, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 245/10000, time: 243, e: 0.01
epoch 245, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 246/10000, time: 240, e: 0.01
epoch 246, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 247/10000, time: 275, e: 0.01
epoch 247, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 248/10000, time: 272, e: 0.01
epoch 248, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 249/10000, time: 248, e: 0.01
epoch 249, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 250/10000, time: 235, e: 0.01
epoch 250, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 251/10000, time: 267, e: 0.01
epoch 251, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 252/10000, time: 181, e: 0.01
epoch 252, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 253/10000, time: 255, e: 0.01
epoch 253, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 254/10000, time: 320, e: 0.01
epoch 254, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 255/10000, time: 273, e: 0.01
epoch 255, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 256/10000, time: 428, e: 0.01
epoch 256, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 257/10000, time: 303, e: 0.01
epoch 257, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 258/10000, time: 260, e: 0.01
epoch 258, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 259/10000, time: 329, e: 0.01
epoch 259, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 260/10000, time: 425, e: 0.01
epoch 260, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 261, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 262/10000, time: 601, e: 0.01
epoch 262, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 263/10000, time: 241, e: 0.01
epoch 263, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 264/10000, time: 316, e: 0.01
epoch 264, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 265/10000, time: 243, e: 0.01
epoch 265, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 266/10000, time: 362, e: 0.01
epoch 266, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 267/10000, time: 460, e: 0.01
epoch 267, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 268/10000, time: 544, e: 0.01
epoch 268, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 269/10000, time: 459, e: 0.01
epoch 269, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 270/10000, time: 258, e: 0.01
epoch 270, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 271/10000, time: 320, e: 0.01
epoch 271, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 272/10000, time: 292, e: 0.01
epoch 272, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 273/10000, time: 275, e: 0.01
epoch 273, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 274/10000, time: 263, e: 0.01
epoch 274, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 275/10000, time: 177, e: 0.01
epoch 275, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 276/10000, time: 174, e: 0.01
epoch 276, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 277/10000, time: 212, e: 0.01
epoch 277, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 278/10000, time: 280, e: 0.01
epoch 278, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 279/10000, time: 290, e: 0.01
epoch 279, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 280/10000, time: 378, e: 0.01
epoch 280, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 281/10000, time: 195, e: 0.01
epoch 281, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 282/10000, time: 381, e: 0.01
epoch 282, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 283/10000, time: 367, e: 0.01
epoch 283, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 284/10000, time: 440, e: 0.01
epoch 284, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 285/10000, time: 227, e: 0.01
epoch 285, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 286/10000, time: 349, e: 0.01
epoch 286, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 287/10000, time: 425, e: 0.01
epoch 287, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 288/10000, time: 395, e: 0.01
epoch 288, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 289/10000, time: 507, e: 0.01
epoch 289, total_reward = 5.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 290/10000, time: 190, e: 0.01
epoch 290, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 291/10000, time: 397, e: 0.01
epoch 291, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 292/10000, time: 516, e: 0.01
epoch 292, total_reward = 5.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 293/10000, time: 225, e: 0.01
epoch 293, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 294/10000, time: 259, e: 0.01
epoch 294, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 295/10000, time: 369, e: 0.01
epoch 295, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 296/10000, time: 352, e: 0.01
epoch 296, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 297/10000, time: 229, e: 0.01
epoch 297, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 298/10000, time: 323, e: 0.01
epoch 298, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 299/10000, time: 305, e: 0.01
epoch 299, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 300/10000, time: 342, e: 0.01
epoch 300, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 301/10000, time: 219, e: 0.01
epoch 301, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 302/10000, time: 271, e: 0.01
epoch 302, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 303/10000, time: 212, e: 0.01
epoch 303, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 304/10000, time: 205, e: 0.01
epoch 304, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 305/10000, time: 217, e: 0.01
epoch 305, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 306/10000, time: 176, e: 0.01
epoch 306, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 307/10000, time: 217, e: 0.01
epoch 307, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 308/10000, time: 204, e: 0.01
epoch 308, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 309/10000, time: 175, e: 0.01
epoch 309, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 310/10000, time: 237, e: 0.01
epoch 310, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 311/10000, time: 242, e: 0.01
epoch 311, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 312/10000, time: 290, e: 0.01
epoch 312, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 313/10000, time: 212, e: 0.01
epoch 313, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 314/10000, time: 234, e: 0.01
epoch 314, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 315/10000, time: 251, e: 0.01
epoch 315, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 316/10000, time: 231, e: 0.01
epoch 316, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 317/10000, time: 285, e: 0.01
epoch 317, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 318/10000, time: 215, e: 0.01
epoch 318, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 319/10000, time: 245, e: 0.01
epoch 319, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 320/10000, time: 218, e: 0.01
epoch 320, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 321/10000, time: 213, e: 0.01
epoch 321, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 322/10000, time: 197, e: 0.01
epoch 322, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 323/10000, time: 329, e: 0.01
epoch 323, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 324/10000, time: 206, e: 0.01
epoch 324, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 325/10000, time: 281, e: 0.01
epoch 325, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 326/10000, time: 203, e: 0.01
epoch 326, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 327/10000, time: 163, e: 0.01
epoch 327, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 328/10000, time: 247, e: 0.01
epoch 328, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 329/10000, time: 213, e: 0.01
epoch 329, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 330/10000, time: 199, e: 0.01
epoch 330, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 331/10000, time: 207, e: 0.01
epoch 331, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 332/10000, time: 213, e: 0.01
epoch 332, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 333/10000, time: 310, e: 0.01
epoch 333, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 334/10000, time: 229, e: 0.01
epoch 334, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 335/10000, time: 225, e: 0.01
epoch 335, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 336/10000, time: 210, e: 0.01
epoch 336, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 337/10000, time: 202, e: 0.01
epoch 337, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 338/10000, time: 265, e: 0.01
epoch 338, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 339/10000, time: 209, e: 0.01
epoch 339, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 340/10000, time: 241, e: 0.01
epoch 340, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 341/10000, time: 226, e: 0.01
epoch 341, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 342/10000, time: 283, e: 0.01
epoch 342, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 343/10000, time: 177, e: 0.01
epoch 343, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 344/10000, time: 178, e: 0.01
epoch 344, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 345/10000, time: 205, e: 0.01
epoch 345, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 346/10000, time: 208, e: 0.01
epoch 346, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 347/10000, time: 185, e: 0.01
epoch 347, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 348/10000, time: 183, e: 0.01
epoch 348, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 349/10000, time: 188, e: 0.01
epoch 349, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 350/10000, time: 165, e: 0.01
epoch 350, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 351/10000, time: 167, e: 0.01
epoch 351, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 352/10000, time: 315, e: 0.01
epoch 352, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 353/10000, time: 214, e: 0.01
epoch 353, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 354/10000, time: 312, e: 0.01
epoch 354, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 355/10000, time: 862, e: 0.01
epoch 355, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 356/10000, time: 765, e: 0.01
epoch 356, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 357/10000, time: 226, e: 0.01
epoch 357, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 358/10000, time: 337, e: 0.01
epoch 358, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 359/10000, time: 401, e: 0.01
epoch 359, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 360/10000, time: 205, e: 0.01
epoch 360, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 361/10000, time: 422, e: 0.01
epoch 361, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 362/10000, time: 467, e: 0.01
epoch 362, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 363/10000, time: 565, e: 0.01
epoch 363, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 364/10000, time: 481, e: 0.01
epoch 364, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 365/10000, time: 350, e: 0.01
epoch 365, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 366/10000, time: 254, e: 0.01
epoch 366, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 367/10000, time: 324, e: 0.01
epoch 367, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 368/10000, time: 308, e: 0.01
epoch 368, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 369/10000, time: 326, e: 0.01
epoch 369, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 370/10000, time: 482, e: 0.01
epoch 370, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 371/10000, time: 670, e: 0.01
epoch 371, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 372/10000, time: 458, e: 0.01
epoch 372, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 373/10000, time: 648, e: 0.01
epoch 373, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 374/10000, time: 435, e: 0.01
epoch 374, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 375/10000, time: 228, e: 0.01
epoch 375, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 376/10000, time: 539, e: 0.01
epoch 376, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 377/10000, time: 405, e: 0.01
epoch 377, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 378/10000, time: 346, e: 0.01
epoch 378, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 379/10000, time: 269, e: 0.01
epoch 379, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 380/10000, time: 278, e: 0.01
epoch 380, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 381/10000, time: 254, e: 0.01
epoch 381, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 382/10000, time: 389, e: 0.01
epoch 382, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 383/10000, time: 304, e: 0.01
epoch 383, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 384/10000, time: 244, e: 0.01
epoch 384, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 385/10000, time: 304, e: 0.01
epoch 385, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 386/10000, time: 300, e: 0.01
epoch 386, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 387/10000, time: 163, e: 0.01
epoch 387, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 388/10000, time: 203, e: 0.01
epoch 388, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 389/10000, time: 221, e: 0.01
epoch 389, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 390/10000, time: 209, e: 0.01
epoch 390, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 391/10000, time: 226, e: 0.01
epoch 391, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 392/10000, time: 211, e: 0.01
epoch 392, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 393/10000, time: 214, e: 0.01
epoch 393, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 394/10000, time: 237, e: 0.01
epoch 394, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 395/10000, time: 284, e: 0.01
epoch 395, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 396/10000, time: 244, e: 0.01
epoch 396, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 397/10000, time: 324, e: 0.01
epoch 397, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 398/10000, time: 558, e: 0.01
epoch 398, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 399/10000, time: 355, e: 0.01
epoch 399, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 400/10000, time: 257, e: 0.01
epoch 400, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 401/10000, time: 339, e: 0.01
epoch 401, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 402/10000, time: 278, e: 0.01
epoch 402, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 403/10000, time: 253, e: 0.01
epoch 403, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 404/10000, time: 424, e: 0.01
epoch 404, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 405/10000, time: 376, e: 0.01
epoch 405, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 406/10000, time: 454, e: 0.01
epoch 406, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 407/10000, time: 696, e: 0.01
epoch 407, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 408/10000, time: 875, e: 0.01
epoch 408, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 409/10000, time: 670, e: 0.01
epoch 409, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 410/10000, time: 463, e: 0.01
epoch 410, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 411/10000, time: 289, e: 0.01
epoch 411, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 412/10000, time: 428, e: 0.01
epoch 412, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 413/10000, time: 253, e: 0.01
epoch 413, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 414/10000, time: 349, e: 0.01
epoch 414, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 415/10000, time: 248, e: 0.01
epoch 415, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 416/10000, time: 245, e: 0.01
epoch 416, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 417/10000, time: 202, e: 0.01
epoch 417, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 418/10000, time: 183, e: 0.01
epoch 418, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 419/10000, time: 215, e: 0.01
epoch 419, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 420/10000, time: 243, e: 0.01
epoch 420, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 421/10000, time: 207, e: 0.01
epoch 421, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 422/10000, time: 228, e: 0.01
epoch 422, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 423/10000, time: 343, e: 0.01
epoch 423, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 424/10000, time: 279, e: 0.01
epoch 424, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 425/10000, time: 297, e: 0.01
epoch 425, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 426/10000, time: 331, e: 0.01
epoch 426, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 427/10000, time: 335, e: 0.01
epoch 427, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 428/10000, time: 328, e: 0.01
epoch 428, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 429/10000, time: 474, e: 0.01
epoch 429, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 430/10000, time: 262, e: 0.01
epoch 430, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 431/10000, time: 345, e: 0.01
epoch 431, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 432/10000, time: 340, e: 0.01
epoch 432, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 433/10000, time: 167, e: 0.01
epoch 433, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 434/10000, time: 266, e: 0.01
epoch 434, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 435/10000, time: 333, e: 0.01
epoch 435, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 436/10000, time: 268, e: 0.01
epoch 436, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 437/10000, time: 223, e: 0.01
epoch 437, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 438/10000, time: 264, e: 0.01
epoch 438, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 439/10000, time: 281, e: 0.01
epoch 439, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 440/10000, time: 447, e: 0.01
epoch 440, total_reward = 5.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 441/10000, time: 308, e: 0.01
epoch 441, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 442/10000, time: 338, e: 0.01
epoch 442, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 443/10000, time: 193, e: 0.01
epoch 443, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 444/10000, time: 179, e: 0.01
epoch 444, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 445/10000, time: 326, e: 0.01
epoch 445, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 446/10000, time: 365, e: 0.01
epoch 446, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 447/10000, time: 175, e: 0.01
epoch 447, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 448/10000, time: 255, e: 0.01
epoch 448, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 449/10000, time: 371, e: 0.01
epoch 449, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 450/10000, time: 347, e: 0.01
epoch 450, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 451/10000, time: 301, e: 0.01
epoch 451, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 452/10000, time: 239, e: 0.01
epoch 452, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 453/10000, time: 336, e: 0.01
epoch 453, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 454/10000, time: 446, e: 0.01
epoch 454, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 455/10000, time: 291, e: 0.01
epoch 455, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 456/10000, time: 210, e: 0.01
epoch 456, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 457/10000, time: 417, e: 0.01
epoch 457, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 458/10000, time: 352, e: 0.01
epoch 458, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 459/10000, time: 431, e: 0.01
epoch 459, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 460/10000, time: 271, e: 0.01
epoch 460, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 461/10000, time: 315, e: 0.01
epoch 461, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 462/10000, time: 169, e: 0.01
epoch 462, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 463/10000, time: 208, e: 0.01
epoch 463, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 464/10000, time: 298, e: 0.01
epoch 464, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 465/10000, time: 264, e: 0.01
epoch 465, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 466/10000, time: 347, e: 0.01
epoch 466, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 467/10000, time: 307, e: 0.01
epoch 467, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 468/10000, time: 465, e: 0.01
epoch 468, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 469/10000, time: 384, e: 0.01
epoch 469, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 470/10000, time: 396, e: 0.01
epoch 470, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 471/10000, time: 375, e: 0.01
epoch 471, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 472/10000, time: 559, e: 0.01
epoch 472, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 473/10000, time: 928, e: 0.01
epoch 473, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 474/10000, time: 624, e: 0.01
epoch 474, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 475/10000, time: 338, e: 0.01
epoch 475, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 476/10000, time: 557, e: 0.01
epoch 476, total_reward = 6.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 477/10000, time: 761, e: 0.01
epoch 477, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 478/10000, time: 308, e: 0.01
epoch 478, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 479/10000, time: 453, e: 0.01
epoch 479, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 480/10000, time: 282, e: 0.01
epoch 480, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 481/10000, time: 304, e: 0.01
epoch 481, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 482/10000, time: 345, e: 0.01
epoch 482, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 483/10000, time: 402, e: 0.01
epoch 483, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 484/10000, time: 437, e: 0.01
epoch 484, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 485/10000, time: 262, e: 0.01
epoch 485, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 486/10000, time: 288, e: 0.01
epoch 486, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 487/10000, time: 394, e: 0.01
epoch 487, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 488/10000, time: 418, e: 0.01
epoch 488, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 489/10000, time: 290, e: 0.01
epoch 489, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 490/10000, time: 313, e: 0.01
epoch 490, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 491/10000, time: 631, e: 0.01
epoch 491, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 492/10000, time: 264, e: 0.01
epoch 492, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 493/10000, time: 252, e: 0.01
epoch 493, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 494/10000, time: 259, e: 0.01
epoch 494, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 495/10000, time: 244, e: 0.01
epoch 495, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 496/10000, time: 247, e: 0.01
epoch 496, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 497/10000, time: 250, e: 0.01
epoch 497, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 498/10000, time: 291, e: 0.01
epoch 498, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 499/10000, time: 189, e: 0.01
epoch 499, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 500/10000, time: 268, e: 0.01
epoch 500, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 501/10000, time: 262, e: 0.01
epoch 501, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 502/10000, time: 164, e: 0.01
epoch 502, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 503/10000, time: 270, e: 0.01
epoch 503, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 504/10000, time: 203, e: 0.01
epoch 504, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 505/10000, time: 198, e: 0.01
epoch 505, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 506/10000, time: 346, e: 0.01
epoch 506, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 507/10000, time: 244, e: 0.01
epoch 507, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 508/10000, time: 214, e: 0.01
epoch 508, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 509/10000, time: 267, e: 0.01
epoch 509, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 510/10000, time: 355, e: 0.01
epoch 510, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 511/10000, time: 261, e: 0.01
epoch 511, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 512/10000, time: 241, e: 0.01
epoch 512, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 513/10000, time: 216, e: 0.01
epoch 513, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 514/10000, time: 264, e: 0.01
epoch 514, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 515/10000, time: 189, e: 0.01
epoch 515, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 516/10000, time: 357, e: 0.01
epoch 516, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 517/10000, time: 427, e: 0.01
epoch 517, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 518/10000, time: 303, e: 0.01
epoch 518, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 519/10000, time: 182, e: 0.01
epoch 519, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 520/10000, time: 326, e: 0.01
epoch 520, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 521/10000, time: 192, e: 0.01
epoch 521, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 522/10000, time: 270, e: 0.01
epoch 522, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 523/10000, time: 380, e: 0.01
epoch 523, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 524/10000, time: 196, e: 0.01
epoch 524, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 525/10000, time: 213, e: 0.01
epoch 525, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 526/10000, time: 256, e: 0.01
epoch 526, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 527/10000, time: 384, e: 0.01
epoch 527, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 528/10000, time: 291, e: 0.01
epoch 528, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 529/10000, time: 355, e: 0.01
epoch 529, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 530/10000, time: 221, e: 0.01
epoch 530, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 531/10000, time: 248, e: 0.01
epoch 531, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 532/10000, time: 360, e: 0.01
epoch 532, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 533/10000, time: 340, e: 0.01
epoch 533, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 534/10000, time: 382, e: 0.01
epoch 534, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 535/10000, time: 260, e: 0.01
epoch 535, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 536/10000, time: 407, e: 0.01
epoch 536, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 537/10000, time: 305, e: 0.01
epoch 537, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 538/10000, time: 258, e: 0.01
epoch 538, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 539/10000, time: 289, e: 0.01
epoch 539, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 540/10000, time: 265, e: 0.01
epoch 540, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 541/10000, time: 225, e: 0.01
epoch 541, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 542/10000, time: 229, e: 0.01
epoch 542, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 543/10000, time: 468, e: 0.01
epoch 543, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 544/10000, time: 484, e: 0.01
epoch 544, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 545/10000, time: 349, e: 0.01
epoch 545, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 546/10000, time: 356, e: 0.01
epoch 546, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 547/10000, time: 308, e: 0.01
epoch 547, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 548/10000, time: 297, e: 0.01
epoch 548, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 549/10000, time: 385, e: 0.01
epoch 549, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 550/10000, time: 548, e: 0.01
epoch 550, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 551/10000, time: 390, e: 0.01
epoch 551, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 552/10000, time: 394, e: 0.01
epoch 552, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 553/10000, time: 396, e: 0.01
epoch 553, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 554/10000, time: 244, e: 0.01
epoch 554, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 555/10000, time: 227, e: 0.01
epoch 555, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 556/10000, time: 250, e: 0.01
epoch 556, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 557/10000, time: 318, e: 0.01
epoch 557, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 558/10000, time: 394, e: 0.01
epoch 558, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 559/10000, time: 324, e: 0.01
epoch 559, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 560/10000, time: 302, e: 0.01
epoch 560, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 561/10000, time: 302, e: 0.01
epoch 561, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 562/10000, time: 236, e: 0.01
epoch 562, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 563/10000, time: 375, e: 0.01
epoch 563, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 564/10000, time: 422, e: 0.01
epoch 564, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 565, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 566/10000, time: 618, e: 0.01
epoch 566, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 567/10000, time: 355, e: 0.01
epoch 567, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 568/10000, time: 433, e: 0.01
epoch 568, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 569/10000, time: 550, e: 0.01
epoch 569, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 570/10000, time: 785, e: 0.01
epoch 570, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 571/10000, time: 261, e: 0.01
epoch 571, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 572/10000, time: 287, e: 0.01
epoch 572, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 573/10000, time: 275, e: 0.01
epoch 573, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 574/10000, time: 489, e: 0.01
epoch 574, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 575/10000, time: 328, e: 0.01
epoch 575, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 576/10000, time: 305, e: 0.01
epoch 576, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 577/10000, time: 327, e: 0.01
epoch 577, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 578/10000, time: 279, e: 0.01
epoch 578, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 579/10000, time: 213, e: 0.01
epoch 579, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 580/10000, time: 203, e: 0.01
epoch 580, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 581/10000, time: 257, e: 0.01
epoch 581, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 582/10000, time: 225, e: 0.01
epoch 582, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 583/10000, time: 207, e: 0.01
epoch 583, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 584/10000, time: 183, e: 0.01
epoch 584, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 585/10000, time: 264, e: 0.01
epoch 585, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 586/10000, time: 239, e: 0.01
epoch 586, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 587/10000, time: 258, e: 0.01
epoch 587, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 588/10000, time: 232, e: 0.01
epoch 588, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 589/10000, time: 267, e: 0.01
epoch 589, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 590/10000, time: 199, e: 0.01
epoch 590, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 591/10000, time: 265, e: 0.01
epoch 591, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 592/10000, time: 199, e: 0.01
epoch 592, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 593/10000, time: 174, e: 0.01
epoch 593, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 594/10000, time: 280, e: 0.01
epoch 594, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 595/10000, time: 266, e: 0.01
epoch 595, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 596/10000, time: 226, e: 0.01
epoch 596, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 597/10000, time: 207, e: 0.01
epoch 597, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 598/10000, time: 261, e: 0.01
epoch 598, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 599/10000, time: 201, e: 0.01
epoch 599, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 600/10000, time: 170, e: 0.01
epoch 600, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 601/10000, time: 245, e: 0.01
epoch 601, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 602/10000, time: 260, e: 0.01
epoch 602, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 603/10000, time: 242, e: 0.01
epoch 603, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 604/10000, time: 319, e: 0.01
epoch 604, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 605/10000, time: 253, e: 0.01
epoch 605, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 606/10000, time: 468, e: 0.01
epoch 606, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 607/10000, time: 490, e: 0.01
epoch 607, total_reward = 5.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 608, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 609/10000, time: 686, e: 0.01
epoch 609, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 610/10000, time: 620, e: 0.01
epoch 610, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 611/10000, time: 643, e: 0.01
epoch 611, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 612/10000, time: 626, e: 0.01
epoch 612, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 613/10000, time: 723, e: 0.01
epoch 613, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 614/10000, time: 828, e: 0.01
epoch 614, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 615/10000, time: 555, e: 0.01
epoch 615, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 616/10000, time: 652, e: 0.01
epoch 616, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 617/10000, time: 357, e: 0.01
epoch 617, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 618/10000, time: 308, e: 0.01
epoch 618, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 619/10000, time: 527, e: 0.01
epoch 619, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 620/10000, time: 614, e: 0.01
epoch 620, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 621/10000, time: 272, e: 0.01
epoch 621, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 622/10000, time: 449, e: 0.01
epoch 622, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 623/10000, time: 280, e: 0.01
epoch 623, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 624/10000, time: 457, e: 0.01
epoch 624, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 625/10000, time: 561, e: 0.01
epoch 625, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 626/10000, time: 397, e: 0.01
epoch 626, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 627/10000, time: 506, e: 0.01
epoch 627, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 628/10000, time: 526, e: 0.01
epoch 628, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 629/10000, time: 295, e: 0.01
epoch 629, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 630/10000, time: 203, e: 0.01
epoch 630, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 631/10000, time: 175, e: 0.01
epoch 631, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 632/10000, time: 260, e: 0.01
epoch 632, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 633/10000, time: 232, e: 0.01
epoch 633, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 634/10000, time: 168, e: 0.01
epoch 634, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 635/10000, time: 241, e: 0.01
epoch 635, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 636/10000, time: 208, e: 0.01
epoch 636, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 637/10000, time: 303, e: 0.01
epoch 637, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 638/10000, time: 241, e: 0.01
epoch 638, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 639/10000, time: 260, e: 0.01
epoch 639, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 640/10000, time: 227, e: 0.01
epoch 640, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 641/10000, time: 260, e: 0.01
epoch 641, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 642/10000, time: 240, e: 0.01
epoch 642, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 643/10000, time: 405, e: 0.01
epoch 643, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 644/10000, time: 248, e: 0.01
epoch 644, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 645/10000, time: 240, e: 0.01
epoch 645, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 646/10000, time: 185, e: 0.01
epoch 646, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 647/10000, time: 213, e: 0.01
epoch 647, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 648/10000, time: 216, e: 0.01
epoch 648, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 649/10000, time: 238, e: 0.01
epoch 649, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 650/10000, time: 265, e: 0.01
epoch 650, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 651/10000, time: 272, e: 0.01
epoch 651, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 652/10000, time: 160, e: 0.01
epoch 652, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 653/10000, time: 245, e: 0.01
epoch 653, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 654/10000, time: 299, e: 0.01
epoch 654, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 655/10000, time: 331, e: 0.01
epoch 655, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 656/10000, time: 245, e: 0.01
epoch 656, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 657/10000, time: 312, e: 0.01
epoch 657, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 658/10000, time: 303, e: 0.01
epoch 658, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 659/10000, time: 323, e: 0.01
epoch 659, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 660/10000, time: 280, e: 0.01
epoch 660, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 661/10000, time: 350, e: 0.01
epoch 661, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 662/10000, time: 280, e: 0.01
epoch 662, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 663/10000, time: 398, e: 0.01
epoch 663, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 664/10000, time: 164, e: 0.01
epoch 664, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 665/10000, time: 241, e: 0.01
epoch 665, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 666/10000, time: 342, e: 0.01
epoch 666, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 667/10000, time: 433, e: 0.01
epoch 667, total_reward = 5.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 668/10000, time: 182, e: 0.01
epoch 668, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 669/10000, time: 318, e: 0.01
epoch 669, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 670/10000, time: 207, e: 0.01
epoch 670, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 671/10000, time: 322, e: 0.01
epoch 671, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 672/10000, time: 243, e: 0.01
epoch 672, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 673/10000, time: 220, e: 0.01
epoch 673, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 674/10000, time: 197, e: 0.01
epoch 674, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 675/10000, time: 269, e: 0.01
epoch 675, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 676/10000, time: 269, e: 0.01
epoch 676, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 677/10000, time: 264, e: 0.01
epoch 677, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 678/10000, time: 205, e: 0.01
epoch 678, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 679/10000, time: 183, e: 0.01
epoch 679, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 680/10000, time: 275, e: 0.01
epoch 680, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 681/10000, time: 215, e: 0.01
epoch 681, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 682/10000, time: 219, e: 0.01
epoch 682, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 683/10000, time: 214, e: 0.01
epoch 683, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 684/10000, time: 395, e: 0.01
epoch 684, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 685/10000, time: 220, e: 0.01
epoch 685, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 686/10000, time: 190, e: 0.01
epoch 686, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 687/10000, time: 298, e: 0.01
epoch 687, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 688/10000, time: 287, e: 0.01
epoch 688, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 689/10000, time: 221, e: 0.01
epoch 689, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 690/10000, time: 327, e: 0.01
epoch 690, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 691/10000, time: 174, e: 0.01
epoch 691, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 692/10000, time: 386, e: 0.01
epoch 692, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 693/10000, time: 236, e: 0.01
epoch 693, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 694/10000, time: 251, e: 0.01
epoch 694, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 695/10000, time: 211, e: 0.01
epoch 695, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 696/10000, time: 257, e: 0.01
epoch 696, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 697/10000, time: 294, e: 0.01
epoch 697, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 698/10000, time: 221, e: 0.01
epoch 698, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 699/10000, time: 336, e: 0.01
epoch 699, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 700/10000, time: 377, e: 0.01
epoch 700, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 701/10000, time: 270, e: 0.01
epoch 701, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 702/10000, time: 294, e: 0.01
epoch 702, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 703/10000, time: 228, e: 0.01
epoch 703, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 704, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 705/10000, time: 785, e: 0.01
epoch 705, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 706/10000, time: 296, e: 0.01
epoch 706, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 707/10000, time: 257, e: 0.01
epoch 707, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 708/10000, time: 666, e: 0.01
epoch 708, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 709/10000, time: 653, e: 0.01
epoch 709, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 710/10000, time: 322, e: 0.01
epoch 710, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 711/10000, time: 385, e: 0.01
epoch 711, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 712/10000, time: 380, e: 0.01
epoch 712, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 713/10000, time: 367, e: 0.01
epoch 713, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 714/10000, time: 509, e: 0.01
epoch 714, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 715/10000, time: 334, e: 0.01
epoch 715, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 716/10000, time: 439, e: 0.01
epoch 716, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 717/10000, time: 454, e: 0.01
epoch 717, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 718/10000, time: 276, e: 0.01
epoch 718, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 719/10000, time: 418, e: 0.01
epoch 719, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 720/10000, time: 322, e: 0.01
epoch 720, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 721/10000, time: 226, e: 0.01
epoch 721, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 722/10000, time: 368, e: 0.01
epoch 722, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 723/10000, time: 492, e: 0.01
epoch 723, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 724/10000, time: 262, e: 0.01
epoch 724, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 725/10000, time: 345, e: 0.01
epoch 725, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 726/10000, time: 252, e: 0.01
epoch 726, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 727/10000, time: 376, e: 0.01
epoch 727, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 728/10000, time: 345, e: 0.01
epoch 728, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 729/10000, time: 449, e: 0.01
epoch 729, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 730/10000, time: 317, e: 0.01
epoch 730, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 731/10000, time: 367, e: 0.01
epoch 731, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 732/10000, time: 391, e: 0.01
epoch 732, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 733/10000, time: 337, e: 0.01
epoch 733, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 734/10000, time: 639, e: 0.01
epoch 734, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 735/10000, time: 469, e: 0.01
epoch 735, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 736/10000, time: 321, e: 0.01
epoch 736, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 737/10000, time: 329, e: 0.01
epoch 737, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 738/10000, time: 265, e: 0.01
epoch 738, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 739/10000, time: 267, e: 0.01
epoch 739, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 740/10000, time: 272, e: 0.01
epoch 740, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 741/10000, time: 187, e: 0.01
epoch 741, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 742/10000, time: 290, e: 0.01
epoch 742, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 743/10000, time: 343, e: 0.01
epoch 743, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 744/10000, time: 395, e: 0.01
epoch 744, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 745/10000, time: 346, e: 0.01
epoch 745, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 746/10000, time: 279, e: 0.01
epoch 746, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 747/10000, time: 312, e: 0.01
epoch 747, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 748/10000, time: 200, e: 0.01
epoch 748, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 749/10000, time: 323, e: 0.01
epoch 749, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 750/10000, time: 409, e: 0.01
epoch 750, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 751/10000, time: 498, e: 0.01
epoch 751, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 752/10000, time: 592, e: 0.01
epoch 752, total_reward = 8.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 753/10000, time: 367, e: 0.01
epoch 753, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 754, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 755/10000, time: 615, e: 0.01
epoch 755, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 756/10000, time: 266, e: 0.01
epoch 756, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 757/10000, time: 328, e: 0.01
epoch 757, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 758/10000, time: 218, e: 0.01
epoch 758, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 759/10000, time: 373, e: 0.01
epoch 759, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 760/10000, time: 344, e: 0.01
epoch 760, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 761/10000, time: 463, e: 0.01
epoch 761, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 762/10000, time: 749, e: 0.01
epoch 762, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 763/10000, time: 485, e: 0.01
epoch 763, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 764/10000, time: 480, e: 0.01
epoch 764, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 765/10000, time: 253, e: 0.01
epoch 765, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 766/10000, time: 346, e: 0.01
epoch 766, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 767/10000, time: 369, e: 0.01
epoch 767, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 768/10000, time: 323, e: 0.01
epoch 768, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 769/10000, time: 255, e: 0.01
epoch 769, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 770/10000, time: 398, e: 0.01
epoch 770, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 771/10000, time: 216, e: 0.01
epoch 771, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 772/10000, time: 202, e: 0.01
epoch 772, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 773/10000, time: 300, e: 0.01
epoch 773, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 774/10000, time: 308, e: 0.01
epoch 774, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 775/10000, time: 293, e: 0.01
epoch 775, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 776/10000, time: 190, e: 0.01
epoch 776, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 777/10000, time: 297, e: 0.01
epoch 777, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 778/10000, time: 326, e: 0.01
epoch 778, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 779/10000, time: 288, e: 0.01
epoch 779, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 780/10000, time: 251, e: 0.01
epoch 780, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 781/10000, time: 535, e: 0.01
epoch 781, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 782/10000, time: 372, e: 0.01
epoch 782, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 783/10000, time: 271, e: 0.01
epoch 783, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 784/10000, time: 260, e: 0.01
epoch 784, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 785/10000, time: 433, e: 0.01
epoch 785, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 786/10000, time: 264, e: 0.01
epoch 786, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 787/10000, time: 289, e: 0.01
epoch 787, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 788/10000, time: 189, e: 0.01
epoch 788, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 789/10000, time: 231, e: 0.01
epoch 789, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 790/10000, time: 454, e: 0.01
epoch 790, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 791/10000, time: 297, e: 0.01
epoch 791, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 792/10000, time: 347, e: 0.01
epoch 792, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 793/10000, time: 273, e: 0.01
epoch 793, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 794/10000, time: 260, e: 0.01
epoch 794, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 795/10000, time: 200, e: 0.01
epoch 795, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 796/10000, time: 170, e: 0.01
epoch 796, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 797/10000, time: 196, e: 0.01
epoch 797, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 798/10000, time: 171, e: 0.01
epoch 798, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 799/10000, time: 276, e: 0.01
epoch 799, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 800/10000, time: 306, e: 0.01
epoch 800, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 801/10000, time: 861, e: 0.01
epoch 801, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

epoch 802, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 803/10000, time: 334, e: 0.01
epoch 803, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 804/10000, time: 173, e: 0.01
epoch 804, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 805/10000, time: 274, e: 0.01
epoch 805, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 806/10000, time: 372, e: 0.01
epoch 806, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 807/10000, time: 260, e: 0.01
epoch 807, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 808/10000, time: 288, e: 0.01
epoch 808, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 809/10000, time: 175, e: 0.01
epoch 809, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 810/10000, time: 339, e: 0.01
epoch 810, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 811/10000, time: 289, e: 0.01
epoch 811, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 812/10000, time: 234, e: 0.01
epoch 812, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 813/10000, time: 317, e: 0.01
epoch 813, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 814/10000, time: 295, e: 0.01
epoch 814, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 815/10000, time: 251, e: 0.01
epoch 815, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 816/10000, time: 301, e: 0.01
epoch 816, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 817/10000, time: 259, e: 0.01
epoch 817, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 818/10000, time: 358, e: 0.01
epoch 818, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 819/10000, time: 372, e: 0.01
epoch 819, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 820/10000, time: 372, e: 0.01
epoch 820, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 821/10000, time: 367, e: 0.01
epoch 821, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 822/10000, time: 324, e: 0.01
epoch 822, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 823/10000, time: 237, e: 0.01
epoch 823, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 824/10000, time: 209, e: 0.01
epoch 824, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 825/10000, time: 192, e: 0.01
epoch 825, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 826/10000, time: 254, e: 0.01
epoch 826, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 827/10000, time: 737, e: 0.01
epoch 827, total_reward = 6.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 828/10000, time: 601, e: 0.01
epoch 828, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 829/10000, time: 259, e: 0.01
epoch 829, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 830/10000, time: 171, e: 0.01
epoch 830, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 831/10000, time: 231, e: 0.01
epoch 831, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 832/10000, time: 207, e: 0.01
epoch 832, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 833/10000, time: 237, e: 0.01
epoch 833, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 834/10000, time: 327, e: 0.01
epoch 834, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 835/10000, time: 198, e: 0.01
epoch 835, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 836/10000, time: 158, e: 0.01
epoch 836, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 837/10000, time: 296, e: 0.01
epoch 837, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 838/10000, time: 304, e: 0.01
epoch 838, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 839/10000, time: 262, e: 0.01
epoch 839, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 840/10000, time: 165, e: 0.01
epoch 840, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 841/10000, time: 264, e: 0.01
epoch 841, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 842/10000, time: 265, e: 0.01
epoch 842, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 843/10000, time: 157, e: 0.01
epoch 843, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 844/10000, time: 246, e: 0.01
epoch 844, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 845/10000, time: 169, e: 0.01
epoch 845, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 846/10000, time: 215, e: 0.01
epoch 846, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 847/10000, time: 176, e: 0.01
epoch 847, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 848/10000, time: 254, e: 0.01
epoch 848, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 849/10000, time: 182, e: 0.01
epoch 849, total_reward = 0.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 850/10000, time: 285, e: 0.01
epoch 850, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 851/10000, time: 394, e: 0.01
epoch 851, total_reward = 4.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 852/10000, time: 244, e: 0.01
epoch 852, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 853/10000, time: 204, e: 0.01
epoch 853, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 854/10000, time: 361, e: 0.01
epoch 854, total_reward = 3.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 855/10000, time: 334, e: 0.01
epoch 855, total_reward = 2.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

episode: 856/10000, time: 236, e: 0.01
epoch 856, total_reward = 1.0


HBox(children=(IntProgress(value=0, max=1000), HTML(value='')))

KeyboardInterrupt: 

In [None]:
agent.save("./Breakout-v0_1.h5")

In [15]:
env.step()

TypeError: step() missing 1 required positional argument: 'action'

In [17]:
next_state, reward, done, info = env.step(action)
next_state

array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       ...,

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]], dtype=uint8)

In [22]:
action

0

## Тестирование моделей (компьютер vs модель)

In [0]:
state = env.reset()
state = cv2.resize(cv2.cvtColor(state, cv2.COLOR_RGB2GRAY), None, fx=0.5, fy=0.5)
state = np.reshape(state, [1, state_size])
score = 0

for time in range(1000):
    env.render()
    action = agent.act(state)
    next_state, reward, _, _ = env.step(action)
    score += reward
    next_state = cv2.resize(cv2.cvtColor(next_state, cv2.COLOR_RGB2GRAY), None, fx=0.5, fy=0.5)
    state = np.reshape(next_state, [1, state_size])
env.close()
print("You score: {}".format(score))