# Textworld starting kit notebook

Model: *LSTM-DQN*

When running first: 
 1. Run the first 2 code cells(with pip installations)
 2. Restart runtime
 3. Continue with the next cells

This is done, because there is a problem with dependencies of **textworld** and **colab**, requiring different versions of **prompt-toolkit**

## Todo
### RL:
* [x] Prioritized Replay Memory
* [x] [N-step DQN](https://www.groundai.com/project/understanding-multi-step-deep-reinforcement-learning-a-systematic-study-of-the-dqn-target/)
* [x] [Fixed Q-targets](https://www.freecodecamp.org/news/improvements-in-deep-q-learning-dueling-double-dqn-prioritized-experience-replay-and-fixed-58b130cc5682/)
* [x] [Double DQN](https://towardsdatascience.com/deep-double-q-learning-7fca410b193a)
* [ ] Dueling DQN
* [ ] Multiple inputs (description, inventory, quests, etc)
* [x] Replay memory sample, when not having any alpha/beta priority samples, should take the whole sample of the opposite priority.
* [ ] Noisy nets
* [ ] DRQN ?
* [ ] [Rainbow Paper](https://arxiv.org/pdf/1710.02298.pdf) ?

### NLP:
* [x] Normalize tokens
* [x] Add l2 regularization
* [ ] Remove apostrophes from descriptions

### Game Env:
* [x] Train with simple games in starting kit
* [ ] Check game generation and complexity
* [ ] Make many simple(easy) games, each of which teaches different skill.
* [ ] Train first with simple games, and progressively train on more complex games.

### Debugging:
* [x] Extended log for checking steps and scores on every epoch
* [ ] Graphs with speed, reward, epsilon movement, loss function, episode length (use Tensorboard or similar)
* [ ] Model comparisson

### Colab:
* [x] Export model parameters to Drive


## Setup

In [0]:
!pip install textworld

Collecting textworld
[?25l  Downloading https://files.pythonhosted.org/packages/e6/ed/698f68c284aa6f45013c9cf42b26c09aebc1096226e9d004464eb33a75cf/textworld-1.1.1-cp36-cp36m-manylinux1_x86_64.whl (7.0MB)
[K     |████████████████████████████████| 7.0MB 2.8MB/s 
[?25hCollecting gym==0.10.4 (from textworld)
[?25l  Downloading https://files.pythonhosted.org/packages/3d/e5/4dae1de6534221f74895c8a95ae4eedc816a5fa003db1d4d608cbdc28b35/gym-0.10.4.tar.gz (1.5MB)
[K     |████████████████████████████████| 1.5MB 36.3MB/s 
Collecting jericho>=1.1.5 (from textworld)
[?25l  Downloading https://files.pythonhosted.org/packages/6c/29/162c0b34722bed00d533a657422d98396c9c4f29fc8a35c1787dc0120678/jericho-1.2.3.tar.gz (1.0MB)
[K     |████████████████████████████████| 1.0MB 38.8MB/s 
Collecting pybars3>=0.9.3 (from textworld)
  Downloading https://files.pythonhosted.org/packages/cf/28/bf14035877a989f64081a44337ea2f5858c72b598a01f93a74be541666fb/pybars3-0.9.6.tar.gz
Collecting hashids>=1.2.0 (from text

In [0]:
!pip install prompt-toolkit==1.0.16



In [0]:
!pip install -U -q PyDrive

[?25l[K     |▎                               | 10kB 15.5MB/s eta 0:00:01[K     |▋                               | 20kB 1.8MB/s eta 0:00:01[K     |█                               | 30kB 2.6MB/s eta 0:00:01[K     |█▎                              | 40kB 1.7MB/s eta 0:00:01[K     |█▋                              | 51kB 2.1MB/s eta 0:00:01[K     |██                              | 61kB 2.5MB/s eta 0:00:01[K     |██▎                             | 71kB 2.9MB/s eta 0:00:01[K     |██▋                             | 81kB 3.3MB/s eta 0:00:01[K     |███                             | 92kB 3.6MB/s eta 0:00:01[K     |███▎                            | 102kB 2.8MB/s eta 0:00:01[K     |███▋                            | 112kB 2.8MB/s eta 0:00:01[K     |████                            | 122kB 2.8MB/s eta 0:00:01[K     |████▎                           | 133kB 2.8MB/s eta 0:00:01[K     |████▋                           | 143kB 2.8MB/s eta 0:00:01[K     |█████                     

In [0]:
!pip install pytorch_pretrained_bert

Collecting pytorch_pretrained_bert
[?25l  Downloading https://files.pythonhosted.org/packages/d7/e0/c08d5553b89973d9a240605b9c12404bcf8227590de62bae27acbcfe076b/pytorch_pretrained_bert-0.6.2-py3-none-any.whl (123kB)
[K     |████████████████████████████████| 133kB 2.8MB/s 
Collecting regex (from pytorch_pretrained_bert)
[?25l  Downloading https://files.pythonhosted.org/packages/6f/4e/1b178c38c9a1a184288f72065a65ca01f3154df43c6ad898624149b8b4e0/regex-2019.06.08.tar.gz (651kB)
[K     |████████████████████████████████| 655kB 37.1MB/s 
Building wheels for collected packages: regex
  Building wheel for regex (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/35/e4/80/abf3b33ba89cf65cd262af8a22a5a999cc28fbfabea6b38473
Successfully built regex
Installing collected packages: regex, pytorch-pretrained-bert
Successfully installed pytorch-pretrained-bert-0.6.2 regex-2019.6.8


In [0]:
import os
import random
import logging
import yaml
import copy
import spacy
import numpy as np
import glob

from tqdm import tqdm
from typing import List, Dict, Any
from collections import namedtuple
import pandas as pd

import torch
import torch.nn.functional as F

import gym
import textworld.gym
from textworld import EnvInfos
import random

from pytorch_pretrained_bert import BertTokenizer, BertModel

torch.cuda.is_available()

True

## Generic functions

In [0]:
def to_np(x):
    if isinstance(x, np.ndarray):
        return x
    return x.data.cpu().numpy()


def to_pt(np_matrix, enable_cuda=False, type='long'):
    if type == 'long':
        if enable_cuda:
            return torch.autograd.Variable(torch.from_numpy(np_matrix).type(torch.LongTensor).cuda())
        else:
            return torch.autograd.Variable(torch.from_numpy(np_matrix).type(torch.LongTensor))
    elif type == 'float':
        if enable_cuda:
            return torch.autograd.Variable(torch.from_numpy(np_matrix).type(torch.FloatTensor).cuda())
        else:
            return torch.autograd.Variable(torch.from_numpy(np_matrix).type(torch.FloatTensor))


def _words_to_ids(words, word2id):
    ids = []
    for word in words:
        try:
            ids.append(word2id[word])
        except KeyError:
            ids.append(1)
    return ids


def preproc(s, str_type='None', tokenizer=None, lower_case=True):
    if s is None:
        return ["nothing"]
    s = s.replace("\n", ' ')
    if s.strip() == "":
        return ["nothing"]
    if str_type == 'feedback':
        if "$$$$$$$" in s:
            s = ""
        if "-=" in s:
            s = s.split("-=")[0]
    s = s.strip()
    if len(s) == 0:
        return ["nothing"]
    tokens = [t.text for t in tokenizer(s)]
    # NORMALIZE WORDS
    #tokens = [t.norm_ for t in tokenizer(s)]
    if lower_case:
        tokens = [t.lower() for t in tokens]
    return tokens


def max_len(list_of_list):
    return max(map(len, list_of_list))


def pad_sequences(sequences, maxlen=None, dtype='int32', value=0.):
    '''
    Partially borrowed from Keras
    # Arguments
        sequences: list of lists where each element is a sequence
        maxlen: int, maximum length
        dtype: type to cast the resulting sequence.
        value: float, value to pad the sequences to the desired value.
    # Returns
        x: numpy array with dimensions (number_of_sequences, maxlen)
    '''
    lengths = [len(s) for s in sequences]
    nb_samples = len(sequences)
    if maxlen is None:
        maxlen = np.max(lengths)
    # take the sample shape from the first non empty sequence
    # checking for consistency in the main loop below.
    sample_shape = tuple()
    for s in sequences:
        if len(s) > 0:
            sample_shape = np.asarray(s).shape[1:]
            break
    x = (np.ones((nb_samples, maxlen) + sample_shape) * value).astype(dtype)
    for idx, s in enumerate(sequences):
        if len(s) == 0:
            continue  # empty list was found
        # pre truncating
        trunc = s[-maxlen:]
        # check `trunc` has expected shape
        trunc = np.asarray(trunc, dtype=dtype)
        if trunc.shape[1:] != sample_shape:
            raise ValueError('Shape of sample %s of sequence at position %s is different from expected shape %s' %
                             (trunc.shape[1:], idx, sample_shape))
        # post padding
        x[idx, :len(trunc)] = trunc
    return x
    
    
def freeze_layer(layer):
    for param in layer.parameters():
        param.requires_grad = False
        
def preproc_example(s):
  s = s.replace('$', '')
  s = s.replace('#', '')
  s = s.replace('\n', ' ')
  s = s.replace('  ', ' ')
  s = s.replace('_', '')
  s = s.replace('|', '')
  s = s.replace('\\', '')
  s = s.replace('/', '')
  s = s.replace('-', '')
  s = s.replace('=', '')
  return s

def convert_examples_to_features(sequences, tokenizer):
  """Loads a data file into a list of `InputFeature`s."""
  batch_tokens = []
  batch_input_ids = []
  batch_input_masks = []
  for example in sequences:
      _example = preproc_example(example)      
#       print(_example)
      tokens = tokenizer.tokenize(_example)
      batch_tokens.append(tokens)
      del _example
      del tokens

  max_length = max([len(x) for x in batch_tokens])
#   print('bert_max_seqence', max_length)
  for tokens in batch_tokens:
      input_ids = tokenizer.convert_tokens_to_ids(tokens)

      # The mask has 1 for real tokens and 0 for padding tokens. Only real
      # tokens are attended to.
      input_mask = [1] * len(input_ids)

      # Zero-pad up to the sequence length.
      while len(input_ids) < max_length:
          input_ids.append(0)
          input_mask.append(0)
           
      batch_input_ids.append(input_ids)
      batch_input_masks.append(input_mask)
      del input_ids
      del input_mask
  
  return batch_tokens, batch_input_ids, batch_input_masks



## Layers

In [0]:
def masked_mean(x, m=None, dim=-1):
    """
        mean pooling when there're paddings
        input:  tensor: batch x time x h
                mask:   batch x time
        output: tensor: batch x h
    """
    if m is None:
        return torch.mean(x, dim=dim)
    mask_sum = torch.sum(m, dim=-1)  # batch
    res = torch.sum(x, dim=1)  # batch x h
    res = res / (mask_sum.unsqueeze(-1) + 1e-6)
    return res 

class FastUniLSTM(torch.nn.Module):
    """
    Adapted from https://github.com/facebookresearch/DrQA/
    now supports:   different rnn size for each layer
                    all zero rows in batch (from time distributed layer, by reshaping certain dimension)
    """

    def __init__(self, ninp, nhids, bidir=False, dropout_between_rnn_layers=0.):
        super(FastUniLSTM, self).__init__()
        self.ninp = ninp
        self.nhids = nhids
        self.nlayers = len(self.nhids)
        self.dropout_between_rnn_layers = dropout_between_rnn_layers
        self.stack_rnns(bidir)

    def stack_rnns(self, bidir):
        rnns = [torch.nn.LSTM(self.ninp if i == 0 else self.nhids[i - 1],
                              self.nhids[i],
                              num_layers=1,
                              bidirectional=bidir) for i in range(self.nlayers)]
            
        self.rnns = torch.nn.ModuleList(rnns)

    def forward(self, x, mask):

        def pad_(tensor, n):
            if n > 0:
                zero_pad = torch.autograd.Variable(torch.zeros((n,) + tensor.size()[1:]))
                if x.is_cuda:
                    zero_pad = zero_pad.cuda()
                tensor = torch.cat([tensor, zero_pad])
            return tensor

        """
        inputs: x:          batch x time x inp
                mask:       batch x time
        output: encoding:   batch x time x hidden[-1]
        """
        # Compute sorted sequence lengths
        batch_size = x.size(0)
        lengths = mask.data.eq(1).long().sum(1)  # .squeeze()
        _, idx_sort = torch.sort(lengths, dim=0, descending=True)
        _, idx_unsort = torch.sort(idx_sort, dim=0)

        lengths = list(lengths[idx_sort])
        idx_sort = torch.autograd.Variable(idx_sort)
        idx_unsort = torch.autograd.Variable(idx_unsort)

        # Sort x
        x = x.index_select(0, idx_sort)

        # remove non-zero rows, and remember how many zeros
        n_nonzero = np.count_nonzero(lengths)
        n_zero = batch_size - n_nonzero
        if n_zero != 0:
            lengths = lengths[:n_nonzero]
            x = x[:n_nonzero]

        # Transpose batch and sequence dims
        x = x.transpose(0, 1)

        # Pack it up
        rnn_input = torch.nn.utils.rnn.pack_padded_sequence(x, lengths)

        # Encode all layers
        outputs = [rnn_input]
        for i in range(self.nlayers):
            rnn_input = outputs[-1]

            # dropout between rnn layers
            if self.dropout_between_rnn_layers > 0:
                dropout_input = F.dropout(rnn_input.data,
                                          p=self.dropout_between_rnn_layers,
                                          training=self.training)
                rnn_input = torch.nn.utils.rnn.PackedSequence(dropout_input,
                                                              rnn_input.batch_sizes)
            seq, last = self.rnns[i](rnn_input)
            outputs.append(seq)
            if i == self.nlayers - 1:
                # last layer
                last_state = last[0]  # (num_layers * num_directions, batch, hidden_size)
                last_state = last_state[0]  # batch x hidden_size

        # Unpack everything
        for i, o in enumerate(outputs[1:], 1):
            outputs[i] = torch.nn.utils.rnn.pad_packed_sequence(o)[0]
        output = outputs[-1]

        # Transpose and unsort
        output = output.transpose(0, 1)  # batch x time x enc

        # re-padding
        output = pad_(output, n_zero)
        last_state = pad_(last_state, n_zero)

        output = output.index_select(0, idx_unsort)
        last_state = last_state.index_select(0, idx_unsort)

        # Pad up to original batch sequence length
        if output.size(1) != mask.size(1):
            padding = torch.zeros(output.size(0),
                                  mask.size(1) - output.size(1),
                                  output.size(2)).type(output.data.type())
            output = torch.cat([output, torch.autograd.Variable(padding)], 1)

        output = output.contiguous() * mask.unsqueeze(-1)
        return output, last_state, mask

## Model

In [0]:
class LSTM_DQN(torch.nn.Module):
    model_name = 'lstm_dqn'

    def __init__(self, model_config, embedding_weights, word_vocab, generate_length=5, enable_cuda=False):
        super(LSTM_DQN, self).__init__()
        self.model_config = model_config
        self.enable_cuda = enable_cuda
        self.word_vocab_size = len(word_vocab)
        self.id2word = word_vocab
        self.generate_length = generate_length
        self.read_config()
        self._def_layers(embedding_weights)
        self.init_weights()
        self.print_parameters()

    def print_parameters(self):
#         print(self)
        amount = 0
        for p in self.parameters():
            amount += np.prod(p.size())
        print("Total number of parameters: {}".format(amount))
        parameters = filter(lambda p: p.requires_grad, self.parameters())
        amount = 0
        for p in parameters:
            amount += np.prod(p.size())
        print("Number of trainable parameters: {}".format(amount))

    def read_config(self):
        # model config
        self.freeze_embedding = self.model_config['freeze_embedding']
        self.embedding_size = self.model_config['embedding_size']
        self.encoder_rnn_hidden_size = self.model_config['encoder_rnn_hidden_size']
        self.action_scorer_hidden_dim = self.model_config['action_scorer_hidden_dim']
        self.dropout_between_rnn_layers = self.model_config['dropout_between_rnn_layers']
        self.bidirectional_lstm = self.model_config['bidirectional_lstm']

    def _def_layers(self, embedding_weights=None):
        # word embeddings
        self.word_embedding = Embedding(embedding_size=self.embedding_size,
                                        vocab_size=self.word_vocab_size,
                                        enable_cuda=self.enable_cuda)
        if not(embedding_weights is None):
            self.word_embedding.set_weights(embedding_weights)
            print("Embedding imported!")
            if self.freeze_embedding:
                freeze_layer(self.word_embedding.embedding_layer)
                print("Embedding freezed!")
            
        # lstm encoder
        self.encoder = FastUniLSTM(ninp=self.embedding_size,
                                   nhids=self.encoder_rnn_hidden_size,
                                   bidir=self.bidirectional_lstm,
                                   dropout_between_rnn_layers=self.dropout_between_rnn_layers)
        
        self.action_scorer_shared = torch.nn.Linear(self.encoder_rnn_hidden_size[-1], self.action_scorer_hidden_dim)

        action_scorers = []
        for _ in range(self.generate_length):
            action_scorers.append(torch.nn.Linear(self.action_scorer_hidden_dim, self.word_vocab_size, bias=False))
        self.action_scorers = torch.nn.ModuleList(action_scorers)
        self.fake_recurrent_mask = None

    def init_weights(self):
        torch.nn.init.xavier_uniform_(self.action_scorer_shared.weight.data)
        for i in range(len(self.action_scorers)):
            torch.nn.init.xavier_uniform_(self.action_scorers[i].weight.data)
        self.action_scorer_shared.bias.data.fill_(0)

    def representation_generator(self, _input_words):
        embeddings, mask = self.word_embedding.forward(_input_words)  # batch x time x emb
        encoding_sequence, _, _ = self.encoder.forward(embeddings, mask)  # batch x time x h
        mean_encoding = masked_mean(encoding_sequence, mask)  # batch x h
        return mean_encoding

    def action_scorer(self, state_representation):
        hidden = self.action_scorer_shared.forward(state_representation)  # batch x hid
        hidden = F.relu(hidden)  # batch x hid
        action_ranks = []
        for i in range(len(self.action_scorers)):
            action_ranks.append(self.action_scorers[i].forward(hidden))  # batch x n_vocab
        return action_ranks

In [0]:
logger = logging.getLogger(__name__)

class Bert_DQN(torch.nn.Module):
    model_name = 'bert_dqn'

    def __init__(self, model_config, word_vocab, generate_length=5, enable_cuda=False):
        super(Bert_DQN, self).__init__()
        self.model_config = model_config
        self.enable_cuda = enable_cuda
        self.word_vocab_size = len(word_vocab)
        self.id2word = word_vocab
        self.generate_length = generate_length
        self.read_config()
#         print(enable_cuda)
        self.device = torch.device("cuda" if enable_cuda else "cpu")
        self.tokenizer = BertTokenizer.from_pretrained(self.bert_model, do_lower_case=True)
        self._def_layers()
        self.init_weights()
        

    def print_parameters(self):
#       print(self)
      amount = 0
      for p in self.parameters():
          amount += np.prod(p.size())
      print("total number of parameters: %s" % (amount))
      parameters = filter(lambda p: p.requires_grad, self.parameters())
      amount = 0
      for p in parameters:
          amount += np.prod(p.size())
      print("number of trainable parameters: %s" % (amount))

    def read_config(self):
        # model config
#         self.embedding_size = self.model_config['embedding_size']
#         self.encoder_rnn_hidden_size = self.model_config['encoder_rnn_hidden_size']
#         self.action_scorer_hidden_dim = self.model_config['action_scorer_hidden_dim']
#         self.dropout_between_rnn_layers = self.model_config['dropout_between_rnn_layers']
        self.bert_model = self.model_config['bert_model']
        self.action_scorer_hidden_dim = self.model_config['action_scorer_hidden_dim']
        self.train_bert = self.model_config['train_bert']
        
    def _def_layers(self):

        # word embeddings
#         self.word_embedding = Embedding(embedding_size=self.embedding_size,
#                                         vocab_size=self.word_vocab_size,
#                                         enable_cuda=self.enable_cuda)

#         # lstm encoder
#         self.encoder = FastUniLSTM(ninp=self.embedding_size,
#                                    nhids=self.encoder_rnn_hidden_size,
#                                    dropout_between_rnn_layers=self.dropout_
        self.encoder = BertModel.from_pretrained(self.bert_model).to(self.device)
        if not self.train_bert:
          freeze_layer(self.encoder)
        # only for base models
        # for large models is 
        bert_embeddings = 768

        self.action_scorer_shared = torch.nn.Linear(bert_embeddings, self.action_scorer_hidden_dim)
        action_scorers = []
        for _ in range(self.generate_length):
            action_scorers.append(torch.nn.Linear(self.action_scorer_hidden_dim, self.word_vocab_size, bias=False))
        self.action_scorers = torch.nn.ModuleList(action_scorers)
        self.fake_recurrent_mask = None

    def init_weights(self):
        torch.nn.init.xavier_uniform_(self.action_scorer_shared.weight.data)
        for i in range(len(self.action_scorers)):
            torch.nn.init.xavier_uniform_(self.action_scorers[i].weight.data)
        self.action_scorer_shared.bias.data.fill_(0)

    def representation_generator(self, ids, mask):
        ids = ids.to(self.device)
        mask = mask.to(self.device)
        
        layers, _ = self.encoder(ids, attention_mask=mask)
#         encoding_sequence = layers[self.layer_index]
#         print('layer length: ', len(layers))
        encoding_sequence = layers[-2].type(torch.FloatTensor)
        encoding_sequence = encoding_sequence.to(self.device)
    
#         print('encoding_sequence: ', type(encoding_sequence))
#         print('encoding_sequence: ', encoding_sequence)
        mask = mask.type(torch.FloatTensor).to(self.device)
#         print('mask: ', type(mask))
#         print('mask: ', mask)
        
#         embeddings, mask = self.word_embedding.forward(_input_words)  # batch x time x emb
#         encoding_sequence, _, _ = self.encoder.forward(embeddings, mask)  # batch x time x h
        res_mean = masked_mean(encoding_sequence, mask)  # batch x h
        del layers
        del encoding_sequence
        del mask
        
        return res_mean


    def action_scorer(self, state_representation):
        hidden = self.action_scorer_shared.forward(state_representation)  # batch x hid
        hidden = F.relu(hidden)  # batch x hid
        action_ranks = []
        for i in range(len(self.action_scorers)):
            action_ranks.append(self.action_scorers[i].forward(hidden))  # batch x n_vocab
        del hidden
        return action_ranks

Cache score

In [0]:
class HistoryScoreCache(object):

    def __init__(self, capacity=1):
        self.capacity = capacity
        self.reset()

    def push(self, stuff):
        """stuff is float."""
        if len(self.memory) < self.capacity:
            self.memory.append(stuff)
        else:
            self.memory = self.memory[1:] + [stuff]

    def get_avg(self):
        return np.mean(np.array(self.memory))

    def reset(self):
        self.memory = []

    def __len__(self):
        return len(self.memory)

## Memory

In [0]:
# a snapshot of state to be stored in replay memory
Transition = namedtuple('Transition', ('bert_ids', 'bert_masks',
                                       'word_indices',
                                       'reward', 'mask', 'done',
                                       'next_bert_ids', 'next_bert_masks',
                                       'next_word_masks'))


In [0]:
class PrioritizedReplayMemory(object):

    def __init__(self, capacity=100000, priority_fraction=0.0):
        # prioritized replay memory
        self.priority_fraction = priority_fraction
        self.alpha_capacity = int(capacity * priority_fraction)
        self.beta_capacity = capacity - self.alpha_capacity
        self.alpha_memory, self.beta_memory = [], []
        self.alpha_position, self.beta_position = 0, 0

    def push(self, is_prior, transition):
        """Saves a transition."""
        if self.priority_fraction == 0.0:
            is_prior = False
        if is_prior:
            if len(self.alpha_memory) < self.alpha_capacity:
                self.alpha_memory.append(None)
            self.alpha_memory[self.alpha_position] = transition
            self.alpha_position = (self.alpha_position + 1) % self.alpha_capacity
        else:
            if len(self.beta_memory) < self.beta_capacity:
                self.beta_memory.append(None)
            self.beta_memory[self.beta_position] = transition
            self.beta_position = (self.beta_position + 1) % self.beta_capacity

    def sample(self, batch_size):
        if self.priority_fraction == 0.0 or len(self.alpha_memory) == 0:
            from_beta = min(batch_size, len(self.beta_memory))
            res = random.sample(self.beta_memory, from_beta)
        elif len(self.beta_memory) == 0:
            from_alpha = min(batch_size, len(self.alpha_memory))
            res = random.sample(self.alpha_memory, from_alpha)
        else:
            priority_batch = int(self.priority_fraction * batch_size)
            from_alpha = min(priority_batch, len(self.alpha_memory))
            from_beta = min(batch_size - priority_batch, len(self.beta_memory))
            res = random.sample(self.alpha_memory, from_alpha) + random.sample(self.beta_memory, from_beta)
        random.shuffle(res)
        return res

    def __len__(self):
        return len(self.alpha_memory) + len(self.beta_memory)


## Agent

In [0]:
class CustomAgent:
    def __init__(self):
        global embedding_weights
        """
        Arguments:
            word_vocab: List of words supported.
        """
        self.mode = "train"
        with open("./vocab.txt") as f:
            self.word_vocab = f.read().split("\n")
        with open("config.yaml") as reader:
            self.config = yaml.safe_load(reader)
        self.word2id = {}
        self.last_wid = 0
        for w in self.word_vocab:
            self.word2id[w] = self.last_wid
            self.last_wid+=1
        self.EOS_id = self.word2id["</S>"]

        self.batch_size = self.config['training']['batch_size']
        self.max_nb_steps_per_episode = self.config['training']['max_nb_steps_per_episode']
        self.nb_epochs = self.config['training']['nb_epochs']

        # Set the random seed manually for reproducibility.
        np.random.seed(self.config['general']['random_seed'])
        torch.manual_seed(self.config['general']['random_seed'])
        if torch.cuda.is_available():
            if not self.config['general']['use_cuda']:
                logging.warning("WARNING: CUDA device detected but 'use_cuda: false' found in config.yaml")
                self.use_cuda = False
            else:
                torch.backends.cudnn.deterministic = True
                torch.cuda.manual_seed(self.config['general']['random_seed'])
                self.use_cuda = True
        else:
            self.use_cuda = False
        
        
        print("Creating Q-Network")
        self.model = Bert_DQN(model_config=self.config["model"],
                              word_vocab=self.word_vocab,
                              enable_cuda=self.use_cuda)
        print("Creating Target Network")
        self.target_model = Bert_DQN(model_config=self.config["model"],
                              word_vocab=self.word_vocab,
                              enable_cuda=self.use_cuda)
        
        self.target_model.print_parameters()
        
        self.update_target_model_count = 0
        self.target_model_update_frequency = self.config['training']['target_model_update_frequency']

        self.experiment_tag = self.config['checkpoint']['experiment_tag']
        self.model_checkpoint_dir = self.config['checkpoint']['model_checkpoint_dir']
        self.save_frequency = self.config['checkpoint']['save_frequency']

        if self.config['checkpoint']['load_pretrained']:
            self.load_pretrained_model(self.model_checkpoint_dir)
        if self.use_cuda:
            self.model.cuda()
            self.target_model.cuda()

        self.replay_batch_size = self.config['general']['replay_batch_size']
        self.replay_memory = PrioritizedReplayMemory(self.config['general']['replay_memory_capacity'],
                                                     priority_fraction=self.config['general']['replay_memory_priority_fraction'])
        self.wt_index = 0

        # optimizer
        parameters = filter(lambda p: p.requires_grad, self.model.parameters())
        self.optimizer = torch.optim.Adam(parameters, lr=self.config['training']['optimizer']['learning_rate'])
        
        # n-step
        self.nsteps = self.config['general']['nsteps']
        self.nstep_buffer = []

        # epsilon greedy
        self.epsilon_anneal_episodes = self.config['general']['epsilon_anneal_episodes']
        self.epsilon_anneal_from = self.config['general']['epsilon_anneal_from']
        self.epsilon_anneal_to = self.config['general']['epsilon_anneal_to']
        self.epsilon = self.epsilon_anneal_from
        self.update_per_k_game_steps = self.config['general']['update_per_k_game_steps']
        self.clip_grad_norm = self.config['training']['optimizer']['clip_grad_norm']

        self.nlp = spacy.load('en', disable=['ner', 'parser', 'tagger'])
        self.preposition_map = {"take": "from",
                                "chop": "with",
                                "slice": "with",
                                "dice": "with",
                                "cook": "with",
                                "insert": "into",
                                "put": "on"}
        self.single_word_verbs = set(["inventory", "look"])
        self.discount_gamma = self.config['general']['discount_gamma']
        self.current_episode = 0
        self.current_step = 0
        self._epsiode_has_started = False
        self.history_avg_scores = HistoryScoreCache(capacity=1000)
        self.best_avg_score_so_far = 0.0
        self.loss = []

    def train(self, imitate = False):
        """
        Tell the agent that it's training phase.
        """
        self.mode = "train"
        self.imitate = imitate
        self.wt_index = 0
#         print(self.wt_index)
        self.model.train()

    def eval(self):
        """
        Tell the agent that it's evaluation phase.
        """
        self.mode = "eval"
        self.model.eval()

    def _start_episode(self, obs: List[str], infos: Dict[str, List[Any]]) -> None:
        """
        Prepare the agent for the upcoming episode.

        Arguments:
            obs: Initial feedback for each game.
            infos: Additional information for each game.
        """
        self.init(obs, infos)
        self._epsiode_has_started = True

    def _end_episode(self, obs: List[str], scores: List[int], infos: Dict[str, List[Any]]) -> None:
        """
        Tell the agent the episode has terminated.

        Arguments:
            obs: Previous command's feedback for each game.
            score: The score obtained so far for each game.
            infos: Additional information for each game.
        """
        self.finish()
        self._epsiode_has_started = False

    def load_pretrained_model(self, load_from_dir):
        """
        Load the pretrained model's last checkpoint from a directory

        Arguments:
            load_from_dir: Directory with save model parameters
        """
        
        checkpoints_glob = glob.glob(os.path.join(load_from_dir, '*.pt'))
        if len(checkpoints_glob) == 0:
            print("No checkpoints to load from: " + load_from_dir)
            return
        
        load_from = max(checkpoints_glob)
        print("loading model from %s\n" % (load_from))
        
        try:
            if self.use_cuda:
                state_dict = torch.load(load_from)
            else:
                state_dict = torch.load(load_from, map_location='cpu')
            self.model.load_state_dict(state_dict)
        except:
            print("Failed to load checkpoint...")

    def select_additional_infos(self) -> EnvInfos:
        """
        Returns what additional information should be made available at each game step.

        Requested information will be included within the `infos` dictionary
        passed to `CustomAgent.act()`. To request specific information, create a
        :py:class:`textworld.EnvInfos <textworld.envs.wrappers.filter.EnvInfos>`
        and set the appropriate attributes to `True`. The possible choices are:

        * `description`: text description of the current room, i.e. output of the `look` command;
        * `inventory`: text listing of the player's inventory, i.e. output of the `inventory` command;
        * `max_score`: maximum reachable score of the game;
        * `objective`: objective of the game described in text;
        * `entities`: names of all entities in the game;
        * `verbs`: verbs understood by the the game;
        * `command_templates`: templates for commands understood by the the game;
        * `admissible_commands`: all commands relevant to the current state;

        In addition to the standard information, game specific information
        can be requested by appending corresponding strings to the `extras`
        attribute. For this competition, the possible extras are:

        * `'recipe'`: description of the cookbook;
        * `'walkthrough'`: one possible solution to the game (not guaranteed to be optimal);

        Example:
            Here is an example of how to request information and retrieve it.

            >>> from textworld import EnvInfos
            >>> request_infos = EnvInfos(description=True, inventory=True, extras=["recipe"])
            ...
            >>> env = gym.make(env_id)
            >>> ob, infos = env.reset()
            >>> print(infos["description"])
            >>> print(infos["inventory"])
            >>> print(infos["extra.recipe"])

        Notes:
            The following information *won't* be available at test time:

            * 'walkthrough'
        """
        request_infos = EnvInfos()
        request_infos.description = True
        request_infos.inventory = True
        request_infos.entities = True
        request_infos.verbs = True
        request_infos.extras = ["recipe", "walkthrough"]
        return request_infos

    def init(self, obs: List[str], infos: Dict[str, List[Any]]):
        """
        Prepare the agent for the upcoming games.

        Arguments:
            obs: Previous command's feedback for each game.
            infos: Additional information for each game.
        """
        # reset agent, get vocabulary masks for verbs / adjectives / nouns
        self.scores = []
        self.dones = []
        self.prev_actions = ["" for _ in range(len(obs))]
        # get word masks
#         print(infos['verbs'])

        batch_size = len(infos["verbs"])
#         print("VERBS SIZE:")
#         print(batch_size)
        verbs_word_list = infos["verbs"]
        noun_word_list, adj_word_list = [], []
        for entities in infos["entities"]:
            tmp_nouns, tmp_adjs = [], []
            for name in entities:
                split = name.split()
                tmp_nouns.append(split[-1])
                if len(split) > 1:
                    tmp_adjs.append(" ".join(split[:-1]))
            noun_word_list.append(list(set(tmp_nouns)))
            adj_word_list.append(list(set(tmp_adjs)))

        verb_mask = np.zeros((batch_size, len(self.word_vocab)), dtype="float32")
        noun_mask = np.zeros((batch_size, len(self.word_vocab)), dtype="float32")
        adj_mask = np.zeros((batch_size, len(self.word_vocab)), dtype="float32")
        for i in range(batch_size):
            for w in verbs_word_list[i]:
                if w in self.word2id:
                    verb_mask[i][self.word2id[w]] = 1.0
            for w in noun_word_list[i]:
                if w in self.word2id:
                    noun_mask[i][self.word2id[w]] = 1.0
            for w in adj_word_list[i]:
                if w in self.word2id:
                    adj_mask[i][self.word2id[w]] = 1.0
#                 else:
#                     self.word2id[w] = self.last_wid
#                     self.last_wid += 1
#                     adj_mask[i][self.word2id[w]] = 1.0
#                     self.word_vocab.append(w)
                    
        second_noun_mask = copy.copy(noun_mask)
        second_adj_mask = copy.copy(adj_mask)
        second_noun_mask[:, self.EOS_id] = 1.0
        adj_mask[:, self.EOS_id] = 1.0
        second_adj_mask[:, self.EOS_id] = 1.0
        self.word_masks_np = [verb_mask, adj_mask, noun_mask, second_adj_mask, second_noun_mask]

        self.cache_bert_ids = None
        self.cache_bert_mask = None
        self.cache_chosen_indices = None
        self.current_step = 0
        
    def append_to_replay(self, is_prior, transition):
        self.nstep_buffer.append((is_prior, transition))

        if len(self.nstep_buffer) < self.nsteps:
            return
        
        R = sum([self.nstep_buffer[i][1].reward * (self.discount_gamma**i) for i in range(self.nsteps)])
        prior, transition = self.nstep_buffer.pop(0)

        self.replay_memory.push(prior, transition._replace(reward=R))


    def get_game_step_info(self, obs: List[str], infos: Dict[str, List[Any]]):
        """
        Get all the available information, and concat them together to be tensor for
        a neural model. we use post padding here, all information are tokenized here.

        Arguments:
            obs: Previous command's feedback for each game.
            infos: Additional information for each game.
        """
      
        sep = ' [SEP] '
        description_text_list = [_d + sep + _i + sep + _q + sep + _f + sep + _pa for (_d, _i, _q, _f, _pa) 
                                  in zip(infos['description'], infos['inventory'], infos['extra.recipe'], obs, self.prev_actions)]

        _, bert_ids, bert_masks  = convert_examples_to_features(description_text_list, self.model.tokenizer)

        del description_text_list
        
        return bert_ids, bert_masks

    def word_ids_to_commands(self, verb, adj, noun, adj_2, noun_2):
        """
        Turn the 5 indices into actual command strings.

        Arguments:
            verb: Index of the guessing verb in vocabulary
            adj: Index of the guessing adjective in vocabulary
            noun: Index of the guessing noun in vocabulary
            adj_2: Index of the second guessing adjective in vocabulary
            noun_2: Index of the second guessing noun in vocabulary
        """
        # turns 5 indices into actual command strings
        if self.word_vocab[verb] in self.single_word_verbs:
            return self.word_vocab[verb]
        if adj == self.EOS_id:
            res = self.word_vocab[verb] + " " + self.word_vocab[noun]
        else:
            res = self.word_vocab[verb] + " " + self.word_vocab[adj] + " " + self.word_vocab[noun]
        if self.word_vocab[verb] not in self.preposition_map:
            return res
        if noun_2 == self.EOS_id:
            return res
        prep = self.preposition_map[self.word_vocab[verb]]
        if adj_2 == self.EOS_id:
            res = res + " " + prep + " " + self.word_vocab[noun_2]
        else:
            res =  res + " " + prep + " " + self.word_vocab[adj_2] + " " + self.word_vocab[noun_2]
        return res
    
    def get_wordid_from_vocab(self, word):
      if word in self.word2id.keys():
        return self.word2id[word]
      else:
        return self.EOS_id
    
    def command_to_word_ids(self, cmd, batch_size):
      verb_id=self.EOS_id
      first_adj=self.EOS_id
      first_noun=self.EOS_id
      second_adj=self.EOS_id
      second_noun=self.EOS_id
      
#       print('cmd_to_ids')
#       print(cmd.split())
      ids = _words_to_ids(cmd.split(), self.word2id)
#       print(ids)
      for ind, i in enumerate(ids):
        if self.word_masks_np[0][0][i]==1.0:
          verb = ind
          verb_id = i
      nouns=[]
      for ind, i in enumerate(ids):
        if self.word_masks_np[2][0][i]==1.0:
          nouns.append((ind,i))
      if len(nouns) > 0:
        if nouns[0][0] != verb - 1:
          adj_ids = ids[verb + 1: nouns[0][0]]
          adj=''
          adj= ' '.join([self.word_vocab[x] for x in adj_ids]) 
#           print(adj)
          first_adj=self.get_wordid_from_vocab(adj)
#         print(nouns)
        first_noun=nouns[0][1]
      
      if len(nouns) > 1:
        if nouns[1][0] != nouns[0][0] - 1:
          adj_ids = ids[nouns[0][0]: nouns[1][0]]
          adj= ' '.join([self.word_vocab[x] for x in adj_ids]) 
          second_adj=self.get_wordid_from_vocab(adj)
        second_noun=nouns[1][1]
        
       
      list_ids = [verb_id, first_adj, first_noun, second_adj, second_noun]
      return [to_pt(np.array([[x]]*batch_size), self.use_cuda) for x in list_ids]
          
       
          
    def get_chosen_strings(self, chosen_indices):
        """
        Turns list of word indices into actual command strings.

        Arguments:
            chosen_indices: Word indices chosen by model.
        """
        chosen_indices_np = [to_np(item)[:, 0] for item in chosen_indices]
        res_str = []
        batch_size = chosen_indices_np[0].shape[0]
        for i in range(batch_size):
            verb, adj, noun, adj_2, noun_2 = chosen_indices_np[0][i],\
                                             chosen_indices_np[1][i],\
                                             chosen_indices_np[2][i],\
                                             chosen_indices_np[3][i],\
                                             chosen_indices_np[4][i]
            res_str.append(self.word_ids_to_commands(verb, adj, noun, adj_2, noun_2))
            del verb
            del adj
            del noun
            del adj_2
            del noun_2
            
        del chosen_indices_np
        return res_str

    def choose_random_command(self, word_ranks, word_masks_np):
        """
        Generate a command randomly, for epsilon greedy.

        Arguments:
            word_ranks: Q values for each word by model.action_scorer.
            word_masks_np: Vocabulary masks for words depending on their type (verb, adj, noun).
        """
        batch_size = word_ranks[0].size(0)
        word_ranks_np = [to_np(item) for item in word_ranks]  # list of batch x n_vocab
        word_ranks_np = [r * m for r, m in zip(word_ranks_np, word_masks_np)]  # list of batch x n_vocab
        word_indices = []
        for i in range(len(word_ranks_np)):
            indices = []
            for j in range(batch_size):
                msk = word_masks_np[i][j]  # vocab
                indices.append(np.random.choice(len(msk), p=msk / np.sum(msk, -1)))
                del msk
#             print('random: ', indices)
            
            word_indices.append(np.array(indices))
            del indices
        # word_indices: list of batch
        word_qvalues = [[] for _ in word_masks_np]
        for i in range(batch_size):
            for j in range(len(word_qvalues)):
                word_qvalues[j].append(word_ranks[j][i][word_indices[j][i]])
        word_qvalues = [torch.stack(item) for item in word_qvalues]
        word_indices = [to_pt(item, self.use_cuda) for item in word_indices]
        word_indices = [item.unsqueeze(-1) for item in word_indices]  # list of batch x 1
        
        del word_ranks_np
        
        return word_qvalues, word_indices

    def choose_maxQ_command(self, word_ranks, word_masks_np):
        """
        Generate a command by maximum q values, for epsilon greedy.

        Arguments:
            word_ranks: Q values for each word by model.action_scorer.
            word_masks_np: Vocabulary masks for words depending on their type (verb, adj, noun).
        """
        batch_size = word_ranks[0].size(0)
        word_ranks_np = [to_np(item) for item in word_ranks]  # list of batch x n_vocab
        word_ranks_np = [r - np.min(r) for r in word_ranks_np] # minus the min value, so that all values are non-negative
        word_ranks_np = [r * m for r, m in zip(word_ranks_np, word_masks_np)]  # list of batch x n_vocab
        word_indices = [np.argmax(item, -1) for item in word_ranks_np]  # list of batch
        word_qvalues = [[] for _ in word_masks_np]

        for i in range(batch_size):
            for j in range(len(word_qvalues)):
                word_qvalues[j].append(word_ranks[j][i][word_indices[j][i]])

        word_qvalues = [torch.stack(item) for item in word_qvalues]
        word_indices = [to_pt(item, self.use_cuda) for item in word_indices]
        word_indices = [item.unsqueeze(-1) for item in word_indices]  # list of batch x 1
        
        del word_ranks_np
        
        return word_qvalues, word_indices

    def get_ranks(self, model, bert_ids, bert_masks):
        """
        Given input description tensor, call model forward, to get Q values of words.

        Arguments:
            input_description: Input tensors, which include all the information chosen in
            select_additional_infos() concatenated together.
        """
        
        bert_ids = torch.tensor([x for x in bert_ids], dtype=torch.long)
        bert_masks = torch.tensor([x for x in bert_masks], dtype=torch.long)
        state_representation = model.representation_generator(bert_ids, bert_masks)
        del bert_ids
        del bert_masks
        
        word_ranks = model.action_scorer(state_representation)  # each element in list has batch x n_vocab size
        del state_representation
        return word_ranks
    
    def act_eval(self, obs: List[str], scores: List[int], dones: List[bool], infos: Dict[str, List[Any]]) -> List[str]:
        """
        Acts upon the current list of observations, during evaluation.

        One text command must be returned for each observation.

        Arguments:
            obs: Previous command's feedback for each game.
            score: The score obtained so far for each game (at previous step).
            done: Whether a game is finished (at previous step).
            infos: Additional information for each game.

        Returns:
            Text commands to be performed (one per observation).

        Notes:
            Commands returned for games marked as `done` have no effect.
            The states for finished games are simply copy over until all
            games are done, in which case `CustomAgent.finish()` is called
            instead.
        """

        if self.current_step > 0:
            # append scores / dones from previous step into memory
            self.scores.append(scores)
            self.dones.append(dones)

        if all(dones):
            self._end_episode(obs, scores, infos)
            return  # Nothing to return.

        bert_ids, bert_masks = self.get_game_step_info(obs, infos)
        word_ranks = self.get_ranks(self.model, bert_ids, bert_masks)  # list of batch x vocab
        
        del bert_ids
        del bert_masks
        
        _, word_indices_maxq = self.choose_maxQ_command(word_ranks, self.word_masks_np)

        chosen_indices = word_indices_maxq
        chosen_indices = [item.detach() for item in chosen_indices]
        chosen_strings = self.get_chosen_strings(chosen_indices)
        self.prev_actions = chosen_strings
        self.current_step += 1

        del word_indices_max_q
        
        return chosen_strings

    def act(self, obs: List[str], scores: List[int], dones: List[bool], infos: Dict[str, List[Any]]) -> List[str]:
        """
        Acts upon the current list of observations.

        One text command must be returned for each observation.

        Arguments:
            obs: Previous command's feedback for each game.
            score: The score obtained so far for each game (at previous step).
            done: Whether a game is finished (at previous step).
            infos: Additional information for each game.

        Returns:
            Text commands to be performed (one per observation).

        Notes:
            Commands returned for games marked as `done` have no effect.
            The states for finished games are simply copy over until all
            games are done, in which case `CustomAgent.finish()` is called
            instead.
        """
        if not self._epsiode_has_started:
            self._start_episode(obs, infos)

        if self.mode == "eval":
            return self.act_eval(obs, scores, dones, infos)

        if self.current_step > 0:
            # append scores / dones from previous step into memory
            self.scores.append(scores)
            self.dones.append(dones)
            # compute previous step's rewards and masks
            rewards_np, rewards, mask_np, mask = self.compute_reward()

        
        bert_ids, bert_masks = self.get_game_step_info(obs, infos)
        # generate commands for one game step, epsilon greedy is applied, i.e.,
        # there is epsilon of chance to generate random commands
        
        if self.imitate:
          print('imitate')
          correct_cmd=infos['extra.walkthrough'][0][self.wt_index]
          print(correct_cmd)
          if self.wt_index != len(infos['extra.walkthrough'][0]) - 1:
            self.wt_index+=1
          chosen_indices = self.command_to_word_ids(correct_cmd, len(bert_ids))
        else:
          word_ranks = self.get_ranks(self.model, bert_ids, bert_masks)  # list of batch x vocab

          _, word_indices_maxq = self.choose_maxQ_command(word_ranks, self.word_masks_np)
          _, word_indices_random = self.choose_random_command(word_ranks, self.word_masks_np)
          # random number for epsilon greedyupdate
          rand_num = np.random.uniform(low=0.0, high=1.0, size=(len(bert_ids), 1))
          less_than_epsilon = (rand_num < self.epsilon).astype("float32")  # batch
          greater_than_epsilon = 1.0 - less_than_epsilon
          less_than_epsilon = to_pt(less_than_epsilon, self.use_cuda, type='float')
          greater_than_epsilon = to_pt(greater_than_epsilon, self.use_cuda, type='float')
          less_than_epsilon, greater_than_epsilon = less_than_epsilon.long(), greater_than_epsilon.long()

#           print('Random_step: ',less_than_epsilon.tolist())

          chosen_indices = [
              less_than_epsilon * idx_random + greater_than_epsilon * idx_maxq 
              for idx_random, idx_maxq in zip(word_indices_random, word_indices_maxq)
          ]
        
        chosen_indices = [item.detach() for item in chosen_indices]
        chosen_strings = self.get_chosen_strings(chosen_indices)
#         print(chosen_strings)
        self.prev_actions = chosen_strings

        # push info from previous game step into replay memory
        if self.current_step > 0:
            for b in range(len(obs)):
                if mask_np[b] == 0:
                    continue
                is_prior = rewards_np[b] > 0.0
                t = Transition(self.cache_bert_ids[b],
                               self.cache_bert_masks[b],
                               [ item[b] for item in self.cache_chosen_indices], 
                               rewards[b], 
                               mask[b], 
                               dones[b], 
                               bert_ids[b],
                               bert_masks[b], 
                               [item[b] for item in self.word_masks_np])
               # print("ACT: {}".format(t.observation_id_list))
                self.append_to_replay(is_prior, t)

        # cache new info in current game step into caches
        self.cache_bert_ids = bert_ids
        self.cache_bert_masks = bert_masks
        self.cache_chosen_indices = chosen_indices

        # update neural model by replaying snapshots in replay memory
        if self.current_step > 0 and self.current_step % self.update_per_k_game_steps == 0:
            loss = self.update()
            
            if loss is not None:
                self.loss.append(to_np(loss).mean())
                # Backpropagate
                self.optimizer.zero_grad()
                loss.backward(retain_graph=True)
                # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
                torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.clip_grad_norm)
                self.optimizer.step()  # apply gradients

        self.current_step += 1

        if all(dones):
            self._end_episode(obs, scores, infos)
            return  # Nothing to return.
        return chosen_strings

    def compute_reward(self):
        """
        Compute rewards by agent. Note this is different from what the training/evaluation
        scripts do. Agent keeps track of scores and other game information for training purpose.
        """
        # mask = 1 if game is not finished or just finished at current step
        if len(self.dones) == 1:
            # it's not possible to finish a game at 0th step
            mask = [1.0 for _ in self.dones[-1]]
        else:
            assert len(self.dones) > 1
            mask = [1.0 if not self.dones[-2][i] else 0.0 for i in range(len(self.dones[-1]))]
        mask = np.array(mask, dtype='float32')
        mask_pt = to_pt(mask, self.use_cuda, type='float')
        # rewards returned by game engine are always accumulated value the
        # agent have recieved. so the reward it gets in the current game step
        # is the new value minus values at previous step.
        rewards = np.array(self.scores[-1], dtype='float32')  # batch
        if len(self.scores) > 1:
            prev_rewards = np.array(self.scores[-2], dtype='float32')
            rewards = rewards - prev_rewards
        rewards_pt = to_pt(rewards, self.use_cuda, type='float')

        return rewards, rewards_pt, mask, mask_pt
    
    def update_target_model(self):
        self.update_target_model_count = (self.update_target_model_count + 1) % self.target_model_update_frequency
        if self.update_target_model_count == 0:
            self.target_model.load_state_dict(self.model.state_dict())

    def update(self):
        """
        Update neural model in agent. In this example we follow algorithm
        of updating model in dqn with replay memory.
        """
        if len(self.replay_memory) < self.replay_batch_size:
            return None
        
        self.update_target_model()
        
        transitions = self.replay_memory.sample(self.replay_batch_size)
        batch = Transition(*zip(*transitions))
        
        del transitions

        bert_ids = pad_sequences(batch.bert_ids, maxlen=max_len(batch.bert_ids)).astype('int32')
        bert_masks = pad_sequences(batch.bert_masks, maxlen=max_len(batch.bert_masks)).astype('int32')

        next_bert_ids = pad_sequences(batch.next_bert_ids, maxlen=max_len(batch.next_bert_ids)).astype('int32')
        next_bert_masks = pad_sequences(batch.next_bert_masks, maxlen=max_len(batch.next_bert_masks)).astype('int32')

        chosen_indices = list(list(zip(*batch.word_indices)))
        chosen_indices = [torch.stack(item, 0) for item in chosen_indices]  # list of batch x 1
        
        word_ranks = self.get_ranks(self.model, bert_ids, bert_masks)  # list of batch x vocab
        
        del bert_ids
        del bert_masks
        
        word_qvalues = [w_rank.gather(1, idx).squeeze(-1) for w_rank, idx in zip(word_ranks, chosen_indices)]  # list of batch
        
        del chosen_indices
        del word_ranks
        
        q_value = torch.mean(torch.stack(word_qvalues, -1), -1)  # batch
        del word_qvalues

        # Action selection, using q-network
        next_word_ranks = self.get_ranks(self.model, next_bert_ids, next_bert_masks) # batch x n_verb, batch x n_noun, batch x n_second_noun
        next_word_masks = list(list(zip(*batch.next_word_masks)))
        next_word_masks = [np.stack(item, 0) for item in next_word_masks]

        _, next_word_indexes = self.choose_maxQ_command(next_word_ranks, next_word_masks)
        
        del next_word_masks
        del next_word_ranks
        
        # Action evaluation, using target network
        eval_next_word_ranks = self.get_ranks(self.target_model, next_bert_ids, next_bert_masks)
        next_word_qvalues = [
            rank.gather(1, idx.detach()).squeeze(-1)
            for rank, idx in zip(eval_next_word_ranks, next_word_indexes)
        ]
        
        del next_word_indexes
        del eval_next_word_ranks
        del next_bert_ids
        del next_bert_masks
        
        next_q_value = torch.mean(torch.stack(next_word_qvalues, -1), -1)  # batch
        next_q_value = next_q_value.detach()

        rewards = torch.stack(batch.reward)  # batch
        not_done = 1.0 - np.array(batch.done, dtype='float32')  # batch
        not_done = to_pt(not_done, self.use_cuda, type='float')
        # NB: Should not_done be used?
        rewards = rewards + not_done * next_q_value * (self.discount_gamma**self.nsteps)  # batch
        #rewards = rewards + next_q_value * (self.discount_gamma**self.nsteps)  # batch
        mask = torch.stack(batch.mask)  # batch
        loss = F.smooth_l1_loss(q_value * mask, rewards * mask)
        
        del q_value
        del mask
        del rewards
        del batch
        
        return loss

    def finish(self) -> None:
        """
        All games in the batch are finished. One can choose to save checkpoints,
        evaluate on validation set, or do parameter annealing here.
        """
        # Game has finished (either win, lose, or exhausted all the given steps).
        self.final_rewards = np.array(self.scores[-1], dtype='float32')  # batch
        dones = []
        for d in self.dones:
            d = np.array([float(dd) for dd in d], dtype='float32')
            dones.append(d)
        dones = np.array(dones)
        step_used = 1.0 - dones
        self.step_used_before_done = np.sum(step_used, 0)  # batch

        self.history_avg_scores.push(np.mean(self.final_rewards))
        # save checkpoint
        if self.mode == "train" and self.current_episode % self.save_frequency == 0:
            avg_score = self.history_avg_scores.get_avg()
            if avg_score > self.best_avg_score_so_far:
                self.best_avg_score_so_far = avg_score

                save_to = os.path.join(self.model_checkpoint_dir, self.experiment_tag + "_episode_" + str(self.current_episode) + ".pt")
                if not os.path.isdir(self.model_checkpoint_dir):
                    os.mkdir(self.model_checkpoint_dir)
                torch.save(self.model.state_dict(), save_to)
                print("\n========= saved checkpoint =========")

        self.current_episode += 1
        # annealing
        if self.current_episode < self.epsilon_anneal_episodes:
            self.epsilon -= (self.epsilon_anneal_from - self.epsilon_anneal_to) / float(self.epsilon_anneal_episodes)
            
    def get_mean_loss(self):
        mean_loss = 0.
        if len(self.loss) != 0:   
            mean_loss = sum(self.loss) / len(self.loss)
        self.loss = []
        return mean_loss

## Configs and environments

### Vocab
Upload vocab.txt file`

In [0]:
from google.colab import files
                                                                                                                                                                                                                                                                                                                            
if not os.path.isfile('./vocab.txt'):
    uploaded = files.upload()
    # Upload vocab.txt
    for fn in uploaded.keys():
        print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))
else:
    print("Vocab already uploaded!")

Vocab already uploaded!


In [0]:
!head vocab.txt

!
"
#
$
%
&
'
'a
'd
'll


### Configuration

In [0]:
with open('./config.yaml', 'w') as config:
    config.write("""
general:
  discount_gamma: 0.7
  random_seed: 42
  use_cuda: True  # disable this when running on machine without cuda

  # replay memory
  replay_memory_capacity: 1000000  # adjust this depending on your RAM size
  replay_memory_priority_fraction: 0.5
  update_per_k_game_steps: 8
  replay_batch_size: 32
  nsteps: 3

  # epsilon greedy
  epsilon_anneal_episodes: 50  # -1 if not annealing
  epsilon_anneal_from: 0.2
  epsilon_anneal_to: 0.2

checkpoint:
  experiment_tag: 'starting-kit'
  model_checkpoint_dir: '/gdrive/My Drive/Masters/TextWorld/models'
  load_pretrained: False  # during test, enable this so that the agent load your pretrained model
  #pretrained_experiment_dir: 'starting-kit'
  save_frequency: 100

training:
  batch_size: 16  # Parallel games played at once
  nb_epochs: 100
  max_nb_steps_per_episode: 100  # after this many steps, a game is terminated
  target_model_update_frequency: 16 # update target model after that number of backprops
  optimizer:
    step_rule: 'adam'  # adam
    learning_rate: 0.001
    clip_grad_norm: 5

model:
  embedding_size: 50
  freeze_embedding: False
  encoder_rnn_hidden_size: [192]
  bidirectional_lstm: False
  action_scorer_hidden_dim: 63
  dropout_between_rnn_layers: 0.0
  bert_model: 'bert-base-uncased'
  train_bert: False
""")

### Mount drive to load games

Notebook takes sample games from google drive(requires authentication).

To train the agent with games, upload archive with them in google drive and fix the path to the archive inside drive below.



In [0]:
from google.colab import drive
drive.mount('/gdrive')

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


In [0]:
home_dir = '/gdrive/My Drive/Masters/TextWorld/'
path_to_sample_games = home_dir + 'sample_games'

## Train

In [0]:
# List of additional information available during evaluation.
AVAILABLE_INFORMATION = EnvInfos(
    description=True, inventory=True,
    max_score=True, objective=True, entities=True, verbs=True,
    command_templates=True, admissible_commands=True,
    has_won=True, has_lost=True,
    extras=["recipe"]
)

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

def _validate_requested_infos(infos: EnvInfos):
    msg = "The following information cannot be requested: {}"
    for key in infos.basics:
        if not getattr(AVAILABLE_INFORMATION, key):
            raise ValueError(msg.format(key))

    for key in infos.extras:
        if key not in AVAILABLE_INFORMATION.extras:
            raise ValueError(msg.format(key))
            
def get_index(game_no, stats):
    return "{}_{}".format(game_no, stats)
            
def print_epoch_stats(epoch_no, stats):
    print("\n\nEpoch: {:3d}".format(epoch_no))
    steps, scores, loss = stats["steps"], stats["scores"], stats["loss"]
    games_cnt, parallel_cnt = len(steps), len(steps[0])
    columns = [ get_index(col, st) for col in range(games_cnt) for st in ['st', 'sc']]
    stats_df = pd.DataFrame(index=list(range(parallel_cnt)) + ["avr", "loss"], columns=columns)
        
    for col in range(games_cnt):
        for row in range(parallel_cnt):
            stats_df[get_index(col, 'st')][row] = steps[col][row]
            stats_df[get_index(col, 'sc')][row] = scores[col][row]
        stats_df[get_index(col, 'sc')]['avr'] = stats_df[get_index(col, 'sc')].mean()
        stats_df[get_index(col, 'st')]['avr'] = stats_df[get_index(col, 'st')].mean()
        stats_df[get_index(col, 'sc')]['loss'] = "{:.5f}".format(loss[col])
    print(stats_df)

def train(game_files):
    print("Agent starting...")
    agent = CustomAgent()
    print("Agent started")
    requested_infos = agent.select_additional_infos()
#     _validate_requested_infos(requested_infos)

    env_id = textworld.gym.register_games(game_files, requested_infos,
                                          max_episode_steps=agent.max_nb_steps_per_episode,
                                          name="training")
    env_id = textworld.gym.make_batch(env_id, batch_size=agent.batch_size, parallel=True)
    print("Making {} parallel environments to train on them\n".format(agent.batch_size))
    env = gym.make(env_id)
    max_score = -1
    game_range = range(len(game_files))
    for epoch_no in range(1, agent.nb_epochs + 1):
        stats = {
            "scores": [],
            "steps": [],
            "loss": [],
        }
        
        for game_no in tqdm(game_range):
            obs, infos = env.reset()
            imitate = random.random() > 0.7
            agent.train(imitate)

            scores = [0] * len(obs) 
            dones = [False] * len(obs)
            steps = [0] * len(obs)
            while not all(dones):
                # Increase step counts.
                steps = [step + int(not done) for step, done in zip(steps, dones)]
                commands = agent.act(obs, scores, dones, infos)
                obs, scores, dones, infos = env.step(commands)

            # Let the agent knows the game is done.
            agent.act(obs, scores, dones, infos)

            stats["scores"].append(scores)
            stats["steps"].append(steps)
            stats["loss"].append( agent.get_mean_loss())
        
        print_epoch_stats(epoch_no, stats)
    #torch.save(agent.model, './agent_model.pt')
    return

In [0]:
%%time

game_dir = path_to_sample_games
games = []
if os.path.isdir(game_dir):
    games += glob.glob(os.path.join(game_dir, "*.ulx"))
print("{} games found for training.".format(len(games)))

if len(games) != 0:
    train(games)

6 games found for training.
Agent starting...
Creating Q-Network
Creating Target Network
total number of parameters: 115896207
number of trainable parameters: 6413967
Agent started
Making 16 parallel environments to train on them



  result = entry_point.load(False)
100%|██████████| 6/6 [14:46<00:00, 151.80s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:   1
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc   4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0  100        0    100        0  100        0
1     100        0  100        0  100        0  100        0    100        0  100        0
2     100        0  100        0  100        0  100        0    100        0  100        0
3     100        0  100        0  100        0  100        0    100        0  100        0
4     100        0  100        0  100        0  100        0    100        0  100        0
5     100        0  100        0  100        0  100        0    100        0  100        0
6     100        0  100        0  100        0  100        0    100        0  100        0
7     100        0  100        0  100        0  100        0    100        0  100        0
8     100        0  100        0  100        0  100        0    100        0  100        0
9     100        0  100        0  100        0  100        0    100        0 

 17%|█▋        | 1/6 [00:01<00:08,  1.69s/it]

imitate
eat meal
imitate
eat meal


 33%|███▎      | 2/6 [01:50<02:15, 33.78s/it]

imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal


 50%|█████     | 3/6 [02:02<01:22, 27.34s/it]

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [09:26<00:00, 106.89s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:   2
     0_st     0_sc     1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0       3        3       82        0   14        6  100        0  100        0  100        0
1       3        3      100        0   14        6  100        0  100        0  100        0
2       3        3       82        0   14        6  100        0  100        0  100        0
3       3        3       82        0   14        6  100        0  100        0  100        0
4       3        3       82        0   14        6  100        0  100        0  100        0
5       3        3       90        0   14        6  100        0  100        0  100        0
6       3        3      100        0   14        6  100        0  100        0  100        0
7       3        3       82        0   14        6  100        0  100        0  100        0
8       3        3       82        0   14        6  100        0  100        0  100        0
9       3        3       99        0   14        6  100  

100%|██████████| 6/6 [11:47<00:00, 122.42s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:   3
     0_st     0_sc 1_st     1_sc 2_st     2_sc    3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0       1        0  100        0  100        0
1     100        0  100        0  100        0       1        0  100        0  100        0
2     100        0  100        0  100        0       2        0  100        0  100        0
3     100        0  100        0  100        0       1        0  100        0  100        0
4     100        0  100        0  100        0       1        0  100        0  100        0
5     100        0  100        0  100        0       1        0  100        0  100        0
6     100        0  100        0  100        0       1        0  100        0  100        0
7     100        0  100        0  100        0       1        0  100        0  100        0
8     100        0  100        0  100        0       1        0  100        0  100        0
9     100        0  100        0  100        0       1        0  10

 17%|█▋        | 1/6 [00:18<01:30, 18.18s/it]

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [12:05<00:00, 115.40s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:   4
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0      22        7  100        0  100        0  100        0  100        0  100        0
1      22        7  100        0  100        0  100        0  100        0  100        0
2      22        7  100        0  100        0  100        0  100        0  100        0
3      22        7  100        0  100        0  100        0  100        0  100        0
4      22        7  100        0  100        0  100        0  100        0  100        0
5      22        7  100        0  100        0  100        0  100        0  100        0
6      22        7  100        0  100        0  100        0  100        0  100        0
7      22        7  100        0  100        0  100        0  100        0  100        0
8      22        7  100        0  100        0  100        0  100        0  100        0
9      22        7  100        0  100        0  100        0  100        0  100        0
10     2

100%|██████████| 6/6 [14:06<00:00, 143.91s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:   5
     0_st     0_sc 1_st     1_sc 2_st     2_sc  3_st     3_sc     4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0   100        0      100        0  100        0
1     100        0  100        0  100        0   100        0      100        0  100        0
2     100        0  100        0  100        0   100        0      100        0  100        0
3     100        0  100        0  100        0   100        0      100        0  100        0
4     100        0  100        0  100        0   100        0      100        0  100        0
5     100        0  100        0  100        0   100        0       47        1  100        0
6     100        0  100        0  100        0   100        0      100        0  100        0
7     100        0  100        0  100        0   100        0      100        0  100        0
8     100        0  100        0  100        0   100        0      100        0  100        0
9     100        0  100        0  100        0 

 33%|███▎      | 2/6 [04:04<08:39, 129.87s/it]

imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 50%|█████     | 3/6 [04:13<04:41, 93.71s/it] 

imitate
eat meal


100%|██████████| 6/6 [09:03<00:00, 97.42s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:   6
     0_st     0_sc 1_st     1_sc 2_st     2_sc    3_st     3_sc   4_st     4_sc 5_st     5_sc
0     100        0  100        0   11        6     100        0      2        0  100        0
1     100        0  100        0   11        6      90        0      1        0  100        0
2     100        0  100        0   11        6     100        0      1        0  100        0
3     100        0  100        0   11        6     100        0      1        0  100        0
4     100        0  100        0   11        6     100        0      1        0  100        0
5     100        0  100        0   11        6     100        0      1        0  100        0
6     100        0  100        0   11        6     100        0      1        0  100        0
7     100        0  100        0   11        6     100        0      1        0  100        0
8     100        0  100        0   11        6     100        0      1        0  100        0
9     100        0  100        0   11        6 

 83%|████████▎ | 5/6 [11:26<02:20, 140.57s/it]

imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


100%|██████████| 6/6 [11:43<00:00, 103.61s/it]

imitate
eat meal
imitate
eat meal


Epoch:   7



  0%|          | 0/6 [00:00<?, ?it/s]

     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0  100        0  100        0   22        7
1     100        0  100        0  100        0  100        0  100        0   22        7
2     100        0  100        0  100        0  100        0  100        0   22        7
3     100        0  100        0  100        0  100        0  100        0   22        7
4     100        0  100        0  100        0  100        0  100        0   22        7
5     100        0  100        0  100        0  100        0  100        0   22        7
6     100        0  100        0  100        0  100        0  100        0   22        7
7     100        0  100        0  100        0  100        0  100        0   22        7
8     100        0  100        0  100        0  100        0  100        0   22        7
9     100        0  100        0  100        0  100        0  100        0   22        7
10    100        0  1

 17%|█▋        | 1/6 [02:31<12:35, 151.15s/it]

imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 33%|███▎      | 2/6 [02:48<07:24, 111.13s/it]

imitate
eat meal
imitate
eat meal


 83%|████████▎ | 5/6 [10:04<02:09, 129.64s/it]

imitate
take red hot pepper from counter
imitate
prepare meal


100%|██████████| 6/6 [10:06<00:00, 91.27s/it] 

imitate
eat meal
imitate
eat meal


Epoch:   8



  0%|          | 0/6 [00:00<?, ?it/s]

         0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0         100        0   22        7  100        0  100        0  100        0    3        3
1          52        0   22        7  100        0  100        0  100        0    3        3
2         100        0   22        7  100        0  100        0  100        0    3        3
3         100        0   22        7  100        0  100        0  100        0    3        3
4         100        0   22        7  100        0  100        0  100        0    3        3
5           6        0   22        7  100        0  100        0  100        0    3        3
6          63        0   22        7  100        0  100        0  100        0    3        3
7         100        0   22        7  100        0  100        0  100        0    3        3
8         100        0   22        7  100        0  100        0  100        0    3        3
9         100        0   22        7  100        0  100        0  100 

 17%|█▋        | 1/6 [00:17<01:26, 17.21s/it]

imitate
eat meal
imitate
eat meal


 83%|████████▎ | 5/6 [10:33<01:56, 116.78s/it]

imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


100%|██████████| 6/6 [10:51<00:00, 87.03s/it] 

imitate
eat meal
imitate
eat meal


Epoch:   9



  0%|          | 0/6 [00:00<?, ?it/s]

     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc     4_st     4_sc 5_st     5_sc
0      22        7  100        0  100        0  100        0      100        0   22       10
1      22        7  100        0  100        0  100        0      100        0   22       10
2      22        7  100        0  100        0  100        0       31        0   22       10
3      22        7  100        0  100        0  100        0      100        0   22       10
4      22        7  100        0  100        0  100        0      100        0   22       10
5      22        7  100        0  100        0  100        0      100        0   22       10
6      22        7  100        0  100        0  100        0      100        0   22       10
7      22        7  100        0  100        0  100        0      100        0   22       10
8      22        7  100        0  100        0  100        0      100        0   22       10
9      22        7  100        0  100        0  100        0      100 

 17%|█▋        | 1/6 [02:55<14:37, 175.54s/it]

imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


 33%|███▎      | 2/6 [03:13<08:32, 128.13s/it]

imitate
eat meal
imitate
eat meal


 50%|█████     | 3/6 [06:26<07:23, 147.80s/it]

imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 67%|██████▋   | 4/6 [06:44<03:37, 108.71s/it]

imitate
eat meal
imitate
eat meal
imitate
drop red onion
imitate
drop yellow bell pepper
imitate
drop red potato
imitate
go south
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
go west
imitate
go north
imitate
open sliding patio door
imitate
go north
imitate
cook yellow potato with BBQ
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
drop salt
imitate
take knife from table
imitate
slice red hot pepper with knife
imitate
drop knife
imitate
take salt
imitate
drop red hot pepper
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red hot pepper
imitate
prepare meal
imitate
eat meal


 83%|████████▎ | 5/6 [07:08<01:23, 83.43s/it] 

imitate
eat meal


100%|██████████| 6/6 [09:38<00:00, 103.46s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  10
     0_st     0_sc 1_st     1_sc    2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0   22       10     100        1   22        7   29        5    3        1
1     100        0   22       10      80        1   22        7   29        5    5        1
2     100        0   22       10     100        1   22        7   29        5   16        1
3     100        0   22       10     100        1   22        7   29        5   16        1
4     100        0   22       10     100        1   22        7   29        5    4        1
5     100        0   22       10     100        1   22        7   29        5   16        1
6     100        0   22       10      80        1   22        7   29        5    5        1
7     100        0   22       10      78        1   22        7   29        5   17        1
8     100        0   22       10     100        1   22        7   29        5    5        1
9     100        0   22       10     100        1   22        7   2

 67%|██████▋   | 4/6 [10:25<05:11, 155.59s/it]

imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


 83%|████████▎ | 5/6 [10:43<01:54, 114.21s/it]

imitate
eat meal
imitate
eat meal
imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal


100%|██████████| 6/6 [10:53<00:00, 83.02s/it] 

imitate
eat meal
imitate
eat meal


Epoch:  11



  0%|          | 0/6 [00:00<?, ?it/s]

     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0  100        0   22       10   14        6
1     100        0  100        0  100        0  100        0   22       10   14        6
2     100        0  100        0  100        0  100        0   22       10   14        6
3     100        0  100        0  100        0  100        0   22       10   14        6
4     100        0  100        0  100        0  100        0   22       10   14        6
5     100        0  100        0  100        0  100        0   22       10   14        6
6     100        0  100        0  100        0  100        0   22       10   14        6
7     100        0  100        0  100        0  100        0   22       10   14        6
8     100        0  100        0  100        0  100        0   22       10   14        6
9     100        0  100        0  100        0  100        0   22       10   14        6
10    100        0  1

 83%|████████▎ | 5/6 [12:58<02:36, 156.97s/it]

imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


100%|██████████| 6/6 [13:08<00:00, 112.72s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

imitate
eat meal


Epoch:  12
     0_st     0_sc     1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0      100        0  100        0  100        0  100        0   11        6
1     100        0      100        0  100        0  100        0  100        0   11        6
2     100        0      100        0  100        0  100        0  100        0   11        6
3     100        0       35        0  100        0  100        0  100        0   11        6
4     100        0      100        0  100        0  100        0  100        0   11        6
5     100        0      100        0  100        0  100        0  100        0   11        6
6     100        0      100        0  100        0  100        0  100        0   11        6
7     100        0      100        0  100        0  100        0  100        0   11        6
8     100        0       14        0  100        0  100        0  100        0   11        6
9     100        0      100        0  10

 17%|█▋        | 1/6 [03:03<15:19, 183.84s/it]

imitate
take red hot pepper from counter
imitate
prepare meal


 33%|███▎      | 2/6 [03:05<08:36, 129.18s/it]

imitate
eat meal
imitate
eat meal


 67%|██████▋   | 4/6 [08:49<05:06, 153.09s/it]

imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 83%|████████▎ | 5/6 [09:07<01:52, 112.53s/it]

imitate
eat meal
imitate
eat meal
imitate
drop red onion
imitate
drop yellow bell pepper
imitate
drop red potato
imitate
go south
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
go west
imitate
go north
imitate
open sliding patio door
imitate
go north
imitate
cook yellow potato with BBQ
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
drop salt
imitate
take knife from table
imitate
slice red hot pepper with knife
imitate
drop knife
imitate
take salt
imitate
drop red hot pepper
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red hot pepper
imitate
prepare meal
imitate
eat meal


100%|██████████| 6/6 [09:31<00:00, 86.10s/it] 
  0%|          | 0/6 [00:00<?, ?it/s]

imitate
eat meal


Epoch:  13
     0_st     0_sc 1_st     1_sc 2_st     2_sc    3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0    3        3  100        0     100        1   22        7   29        5
1     100        0    3        3  100        0     100        1   22        7   29        5
2     100        0    3        3  100        0     100        1   22        7   29        5
3     100        0    3        3  100        0     100        1   22        7   29        5
4     100        0    3        3  100        0     100        1   22        7   29        5
5     100        0    3        3  100        0     100        1   22        7   29        5
6     100        0    3        3  100        0     100        1   22        7   29        5
7     100        0    3        3  100        0     100        1   22        7   29        5
8     100        0    3        3  100        0     100        1   22        7   29        5
9     100        0    3        3  100        0    

 67%|██████▋   | 4/6 [10:14<05:20, 160.04s/it]

imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 83%|████████▎ | 5/6 [10:31<01:57, 117.30s/it]

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [13:02<00:00, 127.53s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  14
     0_st     0_sc   1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0    100        0  100        0  100        0   22        7  100        0
1     100        0    100        0  100        0  100        0   22        7  100        0
2     100        0    100        0  100        0  100        0   22        7  100        0
3     100        0    100        0  100        0  100        0   22        7  100        0
4     100        0    100        0  100        0  100        0   22        7  100        0
5     100        0    100        0  100        0  100        0   22        7  100        0
6     100        0    100        0  100        0  100        0   22        7  100        0
7     100        0    100        0  100        0  100        0   22        7  100        0
8     100        0    100        0  100        0  100        0   22        7  100        0
9     100        0    100        0  100        0  100        0   22        7 

 67%|██████▋   | 4/6 [10:19<04:52, 146.48s/it]

imitate
take red hot pepper from counter
imitate
prepare meal


 83%|████████▎ | 5/6 [10:21<01:43, 103.06s/it]

imitate
eat meal
imitate
eat meal
imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


100%|██████████| 6/6 [10:38<00:00, 77.46s/it] 

imitate
eat meal
imitate
eat meal


Epoch:  15



  0%|          | 0/6 [00:00<?, ?it/s]

     0_st     0_sc 1_st     1_sc 2_st     2_sc     3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0      100        0    3        3   22       10
1     100        0  100        0  100        0      100        0    3        3   22       10
2     100        0  100        0  100        0      100        0    3        3   22       10
3     100        0  100        0  100        0      100        0    3        3   22       10
4     100        0  100        0  100        0      100        0    3        3   22       10
5     100        0  100        0  100        0      100        0    3        3   22       10
6     100        0  100        0  100        0      100        0    3        3   22       10
7     100        0  100        0  100        0      100        0    3        3   22       10
8     100        0  100        0  100        0      100        0    3        3   22       10
9     100        0  100        0  100        0      100        0    3 

 17%|█▋        | 1/6 [02:33<12:49, 153.82s/it]

imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


 33%|███▎      | 2/6 [02:51<07:31, 112.91s/it]

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [13:24<00:00, 145.58s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  16
         0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0         100        0   22       10  100        0  100        0  100        0  100        1
1         100        0   22       10  100        0  100        0  100        0  100        1
2         100        0   22       10  100        0  100        0  100        0  100        1
3         100        0   22       10  100        0  100        0  100        0  100        2
4         100        0   22       10  100        0  100        0  100        0  100        1
5         100        0   22       10  100        0  100        0  100        0  100        1
6         100        0   22       10  100        0  100        0  100        0  100        1
7         100        0   22       10  100        0  100        0  100        0  100        1
8         100        0   22       10  100        0  100        0  100        0  100        1
9         100        0   22       10  100        0  100  

 17%|█▋        | 1/6 [02:45<13:45, 165.11s/it]

imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 33%|███▎      | 2/6 [02:54<07:53, 118.28s/it]

imitate
eat meal


 67%|██████▋   | 4/6 [08:38<04:52, 146.10s/it]

imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal
imitate
eat meal


 83%|████████▎ | 5/6 [08:55<01:47, 107.34s/it]




100%|██████████| 6/6 [10:42<00:00, 107.18s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  17
     0_st     0_sc 1_st     1_sc 2_st     2_sc     3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0   11        6  100        0       39        1   14        6  100        0
1     100        0   11        6  100        0      100        1   14        6  100        0
2     100        0   11        6  100        0      100        1   14        6  100        0
3     100        0   11        6  100        0      100        1   14        6  100        0
4     100        0   11        6  100        0      100        1   14        6  100        0
5     100        0   11        6  100        0      100        1   14        6  100        0
6     100        0   11        6  100        0      100        1   14        6  100        0
7     100        0   11        6  100        0      100        1   14        6  100        0
8     100        0   11        6  100        0      100        1   14        6  100        0
9     100        0   11        6  100        0      100  

 83%|████████▎ | 5/6 [13:29<02:42, 162.86s/it]

imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


100%|██████████| 6/6 [13:47<00:00, 119.39s/it]

imitate
eat meal
imitate
eat meal


Epoch:  18



  0%|          | 0/6 [00:00<?, ?it/s]

     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0  100        0  100        0   22        7
1     100        0  100        0  100        0  100        0  100        0   22        7
2     100        0  100        0  100        0  100        0  100        0   22        7
3     100        0  100        0  100        0  100        0  100        0   22        7
4     100        0  100        0  100        0  100        0  100        0   22        7
5     100        0  100        0  100        0  100        0  100        0   22        7
6     100        0  100        0  100        0  100        0  100        0   22        7
7     100        0  100        0  100        0  100        0  100        0   22        7
8     100        0  100        0  100        0  100        0  100        0   22        7
9     100        0  100        0  100        0  100        0  100        0   22        7
10    100        0  1

 17%|█▋        | 1/6 [02:53<14:29, 173.86s/it]

imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 33%|███▎      | 2/6 [03:03<08:18, 124.61s/it]

imitate
eat meal
imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


 50%|█████     | 3/6 [03:20<04:37, 92.44s/it] 

imitate
eat meal
imitate
eat meal


 67%|██████▋   | 4/6 [06:09<03:50, 115.27s/it]

imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal


 83%|████████▎ | 5/6 [06:19<01:23, 83.75s/it] 

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [08:07<00:00, 91.01s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  19
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc     5_st     5_sc
0     100        0   11        6   22       10  100        0   14        6      100        0
1     100        0   11        6   22       10  100        0   14        6      100        0
2     100        0   11        6   22       10  100        0   14        6      100        0
3     100        0   11        6   22       10  100        0   14        6      100        0
4     100        0   11        6   22       10  100        0   14        6      100        0
5     100        0   11        6   22       10  100        0   14        6      100        0
6     100        0   11        6   22       10  100        0   14        6      100        0
7     100        0   11        6   22       10  100        0   14        6      100        0
8     100        0   11        6   22       10  100        0   14        6      100        0
9     100        0   11        6   22       10  100      

 67%|██████▋   | 4/6 [10:45<05:37, 168.91s/it]

imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 83%|████████▎ | 5/6 [10:55<02:01, 121.12s/it]

imitate
eat meal


100%|██████████| 6/6 [13:50<00:00, 137.43s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  20
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0  100        1   11        6  100        1
1     100        0  100        0  100        0  100        1   11        6  100        0
2     100        0  100        0  100        0  100        1   11        6  100        0
3     100        0  100        0  100        0  100        1   11        6  100        1
4     100        0  100        0  100        0  100        1   11        6  100        0
5     100        0  100        0  100        0  100        1   11        6  100        0
6     100        0  100        0  100        0  100        1   11        6  100        0
7     100        0  100        0  100        0  100        1   11        6  100        0
8     100        0  100        0  100        0  100        1   11        6  100        0
9     100        0  100        0  100        0  100        1   11        6  100        0
10    10

 17%|█▋        | 1/6 [02:43<13:39, 163.82s/it]

imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 33%|███▎      | 2/6 [02:53<07:50, 117.53s/it]

imitate
eat meal


100%|██████████| 6/6 [13:39<00:00, 143.12s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  21
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc    4_st     4_sc 5_st     5_sc
0     100        0   11        6  100        1  100        0     100        0  100        0
1     100        0   11        6  100        0  100        0     100        0  100        0
2     100        0   11        6  100        1  100        0     100        0  100        0
3     100        0   11        6  100        0  100        0     100        0  100        0
4     100        0   11        6  100        1  100        0     100        0  100        0
5     100        0   11        6  100        0  100        0     100        0  100        0
6     100        0   11        6  100        0  100        0     100        0  100        0
7     100        0   11        6  100        1  100        0     100        0  100        0
8     100        0   11        6  100        0  100        0     100        0  100        0
9     100        0   11        6  100        0  100        0      3

 17%|█▋        | 1/6 [01:43<08:37, 103.59s/it]

imitate
take red hot pepper from counter
imitate
prepare meal


 33%|███▎      | 2/6 [01:45<04:52, 73.01s/it] 

imitate
eat meal
imitate
eat meal


 67%|██████▋   | 4/6 [07:27<04:06, 123.14s/it]

imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 83%|████████▎ | 5/6 [07:37<01:29, 89.18s/it] 

imitate
eat meal


100%|██████████| 6/6 [10:30<00:00, 114.45s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  22
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0    3        3  100        0  100        0   11        6  100        0
1     100        0    3        3  100        0  100        0   11        6  100        0
2     100        0    3        3  100        0  100        0   11        6  100        0
3     100        0    3        3  100        0  100        0   11        6  100        0
4     100        0    3        3  100        0  100        0   11        6  100        0
5     100        0    3        3  100        0  100        0   11        6  100        0
6     100        0    3        3  100        0  100        0   11        6  100        0
7     100        0    3        3  100        0  100        0   11        6  100        0
8     100        0    3        3  100        0  100        0   11        6  100        0
9     100        0    3        3  100        0  100        0   11        6  100        0
10    10

 50%|█████     | 3/6 [07:05<06:41, 133.69s/it]

imitate
take red hot pepper from counter
imitate
prepare meal


 67%|██████▋   | 4/6 [07:07<03:08, 94.09s/it] 

imitate
eat meal
imitate
eat meal


 83%|████████▎ | 5/6 [09:39<01:51, 111.62s/it]

imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal


100%|██████████| 6/6 [09:49<00:00, 81.06s/it] 

imitate
eat meal
imitate
eat meal



 67%|██████▋   | 4/6 [10:03<05:01, 150.56s/it]



Epoch:  23
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0    3        3  100        0   14        6
1     100        0  100        0  100        0    3        3  100        0   14        6
2     100        0  100        0  100        0    3        3  100        0   14        6
3     100        0  100        0  100        0    3        3  100        0   14        6
4     100        0  100        0  100        0    3        3  100        0   14        6
5     100        0  100        0  100        0    3        3  100        0   14        6
6     100        0  100        0  100        0    3        3  100        0   14        6
7     100        0  100        0  100        0    3        3  100        0   14        6
8     100        0  100        0  100        0    3        3  100        0   14        6
9     100        0  100        0  100        0    3        3  100        0   14        6
10    10

 83%|████████▎ | 5/6 [10:04<01:45, 105.93s/it]

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [12:55<00:00, 125.22s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  24
     0_st     0_sc 1_st     1_sc 2_st     2_sc    3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0     100        0    3        3  100        0
1     100        0  100        0  100        0     100        0    3        3  100        0
2     100        0  100        0  100        0     100        0    3        3  100        0
3     100        0  100        0  100        0     100        0    3        3  100        0
4     100        0  100        0  100        0     100        0    3        3  100        0
5     100        0  100        0  100        0      70        0    3        3  100        0
6     100        0  100        0  100        0     100        0    3        3  100        0
7     100        0  100        0  100        0     100        0    3        3  100        0
8     100        0  100        0  100        0     100        0    3        3  100        0
9     100        0  100        0  100        0     100        0    

 50%|█████     | 3/6 [08:18<08:24, 168.15s/it]

imitate
drop red onion
imitate
drop yellow bell pepper
imitate
drop red potato
imitate
go south
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
go west
imitate
go north
imitate
open sliding patio door
imitate
go north
imitate
cook yellow potato with BBQ
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
drop salt
imitate
take knife from table
imitate
slice red hot pepper with knife
imitate
drop knife
imitate
take salt
imitate
drop red hot pepper
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red hot pepper
imitate
prepare meal
imitate
eat meal


 67%|██████▋   | 4/6 [08:43<04:10, 125.09s/it]

imitate
eat meal
imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 83%|████████▎ | 5/6 [09:00<01:32, 92.69s/it] 

imitate
eat meal
imitate
eat meal
imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal


100%|██████████| 6/6 [09:10<00:00, 67.90s/it]

imitate
eat meal
imitate
eat meal



 17%|█▋        | 1/6 [02:44<13:40, 164.06s/it]



Epoch:  25
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0   29        5   22        7   14        6
1     100        0  100        1  100        0   29        5   22        7   14        6
2     100        0  100        0  100        0   29        5   22        7   14        6
3     100        0  100        0  100        0   29        5   22        7   14        6
4     100        0  100        0  100        0   29        5   22        7   14        6
5     100        0  100        0  100        0   29        5   22        7   14        6
6     100        0  100        0  100        0   29        5   22        7   14        6
7     100        0  100        0  100        0   29        5   22        7   14        6
8     100        0  100        0  100        0   29        5   22        7   14        6
9     100        0  100        0  100        0   29        5   22        7   14        6
10    10

 33%|███▎      | 2/6 [02:54<07:51, 117.89s/it]

imitate
eat meal
imitate
eat meal
imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 50%|█████     | 3/6 [03:04<04:16, 85.46s/it] 

imitate
eat meal


100%|██████████| 6/6 [10:47<00:00, 124.37s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  26
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        1   14        6   11        6  100        0  100        0  100        0
1     100        1   14        6   11        6  100        0  100        0  100        0
2     100        2   14        6   11        6  100        0  100        0  100        0
3     100        1   14        6   11        6  100        0  100        0  100        0
4     100        1   14        6   11        6  100        0  100        0  100        0
5     100        1   14        6   11        6  100        0  100        0  100        0
6     100        1   14        6   11        6  100        0  100        0  100        0
7     100        1   14        6   11        6  100        0  100        0  100        0
8     100        1   14        6   11        6  100        0  100        0  100        0
9     100        1   14        6   11        6  100        0  100        0  100        0
10    10

 17%|█▋        | 1/6 [00:09<00:47,  9.41s/it]

imitate
eat meal


100%|██████████| 6/6 [12:29<00:00, 125.59s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  27
     0_st     0_sc 1_st     1_sc     2_st     2_sc 3_st     3_sc    4_st     4_sc 5_st     5_sc
0      11        6  100        0      100        0  100        0     100        1  100        0
1      11        6  100        0      100        0  100        0     100        1  100        0
2      11        6  100        0      100        0  100        0     100        1  100        0
3      11        6  100        0      100        0  100        0      78        1  100        0
4      11        6  100        0      100        0  100        0     100        1  100        0
5      11        6  100        0      100        0  100        0     100        2  100        0
6      11        6  100        0      100        0  100        0     100        1  100        0
7      11        6  100        0      100        0  100        0     100        1  100        0
8      11        6  100        0      100        0  100        0     100        1  100        0
9      11        6  100    

 17%|█▋        | 1/6 [00:24<02:01, 24.38s/it]

imitate
eat meal
imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 33%|███▎      | 2/6 [00:42<01:29, 22.49s/it]

imitate
eat meal
imitate
eat meal
imitate
take red hot pepper from counter
imitate
prepare meal


 50%|█████     | 3/6 [00:44<00:48, 16.22s/it]

imitate
eat meal
imitate
eat meal
imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal


 67%|██████▋   | 4/6 [00:54<00:29, 14.60s/it]

imitate
eat meal
imitate
eat meal
imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 83%|████████▎ | 5/6 [01:04<00:13, 13.03s/it]

imitate
eat meal


100%|██████████| 6/6 [04:12<00:00, 65.70s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  28
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0      29        5   22        7    3        3   14        6   11        6  100        0
1      29        5   22        7    3        3   14        6   11        6  100        0
2      29        5   22        7    3        3   14        6   11        6  100        0
3      29        5   22        7    3        3   14        6   11        6  100        0
4      29        5   22        7    3        3   14        6   11        6  100        0
5      29        5   22        7    3        3   14        6   11        6  100        0
6      29        5   22        7    3        3   14        6   11        6  100        0
7      29        5   22        7    3        3   14        6   11        6  100        0
8      29        5   22        7    3        3   14        6   11        6  100        0
9      29        5   22        7    3        3   14        6   11        6  100        0
10     2

 17%|█▋        | 1/6 [00:17<01:26, 17.22s/it]

imitate
eat meal
imitate
eat meal
imitate
drop red onion
imitate
drop yellow bell pepper
imitate
drop red potato
imitate
go south
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
go west
imitate
go north
imitate
open sliding patio door
imitate
go north
imitate
cook yellow potato with BBQ
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
drop salt
imitate
take knife from table
imitate
slice red hot pepper with knife
imitate
drop knife
imitate
take salt
imitate
drop red hot pepper
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red hot pepper
imitate
prepare meal
imitate
eat meal


 33%|███▎      | 2/6 [00:41<01:17, 19.38s/it]

imitate
eat meal
imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 50%|█████     | 3/6 [00:50<00:48, 16.31s/it]

imitate
eat meal


 83%|████████▎ | 5/6 [06:32<01:35, 95.54s/it]

imitate
take red hot pepper from counter
imitate
prepare meal


100%|██████████| 6/6 [06:33<00:00, 67.35s/it]

imitate
eat meal
imitate
eat meal



 17%|█▋        | 1/6 [02:28<12:21, 148.35s/it]



Epoch:  29
     0_st     0_sc 1_st     1_sc 2_st     2_sc     3_st     3_sc 4_st     4_sc 5_st     5_sc
0      22        7   29        5   11        6       24        1  100        0    3        3
1      22        7   29        5   11        6      100        1  100        0    3        3
2      22        7   29        5   11        6      100        0  100        1    3        3
3      22        7   29        5   11        6      100        0  100        0    3        3
4      22        7   29        5   11        6       63        1  100        0    3        3
5      22        7   29        5   11        6      100        0  100        0    3        3
6      22        7   29        5   11        6       51        1  100        0    3        3
7      22        7   29        5   11        6       17        1  100        0    3        3
8      22        7   29        5   11        6      100        1  100        0    3        3
9      22        7   29        5   11        6       95  

 33%|███▎      | 2/6 [02:38<07:07, 106.80s/it]

imitate
eat meal
imitate
take red hot pepper from counter
imitate
prepare meal


 50%|█████     | 3/6 [02:39<03:45, 75.29s/it] 

imitate
eat meal
imitate
eat meal


 83%|████████▎ | 5/6 [08:22<02:05, 125.59s/it]

imitate
drop red onion
imitate
drop yellow bell pepper
imitate
drop red potato
imitate
go south
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
go west
imitate
go north
imitate
open sliding patio door
imitate
go north
imitate
cook yellow potato with BBQ
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
drop salt
imitate
take knife from table
imitate
slice red hot pepper with knife
imitate
drop knife
imitate
take salt
imitate
drop red hot pepper
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red hot pepper
imitate
prepare meal
imitate
eat meal


100%|██████████| 6/6 [08:46<00:00, 95.16s/it] 
  0%|          | 0/6 [00:00<?, ?it/s]

imitate
eat meal


Epoch:  30
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0   11        6    3        3  100        0  100        0   29        5
1     100        0   11        6    3        3  100        0  100        0   29        5
2     100        0   11        6    3        3  100        0  100        0   29        5
3     100        0   11        6    3        3  100        0  100        0   29        5
4     100        0   11        6    3        3  100        0  100        0   29        5
5     100        0   11        6    3        3  100        0  100        0   29        5
6     100        0   11        6    3        3  100        0  100        0   29        5
7     100        0   11        6    3        3  100        0  100        0   29        5
8     100        0   11        6    3        3  100        0  100        0   29        5
9     100        0   11        6    3        3  100        0  100        0   29 

100%|██████████| 6/6 [14:52<00:00, 146.54s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  31
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0  100        0  100        0  100        0  100        1
1     100        0  100        0  100        0  100        0  100        0  100        1
2     100        0  100        0  100        0  100        0  100        0  100        1
3     100        0  100        0  100        0  100        0  100        0  100        1
4     100        0  100        0  100        0  100        0  100        0  100        1
5     100        0  100        0  100        0  100        0  100        0  100        1
6     100        0  100        0  100        0  100        0  100        0  100        1
7     100        0  100        0  100        0  100        0  100        0  100        1
8     100        0  100        0  100        0  100        1  100        0  100        1
9     100        0  100        0  100        0  100        0  100        0  100        1
10    10

 17%|█▋        | 1/6 [02:33<12:47, 153.44s/it]

imitate
drop red onion
imitate
drop yellow bell pepper
imitate
drop red potato
imitate
go south
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
go west
imitate
go north
imitate
open sliding patio door
imitate
go north
imitate
cook yellow potato with BBQ
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
drop salt
imitate
take knife from table
imitate
slice red hot pepper with knife
imitate
drop knife
imitate
take salt
imitate
drop red hot pepper
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red hot pepper
imitate
prepare meal
imitate
eat meal


 33%|███▎      | 2/6 [02:57<07:38, 114.55s/it]

imitate
eat meal


 50%|█████     | 3/6 [05:27<06:15, 125.24s/it]

imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


 67%|██████▋   | 4/6 [05:45<03:05, 92.99s/it] 

imitate
eat meal
imitate
eat meal
imitate
open fridge
imitate
take yellow bell pepper from fridge
imitate
take yellow potato from counter
imitate
take knife from table
imitate
slice yellow bell pepper with knife
imitate
drop knife
imitate
take knife
imitate
dice yellow potato with knife
imitate
drop knife
imitate
prepare meal
imitate
eat meal


 83%|████████▎ | 5/6 [05:54<01:07, 67.84s/it]

imitate
eat meal


100%|██████████| 6/6 [08:30<00:00, 94.19s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  32
     0_st     0_sc 1_st     1_sc     2_st     2_sc 3_st     3_sc 4_st     4_sc     5_st     5_sc
0     100        1   29        5      100        0   22       10   11        6       44        0
1     100        1   29        5      100        1   22       10   11        6      100        1
2     100        1   29        5      100        0   22       10   11        6      100        0
3     100        1   29        5      100        0   22       10   11        6       92        0
4     100        1   29        5      100        0   22       10   11        6       75        1
5     100        1   29        5       76        1   22       10   11        6      100        1
6     100        1   29        5       35        1   22       10   11        6       55        1
7     100        1   29        5      100        1   22       10   11        6       94        1
8     100        1   29        5      100        0   22       10   11        6      100        0
9     100        

 17%|█▋        | 1/6 [03:05<15:25, 185.13s/it]

imitate
drop yellow bell pepper
imitate
drop purple potato
imitate
cook yellow potato with stove
imitate
take knife from counter
imitate
chop block of cheese with knife
imitate
drop knife
imitate
take knife
imitate
dice parsley with knife
imitate
drop knife
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
prepare meal


 33%|███▎      | 2/6 [03:15<08:50, 132.56s/it]

imitate
eat meal
imitate
eat meal
imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 50%|█████     | 3/6 [03:31<04:53, 97.82s/it] 

imitate
eat meal
imitate
eat meal


 83%|████████▎ | 5/6 [08:42<02:07, 127.09s/it]

imitate
drop red onion
imitate
drop yellow bell pepper
imitate
drop red potato
imitate
go south
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
go west
imitate
go north
imitate
open sliding patio door
imitate
go north
imitate
cook yellow potato with BBQ
imitate
open sliding patio door
imitate
go south
imitate
go south
imitate
go east
imitate
drop salt
imitate
take knife from table
imitate
slice red hot pepper with knife
imitate
drop knife
imitate
take salt
imitate
drop red hot pepper
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red hot pepper
imitate
prepare meal
imitate
eat meal


100%|██████████| 6/6 [09:06<00:00, 96.13s/it] 
  0%|          | 0/6 [00:00<?, ?it/s]

imitate
eat meal


Epoch:  33
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0   14        6   22        7  100        0  100        1   29        5
1     100        0   14        6   22        7  100        0  100        1   29        5
2     100        0   14        6   22        7  100        0  100        1   29        5
3     100        0   14        6   22        7  100        0  100        1   29        5
4     100        0   14        6   22        7  100        0  100        1   29        5
5     100        0   14        6   22        7  100        0  100        1   29        5
6     100        0   14        6   22        7  100        0  100        1   29        5
7     100        0   14        6   22        7  100        0  100        1   29        5
8     100        0   14        6   22        7  100        0  100        1   29        5
9     100        0   14        6   22        7  100        0  100        1   29 

 17%|█▋        | 1/6 [02:31<12:39, 151.82s/it]

imitate
drop yellow bell pepper
imitate
drop yellow potato
imitate
drop red hot pepper
imitate
cook orange bell pepper with oven
imitate
cook purple potato with stove
imitate
drop purple potato
imitate
take knife from table
imitate
dice orange bell pepper with knife
imitate
drop knife
imitate
take purple potato
imitate
drop red apple
imitate
take knife
imitate
dice purple potato with knife
imitate
drop knife
imitate
take red apple
imitate
drop orange bell pepper
imitate
take knife
imitate
slice red apple with knife
imitate
drop knife
imitate
take orange bell pepper
imitate
prepare meal


 33%|███▎      | 2/6 [02:48<07:25, 111.33s/it]

imitate
eat meal
imitate
eat meal


 50%|█████     | 3/6 [05:19<06:09, 123.10s/it]




100%|██████████| 6/6 [12:36<00:00, 140.71s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  34
     0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0   22        7  100        0  100        0  100        0  100        1
1     100        0   22        7  100        0  100        0  100        0  100        1
2     100        0   22        7  100        0  100        0  100        0  100        1
3     100        0   22        7  100        0  100        0  100        0  100        1
4     100        0   22        7  100        0  100        0  100        0  100        1
5     100        0   22        7  100        0  100        0  100        0  100        1
6     100        0   22        7  100        0  100        0  100        0  100        1
7     100        0   22        7  100        0  100        0  100        0  100        1
8     100        0   22        7  100        0  100        0  100        0  100        1
9     100        0   22        7  100        0  100        0  100        0  100        1
10    10

 17%|█▋        | 1/6 [02:34<12:51, 154.40s/it]

imitate
take red hot pepper from counter
imitate
prepare meal


 33%|███▎      | 2/6 [02:35<07:14, 108.56s/it]

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [12:53<00:00, 142.55s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  35
         0_st     0_sc 1_st     1_sc 2_st     2_sc     3_st     3_sc 4_st     4_sc 5_st     5_sc
0         100        0    3        3  100        1      100        0  100        0  100        0
1         100        0    3        3  100        1      100        0  100        0  100        0
2         100        0    3        3  100        1      100        0  100        0  100        0
3         100        0    3        3  100        1      100        0  100        0  100        0
4         100        0    3        3  100        1      100        0  100        0  100        0
5         100        0    3        3  100        1      100        0  100        0  100        0
6         100        0    3        3  100        1      100        0  100        0  100        0
7         100        0    3        3  100        1       37        0  100        0  100        0
8         100        0    3        3  100        1      100        0  100        0  100        0
9         100    

100%|██████████| 6/6 [15:00<00:00, 150.51s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  36
        0_st     0_sc 1_st     1_sc 2_st     2_sc 3_st     3_sc     4_st     4_sc 5_st     5_sc
0        100        0  100        0  100        0  100        0      100        0  100        1
1        100        0  100        1  100        0  100        0       54        0  100        1
2        100        0  100        1  100        0  100        0      100        0  100        1
3        100        0  100        1  100        0  100        0      100        0  100        1
4         22        0  100        0  100        0  100        0      100        0  100        1
5        100        0  100        1  100        0  100        0      100        0  100        1
6        100        0  100        1  100        0  100        0      100        0  100        1
7        100        0  100        1  100        0  100        0      100        0  100        1
8        100        0  100        1  100        0  100        0      100        0  100        1
9        100        0  100 

 50%|█████     | 3/6 [07:20<07:38, 152.82s/it]

imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


 67%|██████▋   | 4/6 [07:37<03:43, 111.99s/it]

imitate
eat meal
imitate
eat meal
imitate
take red hot pepper from counter
imitate
prepare meal


 83%|████████▎ | 5/6 [07:38<01:18, 78.80s/it] 

imitate
eat meal
imitate
eat meal


100%|██████████| 6/6 [10:29<00:00, 106.45s/it]
  0%|          | 0/6 [00:00<?, ?it/s]



Epoch:  37
     0_st     0_sc 1_st     1_sc   2_st     2_sc 3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        0  100        0    100        0   22       10    3        3  100        0
1     100        0  100        0    100        1   22       10    3        3  100        0
2     100        0  100        0    100        0   22       10    3        3  100        0
3     100        1  100        0    100        0   22       10    3        3  100        0
4     100        0  100        0    100        0   22       10    3        3  100        0
5     100        0  100        0    100        1   22       10    3        3  100        0
6     100        0  100        0     87        1   22       10    3        3  100        0
7     100        0  100        0     70        1   22       10    3        3  100        0
8     100        0  100        0    100        1   22       10    3        3  100        0
9     100        0  100        0    100        0   22       10    3        3 

 83%|████████▎ | 5/6 [13:15<02:34, 154.04s/it]

imitate
take red apple from counter
imitate
take red onion from fridge
imitate
take yellow potato from counter
imitate
cook red apple with oven
imitate
cook yellow potato with stove
imitate
drop red onion
imitate
take knife from counter
imitate
slice red apple with knife
imitate
drop knife
imitate
take red onion
imitate
drop yellow potato
imitate
take knife
imitate
slice red onion with knife
imitate
drop knife
imitate
take yellow potato
imitate
drop red apple
imitate
take knife
imitate
slice yellow potato with knife
imitate
drop knife
imitate
take red apple
imitate
prepare meal


100%|██████████| 6/6 [13:32<00:00, 112.88s/it]

imitate
eat meal
imitate
eat meal



  0%|          | 0/6 [00:00<?, ?it/s]

tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed


Process Process-3:


tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed
tw-training-v0 closed


Process Process-1:
Process Process-16:


tw-training-v0 closed
tw-training-v0 closed


Process Process-13:


tw-training-v0 closed


Process Process-10:
Process Process-4:
Process Process-9:
Process Process-14:
Process Process-12:
Process Process-5:
Process Process-7:
Process Process-15:
Process Process-8:
Process Process-2:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
Traceback (most recent call last):
Process Process-6:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Process Process-11:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93,



Epoch:  38
     0_st     0_sc 1_st     1_sc 2_st     2_sc   3_st     3_sc 4_st     4_sc 5_st     5_sc
0     100        1  100        1  100        0     11        1  100        0   22       10
1     100        1  100        1  100        0      7        1  100        0   22       10
2     100        1  100        1  100        0     14        1  100        0   22       10
3     100        1  100        0  100        0      9        1  100        0   22       10
4     100        1  100        1  100        0     10        1  100        0   22       10
5     100        1  100        0  100        0     21        1  100        0   22       10
6     100        1  100        0  100        0      7        1  100        0   22       10
7     100        1  100        0  100        0      5        1  100        0   22       10
8     100        1  100        0  100        0      6        1  100        0   22       10
9     100        1  100        1  100        0      9        1  100        0 

KeyboardInterrupt: ignored

## Save models

In [0]:
!ls -lh '/gdrive/My Drive/saved_models'