### DeepPavlov sequence-to-sequence tutorial

In this tutorial we are going to implement sequence-to-sequence [[original paper]](https://arxiv.org/abs/1409.3215) model in DeepPavlov.

Sequence-to-sequence is the concept of mapping input sequence to target sequence. Sequence-to-sequence models consist of two main components: encoder and decoder. Encoder is used to encode the input sequence to dense representation and decoder uses this dense representation to generate target sequence.

![sequence-to-sequence](img/seq2seq.png)

Here, input sequence is ABC, special token <EOS\> (end of sequence) is used as indicator to start decoding target sequence WXYZ.

To implement this model in DeepPavlov we have to code some DeepPavlov abstractions:
* **DatasetReader** to read the data
* **DatasetIterator** to generate batches
* **Vocabulary** to convert words to indexes
* **Model** to train it and then use it
* and some other components for pre- and postprocessing

In [1]:
%load_ext autoreload
%autoreload 2

import deeppavlov
import json
import numpy as np
import tensorflow as tf

from itertools import chain
from pathlib import Path

### Download & extract dataset

In [2]:
from deeppavlov.core.data.utils import download_decompress
download_decompress('http://files.deeppavlov.ai/datasets/personachat_v2.tar.gz', './personachat')

2018-10-16 18:40:50.275 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 205: Starting new HTTP connection (1): files.deeppavlov.ai:80
2018-10-16 18:40:51.615 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 393: http://files.deeppavlov.ai:80 "GET /datasets/personachat_v2.tar.gz HTTP/1.1" 200 223217972
2018-10-16 18:40:51.615 INFO in 'deeppavlov.core.data.utils'['utils'] at line 62: Downloading from http://files.deeppavlov.ai/datasets/personachat_v2.tar.gz to personachat\personachat_v2.tar.gz
100%|████████████████████████████████████████████████████████████████████████████████| 223M/223M [04:03<00:00, 916kB/s]
2018-10-16 18:44:55.312 INFO in 'deeppavlov.core.data.utils'['utils'] at line 200: Extracting personachat\personachat_v2.tar.gz archive into personachat


```python
def download_decompress(url: str, download_path: [Path, str], extract_paths=None):
    """
    다운로드 한 후 압축파일(.tar.gz/.gz/.zip) 풀어준다. 저장의 경우 폴더 구분해서 저장할 수 있다.
    Arg:
        url: download 할 url
        download_path: 다운 한 파일의 저장 경로
        extract_paths: 다운한 파일을 압춘 푼 파일을 저장할 경로(or 경로의 리스트)
    """
    file_name = Path(urlparse(url).path).name #경로지정
    download_path = Path(download_path) #경로지정

    if extract_paths is None:
        extract_paths = [download_path]
    elif isinstance(extract_paths, list): # path가 list
        extract_paths = [Path(path) for path in extract_paths]
    else:
        extract_paths = [Path(extract_paths)]

    cache_dir = os.getenv('DP_CACHE_DIR') 
    extracted = False
    if cache_dir:
        cache_dir = Path(cache_dir)
        url_hash = md5(url.encode('utf8')).hexdigest()[:15]
        arch_file_path = cache_dir / url_hash
        extracted_path = cache_dir / (url_hash + '_extracted')
        extracted = extracted_path.exists()
        if not extracted and not arch_file_path.exists():
            simple_download(url, arch_file_path) # download 함수
    else:
        arch_file_path = download_path / file_name
        simple_download(url, arch_file_path) # download 함수
        extracted_path = extract_paths.pop()

    if not extracted:
        log.info('Extracting {} archive into {}'.format(arch_file_path, extracted_path)) 
        extracted_path.mkdir(parents=True, exist_ok=True) 

        
        # 압축 방법따라 구분
        if file_name.endswith('.tar.gz'):
            untar(arch_file_path, extracted_path)
        elif file_name.endswith('.gz'):
            ungzip(arch_file_path, extracted_path / Path(file_name).with_suffix('').name)
        elif file_name.endswith('.zip'):
            with zipfile.ZipFile(arch_file_path, 'r') as zip_ref: 
                zip_ref.extractall(extracted_path)
        else:
            raise RuntimeError(f'Trying to extract an unknown type of archive {file_name}')

        if not cache_dir:
            arch_file_path.unlink()

    for extract_path in extract_paths:
        for src in extracted_path.iterdir():
            dest = extract_path / src.name
            if src.is_dir():
                copytree(src, dest)
            else:
                extract_path.mkdir(parents=True, exist_ok=True)
                shutil.copy(str(src), str(dest))
```

### DatasetReader

DatasetReader is used to read and parse data from files. Here, we define new PersonaChatDatasetReader which reads [PersonaChat dataset](https://arxiv.org/abs/1801.07243). PersonaChat dataset consists of dialogs and user personalities.

User personality is described by four sentences, e.g.:

    i like to remodel homes.
    i like to go hunting.
    i like to shoot a bow.
    my favorite holiday is halloween.

In [3]:
from deeppavlov.core.commands.train import build_model_from_config
from deeppavlov.core.data.dataset_reader import DatasetReader
from deeppavlov.core.data.utils import download_decompress
from deeppavlov.core.common.registry import register

@register('personachat_dataset_reader') # 모델 이름 등록해놓는 decorator
class PersonaChatDatasetReader(DatasetReader): # 인자로 받는 객체는 dataset읽은 class
    """
    다운 로드한 personachat 데이터 읽기, parsing 하는 함수
    
    해당 데이터는 다음의 key값을 갖는 dictionary 
    [{
        'persona': [list of persona sentences],
        'x': input utterance,
        'y': output utterance,
        'dialog_history': list of previous utterances
        'candidates': [list of candidate utterances]
        'y_idx': index of y utt in candidates list
      },
       ...
    ]
    """
    def read(self, dir_path: str, mode='self_original'):
        dir_path = Path(dir_path)
        dataset = {}
        for dt in ['train', 'valid', 'test']:
            dataset[dt] = self._parse_data(dir_path / '{}_{}.txt'.format(dt, mode))

        return dataset

    @staticmethod
    def _parse_data(filename):
        examples = []
        print(filename)
        curr_persona = []
        curr_dialog_history = []
        persona_done = False
        with filename.open('r') as fin:
            for line in fin:
                line = ' '.join(line.strip().split(' ')[1:])
                your_persona_pref = 'your persona: '
                if line[:len(your_persona_pref)] == your_persona_pref and persona_done:
                    curr_persona = [line[len(your_persona_pref):]]
                    curr_dialog_history = []
                    persona_done = False
                elif line[:len(your_persona_pref)] == your_persona_pref:
                    curr_persona.append(line[len(your_persona_pref):])
                else:
                    persona_done = True
                    x, y, _, candidates = line.split('\t')
                    candidates = candidates.split('|')
                    example = {
                        'persona': curr_persona,
                        'x': x,
                        'y': y,
                        'dialog_history': curr_dialog_history[:],
                        'candidates': candidates,
                        'y_idx': candidates.index(y)
                    }
                    curr_dialog_history.extend([x, y])
                    examples.append(example)

        return examples

```python
def register(name: str = None) -> type:
    """
    사용한 객체 이름으로 저장
    """
    def decorate(model_cls: type, reg_name: str = None) -> type:
        model_name = reg_name or short_name(model_cls)
        global _REGISTRY
        cls_name = model_cls.__module__ + ':' + model_cls.__name__
        if model_name in _REGISTRY and _REGISTRY[model_name] != cls_name:
            logger.warning('Registry name "{}" has been already registered and will be overwritten.'.format(model_name))
        _REGISTRY[model_name] = cls_name
        return model_cls

    return lambda model_cls_name: decorate(model_cls_name, name)
```

---

```python
class DatasetReader:
    """데이터 set 읽기 위한 추상 class"""

    def read(self, data_path: str, *args, **kwargs) -> Dict[str, List[Tuple[Any, Any]]]:
        """Reads a file from a path and returns data as a list of tuples of inputs and correct outputs
         for every data type in ``train``, ``valid`` and ``test``.
        """
        raise NotImplementedError
```

In [4]:
data = PersonaChatDatasetReader().read('./personachat') #데이터 불러온 후 parsing

personachat\train_self_original.txt
personachat\valid_self_original.txt
personachat\test_self_original.txt


#### Let's check dataset size

In [16]:
for k in data:
    print(k, len(data[k]))

train 65719
valid 7801
test 7512


In [33]:
data['train'][0]

{'persona': ['i like to remodel homes.',
  'i like to go hunting.',
  'i like to shoot a bow.',
  'my favorite holiday is halloween.'],
 'x': 'hi , how are you doing ? i am getting ready to do some cheetah chasing to stay in shape .',
 'y': 'you must be very fast . hunting is one of my favorite hobbies .',
 'dialog_history': [],
 'candidates': ['my mom was single with 3 boys , so we never left the projects .',
  'i try to wear all black every day . it makes me feel comfortable .',
  'well nursing stresses you out so i wish luck with sister',
  'yeah just want to pick up nba nfl getting old',
  'i really like celine dion . what about you ?',
  'no . i live near farms .',
  'i wish i had a daughter , i am a boy mom . they are beautiful boys though still lucky',
  'yeah when i get bored i play gone with the wind my favorite movie .',
  'hi how are you ? i am eating dinner with my hubby and 2 kids .',
  'were you married to your high school sweetheart ? i was .',
  'that is great to hear !

In [21]:
data['train'][0].keys()

dict_keys(['persona', 'x', 'y', 'dialog_history', 'candidates', 'y_idx'])

### Dataset iterator

Dataset iterator is used to generate batches from parsed dataset (DatasetReader). Let's extract only *x* and *y* from parsed dataset and use them to predict sentence *y* by sentence *x*.

In [34]:
from deeppavlov.core.data.data_learning_iterator import DataLearningIterator

@register('personachat_iterator')
class PersonaChatIterator(DataLearningIterator): # 인자로 받는 객체는 data iterate할 수 있도록 만든 Class(batch 가능)
    def split(self, *args, **kwargs):
        for dt in ['train', 'valid', 'test']:
            setattr(self, dt, self._to_tuple(getattr(self, dt))) 
            # DataLearningIterator 객체의 'train', 'valid', 'test' 변수에 'train', 'valid', 'test' 의 x,y값 지정

    @staticmethod
    def _to_tuple(data):
        """
        데이터의 x,y 값 tuple로
        """
        return list(map(lambda x: (x['x'], x['y']), data))

```python
@register('data_learning_iterator')
class DataLearningIterator:
    """데이터셋 iterator, train,valid,test로 나눠 generate batch 함수 사용가능 

    Args:
        data: list of (x, y) pairs for every data type in ``'train'``, ``'valid'`` and ``'test'``
        seed: random seed for data shuffling
        shuffle: whether to shuffle data during batching

    Attributes:
        shuffle: whether to shuffle data during batching
        random: instance of ``Random`` initialized with a seed
    """
    def split(self, *args, **kwargs):
        pass

    def __init__(self, data: Dict[str, List[Tuple[Any, Any]]], seed: int = None, shuffle: bool = True,
                 *args, **kwargs) -> None:
        self.shuffle = shuffle

        self.random = Random(seed)

        self.train = data.get('train', [])
        self.valid = data.get('valid', [])
        self.test = data.get('test', [])
        self.split(*args, **kwargs) # split함수 선언
        self.data = {
            'train': self.train,
            'valid': self.valid,
            'test': self.test,
            'all': self.train + self.test + self.valid
        }

    def gen_batches(self, batch_size: int, data_type: str = 'train',
                    shuffle: bool = None) -> Iterator[Tuple[tuple, tuple]]:
        """Generate batches of inputs and expected output to train neural networks

        Args:
            batch_size: number of samples in batch
            data_type: can be either 'train', 'test', or 'valid'
            shuffle: whether to shuffle dataset before batching

        Yields:
             a tuple of a batch of inputs and a batch of expected outputs
        """
        if shuffle is None:
            shuffle = self.shuffle

        data = self.data[data_type] # 전체 데이터 중 train or valid or test 데이터 가져온다/
        data_len = len(data) # 데이터 길이

        if data_len == 0:
            return

        order = list(range(data_len)) # 길이에 대해서 range list 생성
        if shuffle: 
            self.random.shuffle(order) # 생성한 순서 shuffle

        if batch_size < 0:
            batch_size = data_len # batch 안하는 경우

        for i in range((data_len - 1) // batch_size + 1): # 배치 크기로 데이터 나눔
            yield tuple(zip(*[data[o] for o in order[i * batch_size:(i + 1) * batch_size]])) # yield로 iterable한 객체 생성, 

    def get_instances(self, data_type: str = 'train') -> Tuple[tuple, tuple]:
        """Get all data for a selected data type

        Args:
            data_type (str): can be either ``'train'``, ``'test'``, ``'valid'`` or ``'all'``

        Returns:
             a tuple of all inputs for a data type and all expected outputs for a data type
        """
        data = self.data[data_type]
        return tuple(zip(*data))

```

Let's look on data in batches:

In [49]:
data['train'][0]['x']

'hi , how are you doing ? i am getting ready to do some cheetah chasing to stay in shape .'

In [45]:
iterator = PersonaChatIterator(data)
batch = [el for el in iterator.gen_batches(5, 'train')][0] # 5 크기로 배치로 묶은 후 첫 데이터 (shuffle 사용)
for x, y in zip(*batch):
    print('x:', x)
    print('y:', y)
    print('----------')

x: can you stenotype as fast as you can talk ?
y: faster . the keyboard layout is easier in my opinion .
----------
x: he must be really good at gaming .
y: it does not take much to be better than me , but yea he loves it .
----------
x: hello there ! how are you ?
y: i am looking for love , i will never stop . was the youngest of eight kids did not get enough love .
----------
x: oops , going . are you talking about the hurricane ?
y: yeh . i am a police officer n on duty at midnight . where do you live ?
----------
x: i do not have any yet
y: well i am told 45 to 50 . how old are you
----------


### Tokenizer

Tokenizer is used to extract tokens from utterance.

In [50]:
from deeppavlov.models.tokenizers.lazy_tokenizer import LazyTokenizer
tokenizer = LazyTokenizer()
tokenizer(['Hello my friend'])

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\JungHyun\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\JungHyun\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package perluniprops to
[nltk_data]     C:\Users\JungHyun\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping misc\perluniprops.zip.
[nltk_data] Downloading package nonbreaking_prefixes to
[nltk_data]     C:\Users\JungHyun\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\nonbreaking_prefixes.zip.


[['Hello', 'my', 'friend']]

```python
@register('lazy_tokenizer')
class LazyTokenizer(Component):
    """nltk tokenizer 사용해서 tokenizing"""
    def __init__(self, **kwargs):
        pass

    @overrides
    def __call__(self, batch, *args, **kwargs):
        if len(batch) > 0 and isinstance(batch[0], str):
            batch = [word_tokenize(utt) for utt in batch] # nltk의 word_tokenize
        return batch
```

### Vocabulary

Vocabulary prepares mapping from tokens to token indexes. It uses train data to build this mapping.

We will implement DialogVocab (inherited from SimpleVocabulary) wich adds all tokens from *x* and *y* utterances to vocabulary.

In [54]:
from deeppavlov.core.data.simple_vocab import SimpleVocabulary

@register('dialog_vocab')
class DialogVocab(SimpleVocabulary):
    def fit(self, *args):
        tokens = chain(*args)
        super().fit(tokens)

    def __call__(self, batch, **kwargs):
        indices_batch = []
        for utt in batch:
            tokens = [self[token] for token in utt]
            indices_batch.append(tokens)
        return indices_batch



```python
@register('simple_vocab')
class SimpleVocabulary(Estimator): # Estimator는 Component와 Serializable 객체를 인자로 받는 추상 클래스 (fit함수)
    """Vocabulary 생성."""
    def __init__(self, 
                 special_tokens=tuple(), 
                 default_token=None,
                 max_tokens=2**30,
                 min_freq=0,
                 pad_with_zeros=False,
                 unk_token=None,
                 *args,
                 **kwargs):
        
        super().__init__(**kwargs)
        self.special_tokens = special_tokens # PAD, BOS, EOS, UNK
        self.default_token = default_token
        self._max_tokens = max_tokens
        self._min_freq = min_freq # 최소 반복
        self._pad_with_zeros = pad_with_zeros #ture or false
        self.unk_token = unk_token # UNK
        self.reset()
        if self.load_path:
            self.load()
    
    def fit(self, *args):
        # token으로 vocab 생성
        self.reset()
        tokens = chain(*args)
        # filter(None, <>) -- to filter empty tokens
        self.freqs = Counter(filter(None, chain(*tokens)))
        # special token 부터 vocab(t2i/i2t)에 등록
        for special_token in self.special_tokens:
            self._t2i[special_token] = self.count # token to index
            self._i2t.append(special_token) # index to token
            self.count += 1
        # 기본 token vocab에 등록
        for token, freq in self.freqs.most_common()[:self._max_tokens]:
            if freq >= self._min_freq:
                self._t2i[token] = self.count
                self._i2t.append(token)
                self.count += 1

    def _add_tokens_with_freqs(self, tokens, freqs):
        self.freqs = Counter()
        self.freqs.update(dict(zip(tokens, freqs)))
        # min_freq보다 작은 것들을 제거하고 t2i, i2t를 만듬
        for token, freq in zip(tokens, freqs):
            if freq >= self._min_freq or token in self.special_tokens:
                self._t2i[token] = self.count
                self._i2t.append(token)
                self.count += 1

    def __call__(self, batch, **kwargs):
        indices_batch = []
        for sample in batch:
            indices_batch.append([self[token] for token in sample])
        if self._pad_with_zeros and self.is_str_batch(batch):
            indices_batch = zero_pad(indices_batch)
        return indices_batch

    def save(self):
        # 객체에서 token과 각 token의 count 값 저장
        log.info("[saving vocabulary to {}]".format(self.save_path))
        with self.save_path.open('wt', encoding='utf8') as f:
            for n in range(len(self)):
                token = self._i2t[n]
                cnt = self.freqs[token]
                f.write('{}\t{:d}\n'.format(token, cnt))

    def load(self):
        self.reset()
        # 저장한 vocab 불러옴
        if self.load_path:
            if self.load_path.is_file():
                log.info("[loading vocabulary from {}]".format(self.load_path))
                tokens, counts = [], []
                for ln in self.load_path.open('r', encoding='utf8'):
                    token, cnt = ln.split('\t', 1)
                    tokens.append(token)
                    counts.append(int(cnt))
                self._add_tokens_with_freqs(tokens, counts)
            elif isinstance(self.load_path, Path):
                if not self.load_path.parent.is_dir():
                    raise ConfigError("Provided `load_path` for {} doesn't exist!".format(
                        self.__class__.__name__))
        else:
            raise ConfigError("`load_path` for {} is not provided!".format(self))

    @property
    def len(self):
        return len(self)

    def keys(self):
        return (self[n] for n in range(self.len))

    def values(self):
        return list(range(self.len))

    def items(self):
        return zip(self.keys(), self.values())

    def __getitem__(self, key):
        if isinstance(key, (int, np.integer)):
            return self._i2t[key]
        elif isinstance(key, str):
            return self._t2i[key]
        else:
            raise NotImplementedError("not implemented for type `{}`".format(type(key)))

    def __contains__(self, item):
        return item in self._t2i

    def __len__(self):
        return len(self._i2t)

    def is_str_batch(self, batch):
        # batch의 각 data가 string인지 확인
        if not self.is_empty(batch):
            non_empty = [item for item in batch if len(item) > 0]
            if isinstance(non_empty[0], str) or isinstance(non_empty[0][0], str):
                return True
            elif isinstance(non_empty[0][0], (int, np.integer)):
                return False
            else:
                raise RuntimeError(f'The elements passed to the vocab are not strings '
                                   f'or integers! But they are {type(element)}')
        else:
            return False

    def reset(self):
        # default index is the position of default_token
        if self.default_token is not None:
            default_ind = self.special_tokens.index(self.default_token)
        else:
            default_ind = 0
        self.freqs = None
        unk_index = 0
        if self.unk_token in self.special_tokens:
            unk_index = self.special_tokens.index(self.unk_token)
        self._t2i = defaultdict(lambda: unk_index)
        self._i2t = []
        self.count = 0

    @staticmethod
    def is_empty(batch):
        non_empty = [item for item in batch if len(item) > 0]
        self._i2t = []
        self.count = 0

    @staticmethod
    def is_empty(batch):
        non_empty = [item for item in batch if len(item) > 0]
        return len(non_empty) == 0

```

---

```python
class Estimator(Component, Serializable):
    """fit 하기 위한 추상 Class"""
    @abstractmethod
    def fit(self, *args, **kwargs):
        pass

```

Let's create instance of DialogVocab. We define save and load paths, minimal frequence of tokens which are added to vocabulary and set of special tokens.

Special tokens are:
* <PAD\> - padding
* <BOS\> - begin of sequence
* <EOS\> - end of sequence
* <UNK\> - unknown token - token which is not presented in vocabulary

And fit it on tokens from *x* and *y*.

In [55]:
vocab = DialogVocab(
    save_path='./vocab.dict',
    load_path='./vocab.dict',
    min_freq=2,
    special_tokens=('<PAD>','<BOS>', '<EOS>', '<UNK>',),
    unk_token='<UNK>'
)

vocab.fit(tokenizer(iterator.get_instances(data_type='train')[0]), tokenizer(iterator.get_instances(data_type='train')[1]))
vocab.save()

2018-10-16 22:52:40.802 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 86: [saving vocabulary to C:\Users\JungHyun\Anaconda3\envs\tensorflow\Lib\site-packages\download\vocab.dict]


Top 10 most frequent tokens in train dataset:

In [56]:
vocab.freqs.most_common(10) # freqs : Counter 객체

[('i', 103487),
 ('.', 101599),
 ('you', 48296),
 ('?', 43771),
 (',', 39500),
 ('a', 34214),
 ('to', 32105),
 ('do', 30574),
 ('is', 28579),
 ('my', 26953)]

Number of tokens in vocabulary:

In [57]:
len(vocab)

11595

Let's use built vocabulary to encode some tokenized sentence.

In [64]:
vocab([['<BOS>', 'hello', 'my', 'friend', 'there_is_no_such_word_in_dataset', 'and_this', '<EOS>', '<PAD>']])

[[1, 70, 13, 240, 3, 3, 2, 0]]

### Padding

To feed sequences of token indexes to neural model we should make their lengths equal. If sequence is too short we add <PAD\> symbols to the end of sequence. If sequence is too long we just cut it.

SentencePadder implements such behavior, it also adds <BOS\> and <EOS\> tokens.

In [51]:
from deeppavlov.core.models.component import Component

@register('sentence_padder')
class SentencePadder(Component):
    def __init__(self, length_limit, pad_token_id=0, start_token_id=1, end_token_id=2, *args, **kwargs):
        self.length_limit = length_limit
        self.pad_token_id = pad_token_id
        self.start_token_id = start_token_id
        self.end_token_id = end_token_id

    def __call__(self, batch):
        for i in range(len(batch)): # batch 단위로 데이터 받아옴
            batch[i] = batch[i][:self.length_limit] # 각 데이터를 limit 길이 까지만 자름
            batch[i] = [self.start_token_id] + batch[i] + [self.end_token_id] # 앞 뒤로 start, end token 추가
            batch[i] += [self.pad_token_id] * (self.length_limit + 2 - len(batch[i]))  # 앞뒤로 limit 보다 작은 길이는 padding 추가
        return batch

```python
class Component(metaclass=ABCMeta):
    """pipeline 위한 추상 class"""
    @abstractmethod
    def __call__(self, *args, **kwargs):
        pass

    def reset(self):
        pass

    def destroy(self):
        pass
```

In [65]:
padder = SentencePadder(length_limit=6) #최대 길이 6으로 padder 객체 생성
vocab(padder(vocab([['hello', 'my', 'friend', 'there_is_no_such_word_in_dataset', 'and_this']]))) # 길이5인 예제 

[['<BOS>', 'hello', 'my', 'friend', '<UNK>', '<UNK>', '<EOS>', '<PAD>']]

### Seq2Seq Model
Model consists of two main components: encoder and decoder. We can implement them independently and then put them together in one Seq2Seq model.

#### Encoder
Encoder builds hidden representation of input sequence.

In [66]:
def encoder(inputs, inputs_len, embedding_matrix, cell_size, keep_prob=1.0):
    # inputs: tf.int32 tensor, [batch size x seq_len], 각 값은 token ID
    # inputs_len: tf.int32 tensor, [batch size]
    # embedding_matrix: tf.float32 tensor, [vocab_size x vocab_dim]
    # cell_size: cell의 hidden size(hidden state의 dimension)
    # keep_prob: dropout keep 확률
    with tf.variable_scope('encoder'):
        # first of all we should embed every token in input sequence (use tf.nn.embedding_lookup, don't forget about dropout)
        x_emb = tf.nn.dropout(tf.nn.embedding_lookup(embedding_matrix, inputs), keep_prob=keep_prob)
        
        # 하나의 GRU cell (LSTM 사용 가능)
        encoder_cell = tf.nn.rnn_cell.GRUCell(
                            num_units=cell_size,
                            kernel_initializer=tf.contrib.layers.xavier_initializer(),
                            name='encoder_cell')
        
        # use tf.nn.dynamic_rnn to encode input sequence, use actual length of input sequence
        encoder_outputs, encoder_state = tf.nn.dynamic_rnn(cell=encoder_cell, inputs=x_emb, sequence_length=inputs_len, dtype=tf.float32)
    return encoder_outputs, encoder_state

Check your encoder implementation:

next cell output shapes are

32 x 10 x 100 and 32 x 100 

In [67]:
""""
실제 encoder 아닌 data 32개, 최대 길이 10인 sequence 생성(vocab은 100개) / cell의 hidden state의 dim은 100
""""


tf.reset_default_graph()
vocab_size = 100
hidden_dim = 100
inputs = tf.cast(tf.random_uniform(shape=[32, 10]) * vocab_size, tf.int32) # [batch size x seq_len] batch : 32 / seq_len : 10 
# vocab_size를 곱해서 0~1 을 0~100의 수 가지도록 한 뒤 integer로 만듬
mask = tf.cast(tf.random_uniform(shape=[32, 10]) * 2, tf.int32) # [batch size x seq_len]
# 0 or 1 값 가지도록
inputs_len = tf.reduce_sum(mask, axis=1) # 각 row에 대해서 reduce_sum을 해서 [32 x 1] 값 만듬 => random하게 길이 설정
embedding_matrix = tf.random_uniform(shape=[vocab_size, hidden_dim]) # embedding matrix

encoder(inputs, inputs_len, embedding_matrix, hidden_dim)

(<tf.Tensor 'encoder/rnn/transpose_1:0' shape=(32, 10, 100) dtype=float32>,
 <tf.Tensor 'encoder/rnn/while/Exit_3:0' shape=(32, 100) dtype=float32>)

#### Decoder
Decoder uses encoder outputs and encoder state to produce output sequence.

Here, you should:
* define your decoder_cell (GRU or LSTM)

it will be your baseline seq2seq model.


And, to improve the model:
* add Teacher Forcing
* add Attention Mechanism

In [68]:
def decoder(encoder_outputs, encoder_state, embedding_matrix, mask,
            cell_size, max_length, y_ph,
            start_token_id=1, keep_prob=1.0,
            teacher_forcing_rate_ph=None,
            use_attention=False, is_train=True):
    # decoder
    # encoder_outputs: tf.float32 tensor, [batch size x seq_len x encoder_cell_size]
    # encoder_state: tf.float32 tensor, [batch size x encoder_cell_size]
    # embedding_matrix: tf.float32 tensor, [vocab_size x vocab_dim]
    # mask: tf.int32 tensor, [batch size x seq_len] sequence 값들 중 maked 된 값은 0
    # cell_size: hidden state의 dimension
    # max_length: output의 max_length
    # start_token_id: vocab에서 start token <BOS> 의 id
    # keep_prob: dropout 확률
    # teacher_forcing_rate_ph: teacher forcing 사용시 확률
    # use_attention: attention 사용 유무
    # is_train: 학습 유무, inference 시에는 teacher forcing 사용 안함
    with tf.variable_scope('decoder'):
        # define decoder recurrent cell
        decoder_cell = tf.nn.rnn_cell.GRUCell(
                            num_units=cell_size,
                            kernel_initializer=tf.contrib.layers.xavier_initializer(),
                            name='decoder_cell')
        
        # initial value of output_token on previsous step is start_token
        output_token = tf.ones(shape=(tf.shape(encoder_outputs)[0],), dtype=tf.int32) * start_token_id 
        # [batch,1] = 1 가지도록, 모든 data에 대해 start token 가지도록 함
        
        # decoder_state의 첫 값(encoder의 output state)
        decoder_state = encoder_state

        pred_tokens = []
        logits = []

        # use for loop to sequentially call recurrent cell
        for i in range(max_length):
            """
            TEACHER FORCING
            # here you can try to implement teacher forcing for your model
            # details about teacher forcing are explained further in tutorial
            
            # pseudo code:
            NOTE THAT FOLLOWING CONDITIONS SHOULD BE EVALUATED AT GRAPH RUNTIME
            use tf.cond and tf.logical operations instead of python if
            
            if i > 0 and is_train and random_value < teacher_forcing_rate_ph:
                input_token = y_ph[:, i-1] # 예측 token 이 아니라 실제 이전 token을 다음 input으로 넣는다.
            else:
                input_token = output_token

            input_token_emb = tf.nn.embedding_lookup(embedding_matrix, input_token)
            
            """
            if i > 0:
                input_token_emb = tf.cond(
                                      tf.logical_and(
                                          is_train,
                                          tf.random_uniform(shape=(), maxval=1) <= teacher_forcing_rate_ph 
                                          # 일정 확률 이상일때 실제 token 사용
                                      ),
                                      lambda: tf.nn.embedding_lookup(embedding_matrix, y_ph[:, i-1]), # teacher forcing
                                      lambda: tf.nn.embedding_lookup(embedding_matrix, output_token)
                                      )
            else:
                input_token_emb = tf.nn.embedding_lookup(embedding_matrix, output_token) # 처음에는 start token의 embedding 값

            """
            ATTENTION MECHANISM
            # here you can add attention to your model
            # you can find details about attention further in tutorial
            """            
            if use_attention: # attention 사용
                # compute attention and concat attention vector to input_token_emb
                att = dot_attention(encoder_outputs, decoder_state, mask, scope='att') 
                # decoder state 값과 encoder의 output들에 대해서 dot attention 계산
                
                input_token_emb = tf.concat([input_token_emb, att], axis=-1) # attention 값들에 대해서 가중 평균 계산


            input_token_emb = tf.nn.dropout(input_token_emb, keep_prob=keep_prob) # dropout 적용
            # call recurrent cell
            decoder_outputs, decoder_state = decoder_cell(input_token_emb, decoder_state)
            decoder_outputs = tf.nn.dropout(decoder_outputs, keep_prob=keep_prob)
            # project decoder output to embeddings dimension
            embeddings_dim = embedding_matrix.get_shape()[1]
            output_proj = tf.layers.dense(decoder_outputs, embeddings_dim, activation=tf.nn.tanh,
                                          kernel_initializer=tf.contrib.layers.xavier_initializer(),
                                          name='proj', reuse=tf.AUTO_REUSE) # output token 위한 dense layer
            # compute logits
            output_logits = tf.matmul(output_proj, embedding_matrix, transpose_b=True) # output_proj x (embedding_matrix)^T

            logits.append(output_logits) 
            output_probs = tf.nn.softmax(output_logits)
            output_token = tf.argmax(output_probs, axis=-1)
            pred_tokens.append(output_token)

        y_pred_tokens = tf.transpose(tf.stack(pred_tokens, axis=0), [1, 0])
        y_logits = tf.transpose(tf.stack(logits, axis=0), [1, 0, 2])
    return y_pred_tokens, y_logits

Output of next cell should be with shapes:

    32 x 10
    32 x 10 x 100

In [69]:
tf.reset_default_graph()
vocab_size = 100
hidden_dim = 100
inputs = tf.cast(tf.random_uniform(shape=[32, 10]) * vocab_size, tf.int32) # bs x seq_len
mask = tf.cast(tf.random_uniform(shape=[32, 10]) * 2, tf.int32) # bs x seq_len
inputs_len = tf.reduce_sum(mask, axis=1)
embedding_matrix = tf.random_uniform(shape=[vocab_size, hidden_dim])

teacher_forcing_rate = tf.random_uniform(shape=())
y = tf.cast(tf.random_uniform(shape=[32, 10]) * vocab_size, tf.int32)

encoder_outputs, encoder_state = encoder(inputs, inputs_len, embedding_matrix, hidden_dim)
decoder(encoder_outputs, encoder_state, embedding_matrix, mask, hidden_dim, max_length=10,
        y_ph=y, teacher_forcing_rate_ph=teacher_forcing_rate)

(<tf.Tensor 'decoder/transpose:0' shape=(32, 10) dtype=int64>,
 <tf.Tensor 'decoder/transpose_1:0' shape=(32, 10, 100) dtype=float32>)

#### Model

Seq2Seq model should be inherited from TFModel class and implement following methods:
* train_on_batch - this method is called in training phase
* \_\_call\_\_ - this method is called to make predictions

In [71]:
from deeppavlov.core.models.tf_model import TFModel

@register('seq2seq')
class Seq2Seq(TFModel):
    def __init__(self, **kwargs):
        # hyperparameters
        
        # dimension of word embeddings
        self.embeddings_dim = kwargs.get('embeddings_dim', 100)
        # size of recurrent cell in encoder and decoder
        self.cell_size = kwargs.get('cell_size', 200)
        # dropout keep_probability
        self.keep_prob = kwargs.get('keep_prob', 0.8)
        # learning rate
        self.learning_rate = kwargs.get('learning_rate', 3e-04)
        # max length of output sequence
        self.max_length = kwargs.get('max_length', 20)
        self.grad_clip = kwargs.get('grad_clip', 5.0) # gradient regularization
        self.start_token_id = kwargs.get('start_token_id', 1)
        self.vocab_size = kwargs.get('vocab_size', 11595)
        self.teacher_forcing_rate = kwargs.get('teacher_forcing_rate', 0.0)
        self.use_attention = kwargs.get('use_attention', False)
        
        # create tensorflow session to run computational graph in it
        self.sess_config = tf.ConfigProto(allow_soft_placement=True)
        self.sess_config.gpu_options.allow_growth = True
        self.sess = tf.Session(config=self.sess_config)
        
        self.init_graph()
        
        # define train op
        self.train_op = self.get_train_op(self.loss, self.lr_ph,
                                          optimizer=tf.train.AdamOptimizer,
                                          clip_norm=self.grad_clip) 
        # initialize graph variables
        self.sess.run(tf.global_variables_initializer())
        
        super().__init__(**kwargs)
        # load saved model if there is one
        if self.load_path is not None:
            self.load()
        
    def init_graph(self):
        # create placeholders
        self.init_placeholders()

        self.x_mask = tf.cast(self.x_ph, tf.int32) #?
        self.y_mask = tf.cast(self.y_ph, tf.int32) #?
        
        self.x_len = tf.reduce_sum(self.x_mask, axis=1) # 각 데이터의 길이
        
        # create embeddings matrix for tokens
        self.embeddings = tf.Variable(tf.random_uniform((self.vocab_size, self.embeddings_dim), -0.1, 0.1, name='embeddings'), dtype=tf.float32)

        # encoder
        encoder_outputs, encoder_state = encoder(self.x_ph, self.x_len, self.embeddings, self.cell_size, self.keep_prob_ph)

        # decoder
        self.y_pred_tokens, y_logits = decoder(encoder_outputs, encoder_state, self.embeddings, self.x_mask,
                                                      self.cell_size, self.max_length,
                                                      self.y_ph, self.start_token_id, self.keep_prob_ph,
                                                      self.teacher_forcing_rate_ph, self.use_attention, self.is_train_ph)
        
        # loss
        self.y_ohe = tf.one_hot(self.y_ph, depth=self.vocab_size)
        self.y_mask = tf.cast(self.y_mask, tf.float32) # 연산을 위해 casting
        self.loss = tf.nn.softmax_cross_entropy_with_logits(labels=self.y_ohe, logits=y_logits) * self.y_mask
        self.loss = tf.reduce_sum(self.loss) / tf.reduce_sum(self.y_mask) # loss 평균
    
    def init_placeholders(self):
        # placeholders for inputs
        self.x_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='x_ph')
        # at inference time y_ph is used (y_ph exists in computational graph)  when teacher forcing is activated, so we add dummy default value
        # this dummy value is not actually used at inference
        self.y_ph = tf.placeholder_with_default(tf.zeros_like(self.x_ph), shape=(None,None), name='y_ph')

        # placeholders for model parameters
        self.lr_ph = tf.placeholder(dtype=tf.float32, shape=[], name='lr_ph')
        self.keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph')
        self.is_train_ph = tf.placeholder_with_default(False, shape=[], name='is_train_ph')
        self.teacher_forcing_rate_ph = tf.placeholder_with_default(0.0, shape=[], name='teacher_forcing_rate_ph')
            
    def _build_feed_dict(self, x, y=None):
        feed_dict = {
            self.x_ph: x,
        }
        if y is not None:
            feed_dict.update({
                self.y_ph: y,
                self.lr_ph: self.learning_rate,
                self.keep_prob_ph: self.keep_prob,
                self.is_train_ph: True,
                self.teacher_forcing_rate_ph: self.teacher_forcing_rate,
            })
        return feed_dict
    
    def train_on_batch(self, x, y):
        feed_dict = self._build_feed_dict(x, y)
        loss, _ = self.sess.run([self.loss, self.train_op], feed_dict=feed_dict)
        return loss
    
    def __call__(self, x):
        feed_dict = self._build_feed_dict(x)
        y_pred = self.sess.run(self.y_pred_tokens, feed_dict=feed_dict)
        return y_pred

```python
class TFModel(NNModel, metaclass=TfModelMeta):
    """Parent class for all components using TensorFlow."""
    def __init__(self, *args, **kwargs) -> None:
        if not hasattr(self, 'sess'):
            raise RuntimeError('Your TensorFlow model {} must'
                               ' have sess attribute!'.format(self.__class__.__name__))
        super().__init__(*args, **kwargs)

    def load(self, exclude_scopes: Optional[Iterable] = ('Optimizer',)) -> None:
        """Load model parameters from self.load_path"""
        path = str(self.load_path.resolve())
        # Check presence of the model files
        if tf.train.checkpoint_exists(path):
            log.info('[loading model from {}]'.format(path))
            # Exclude optimizer variables from saved variables
            var_list = self._get_saveable_variables(exclude_scopes)
            saver = tf.train.Saver(var_list)
            saver.restore(self.sess, path)

    def save(self, exclude_scopes: Optional[Iterable] = ('Optimizer',)) -> None:
        """Save model parameters to self.save_path"""
        path = str(self.save_path.resolve())
        log.info('[saving model to {}]'.format(path))
        var_list = self._get_saveable_variables(exclude_scopes)
        saver = tf.train.Saver(var_list)
        saver.save(self.sess, path)

    @staticmethod
    def _get_saveable_variables(exclude_scopes=tuple()):
        all_vars = variables._all_saveable_objects()
        vars_to_train = [var for var in all_vars if all(sc not in var.name for sc in exclude_scopes)]
        return vars_to_train

    @staticmethod
    def _get_trainable_variables(exclude_scopes=tuple()):
        all_vars = tf.global_variables()
        vars_to_train = [var for var in all_vars if all(sc not in var.name for sc in exclude_scopes)]
        return vars_to_train

    def get_train_op(self,
                     loss,
                     learning_rate,
                     optimizer=None,
                     clip_norm=None,
                     learnable_scopes=None,
                     optimizer_scope_name=None):
        """ Get train operation for given loss

        Args:
            loss: loss, tf tensor or scalar
            learning_rate: scalar or placeholder
            clip_norm: clip gradients norm by clip_norm
            learnable_scopes: which scopes are trainable (None for all)
            optimizer: instance of tf.train.Optimizer, default Adam

        Returns:
            train_op
        """
        if optimizer_scope_name is None:
            opt_scope = tf.variable_scope('Optimizer')
        else:
            opt_scope = tf.variable_scope(optimizer_scope_name)
        with opt_scope:
            if learnable_scopes is None:
                variables_to_train = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
            else:
                variables_to_train = []
                for scope_name in learnable_scopes:
                    variables_to_train.extend(tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=scope_name))

            if optimizer is None:
                optimizer = tf.train.AdamOptimizer

            # For batch norm it is necessary to update running averages
            extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
            with tf.control_dependencies(extra_update_ops):

                def clip_if_not_none(grad):
                    if grad is not None:
                        return tf.clip_by_norm(grad, clip_norm)

                opt = optimizer(learning_rate)
                grads_and_vars = opt.compute_gradients(loss, var_list=variables_to_train)
                if clip_norm is not None:
                    grads_and_vars = [(clip_if_not_none(grad), var)
                                      for grad, var in grads_and_vars]
                train_op = opt.apply_gradients(grads_and_vars)
        return train_op

    @staticmethod
    def print_number_of_parameters():
        """
        Print number of *trainable* parameters in the network
        """
        log.info('Number of parameters: ')
        variables = tf.trainable_variables()
        blocks = defaultdict(int)
        for var in variables:
            # Get the top level scope name of variable
            block_name = var.name.split('/')[0]
            number_of_parameters = np.prod(var.get_shape().as_list())
            blocks[block_name] += number_of_parameters
        for block_name, cnt in blocks.items():
            log.info("{} - {}.".format(block_name, cnt))
        total_num_parameters = np.sum(list(blocks.values()))
        log.info('Total number of parameters equal {}'.format(total_num_parameters))
```

Let's create model with random weights and default parameters, change path to model, otherwise it will be stored in deeppavlov/download folder:

In [72]:
s2s = Seq2Seq(
    save_path='PATH_TO_YOUR_WORKING_DIR/model',
    load_path='PATH_TO_YOUR_WORKING_DIR/model'
)

Using TensorFlow backend.


Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



Here, we firstly run all preprocessing steps and call seq2seq model, and then convert token indexes to tokens. As result we should get some random sequence of words.

In [74]:
vocab(s2s(padder(vocab([['today','is','so','hot']]))))

[['nintendo',
  'uni',
  'roomage',
  'cricket',
  'gamer',
  'gnc',
  'suffer',
  'loosing',
  'fished',
  'galleries',
  'galleries',
  'close',
  'owe',
  'guesses',
  'bowl',
  'cali',
  'energetic',
  'frightening',
  'bad',
  'cali']]

In [73]:
vocab(s2s(padder(vocab([['hello', 'my', 'friend', 'there_is_no_such_word_in_dataset', 'and_this']]))))

[['music',
  'music',
  'trout',
  'cooper',
  'settle',
  'successful',
  'caesar',
  'agriculture',
  'agriculture',
  'seas',
  'give',
  'ahahah',
  'starved',
  'uses',
  'spaniel',
  'cum',
  'f',
  'cum',
  'especially',
  'f']]

#### Attention mechanism
Attention mechanism [[paper](https://arxiv.org/abs/1409.0473)] allows to aggregate information from "memory" according to current state. By aggregating we suppose weighted sum of "memory" items. Weight of each memory item depends on current state.

Without attention decoder could use only last hidden state of encoder. Attention mechanism gives access to all encoder states during decoding.

![attention](img/attention.png)

One of the simpliest ways to compute attention weights (*a_ij*) is to compute them by dot product between memory items and state and then apply softmax function. Other ways of computing *multiplicative* attention could be found in this [paper](https://arxiv.org/abs/1508.04025).

We also need a mask to skip some sequence elements like <PAD\>. To make weight of undesired memory items close to zero we can add big negative value to logits (result of dot product) before applying softmax.

In [None]:
def softmax_mask(values, mask):
    # adds big negative to masked values
    INF = 1e30
    return -INF * (1 - tf.cast(mask, tf.float32)) + values

In [None]:
def dot_attention(memory, state, mask, scope="dot_attention"):
    # inputs: bs x seq_len x hidden_dim
    # state: bs x hidden_dim
    # mask: bs x seq_len
    with tf.variable_scope(scope, reuse=tf.AUTO_REUSE):
        # dot product between each item in memory and state
        logits = tf.matmul(memory, tf.expand_dims(state, axis=1), transpose_b=True)
        logits = tf.squeeze(logits, [2])
        
        # apply mask to logits
        logits = softmax_mask(logits, mask)
        
        # apply softmax to logits
        att_weights = tf.expand_dims(tf.nn.softmax(logits), axis=2)
        
        # compute weighted sum of items in memory
        att = tf.reduce_sum(att_weights * memory, axis=1)
        return att

Check your implementation:

outputs should be with shapes 32 x 100

In [None]:
tf.reset_default_graph()
memory = tf.random_normal(shape=[32, 10, 100]) # bs x seq_len x hidden_dim
state = tf.random_normal(shape=[32, 100]) # bs x hidden_dim
mask = tf.cast(tf.random_normal(shape=[32, 10]), tf.int32) # bs x seq_len
dot_attention(memory, state, mask)

#### Teacher forcing

We have implemented decoder, which takes as input it's own output during training and inference time. But, at early stages of training it could be hard for model to produce long sequences depending on it's own close to random output. Teacher forcing can help with this: instead of feeding model's output we can feed ground truth tokens. It helps model on training time, but on inference we still can rely only on it's own output.


Using model's output:

<img src="img/sampling.png" alt="sampling" width=50%/>

Teacher forcing:

<img src="img/teacher_forcing.png" alt="teacher_forcing" width=50%/>

It is not necessary to feed ground truth tokens on each time step - we can randomly choose with some rate if we want ground truth input or predicted by model.
*teacher_forcing_rate* parameter of seq2seq model can control such behavior.

More details about teacher forcing could be found in DeepLearningBook [Chapter 10.2.1](http://www.deeplearningbook.org/contents/rnn.html)

Let's create model with random weights and default parameters:

Here, we firstly run all preprocessing steps and call seq2seq model, and then convert token indexes to tokens. As result we should get some random sequence of words.

### Postprocessing

In postprocessing step we are going to remove all <PAD\>, <BOS\>, <EOS\> tokens.

In [75]:
@register('postprocessing')
class SentencePostprocessor(Component):
    def __init__(self, pad_token='<PAD>', start_token='<BOS>', end_token='<EOS>', *args, **kwargs):
        self.pad_token = pad_token
        self.start_token = start_token
        self.end_token = end_token

    def __call__(self, batch):
        for i in range(len(batch)):
            batch[i] = ' '.join(self._postproc(batch[i]))
        return batch
    
    def _postproc(self, utt):
        if self.end_token in utt:
            utt = utt[:utt.index(self.end_token)]
        return utt

In [76]:
postprocess = SentencePostprocessor()

In [78]:
postprocess(vocab(s2s(padder(vocab([['who', 'are', 'you']])))))

['clever clever clever clever ponder ponder whipped royalty lax desire duramax iran dope tuck teaches envious envious coordination parrots parrots']

In [77]:
postprocess(vocab(s2s(padder(vocab([['hello', 'my', 'friend', 'there_is_no_such_word_in_dataset', 'and_this']])))))

['music music trout cooper settle successful caesar agriculture agriculture seas give ahahah starved uses spaniel cum f cum especially f']

### Create config file
Let's put is all together in one config file.

In [None]:
config = {
  "dataset_reader": {
    "name": "personachat_dataset_reader",
    "data_path": "YOUR_PATH_TO_FOLDER_WITH_PERSONACHAT_DATASET"
  },
  "dataset_iterator": {
    "name": "personachat_iterator",
    "seed": 1337,
    "shuffle": True
  },
  "chainer": {
    "in": ["x"],
    "in_y": ["y"],
    "pipe": [
      {
        "name": "lazy_tokenizer",
        "id": "tokenizer",
        "in": ["x"],
        "out": ["x_tokens"]
      },
      {
        "name": "lazy_tokenizer",
        "id": "tokenizer",
        "in": ["y"],
        "out": ["y_tokens"]
      },
      {
        "name": "dialog_vocab",
        "id": "vocab",
        "save_path": "YOUR_PATH_TO_WORKING_DIR/vocab.dict",
        "load_path": "YOUR_PATH_TO_WORKING_DIR/vocab.dict",
        "min_freq": 2,
        "special_tokens": ["<PAD>","<BOS>", "<EOS>", "<UNK>"],
        "unk_token": "<UNK>",
        "fit_on": ["x_tokens", "y_tokens"],
        "in": ["x_tokens"],
        "out": ["x_tokens_ids"]
      },
      {
        "ref": "vocab",
        "in": ["y_tokens"],
        "out": ["y_tokens_ids"]
      },
      {
        "name": "sentence_padder",
        "id": "padder",
        "length_limit": 20,
        "in": ["x_tokens_ids"],
        "out": ["x_tokens_ids"]
      },
      {
        "ref": "padder",
        "in": ["y_tokens_ids"],
        "out": ["y_tokens_ids"]
      },
      {
        "name": "seq2seq",
        "id": "s2s",
        "max_length": "#padder.length_limit+2",
        "cell_size": 250,
        "embeddings_dim": 50,
        "vocab_size": 11595,
        "keep_prob": 0.8,
        "learning_rate": 3e-04,
        "teacher_forcing_rate": 0.0,
        "use_attention": False,
        "save_path": "YOUR_PATH_TO_WORKING_DIR/model",
        "load_path": "YOUR_PATH_TO_WORKING_DIR/model",
        "in": ["x_tokens_ids"],
        "in_y": ["y_tokens_ids"],
        "out": ["y_predicted_tokens_ids"],
      },
      {
        "ref": "vocab",
        "in": ["y_predicted_tokens_ids"],
        "out": ["y_predicted_tokens"]
      },
      {
        "name": "postprocessing",
        "in": ["y_predicted_tokens"],
        "out": ["y_predicted_tokens"]
      }
    ],
    "out": ["y_predicted_tokens"]
  },
  "train": {
    "log_every_n_batches": 100,
    "val_every_n_epochs":0,
    "batch_size": 64,
    "validation_patience": 0,
    "epochs": 20,
    "metrics": ["bleu"],
  }
}

### Interact with model using config

In [None]:
from deeppavlov.core.commands.infer import build_model_from_config
model = build_model_from_config(config)

In [None]:
model(['Hi, how are you?', 'Any ideas my dear friend?'])

### Train model


Run experiments with and without attention, with teacher forcing and without.

In [None]:
from deeppavlov.core.commands.train import train_evaluate_model_from_config

In [None]:
json.dump(config, open('seq2seq.json', 'w'))

In [None]:
train_evaluate_model_from_config('seq2seq.json')

In [None]:
model = build_model_from_config(config)
model(['hi, how are you?', 'any ideas my dear friend?', 'okay, i agree with you', 'good bye!'])

C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_FOLDER_WITH_PERSONACHAT_DATASET\train_self_original.txt
C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_FOLDER_WITH_PERSONACHAT_DATASET\valid_self_original.txt
C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_FOLDER_WITH_PERSONACHAT_DATASET\test_self_original.txt
2018-10-17 01:41:44.927 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 86: [saving vocabulary to C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_WORKING_DIR\vocab.dict]
C:\Users\CAU\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py:490: UserWarning: 
Corpus/Sentence contains 0 counts of 4-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)
{"train": {"epochs_done": 0, "batches_seen": 100, "examples_seen": 6400, "metrics": {"bleu": 0.001}, "time_spent": "0:01:50", "loss": 9.326328592300415}}
{"train": {"epochs_done": 0, "batches_seen": 200, "examples_seen": 12800, "metrics": {"bleu": 0.0009}, "time_spent": "0:03:35", "loss": 9.246723451614379}}
C:\Users\CAU\Anaconda3\lib\site-packages\nltk\translate\bleu_score.py:490: UserWarning: 
Corpus/Sentence contains 0 counts of 3-gram overlaps.
BLEU scores might be undesirable; use SmoothingFunction().
  warnings.warn(_msg)
{"train": {"epochs_done": 0, "batches_seen": 300, "examples_seen": 19200, "metrics": {"bleu": 0.0349}, "time_spent": "0:05:20", "loss": 9.236984586715698}}
{"train": {"epochs_done": 0, "batches_seen": 400, "examples_seen": 25600, "metrics": {"bleu": 0.0068}, "time_spent": "0:07:05", "loss": 9.234621047973633}}
{"train": {"epochs_done": 0, "batches_seen": 500, "examples_seen": 32000, "metrics": {"bleu": 0.0276}, "time_spent": "0:08:51", "loss": 9.217944164276123}}
{"train": {"epochs_done": 0, "batches_seen": 600, "examples_seen": 38400, "metrics": {"bleu": 0.006}, "time_spent": "0:10:36", "loss": 9.231796932220458}}
{"train": {"epochs_done": 0, "batches_seen": 700, "examples_seen": 44800, "metrics": {"bleu": 0.0266}, "time_spent": "0:12:21", "loss": 9.233966999053955}}
{"train": {"epochs_done": 0, "batches_seen": 800, "examples_seen": 51200, "metrics": {"bleu": 0.005}, "time_spent": "0:14:06", "loss": 9.238256750106812}}
{"train": {"epochs_done": 0, "batches_seen": 900, "examples_seen": 57600, "metrics": {"bleu": 0.0407}, "time_spent": "0:15:51", "loss": 9.232742547988892}}
{"train": {"epochs_done": 0, "batches_seen": 1000, "examples_seen": 64000, "metrics": {"bleu": 0.0228}, "time_spent": "0:17:36", "loss": 9.2308828830719}}
{"train": {"epochs_done": 1, "batches_seen": 1100, "examples_seen": 70391, "metrics": {"bleu": 0.0332}, "time_spent": "0:19:21", "loss": 9.146511001586914}}
{"train": {"epochs_done": 1, "batches_seen": 1200, "examples_seen": 76791, "metrics": {"bleu": 0.0449}, "time_spent": "0:21:07", "loss": 9.085340995788574}}
{"train": {"epochs_done": 1, "batches_seen": 1300, "examples_seen": 83191, "metrics": {"bleu": 0.003}, "time_spent": "0:22:52", "loss": 9.08676646232605}}
{"train": {"epochs_done": 1, "batches_seen": 1400, "examples_seen": 89591, "metrics": {"bleu": 0.0016}, "time_spent": "0:24:37", "loss": 9.07125020980835}}
{"train": {"epochs_done": 1, "batches_seen": 1500, "examples_seen": 95991, "metrics": {"bleu": 0.0021}, "time_spent": "0:26:22", "loss": 9.087638883590698}}
{"train": {"epochs_done": 1, "batches_seen": 1600, "examples_seen": 102391, "metrics": {"bleu": 0.0102}, "time_spent": "0:28:07", "loss": 9.072702112197875}}
{"train": {"epochs_done": 1, "batches_seen": 1700, "examples_seen": 108791, "metrics": {"bleu": 0.0026}, "time_spent": "0:29:52", "loss": 9.079061880111695}}
{"train": {"epochs_done": 1, "batches_seen": 1800, "examples_seen": 115191, "metrics": {"bleu": 0.0024}, "time_spent": "0:31:37", "loss": 9.08630181312561}}
{"train": {"epochs_done": 1, "batches_seen": 1900, "examples_seen": 121591, "metrics": {"bleu": 0.003}, "time_spent": "0:33:21", "loss": 9.082614107131958}}
{"train": {"epochs_done": 1, "batches_seen": 2000, "examples_seen": 127991, "metrics": {"bleu": 0.0011}, "time_spent": "0:35:06", "loss": 9.078346729278564}}
{"train": {"epochs_done": 2, "batches_seen": 2100, "examples_seen": 134382, "metrics": {"bleu": 0.0014}, "time_spent": "0:36:51", "loss": 9.027002954483033}}
{"train": {"epochs_done": 2, "batches_seen": 2200, "examples_seen": 140782, "metrics": {"bleu": 0.0082}, "time_spent": "0:38:37", "loss": 8.999157638549805}}
{"train": {"epochs_done": 2, "batches_seen": 2300, "examples_seen": 147182, "metrics": {"bleu": 0.0012}, "time_spent": "0:40:22", "loss": 8.996476573944092}}
{"train": {"epochs_done": 2, "batches_seen": 2400, "examples_seen": 153582, "metrics": {"bleu": 0.0139}, "time_spent": "0:42:07", "loss": 9.002782011032105}}
{"train": {"epochs_done": 2, "batches_seen": 2500, "examples_seen": 159982, "metrics": {"bleu": 0.0011}, "time_spent": "0:43:52", "loss": 9.012301778793335}}
{"train": {"epochs_done": 2, "batches_seen": 2600, "examples_seen": 166382, "metrics": {"bleu": 0.0015}, "time_spent": "0:45:37", "loss": 9.025073194503785}}
{"train": {"epochs_done": 2, "batches_seen": 2700, "examples_seen": 172782, "metrics": {"bleu": 0.0166}, "time_spent": "0:47:22", "loss": 9.030844554901122}}
{"train": {"epochs_done": 2, "batches_seen": 2800, "examples_seen": 179182, "metrics": {"bleu": 0.0143}, "time_spent": "0:49:07", "loss": 9.023800868988037}}
{"train": {"epochs_done": 2, "batches_seen": 2900, "examples_seen": 185582, "metrics": {"bleu": 0.0029}, "time_spent": "0:50:52", "loss": 9.013167266845704}}
{"train": {"epochs_done": 2, "batches_seen": 3000, "examples_seen": 191982, "metrics": {"bleu": 0.0025}, "time_spent": "0:52:37", "loss": 9.024880704879761}}
{"train": {"epochs_done": 3, "batches_seen": 3100, "examples_seen": 198373, "metrics": {"bleu": 0.0028}, "time_spent": "0:54:21", "loss": 9.006502389907837}}
{"train": {"epochs_done": 3, "batches_seen": 3200, "examples_seen": 204773, "metrics": {"bleu": 0.0012}, "time_spent": "0:56:06", "loss": 8.927226390838623}}
{"train": {"epochs_done": 3, "batches_seen": 3300, "examples_seen": 211173, "metrics": {"bleu": 0.0021}, "time_spent": "0:57:51", "loss": 8.935716943740845}}
{"train": {"epochs_done": 3, "batches_seen": 3400, "examples_seen": 217573, "metrics": {"bleu": 0.0034}, "time_spent": "0:59:36", "loss": 8.934136743545531}}
{"train": {"epochs_done": 3, "batches_seen": 3500, "examples_seen": 223973, "metrics": {"bleu": 0.0023}, "time_spent": "1:01:21", "loss": 8.969236526489258}}
{"train": {"epochs_done": 3, "batches_seen": 3600, "examples_seen": 230373, "metrics": {"bleu": 0.0035}, "time_spent": "1:03:06", "loss": 8.942802066802978}}
{"train": {"epochs_done": 3, "batches_seen": 3700, "examples_seen": 236773, "metrics": {"bleu": 0.0332}, "time_spent": "1:04:51", "loss": 8.945540342330933}}
{"train": {"epochs_done": 3, "batches_seen": 3800, "examples_seen": 243173, "metrics": {"bleu": 0.0021}, "time_spent": "1:06:35", "loss": 8.977229433059692}}
{"train": {"epochs_done": 3, "batches_seen": 3900, "examples_seen": 249573, "metrics": {"bleu": 0.0034}, "time_spent": "1:08:20", "loss": 8.971212635040283}}
{"train": {"epochs_done": 3, "batches_seen": 4000, "examples_seen": 255973, "metrics": {"bleu": 0.002}, "time_spent": "1:10:05", "loss": 9.000960874557496}}
{"train": {"epochs_done": 3, "batches_seen": 4100, "examples_seen": 262373, "metrics": {"bleu": 0.0017}, "time_spent": "1:11:50", "loss": 8.987378797531129}}
{"train": {"epochs_done": 4, "batches_seen": 4200, "examples_seen": 268764, "metrics": {"bleu": 0.0023}, "time_spent": "1:13:35", "loss": 8.901102352142335}}
{"train": {"epochs_done": 4, "batches_seen": 4300, "examples_seen": 275164, "metrics": {"bleu": 0.0029}, "time_spent": "1:15:20", "loss": 8.87007791519165}}
{"train": {"epochs_done": 4, "batches_seen": 4400, "examples_seen": 281564, "metrics": {"bleu": 0.0022}, "time_spent": "1:17:05", "loss": 8.888577156066894}}
{"train": {"epochs_done": 4, "batches_seen": 4500, "examples_seen": 287964, "metrics": {"bleu": 0.0031}, "time_spent": "1:18:50", "loss": 8.908924293518066}}
{"train": {"epochs_done": 4, "batches_seen": 4600, "examples_seen": 294364, "metrics": {"bleu": 0.0034}, "time_spent": "1:20:34", "loss": 8.91369794845581}}
{"train": {"epochs_done": 4, "batches_seen": 4700, "examples_seen": 300764, "metrics": {"bleu": 0.0033}, "time_spent": "1:22:19", "loss": 8.909589967727662}}
{"train": {"epochs_done": 4, "batches_seen": 4800, "examples_seen": 307164, "metrics": {"bleu": 0.0025}, "time_spent": "1:24:06", "loss": 8.922259683609008}}
{"train": {"epochs_done": 4, "batches_seen": 4900, "examples_seen": 313564, "metrics": {"bleu": 0.0022}, "time_spent": "1:25:51", "loss": 8.931581058502196}}
{"train": {"epochs_done": 4, "batches_seen": 5000, "examples_seen": 319964, "metrics": {"bleu": 0.002}, "time_spent": "1:27:36", "loss": 8.932306070327758}}
{"train": {"epochs_done": 4, "batches_seen": 5100, "examples_seen": 326364, "metrics": {"bleu": 0.0025}, "time_spent": "1:29:21", "loss": 8.942867860794067}}
{"train": {"epochs_done": 5, "batches_seen": 5200, "examples_seen": 332755, "metrics": {"bleu": 0.0022}, "time_spent": "1:31:06", "loss": 8.8621005153656}}
{"train": {"epochs_done": 5, "batches_seen": 5300, "examples_seen": 339155, "metrics": {"bleu": 0.0029}, "time_spent": "1:32:51", "loss": 8.82677529335022}}
{"train": {"epochs_done": 5, "batches_seen": 5400, "examples_seen": 345555, "metrics": {"bleu": 0.0033}, "time_spent": "1:34:36", "loss": 8.839066534042358}}
{"train": {"epochs_done": 5, "batches_seen": 5500, "examples_seen": 351955, "metrics": {"bleu": 0.0022}, "time_spent": "1:36:21", "loss": 8.857021369934081}}
{"train": {"epochs_done": 5, "batches_seen": 5600, "examples_seen": 358355, "metrics": {"bleu": 0.0019}, "time_spent": "1:38:06", "loss": 8.864920358657837}}
{"train": {"epochs_done": 5, "batches_seen": 5700, "examples_seen": 364755, "metrics": {"bleu": 0.0025}, "time_spent": "1:39:51", "loss": 8.870533008575439}}
{"train": {"epochs_done": 5, "batches_seen": 5800, "examples_seen": 371155, "metrics": {"bleu": 0.002}, "time_spent": "1:41:37", "loss": 8.867363786697387}}
{"train": {"epochs_done": 5, "batches_seen": 5900, "examples_seen": 377555, "metrics": {"bleu": 0.0027}, "time_spent": "1:43:21", "loss": 8.874911956787109}}
{"train": {"epochs_done": 5, "batches_seen": 6000, "examples_seen": 383955, "metrics": {"bleu": 0.0022}, "time_spent": "1:45:07", "loss": 8.878385524749756}}
{"train": {"epochs_done": 5, "batches_seen": 6100, "examples_seen": 390355, "metrics": {"bleu": 0.0017}, "time_spent": "1:46:52", "loss": 8.888730669021607}}
{"train": {"epochs_done": 6, "batches_seen": 6200, "examples_seen": 396746, "metrics": {"bleu": 0.0023}, "time_spent": "1:48:37", "loss": 8.841088657379151}}
{"train": {"epochs_done": 6, "batches_seen": 6300, "examples_seen": 403146, "metrics": {"bleu": 0.0022}, "time_spent": "1:50:22", "loss": 8.77040545463562}}
{"train": {"epochs_done": 6, "batches_seen": 6400, "examples_seen": 409546, "metrics": {"bleu": 0.0023}, "time_spent": "1:52:07", "loss": 8.800858402252198}}
{"train": {"epochs_done": 6, "batches_seen": 6500, "examples_seen": 415946, "metrics": {"bleu": 0.0022}, "time_spent": "1:53:52", "loss": 8.817020311355591}}
{"train": {"epochs_done": 6, "batches_seen": 6600, "examples_seen": 422346, "metrics": {"bleu": 0.0026}, "time_spent": "1:55:37", "loss": 8.814838724136353}}
{"train": {"epochs_done": 6, "batches_seen": 6700, "examples_seen": 428746, "metrics": {"bleu": 0.0025}, "time_spent": "1:57:22", "loss": 8.813114614486695}}
{"train": {"epochs_done": 6, "batches_seen": 6800, "examples_seen": 435146, "metrics": {"bleu": 0.0022}, "time_spent": "1:59:07", "loss": 8.814752836227417}}
{"train": {"epochs_done": 6, "batches_seen": 6900, "examples_seen": 441546, "metrics": {"bleu": 0.0021}, "time_spent": "2:00:52", "loss": 8.815455055236816}}
{"train": {"epochs_done": 6, "batches_seen": 7000, "examples_seen": 447946, "metrics": {"bleu": 0.0021}, "time_spent": "2:02:37", "loss": 8.831798906326293}}
{"train": {"epochs_done": 6, "batches_seen": 7100, "examples_seen": 454346, "metrics": {"bleu": 0.0026}, "time_spent": "2:04:21", "loss": 8.847474431991577}}
{"train": {"epochs_done": 7, "batches_seen": 7200, "examples_seen": 460737, "metrics": {"bleu": 0.0029}, "time_spent": "2:06:06", "loss": 8.836204404830932}}
{"train": {"epochs_done": 7, "batches_seen": 7300, "examples_seen": 467137, "metrics": {"bleu": 0.0026}, "time_spent": "2:07:51", "loss": 8.733682498931884}}
{"train": {"epochs_done": 7, "batches_seen": 7400, "examples_seen": 473537, "metrics": {"bleu": 0.0033}, "time_spent": "2:09:36", "loss": 8.745048847198486}}
{"train": {"epochs_done": 7, "batches_seen": 7500, "examples_seen": 479937, "metrics": {"bleu": 0.0025}, "time_spent": "2:11:21", "loss": 8.755669250488282}}
{"train": {"epochs_done": 7, "batches_seen": 7600, "examples_seen": 486337, "metrics": {"bleu": 0.002}, "time_spent": "2:13:06", "loss": 8.758534488677979}}
{"train": {"epochs_done": 7, "batches_seen": 7700, "examples_seen": 492737, "metrics": {"bleu": 0.0026}, "time_spent": "2:14:51", "loss": 8.764473743438721}}
{"train": {"epochs_done": 7, "batches_seen": 7800, "examples_seen": 499137, "metrics": {"bleu": 0.0028}, "time_spent": "2:16:36", "loss": 8.772283163070679}}
{"train": {"epochs_done": 7, "batches_seen": 7900, "examples_seen": 505537, "metrics": {"bleu": 0.0031}, "time_spent": "2:18:21", "loss": 8.785279798507691}}
{"train": {"epochs_done": 7, "batches_seen": 8000, "examples_seen": 511937, "metrics": {"bleu": 0.0024}, "time_spent": "2:20:06", "loss": 8.807142934799195}}
{"train": {"epochs_done": 7, "batches_seen": 8100, "examples_seen": 518337, "metrics": {"bleu": 0.0019}, "time_spent": "2:21:51", "loss": 8.815911846160889}}
{"train": {"epochs_done": 7, "batches_seen": 8200, "examples_seen": 524737, "metrics": {"bleu": 0.0022}, "time_spent": "2:23:35", "loss": 8.79825382232666}}
{"train": {"epochs_done": 8, "batches_seen": 8300, "examples_seen": 531128, "metrics": {"bleu": 0.0024}, "time_spent": "2:25:21", "loss": 8.713072605133057}}
{"train": {"epochs_done": 8, "batches_seen": 8400, "examples_seen": 537528, "metrics": {"bleu": 0.0032}, "time_spent": "2:27:06", "loss": 8.701477375030517}}
{"train": {"epochs_done": 8, "batches_seen": 8500, "examples_seen": 543928, "metrics": {"bleu": 0.0031}, "time_spent": "2:28:50", "loss": 8.704274587631225}}
{"train": {"epochs_done": 8, "batches_seen": 8600, "examples_seen": 550328, "metrics": {"bleu": 0.0027}, "time_spent": "2:30:35", "loss": 8.730870761871337}}
{"train": {"epochs_done": 8, "batches_seen": 8700, "examples_seen": 556728, "metrics": {"bleu": 0.0029}, "time_spent": "2:32:20", "loss": 8.733579845428467}}
{"train": {"epochs_done": 8, "batches_seen": 8800, "examples_seen": 563128, "metrics": {"bleu": 0.0029}, "time_spent": "2:34:05", "loss": 8.742090883255004}}
{"train": {"epochs_done": 8, "batches_seen": 8900, "examples_seen": 569528, "metrics": {"bleu": 0.0024}, "time_spent": "2:35:50", "loss": 8.742289047241211}}
{"train": {"epochs_done": 8, "batches_seen": 9000, "examples_seen": 575928, "metrics": {"bleu": 0.003}, "time_spent": "2:37:35", "loss": 8.750054235458373}}
{"train": {"epochs_done": 8, "batches_seen": 9100, "examples_seen": 582328, "metrics": {"bleu": 0.0025}, "time_spent": "2:39:20", "loss": 8.765758790969848}}
{"train": {"epochs_done": 8, "batches_seen": 9200, "examples_seen": 588728, "metrics": {"bleu": 0.0029}, "time_spent": "2:41:05", "loss": 8.763631620407104}}
{"train": {"epochs_done": 9, "batches_seen": 9300, "examples_seen": 595119, "metrics": {"bleu": 0.0029}, "time_spent": "2:42:50", "loss": 8.696784067153931}}
{"train": {"epochs_done": 9, "batches_seen": 9400, "examples_seen": 601519, "metrics": {"bleu": 0.0027}, "time_spent": "2:44:35", "loss": 8.648366346359253}}
{"train": {"epochs_done": 9, "batches_seen": 9500, "examples_seen": 607919, "metrics": {"bleu": 0.0019}, "time_spent": "2:46:20", "loss": 8.66939359664917}}
{"train": {"epochs_done": 9, "batches_seen": 9600, "examples_seen": 614319, "metrics": {"bleu": 0.0023}, "time_spent": "2:48:05", "loss": 8.681710691452027}}
{"train": {"epochs_done": 9, "batches_seen": 9700, "examples_seen": 620719, "metrics": {"bleu": 0.0032}, "time_spent": "2:49:52", "loss": 8.667869997024535}}
{"train": {"epochs_done": 9, "batches_seen": 9800, "examples_seen": 627119, "metrics": {"bleu": 0.0029}, "time_spent": "2:51:40", "loss": 8.697077379226684}}
{"train": {"epochs_done": 9, "batches_seen": 9900, "examples_seen": 633519, "metrics": {"bleu": 0.0029}, "time_spent": "2:53:26", "loss": 8.709511318206786}}
{"train": {"epochs_done": 9, "batches_seen": 10000, "examples_seen": 639919, "metrics": {"bleu": 0.0032}, "time_spent": "2:55:11", "loss": 8.728211965560913}}
{"train": {"epochs_done": 9, "batches_seen": 10100, "examples_seen": 646319, "metrics": {"bleu": 0.0032}, "time_spent": "2:56:56", "loss": 8.715833854675292}}
{"train": {"epochs_done": 9, "batches_seen": 10200, "examples_seen": 652719, "metrics": {"bleu": 0.0028}, "time_spent": "2:58:41", "loss": 8.737465934753418}}
{"train": {"epochs_done": 10, "batches_seen": 10300, "examples_seen": 659110, "metrics": {"bleu": 0.0031}, "time_spent": "3:00:25", "loss": 8.711427669525147}}
{"train": {"epochs_done": 10, "batches_seen": 10400, "examples_seen": 665510, "metrics": {"bleu": 0.0025}, "time_spent": "3:02:10", "loss": 8.620444145202637}}
{"train": {"epochs_done": 10, "batches_seen": 10500, "examples_seen": 671910, "metrics": {"bleu": 0.0032}, "time_spent": "3:03:55", "loss": 8.619233655929566}}
{"train": {"epochs_done": 10, "batches_seen": 10600, "examples_seen": 678310, "metrics": {"bleu": 0.003}, "time_spent": "3:05:40", "loss": 8.641641664505006}}
{"train": {"epochs_done": 10, "batches_seen": 10700, "examples_seen": 684710, "metrics": {"bleu": 0.0025}, "time_spent": "3:07:25", "loss": 8.634737930297852}}
{"train": {"epochs_done": 10, "batches_seen": 10800, "examples_seen": 691110, "metrics": {"bleu": 0.0024}, "time_spent": "3:09:10", "loss": 8.645018320083619}}
{"train": {"epochs_done": 10, "batches_seen": 10900, "examples_seen": 697510, "metrics": {"bleu": 0.0033}, "time_spent": "3:10:55", "loss": 8.681361026763916}}
{"train": {"epochs_done": 10, "batches_seen": 11000, "examples_seen": 703910, "metrics": {"bleu": 0.0027}, "time_spent": "3:12:40", "loss": 8.698081283569335}}
{"train": {"epochs_done": 10, "batches_seen": 11100, "examples_seen": 710310, "metrics": {"bleu": 0.0029}, "time_spent": "3:14:25", "loss": 8.686662731170655}}
{"train": {"epochs_done": 10, "batches_seen": 11200, "examples_seen": 716710, "metrics": {"bleu": 0.0034}, "time_spent": "3:16:10", "loss": 8.69352520942688}}
{"train": {"epochs_done": 11, "batches_seen": 11300, "examples_seen": 723101, "metrics": {"bleu": 0.003}, "time_spent": "3:17:55", "loss": 8.706138372421265}}
{"train": {"epochs_done": 11, "batches_seen": 11400, "examples_seen": 729501, "metrics": {"bleu": 0.0029}, "time_spent": "3:19:40", "loss": 8.564194202423096}}
{"train": {"epochs_done": 11, "batches_seen": 11500, "examples_seen": 735901, "metrics": {"bleu": 0.0031}, "time_spent": "3:21:25", "loss": 8.581658983230591}}
{"train": {"epochs_done": 11, "batches_seen": 11600, "examples_seen": 742301, "metrics": {"bleu": 0.0027}, "time_spent": "3:23:10", "loss": 8.605223598480224}}
{"train": {"epochs_done": 11, "batches_seen": 11700, "examples_seen": 748701, "metrics": {"bleu": 0.0031}, "time_spent": "3:24:54", "loss": 8.62191987991333}}
{"train": {"epochs_done": 11, "batches_seen": 11800, "examples_seen": 755101, "metrics": {"bleu": 0.002}, "time_spent": "3:26:39", "loss": 8.631333122253418}}
{"train": {"epochs_done": 11, "batches_seen": 11900, "examples_seen": 761501, "metrics": {"bleu": 0.0028}, "time_spent": "3:28:24", "loss": 8.63178150177002}}
{"train": {"epochs_done": 11, "batches_seen": 12000, "examples_seen": 767901, "metrics": {"bleu": 0.0026}, "time_spent": "3:30:09", "loss": 8.648593225479125}}
{"train": {"epochs_done": 11, "batches_seen": 12100, "examples_seen": 774301, "metrics": {"bleu": 0.0026}, "time_spent": "3:31:54", "loss": 8.66093267440796}}
{"train": {"epochs_done": 11, "batches_seen": 12200, "examples_seen": 780701, "metrics": {"bleu": 0.0028}, "time_spent": "3:33:39", "loss": 8.659905166625977}}
{"train": {"epochs_done": 11, "batches_seen": 12300, "examples_seen": 787101, "metrics": {"bleu": 0.0023}, "time_spent": "3:35:24", "loss": 8.659852457046508}}
{"train": {"epochs_done": 12, "batches_seen": 12400, "examples_seen": 793492, "metrics": {"bleu": 0.0029}, "time_spent": "3:37:09", "loss": 8.568598175048828}}
{"train": {"epochs_done": 12, "batches_seen": 12500, "examples_seen": 799892, "metrics": {"bleu": 0.0025}, "time_spent": "3:38:54", "loss": 8.555956754684448}}
{"train": {"epochs_done": 12, "batches_seen": 12600, "examples_seen": 806292, "metrics": {"bleu": 0.0024}, "time_spent": "3:40:39", "loss": 8.560790424346925}}
{"train": {"epochs_done": 12, "batches_seen": 12700, "examples_seen": 812692, "metrics": {"bleu": 0.0021}, "time_spent": "3:42:24", "loss": 8.57647391319275}}
{"train": {"epochs_done": 12, "batches_seen": 12800, "examples_seen": 819092, "metrics": {"bleu": 0.0024}, "time_spent": "3:44:09", "loss": 8.59372670173645}}
{"train": {"epochs_done": 12, "batches_seen": 12900, "examples_seen": 825492, "metrics": {"bleu": 0.0027}, "time_spent": "3:45:55", "loss": 8.60395185470581}}
{"train": {"epochs_done": 12, "batches_seen": 13000, "examples_seen": 831892, "metrics": {"bleu": 0.0027}, "time_spent": "3:47:40", "loss": 8.603412675857545}}
{"train": {"epochs_done": 12, "batches_seen": 13100, "examples_seen": 838292, "metrics": {"bleu": 0.0027}, "time_spent": "3:49:25", "loss": 8.633497037887572}}
{"train": {"epochs_done": 12, "batches_seen": 13200, "examples_seen": 844692, "metrics": {"bleu": 0.0025}, "time_spent": "3:51:10", "loss": 8.618759517669679}}
{"train": {"epochs_done": 12, "batches_seen": 13300, "examples_seen": 851092, "metrics": {"bleu": 0.003}, "time_spent": "3:52:55", "loss": 8.625259075164795}}
{"train": {"epochs_done": 13, "batches_seen": 13400, "examples_seen": 857483, "metrics": {"bleu": 0.0022}, "time_spent": "3:54:40", "loss": 8.57796597480774}}
{"train": {"epochs_done": 13, "batches_seen": 13500, "examples_seen": 863883, "metrics": {"bleu": 0.0024}, "time_spent": "3:56:25", "loss": 8.518876609802247}}
{"train": {"epochs_done": 13, "batches_seen": 13600, "examples_seen": 870283, "metrics": {"bleu": 0.003}, "time_spent": "3:58:10", "loss": 8.530658340454101}}
{"train": {"epochs_done": 13, "batches_seen": 13700, "examples_seen": 876683, "metrics": {"bleu": 0.0028}, "time_spent": "3:59:55", "loss": 8.538879270553588}}
{"train": {"epochs_done": 13, "batches_seen": 13800, "examples_seen": 883083, "metrics": {"bleu": 0.002}, "time_spent": "4:01:40", "loss": 8.556218814849853}}
{"train": {"epochs_done": 13, "batches_seen": 13900, "examples_seen": 889483, "metrics": {"bleu": 0.0025}, "time_spent": "4:03:25", "loss": 8.577951164245606}}
{"train": {"epochs_done": 13, "batches_seen": 14000, "examples_seen": 895883, "metrics": {"bleu": 0.0027}, "time_spent": "4:05:10", "loss": 8.579949855804443}}
{"train": {"epochs_done": 13, "batches_seen": 14100, "examples_seen": 902283, "metrics": {"bleu": 0.0024}, "time_spent": "4:06:55", "loss": 8.574736528396606}}
{"train": {"epochs_done": 13, "batches_seen": 14200, "examples_seen": 908683, "metrics": {"bleu": 0.0019}, "time_spent": "4:08:40", "loss": 8.600758237838745}}
{"train": {"epochs_done": 13, "batches_seen": 14300, "examples_seen": 915083, "metrics": {"bleu": 0.0023}, "time_spent": "4:10:25", "loss": 8.592985925674439}}
{"train": {"epochs_done": 14, "batches_seen": 14400, "examples_seen": 921474, "metrics": {"bleu": 0.0024}, "time_spent": "4:12:10", "loss": 8.57361834526062}}
{"train": {"epochs_done": 14, "batches_seen": 14500, "examples_seen": 927874, "metrics": {"bleu": 0.0024}, "time_spent": "4:13:55", "loss": 8.482298250198363}}
{"train": {"epochs_done": 14, "batches_seen": 14600, "examples_seen": 934274, "metrics": {"bleu": 0.0028}, "time_spent": "4:15:40", "loss": 8.501545705795287}}
{"train": {"epochs_done": 14, "batches_seen": 14700, "examples_seen": 940674, "metrics": {"bleu": 0.0022}, "time_spent": "4:17:24", "loss": 8.518573694229126}}
{"train": {"epochs_done": 14, "batches_seen": 14800, "examples_seen": 947074, "metrics": {"bleu": 0.0019}, "time_spent": "4:19:09", "loss": 8.514998598098755}}
{"train": {"epochs_done": 14, "batches_seen": 14900, "examples_seen": 953474, "metrics": {"bleu": 0.0026}, "time_spent": "4:20:54", "loss": 8.53709023475647}}
{"train": {"epochs_done": 14, "batches_seen": 15000, "examples_seen": 959874, "metrics": {"bleu": 0.0021}, "time_spent": "4:22:39", "loss": 8.551319189071656}}
{"train": {"epochs_done": 14, "batches_seen": 15100, "examples_seen": 966274, "metrics": {"bleu": 0.0022}, "time_spent": "4:24:26", "loss": 8.549151849746703}}
{"train": {"epochs_done": 14, "batches_seen": 15200, "examples_seen": 972674, "metrics": {"bleu": 0.0027}, "time_spent": "4:26:11", "loss": 8.545798206329346}}
{"train": {"epochs_done": 14, "batches_seen": 15300, "examples_seen": 979074, "metrics": {"bleu": 0.0022}, "time_spent": "4:27:56", "loss": 8.577144899368285}}
{"train": {"epochs_done": 14, "batches_seen": 15400, "examples_seen": 985474, "metrics": {"bleu": 0.0023}, "time_spent": "4:29:41", "loss": 8.58063648223877}}
{"train": {"epochs_done": 15, "batches_seen": 15500, "examples_seen": 991865, "metrics": {"bleu": 0.0026}, "time_spent": "4:31:26", "loss": 8.466574754714966}}
{"train": {"epochs_done": 15, "batches_seen": 15600, "examples_seen": 998265, "metrics": {"bleu": 0.0024}, "time_spent": "4:33:11", "loss": 8.464899950027466}}
{"train": {"epochs_done": 15, "batches_seen": 15700, "examples_seen": 1004665, "metrics": {"bleu": 0.0023}, "time_spent": "4:34:56", "loss": 8.470347948074341}}
{"train": {"epochs_done": 15, "batches_seen": 15800, "examples_seen": 1011065, "metrics": {"bleu": 0.0021}, "time_spent": "4:36:41", "loss": 8.491881122589112}}
{"train": {"epochs_done": 15, "batches_seen": 15900, "examples_seen": 1017465, "metrics": {"bleu": 0.0026}, "time_spent": "4:38:26", "loss": 8.50128631591797}}
{"train": {"epochs_done": 15, "batches_seen": 16000, "examples_seen": 1023865, "metrics": {"bleu": 0.0021}, "time_spent": "4:40:11", "loss": 8.523517770767212}}
{"train": {"epochs_done": 15, "batches_seen": 16100, "examples_seen": 1030265, "metrics": {"bleu": 0.0028}, "time_spent": "4:41:56", "loss": 8.511568994522095}}
{"train": {"epochs_done": 15, "batches_seen": 16200, "examples_seen": 1036665, "metrics": {"bleu": 0.0021}, "time_spent": "4:43:41", "loss": 8.529848451614379}}
{"train": {"epochs_done": 15, "batches_seen": 16300, "examples_seen": 1043065, "metrics": {"bleu": 0.0026}, "time_spent": "4:45:27", "loss": 8.54656413078308}}
{"train": {"epochs_done": 15, "batches_seen": 16400, "examples_seen": 1049465, "metrics": {"bleu": 0.0017}, "time_spent": "4:47:12", "loss": 8.547988777160645}}
{"train": {"epochs_done": 16, "batches_seen": 16500, "examples_seen": 1055856, "metrics": {"bleu": 0.002}, "time_spent": "4:48:57", "loss": 8.462585935592651}}
{"train": {"epochs_done": 16, "batches_seen": 16600, "examples_seen": 1062256, "metrics": {"bleu": 0.0022}, "time_spent": "4:50:42", "loss": 8.430262384414673}}
{"train": {"epochs_done": 16, "batches_seen": 16700, "examples_seen": 1068656, "metrics": {"bleu": 0.0028}, "time_spent": "4:52:27", "loss": 8.462433471679688}}
{"train": {"epochs_done": 16, "batches_seen": 16800, "examples_seen": 1075056, "metrics": {"bleu": 0.0022}, "time_spent": "4:54:12", "loss": 8.467537298202515}}
{"train": {"epochs_done": 16, "batches_seen": 16900, "examples_seen": 1081456, "metrics": {"bleu": 0.0018}, "time_spent": "4:55:57", "loss": 8.469555521011353}}
{"train": {"epochs_done": 16, "batches_seen": 17000, "examples_seen": 1087856, "metrics": {"bleu": 0.0018}, "time_spent": "4:57:42", "loss": 8.487376527786255}}
{"train": {"epochs_done": 16, "batches_seen": 17100, "examples_seen": 1094256, "metrics": {"bleu": 0.0016}, "time_spent": "4:59:27", "loss": 8.498280239105224}}
{"train": {"epochs_done": 16, "batches_seen": 17200, "examples_seen": 1100656, "metrics": {"bleu": 0.002}, "time_spent": "5:01:12", "loss": 8.490436048507691}}
{"train": {"epochs_done": 16, "batches_seen": 17300, "examples_seen": 1107056, "metrics": {"bleu": 0.0022}, "time_spent": "5:02:57", "loss": 8.512251224517822}}
{"train": {"epochs_done": 16, "batches_seen": 17400, "examples_seen": 1113456, "metrics": {"bleu": 0.0025}, "time_spent": "5:04:43", "loss": 8.517051076889038}}
{"train": {"epochs_done": 17, "batches_seen": 17500, "examples_seen": 1119847, "metrics": {"bleu": 0.002}, "time_spent": "5:06:27", "loss": 8.47082347869873}}
{"train": {"epochs_done": 17, "batches_seen": 17600, "examples_seen": 1126247, "metrics": {"bleu": 0.002}, "time_spent": "5:08:13", "loss": 8.405154647827148}}
{"train": {"epochs_done": 17, "batches_seen": 17700, "examples_seen": 1132647, "metrics": {"bleu": 0.0017}, "time_spent": "5:09:58", "loss": 8.430734634399414}}
{"train": {"epochs_done": 17, "batches_seen": 17800, "examples_seen": 1139047, "metrics": {"bleu": 0.0017}, "time_spent": "5:11:43", "loss": 8.442477540969849}}
{"train": {"epochs_done": 17, "batches_seen": 17900, "examples_seen": 1145447, "metrics": {"bleu": 0.0019}, "time_spent": "5:13:28", "loss": 8.437131967544556}}
{"train": {"epochs_done": 17, "batches_seen": 18000, "examples_seen": 1151847, "metrics": {"bleu": 0.0019}, "time_spent": "5:15:13", "loss": 8.441809358596801}}
{"train": {"epochs_done": 17, "batches_seen": 18100, "examples_seen": 1158247, "metrics": {"bleu": 0.0016}, "time_spent": "5:16:58", "loss": 8.461752138137818}}
{"train": {"epochs_done": 17, "batches_seen": 18200, "examples_seen": 1164647, "metrics": {"bleu": 0.0018}, "time_spent": "5:18:43", "loss": 8.462452878952027}}
{"train": {"epochs_done": 17, "batches_seen": 18300, "examples_seen": 1171047, "metrics": {"bleu": 0.0019}, "time_spent": "5:20:28", "loss": 8.489460515975953}}
{"train": {"epochs_done": 17, "batches_seen": 18400, "examples_seen": 1177447, "metrics": {"bleu": 0.0017}, "time_spent": "5:22:14", "loss": 8.478007984161376}}
{"train": {"epochs_done": 18, "batches_seen": 18500, "examples_seen": 1183838, "metrics": {"bleu": 0.0016}, "time_spent": "5:23:59", "loss": 8.477941026687622}}
{"train": {"epochs_done": 18, "batches_seen": 18600, "examples_seen": 1190238, "metrics": {"bleu": 0.0015}, "time_spent": "5:25:44", "loss": 8.385605058670045}}
{"train": {"epochs_done": 18, "batches_seen": 18700, "examples_seen": 1196638, "metrics": {"bleu": 0.0017}, "time_spent": "5:27:29", "loss": 8.38108395576477}}
{"train": {"epochs_done": 18, "batches_seen": 18800, "examples_seen": 1203038, "metrics": {"bleu": 0.0019}, "time_spent": "5:29:14", "loss": 8.403089227676391}}
{"train": {"epochs_done": 18, "batches_seen": 18900, "examples_seen": 1209438, "metrics": {"bleu": 0.0014}, "time_spent": "5:30:59", "loss": 8.416984882354736}}
{"train": {"epochs_done": 18, "batches_seen": 19000, "examples_seen": 1215838, "metrics": {"bleu": 0.002}, "time_spent": "5:32:44", "loss": 8.441798295974731}}
{"train": {"epochs_done": 18, "batches_seen": 19100, "examples_seen": 1222238, "metrics": {"bleu": 0.0016}, "time_spent": "5:34:29", "loss": 8.427223596572876}}
{"train": {"epochs_done": 18, "batches_seen": 19200, "examples_seen": 1228638, "metrics": {"bleu": 0.0023}, "time_spent": "5:36:15", "loss": 8.445520982742309}}
{"train": {"epochs_done": 18, "batches_seen": 19300, "examples_seen": 1235038, "metrics": {"bleu": 0.0022}, "time_spent": "5:38:00", "loss": 8.456453838348388}}
{"train": {"epochs_done": 18, "batches_seen": 19400, "examples_seen": 1241438, "metrics": {"bleu": 0.0021}, "time_spent": "5:39:45", "loss": 8.449989824295043}}
{"train": {"epochs_done": 18, "batches_seen": 19500, "examples_seen": 1247838, "metrics": {"bleu": 0.0018}, "time_spent": "5:41:30", "loss": 8.461289339065551}}
{"train": {"epochs_done": 19, "batches_seen": 19600, "examples_seen": 1254229, "metrics": {"bleu": 0.0022}, "time_spent": "5:43:15", "loss": 8.371300048828125}}
{"train": {"epochs_done": 19, "batches_seen": 19700, "examples_seen": 1260629, "metrics": {"bleu": 0.0021}, "time_spent": "5:45:01", "loss": 8.360338172912599}}
{"train": {"epochs_done": 19, "batches_seen": 19800, "examples_seen": 1267029, "metrics": {"bleu": 0.0018}, "time_spent": "5:46:46", "loss": 8.369845895767211}}
{"train": {"epochs_done": 19, "batches_seen": 19900, "examples_seen": 1273429, "metrics": {"bleu": 0.0018}, "time_spent": "5:48:31", "loss": 8.383654594421387}}
{"train": {"epochs_done": 19, "batches_seen": 20000, "examples_seen": 1279829, "metrics": {"bleu": 0.0014}, "time_spent": "5:50:16", "loss": 8.37829418182373}}
{"train": {"epochs_done": 19, "batches_seen": 20100, "examples_seen": 1286229, "metrics": {"bleu": 0.0012}, "time_spent": "5:52:01", "loss": 8.411508951187134}}
{"train": {"epochs_done": 19, "batches_seen": 20200, "examples_seen": 1292629, "metrics": {"bleu": 0.0017}, "time_spent": "5:53:46", "loss": 8.399707021713256}}
{"train": {"epochs_done": 19, "batches_seen": 20300, "examples_seen": 1299029, "metrics": {"bleu": 0.0015}, "time_spent": "5:55:31", "loss": 8.432512350082398}}
{"train": {"epochs_done": 19, "batches_seen": 20400, "examples_seen": 1305429, "metrics": {"bleu": 0.0017}, "time_spent": "5:57:16", "loss": 8.444122066497803}}
{"train": {"epochs_done": 19, "batches_seen": 20500, "examples_seen": 1311829, "metrics": {"bleu": 0.002}, "time_spent": "5:59:01", "loss": 8.445026168823242}}
2018-10-17 07:41:32.493 INFO in 'deeppavlov.core.commands.train'['train'] at line 511: Saving model
2018-10-17 07:41:32.497 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 52: [saving model to C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_WORKING_DIR\model]
2018-10-17 07:41:33.131 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 97: [loading vocabulary from C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_WORKING_DIR\vocab.dict]
2018-10-17 07:41:37.957 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 43: [loading model from C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_WORKING_DIR\model]
INFO:tensorflow:Restoring parameters from C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_WORKING_DIR\model
2018-10-17 07:41:37.976 INFO in 'tensorflow'['tf_logging'] at line 115: Restoring parameters from C:\Users\CAU\AppData\Roaming\Python\Python36\site-packages\download\YOUR_PATH_TO_WORKING_DIR\model
2018-10-17 07:41:38.96 INFO in 'deeppavlov.core.commands.train'['train'] at line 195: Testing the best saved model
{"valid": {"eval_examples_count": 7801, "metrics": {"bleu": 0.0011}, "time_spent": "0:00:20"}}
{"test": {"eval_examples_count": 7512, "metrics": {"bleu": 0.0012}, "time_spent": "0:00:18"}}
{'test': OrderedDict([('bleu', 0.0012)]),
 'valid': OrderedDict([('bleu', 0.0011)])}

To improve the model you can try to use multilayer (use MultiRNNCell) encoder and decoder, try to use attention with trainable parameters (not dot product scoring function).