# shorturl.at/pzBR7
# TG: @YallenGusev
# E-mail: ilya.gusev@phystech.edu


Основано на: https://github.com/DanAnastasyev/DeepNLP-Course Week 12

In [None]:
!git clone https://github.com/MiuLab/SlotGated-SLU.git
!wget -qq https://raw.githubusercontent.com/yandexdataschool/nlp_course/master/week08_multitask/conlleval.py

fatal: destination path 'SlotGated-SLU' already exists and is not an empty directory.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
DEVICE = torch.device('cuda')

# Диалоговые системы

Диалоговые системы делятся на два типа - *goal-orientied* и *general conversation*.

**General conversation** - это болталка, разговор на свободную тему:  
<img src="https://i.ibb.co/bFwwGpc/alice.jpg" width="200"/>

Сегодня будем говорить не про них, а про **goal-orientied** системы:

<img src="https://hsto.org/webt/gj/3y/xl/gj3yxlqbr7ujuqr9r2akacxmkee.jpeg" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Пользователь говорит что-то, это что-то распознается. По распознанному определяется - что, где и когда он хотел. Дальше диалоговый движок решает, действительно ли пользователь знает, чего хотел попросить. Происходит поход в источники - узнать информацию, которую (кажется) запросил пользователь. Исходя из всего этого генерируется некоторый ответ:

<img src="https://i.ibb.co/8XcdpJ7/goal-orientied.png" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Будем учить ту часть, которая посередине - классификатор и теггер. Всё остальное обычно - эвристики и захардкоженные ответы.

## Данные

Есть условно стандартный датасет - atis, который неприлично маленький, на самом деле.

К нему можно взять еще датасет snips - он больше и разнообразнее.

Оба датасета возьмем из репозитория статьи [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://aclweb.org/anthology/N18-2118).

Начнем с atis.

In [None]:
import os 

def read_dataset(path):
    with open(os.path.join(path, 'seq.in')) as f_words, \
            open(os.path.join(path, 'seq.out')) as f_tags, \
            open(os.path.join(path, 'label')) as f_intents:
        
        return [
            (words.strip().split(), tags.strip().split(), intent.strip()) 
            for words, tags, intent in zip(f_words, f_tags, f_intents)
        ]

In [None]:
train_data = read_dataset('SlotGated-SLU/data/atis/train/')
val_data = read_dataset('SlotGated-SLU/data/atis/valid/')
test_data = read_dataset('SlotGated-SLU/data/atis/test/')

In [None]:
intent_to_example = {example[2]: example for example in train_data}
for example in intent_to_example.values():
    print('Intent:\t', example[2])
    print('Text:\t', '\t'.join(example[0]))
    print('Tags:\t', '\t'.join(example[1]))
    print()

Intent:	 atis_flight
Text:	 is	there	a	delta	flight	from	denver	to	san	francisco
Tags:	 O	O	O	B-airline_name	O	O	B-fromloc.city_name	O	B-toloc.city_name	I-toloc.city_name

Intent:	 atis_airfare
Text:	 what	is	the	most	expensive	one	way	fare	from	boston	to	atlanta	on	american	airlines
Tags:	 O	O	O	B-cost_relative	I-cost_relative	B-round_trip	I-round_trip	O	O	B-fromloc.city_name	O	B-toloc.city_name	O	B-airline_name	I-airline_name

Intent:	 atis_airline
Text:	 list	airlines	serving	between	denver	and	san	francisco
Tags:	 O	O	O	O	B-fromloc.city_name	O	B-toloc.city_name	I-toloc.city_name

Intent:	 atis_ground_service
Text:	 tell	me	about	ground	transportation	between	orlando	international	and	orlando
Tags:	 O	O	O	O	O	O	B-fromloc.airport_name	I-fromloc.airport_name	O	B-toloc.city_name

Intent:	 atis_quantity
Text:	 how	many	airlines	have	flights	with	service	class	yn
Tags:	 O	O	O	O	O	O	O	O	B-fare_basis_code

Intent:	 atis_city
Text:	 where	is	lester	pearson	airport
Tags:	 O	O	B-airport_name	

In [None]:
from torchtext.data import Field, LabelField, Example, Dataset, BucketIterator

tokens_field = Field()
tags_field = Field(unk_token=None)
intent_field = LabelField()

fields = [('tokens', tokens_field), ('tags', tags_field), ('intent', intent_field)]

train_dataset = Dataset([Example.fromlist(example, fields) for example in train_data], fields)
val_dataset = Dataset([Example.fromlist(example, fields) for example in val_data], fields)
test_dataset = Dataset([Example.fromlist(example, fields) for example in test_data], fields)

tokens_field.build_vocab(train_dataset)
tags_field.build_vocab(train_dataset)
intent_field.build_vocab(train_dataset)

print('Vocab size =', len(tokens_field.vocab))
print('Tags count =', len(tags_field.vocab))
print('Intents count =', len(intent_field.vocab))

train_iter, val_iter, test_iter = BucketIterator.splits(
    datasets=(train_dataset, val_dataset, test_dataset), batch_sizes=(32, 128, 128), 
    shuffle=True, device=DEVICE, sort=False
)

Vocab size = 869
Tags count = 121
Intents count = 21


То же самое со snips

In [None]:
snips_train_data = read_dataset('SlotGated-SLU/data/snips/train/')
snips_val_data = read_dataset('SlotGated-SLU/data/snips/valid/')
snips_test_data = read_dataset('SlotGated-SLU/data/snips/test/')
snips_intent_to_example = {example[2]: example for example in snips_train_data}
for example in snips_intent_to_example.values():
    print('Intent:\t', example[2])
    print('Text:\t', '\t'.join(example[0]))
    print('Tags:\t', '\t'.join(example[1]))
    print()

Intent:	 PlayMusic
Text:	 play	funky	heavy	bluesy
Tags:	 O	B-playlist	I-playlist	I-playlist

Intent:	 AddToPlaylist
Text:	 add	gabrial	mcnair	to	my	love	in	paris	list
Tags:	 O	B-artist	I-artist	O	B-playlist_owner	B-playlist	I-playlist	I-playlist	O

Intent:	 RateBook
Text:	 rate	richard	carvel	4	out	of	6
Tags:	 O	B-object_name	I-object_name	B-rating_value	O	O	B-best_rating

Intent:	 SearchScreeningEvent
Text:	 can	i	get	the	movie	schedule	for	loews	cineplex	entertainment
Tags:	 O	O	O	O	B-object_type	I-object_type	O	B-location_name	I-location_name	I-location_name

Intent:	 BookRestaurant
Text:	 i	want	to	eat	choucroute	at	a	brasserie	for	8
Tags:	 O	O	O	O	B-served_dish	O	O	B-restaurant_type	O	B-party_size_number

Intent:	 GetWeather
Text:	 tell	me	when	it	ll	be	chillier	in	cavalero	corner	id
Tags:	 O	O	O	O	O	O	B-condition_temperature	O	B-city	I-city	B-state

Intent:	 SearchCreativeWork
Text:	 go	to	the	photograph	the	inflated	tear
Tags:	 O	O	O	B-object_type	B-object_name	I-object_name	I-o

In [None]:
from torchtext.data import Field, LabelField, Example, Dataset, BucketIterator

snips_tokens_field = Field()
snips_tags_field = Field(unk_token=None)
snips_intent_field = LabelField()

fields = [('tokens', snips_tokens_field), ('tags', snips_tags_field), ('intent', snips_intent_field)]

snips_train_dataset = Dataset([Example.fromlist(example, fields) for example in snips_train_data], fields)
snips_val_dataset = Dataset([Example.fromlist(example, fields) for example in snips_val_data], fields)
snips_test_dataset = Dataset([Example.fromlist(example, fields) for example in snips_test_data], fields)

snips_tokens_field.build_vocab(snips_train_dataset)
snips_tags_field.build_vocab(snips_train_dataset)
snips_intent_field.build_vocab(snips_train_dataset)

print('Vocab size =', len(snips_tokens_field.vocab))
print('Tags count =', len(snips_tags_field.vocab))
print('Intents count =', len(snips_intent_field.vocab))

snips_train_iter, snips_val_iter, snips_test_iter = BucketIterator.splits(
    datasets=(snips_train_dataset, snips_val_dataset, snips_test_dataset), batch_sizes=(32, 128, 128), 
    shuffle=True, device=DEVICE, sort=False
)

Vocab size = 11420
Tags count = 73
Intents count = 7


## Классификатор интентов

Начнем с классификатора: к какому интенту относится данный запрос.

Ничего умного - берём rnn'ку и учимся предсказывать метки-интенты.

In [None]:
class IntentClassifierModel(nn.Module):
    def __init__(self, vocab_size, intents_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer = nn.Linear(lstm_hidden_dim * 2, intents_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        output, (final_hidden_state, _) = self.lstm_layer(projections)
        hidden = self.dropout(torch.cat((final_hidden_state[0], final_hidden_state[1]), dim=1))
        output = self.out_layer.forward(hidden)
        return output

In [None]:
class ModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count
        )
        
    def on_batch(self, batch):
        logits = self.model(batch.tokens.transpose(0, 1))
        loss = self.criterion(logits, batch.intent)
        predicted_intent = torch.max(logits, axis=1)[1]
        self.total_count += predicted_intent.size(0)
        self.correct_count += torch.sum(predicted_intent == batch.intent).item()
        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [None]:
import math
from tqdm import tqdm
tqdm.get_lock().locks = []


def do_epoch(trainer, data_iter, is_train, name=None):
    trainer.on_epoch_begin(is_train, name, batches_count=len(data_iter))
    
    with torch.autograd.set_grad_enabled(is_train):
        with tqdm(total=trainer.batches_count) as progress_bar:
            for i, batch in enumerate(data_iter):
                batch_progress = trainer.on_batch(batch)

                progress_bar.update()
                progress_bar.set_description(batch_progress)
                
            epoch_progress = trainer.on_epoch_end()
            progress_bar.set_description(epoch_progress)
            progress_bar.refresh()

            
def fit(trainer, train_iter, epochs_count=1, val_iter=None):
    best_val_loss = None
    for epoch in range(epochs_count):
        name_prefix = '[{} / {}] '.format(epoch + 1, epochs_count)
        do_epoch(trainer, train_iter, is_train=True, name=name_prefix + 'Train:')
        
        if not val_iter is None:
            do_epoch(trainer, val_iter, is_train=False, name=name_prefix + '  Val:')

In [None]:
model = IntentClassifierModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = ModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 0.87404, Accuracy = 79.19%: 100%|██████████| 140/140 [00:01<00:00, 80.54it/s]
[1 / 30]   Val: Loss = 0.58326, Accuracy = 84.80%: 100%|██████████| 4/4 [00:00<00:00, 126.56it/s]
[2 / 30] Train: Loss = 0.35541, Accuracy = 91.07%: 100%|██████████| 140/140 [00:01<00:00, 112.27it/s]
[2 / 30]   Val: Loss = 0.34589, Accuracy = 90.40%: 100%|██████████| 4/4 [00:00<00:00, 127.95it/s]
[3 / 30] Train: Loss = 0.22426, Accuracy = 94.31%: 100%|██████████| 140/140 [00:01<00:00, 105.39it/s]
[3 / 30]   Val: Loss = 0.25908, Accuracy = 92.80%: 100%|██████████| 4/4 [00:00<00:00, 98.32it/s] 
[4 / 30] Train: Loss = 0.14888, Accuracy = 96.45%: 100%|██████████| 140/140 [00:01<00:00, 112.06it/s]
[4 / 30]   Val: Loss = 0.19172, Accuracy = 95.60%: 100%|██████████| 4/4 [00:00<00:00, 130.37it/s]
[5 / 30] Train: Loss = 0.10044, Accuracy = 97.68%: 100%|██████████| 140/140 [00:01<00:00, 105.45it/s]
[5 / 30]   Val: Loss = 0.19198, Accuracy = 95.60%: 100%|██████████| 4/4 [00:00<00:00, 105.77it/s]
[

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.33216, Accuracy = 95.07%: 100%|██████████| 7/7 [00:00<00:00, 129.41it/s]


## Теггер

![](https://commons.bmstu.wiki/images/0/00/NER1.png)  
*From [NER](https://ru.bmstu.wiki/NER_(Named-Entity_Recognition)*

#### **Задание 1.1**
Напишите простой теггер

In [None]:
class TokenTaggerModel(nn.Module):
    def __init__(self, vocab_size, tags_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer = nn.Linear(lstm_hidden_dim * 2, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        output, _ = self.lstm_layer(projections)
        output = self.dropout(output)
        output = self.out_layer.forward(output)
        return output

#### **Задание 1.2**
Обновите `ModelTrainer`: считать нужно всё те же лосс и accuracy, только теперь немного по-другому.

In [None]:
class TagModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count
        )
        
    def on_batch(self, batch):
        logits = self.model(batch.tokens.transpose(0, 1))
        true_tags = batch.tags.transpose(0, 1)
        loss = self.criterion(logits.transpose(1, 2), true_tags)
        predicted_tags = logits.max(axis=2)[1]
        self.correct_count += torch.sum(true_tags == predicted_tags).item() - torch.sum(true_tags == 0).item()
        self.total_count += torch.sum(true_tags != 0).item()
        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [None]:
model = TokenTaggerModel(vocab_size=len(tokens_field.vocab), tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = TagModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 1.03919, Accuracy = 64.03%: 100%|██████████| 140/140 [00:01<00:00, 106.05it/s]
[1 / 30]   Val: Loss = 0.34103, Accuracy = 80.67%: 100%|██████████| 4/4 [00:00<00:00, 99.23it/s] 
[2 / 30] Train: Loss = 0.27908, Accuracy = 88.28%: 100%|██████████| 140/140 [00:01<00:00, 109.38it/s]
[2 / 30]   Val: Loss = 0.14839, Accuracy = 91.39%: 100%|██████████| 4/4 [00:00<00:00, 123.02it/s]
[3 / 30] Train: Loss = 0.13520, Accuracy = 94.30%: 100%|██████████| 140/140 [00:01<00:00, 108.23it/s]
[3 / 30]   Val: Loss = 0.09235, Accuracy = 94.56%: 100%|██████████| 4/4 [00:00<00:00, 106.35it/s]
[4 / 30] Train: Loss = 0.08534, Accuracy = 96.41%: 100%|██████████| 140/140 [00:01<00:00, 112.10it/s]
[4 / 30]   Val: Loss = 0.07241, Accuracy = 96.05%: 100%|██████████| 4/4 [00:00<00:00, 114.98it/s]
[5 / 30] Train: Loss = 0.05922, Accuracy = 97.66%: 100%|██████████| 140/140 [00:01<00:00, 109.40it/s]
[5 / 30]   Val: Loss = 0.05807, Accuracy = 96.90%: 100%|██████████| 4/4 [00:00<00:00, 107.36it/s]


In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.06674, Accuracy = 97.49%: 100%|██████████| 7/7 [00:00<00:00, 122.93it/s]


In [None]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1)).transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 93.65%, Recall = 94.28%, F1 = 93.97%


## Multi-task learning

Реализуем модель, которая умеет сразу и предсказывать теги и интенты. Идея в том, что в этом всем есть общая информация, которая должна помочь как одной, так и другой задаче: зная интент, можно понять, какие слоты вообще могут быть, а зная слоты, можно угадать и интент.

#### **Задание 2.1**
Реализуйте объединенную модель.

In [None]:
class SharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=300,
                 lstm_hidden_dim=256, num_layers=2, dropout_p=0.3):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer_intent = nn.Linear(lstm_hidden_dim * 2, intents_count)
        self.out_layer_tags = nn.Linear(lstm_hidden_dim * 2, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        output, (hidden, _) = self.lstm_layer(projections)
        hidden = torch.cat((hidden[0], hidden[1]), dim=1)
        
        output = self.dropout(output)
        hidden = self.dropout(hidden)
        intent_output = self.out_layer_intent.forward(hidden)
        tags_output = self.out_layer_tags.forward(output)
        return tags_output, intent_output

#### **Задание 2.2**
Допишите SharedModelTrainer

In [None]:
class SharedModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tags_correct_count, self.tags_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.tags_correct_count / self.tags_total_count, 
            self.intent_correct_count / self.intent_total_count
        )
        
    def on_batch(self, batch):
        tags_logits, intent_logits = self.model(batch.tokens.transpose(0, 1))
        true_tags = batch.tags.transpose(0, 1)
        true_intent = batch.intent
        tags_loss = self.criterion(tags_logits.transpose(1, 2), true_tags)
        intent_loss = self.criterion(intent_logits, true_intent)
        loss = tags_loss + intent_loss
        predicted_tags = tags_logits.max(axis=2)[1]
        predicted_intent = intent_logits.max(axis=1)[1]
        self.tags_correct_count += torch.sum(true_tags == predicted_tags).item() - torch.sum(true_tags == 0).item()
        self.tags_total_count += torch.sum(true_tags != 0).item()
        self.intent_correct_count += torch.sum(true_intent == predicted_intent).item()
        self.intent_total_count += true_intent.size(0)
        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [None]:
model = SharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                    tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = SharedModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 1.23851, Tags accuracy = 74.96%, Intents accuracy = 85.98%: 100%|██████████| 140/140 [00:02<00:00, 67.39it/s]
[1 / 30]   Val: Loss = 0.43134, Tags accuracy = 91.27%, Intents accuracy = 92.60%: 100%|██████████| 4/4 [00:00<00:00, 79.37it/s]
[2 / 30] Train: Loss = 0.27633, Tags accuracy = 95.00%, Intents accuracy = 96.34%: 100%|██████████| 140/140 [00:02<00:00, 69.24it/s]
[2 / 30]   Val: Loss = 0.24991, Tags accuracy = 96.58%, Intents accuracy = 94.80%: 100%|██████████| 4/4 [00:00<00:00, 78.76it/s]
[3 / 30] Train: Loss = 0.11113, Tags accuracy = 97.91%, Intents accuracy = 98.84%: 100%|██████████| 140/140 [00:01<00:00, 71.04it/s]
[3 / 30]   Val: Loss = 0.13813, Tags accuracy = 97.79%, Intents accuracy = 97.60%: 100%|██████████| 4/4 [00:00<00:00, 72.59it/s]
[4 / 30] Train: Loss = 0.05905, Tags accuracy = 98.95%, Intents accuracy = 99.15%: 100%|██████████| 140/140 [00:01<00:00, 71.40it/s]
[4 / 30]   Val: Loss = 0.14756, Tags accuracy = 98.12%, Intents accuracy = 97.40%

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.26782, Tags accuracy = 97.58%, Intents accuracy = 96.75%: 100%|██████████| 7/7 [00:00<00:00, 64.24it/s]


In [None]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            true = batch.tags.transpose(0, 1).cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in true])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 94.59%, Recall = 94.42%, F1 = 94.51%


 ## Асинхронное обучение

Идея описана в статье [A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling](http://aclweb.org/anthology/N18-2050).

<img src="https://i.ibb.co/qrgVSqF/2018-11-27-2-11-17.png" width="600"/>

Основное отличие от того, что уже реализовали в том, в каком порядке все оптимизируется. Вместо объединенного обучения всех слоев, сети для теггера и для классификатора обучаются отдельно.

На каждом шаге обучения генерируются последовательности скрытых состояний $h^1$ и $h^2$ - для классификатора и для теггера.

Дальше сначала считаются потери от предсказания интента и делается шаг оптимизатора, а затем потери от предсказания теггов - и опять шаг оптимизатора.

#### **Задание 3.1**
Реализуйте асинхронное обучение совместной модели

In [None]:
class AsyncSharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=300,
                 lstm_hidden_dim=256, num_layers=2, dropout_p=0.3):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.inner_lstm_layer_tags = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                            bidirectional=True, num_layers=num_layers)
        self.inner_lstm_layer_intent = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                               bidirectional=True, num_layers=num_layers)
        self.outer_lstm_layer_tags = nn.LSTM(lstm_hidden_dim * 4, lstm_hidden_dim, batch_first=True,
                                             num_layers=1)
        self.outer_lstm_layer_intent = nn.LSTM(lstm_hidden_dim * 4, lstm_hidden_dim, batch_first=True,
                                               num_layers=1)
        self.out_layer_intent = nn.Linear(lstm_hidden_dim, intents_count)
        self.out_layer_tags = nn.Linear(lstm_hidden_dim, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        h_intent, _ = self.inner_lstm_layer_intent(projections)
        h_tags, _ = self.inner_lstm_layer_tags(projections)
        h = torch.cat((h_intent, h_tags), dim=2)
        tags_output, _ = self.outer_lstm_layer_tags(h)
        _, (hidden, _) = self.outer_lstm_layer_intent(h)
        intent_output = hidden[-1]
        tags_output = self.dropout(tags_output)
        intent_output = self.dropout(intent_output)
        intent_output = self.out_layer_intent.forward(intent_output)
        tags_output = self.out_layer_tags.forward(tags_output)
        return tags_output, intent_output

In [None]:
class AsyncSharedModelTrainer():
    def __init__(self, model, criterion, tags_optimizer, intent_optimizer):
        self.model = model
        self.criterion = criterion
        self.tags_optimizer = tags_optimizer
        self.intent_optimizer = intent_optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tags_correct_count, self.tags_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.tags_correct_count / self.tags_total_count, 
            self.intent_correct_count / self.intent_total_count
        )
        
    def on_batch(self, batch):
        tags_logits, intent_logits = self.model(batch.tokens.transpose(0, 1))
        true_tags = batch.tags.transpose(0, 1)
        true_intent = batch.intent
        tags_loss = self.criterion(tags_logits.transpose(1, 2), true_tags)
        intent_loss = self.criterion(intent_logits, true_intent)
        predicted_tags = tags_logits.max(axis=2)[1]
        predicted_intent = intent_logits.max(axis=1)[1]
        self.tags_correct_count += torch.sum(true_tags == predicted_tags).item() - torch.sum(true_tags == 0).item()
        self.tags_total_count += torch.sum(true_tags != 0).item()
        self.intent_correct_count += torch.sum(true_intent == predicted_intent).item()
        self.intent_total_count += true_intent.size(0)
        if self.is_train:
            intent_loss.backward(retain_graph=True)
            self.intent_optimizer.step()
            self.intent_optimizer.zero_grad()
            tags_loss.backward(retain_graph=True)
            self.tags_optimizer.step()
            self.tags_optimizer.zero_grad()
        self.epoch_loss += tags_loss.item() + intent_loss.item()

Затем их нужно передать в отдельные оптимизаторы и учить отдельно.

*Еще, может быть, пригодится retain_graph параметр метода backward()*.

In [None]:
model = AsyncSharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                         tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
tags_parameters = [param for name, param in model.named_parameters() if not 'intent' in name]
intent_parameters = [param for name, param in model.named_parameters() if not 'tags' in name]
tags_optimizer = optim.Adam(tags_parameters)
intent_optimizer = optim.Adam(intent_parameters)
trainer = AsyncSharedModelTrainer(model, criterion, tags_optimizer, intent_optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 1.99057, Tags accuracy = 68.31%, Intents accuracy = 73.29%: 100%|██████████| 140/140 [00:04<00:00, 30.20it/s]
[1 / 30]   Val: Loss = 1.35171, Tags accuracy = 84.18%, Intents accuracy = 71.60%: 100%|██████████| 4/4 [00:00<00:00, 41.19it/s]
[2 / 30] Train: Loss = 1.04530, Tags accuracy = 89.63%, Intents accuracy = 77.45%: 100%|██████████| 140/140 [00:04<00:00, 31.17it/s]
[2 / 30]   Val: Loss = 0.88679, Tags accuracy = 92.65%, Intents accuracy = 79.80%: 100%|██████████| 4/4 [00:00<00:00, 38.66it/s]
[3 / 30] Train: Loss = 0.66037, Tags accuracy = 94.26%, Intents accuracy = 85.57%: 100%|██████████| 140/140 [00:04<00:00, 30.66it/s]
[3 / 30]   Val: Loss = 0.62762, Tags accuracy = 94.81%, Intents accuracy = 83.80%: 100%|██████████| 4/4 [00:00<00:00, 40.35it/s]
[4 / 30] Train: Loss = 0.43881, Tags accuracy = 96.42%, Intents accuracy = 91.05%: 100%|██████████| 140/140 [00:04<00:00, 28.60it/s]
[4 / 30]   Val: Loss = 0.45855, Tags accuracy = 96.25%, Intents accuracy = 89.40%

In [None]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.27244, Tags accuracy = 97.25%, Intents accuracy = 95.74%: 100%|██████████| 7/7 [00:00<00:00, 27.52it/s]


In [None]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

#### **Задание 3.2**
Посмотрите на параметры в статье и попробуйте добиться похожего качества.

#### **Задание 4**
Посмотрите результаты на SNIPS

## Async Multi-task Learning for POS Tagging

Ещё одна статья: [Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings](https://arxiv.org/pdf/1805.08237.pdf)

Архитектура там такая:

<img src="https://i.ibb.co/0nSX6CC/2018-11-27-9-26-15.png" width="400"/>

Multi-task задача - обучение отдельных классификаторов более низкого уровня (над символами и словами) для предсказания тегов отдельными оптимизаторами.

## DeepPavlov go_bot

http://docs.deeppavlov.ai/en/master/features/skills/go_bot.html

In [None]:
!pip install deeppavlov
!python -m deeppavlov install gobot_dstc2

Collecting deeppavlov
[?25l  Downloading https://files.pythonhosted.org/packages/d7/9d/453d101981b293441be889ba91d6983abdeeb2b3abc070b47a4044f6a64b/deeppavlov-0.7.1-py3-none-any.whl (735kB)
[K     |▌                               | 10kB 6.8MB/s eta 0:00:01[K     |█                               | 20kB 3.7MB/s eta 0:00:01[K     |█▍                              | 30kB 5.4MB/s eta 0:00:01[K     |█▉                              | 40kB 4.3MB/s eta 0:00:01[K     |██▎                             | 51kB 5.3MB/s eta 0:00:01[K     |██▊                             | 61kB 6.3MB/s eta 0:00:01[K     |███▏                            | 71kB 6.6MB/s eta 0:00:01[K     |███▋                            | 81kB 7.4MB/s eta 0:00:01[K     |████                            | 92kB 6.9MB/s eta 0:00:01[K     |████▌                           | 102kB 7.6MB/s eta 0:00:01[K     |█████                           | 112kB 7.6MB/s eta 0:00:01[K     |█████▍                          | 122kB 7.6MB/s e

email-validator not installed, email fields will be treated as str.
To install, run: pip install email-validator
2020-02-18 11:25:24.30 INFO in 'deeppavlov.core.common.file'['file'] at line 30: Interpreting 'gobot_dstc2' as '/usr/local/lib/python3.6/dist-packages/deeppavlov/configs/go_bot/gobot_dstc2.json'
Collecting tensorflow==1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (109.2MB)
[K     |████████████████████████████████| 109.2MB 124kB/s 
Collecting tensorboard<1.15.0,>=1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/91/2d/2ed263449a078cd9c8a9ba50ebd50123adf1f8cfbea1492f9084169b89d9/tensorboard-1.14.0-py3-none-any.whl (3.1MB)
[K     |████████████████████████████████| 3.2MB 46.9MB/s 
[?25hCollecting tensorflow-estimator<1.15.0rc0,>=1.14.0rc0
[?25l  Downloading https://files.pythonhosted.org/packages/3c/d5/21860a5b11caf0678fb

In [None]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
from deeppavlov import build_model, configs

bot1 = build_model(configs.go_bot.gobot_dstc2, download=True)

bot1(['hi, i want restaurant in the cheap pricerange'])
bot1(['bye'])

2020-02-18 11:28:36.107 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/slotfill_dstc2.tar.gz to /root/.deeppavlov/slotfill_dstc2.tar.gz
100%|██████████| 641k/641k [00:00<00:00, 3.22MB/s]
2020-02-18 11:28:36.313 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/slotfill_dstc2.tar.gz archive into /root/.deeppavlov/models
2020-02-18 11:28:36.501 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/gobot_dstc2_v9.tar.gz to /root/.deeppavlov/gobot_dstc2_v9.tar.gz
100%|██████████| 966k/966k [00:00<00:00, 1.92MB/s]
2020-02-18 11:28:37.7 INFO in 'deeppavlov.core.data.utils'['utils'] at line 237: Extracting /root/.deeppavlov/gobot_dstc2_v9.tar.gz archive into /root/.deeppavlov/models
2020-02-18 11:28:37.191 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/deeppavlov_data/






2020-02-18 11:29:14.130 INFO in 'deeppavlov.core.data.sqlite_database'['sqlite_database'] at line 66: Loading database from /root/.deeppavlov/downloads/dstc2/resto.sqlite.
2020-02-18 11:29:14.138 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /root/.deeppavlov/models/slotfill_dstc2/word.dict]
2020-02-18 11:29:14.146 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 115: [loading vocabulary from /root/.deeppavlov/models/slotfill_dstc2/tag.dict]





Instructions for updating:
Use `tf.keras.layers.Conv1D` instead.


Using TensorFlow backend.


Instructions for updating:
Use keras.layers.BatchNormalization instead.  In particular, `tf.control_dependencies(tf.GraphKeys.UPDATE_OPS)` should not be used (consult the `tf.keras.layers.batch_normalization` documentation).
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
Use standard file APIs to check for files with this prefix.


2020-02-18 11:29:15.808 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/slotfill_dstc2/model]



INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/slotfill_dstc2/model


2020-02-18 11:29:15.974 INFO in 'deeppavlov.core.data.utils'['utils'] at line 80: Downloading from http://files.deeppavlov.ai/datasets/dstc_slot_vals.json to /root/.deeppavlov/models/slotfill_dstc2/model
100%|██████████| 8.49k/8.49k [00:00<00:00, 17.1MB/s]
2020-02-18 11:29:16.108 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/root/.deeppavlov/downloads/embeddings/glove.6B.100d.txt`]
  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL






2020-02-18 11:29:58.496 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 165: [loading templates from /root/.deeppavlov/downloads/dstc2/dstc2-templates.txt]
2020-02-18 11:29:58.498 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 168: 46 templates loaded.
2020-02-18 11:29:58.499 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 214: Calculated input size for `GoalOrientedBotNetwork` is 611



Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


2020-02-18 11:29:59.747 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 252: [initializing `GoalOrientedBot` from saved]
2020-02-18 11:29:59.749 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 772: [loading parameters from /root/.deeppavlov/models/gobot_dstc2/model.json]
2020-02-18 11:29:59.758 INFO in 'deeppavlov.core.models.tf_model'['tf_model'] at line 51: [loading model from /root/.deeppavlov/models/gobot_dstc2/model]



INFO:tensorflow:Restoring parameters from /root/.deeppavlov/models/gobot_dstc2/model


2020-02-18 11:29:59.974 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 489: Made api_call with {'pricerange': 'cheap'}, got 22 results.


['You are welcome!']

In [None]:
bot1.reset()
bot1(['hi, i want restaurant in the cheap pricerange'])

2020-02-18 11:31:22.791 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 489: Made api_call with {'pricerange': 'cheap'}, got 22 results.


['The lucky star is a nice place in the south of town and the prices are cheap.']

In [None]:
bot1.reset()
bot1(['any restaurants with italian food in the menu?'])

2020-02-18 11:38:26.580 INFO in 'deeppavlov.models.go_bot.network'['network'] at line 489: Made api_call with {'food': 'italian'}, got 13 results.


['Da vinci pizzeria is a nice place in the north of town serving tasty italian food.']

In [None]:
!wget http://camdial.org/~mh521/dstc/downloads/dstc2_test.tar.gz

--2020-02-18 11:33:45--  http://camdial.org/~mh521/dstc/downloads/dstc2_test.tar.gz
Resolving camdial.org (camdial.org)... 178.79.137.90
Connecting to camdial.org (camdial.org)|178.79.137.90|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20180642 (19M) [application/x-gzip]
Saving to: ‘dstc2_test.tar.gz’


2020-02-18 11:33:45 (113 MB/s) - ‘dstc2_test.tar.gz’ saved [20180642/20180642]



In [None]:
!tar -xzvf dstc2_test.tar.gz

scripts/config/dstc2_test.flist
data/Mar13_S2A0/voip-00d76b791d-20130327_005342/
data/Mar13_S2A0/voip-00d76b791d-20130327_005342/log.json
data/Mar13_S2A0/voip-00d76b791d-20130327_005342/label.json
data/Mar13_S2A0/voip-00d76b791d-20130327_011305/
data/Mar13_S2A0/voip-00d76b791d-20130327_011305/label.json
data/Mar13_S2A0/voip-00d76b791d-20130327_011305/log.json
data/Mar13_S2A0/voip-00d76b791d-20130327_012544/
data/Mar13_S2A0/voip-00d76b791d-20130327_012544/log.json
data/Mar13_S2A0/voip-00d76b791d-20130327_012544/label.json
data/Mar13_S2A0/voip-0241bbae39-20130327_194449/
data/Mar13_S2A0/voip-0241bbae39-20130327_194449/label.json
data/Mar13_S2A0/voip-0241bbae39-20130327_194449/log.json
data/Mar13_S2A0/voip-0241bbae39-20130327_202609/
data/Mar13_S2A0/voip-0241bbae39-20130327_202609/log.json
data/Mar13_S2A0/voip-0241bbae39-20130327_202609/label.json
data/Mar13_S2A0/voip-03c2655d43-20130327_200228/
data/Mar13_S2A0/voip-03c2655d43-20130327_200228/log.json
data/Mar13_S2A0/voip-03c2655d43-20130

In [None]:
!head -n 100 data/Mar13_S2A1/voip-fe4b6ef58f-20130328_232533/log.json

{
    "session-id": "voip-fe4b6ef58f-20130328_232533", 
    "session-date": "2013-03-28", 
    "session-time": "23:25:33", 
    "caller-id": "fe4b6ef58f", 
    "turns": [
        {
            "output": {
                "transcript": "Hello , welcome to the Cambridge restaurant system? You can ask for restaurants by area , price range or food type . How may I help you?", 
                "end-time": 9.26, 
                "start-time": 0.000846, 
                "dialog-acts": [
                    {
                        "slots": [], 
                        "act": "welcomemsg"
                    }
                ], 
                "aborted": false
            }, 
            "turn-index": 0, 
            "input": {
                "live": {
                    "asr-hyps": [
                        {
                            "asr-hyp": "i am looking for a restaurant in the west part of town", 
                            "score": -0.445744
                        }, 
        

Поддробные туториалы:

Simple: https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_tutorial.ipynb

Extended: https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb

# Дополнительные материалы

## Статьи
A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling, 2018 [[pdf]](http://aclweb.org/anthology/N18-2050)

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction, 2018 [[pdf]](http://aclweb.org/anthology/N18-2118) 

Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings, 2018 [[pdf]](https://arxiv.org/pdf/1805.08237.pdf)

BERT for Joint Intent Classification and Slot Filling
 [[pdf]](https://arxiv.org/pdf/1902.10909.pdf)

## Блоги
[Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)  