# shorturl.at/iOY35

Основано на: https://github.com/DanAnastasyev/DeepNLP-Course Week 12

In [1]:
!pip install  torchtext==0.4.0 torch==1.13.1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torchtext==0.4.0
  Downloading torchtext-0.4.0-py3-none-any.whl (53 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 KB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: torchtext
  Attempting uninstall: torchtext
    Found existing installation: torchtext 0.14.1
    Uninstalling torchtext-0.14.1:
      Successfully uninstalled torchtext-0.14.1
Successfully installed torchtext-0.4.0


In [2]:
!git clone https://github.com/MiuLab/SlotGated-SLU.git
!wget -qq https://raw.githubusercontent.com/yandexdataschool/nlp_course/master/week08_multitask/conlleval.py

Cloning into 'SlotGated-SLU'...
remote: Enumerating objects: 51, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 51 (delta 4), reused 2 (delta 2), pack-reused 44[K
Unpacking objects: 100% (51/51), 426.19 KiB | 1.05 MiB/s, done.


In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
DEVICE = torch.device('cpu')

# Диалоговые системы

Диалоговые системы делятся на два типа - *goal-orientied* и *general conversation*.

**General conversation** - это болталка, разговор на свободную тему:  
<img src="https://i.ibb.co/bFwwGpc/alice.jpg" width="200"/>

Сегодня будем говорить не про них, а про **goal-orientied** системы:

<img src="https://hsto.org/webt/gj/3y/xl/gj3yxlqbr7ujuqr9r2akacxmkee.jpeg" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Пользователь говорит что-то, это что-то распознается. По распознанному определяется - что, где и когда он хотел. Дальше диалоговый движок решает, действительно ли пользователь знает, чего хотел попросить. Происходит поход в источники - узнать информацию, которую (кажется) запросил пользователь. Исходя из всего этого генерируется некоторый ответ:

<img src="https://i.ibb.co/8XcdpJ7/goal-orientied.png" width="600"/>

*From [Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)*

Будем учить ту часть, которая посередине - классификатор и теггер. Всё остальное обычно - эвристики и захардкоженные ответы.

## Данные

Есть условно стандартный датасет - atis, который неприлично маленький, на самом деле.

К нему можно взять еще датасет snips - он больше и разнообразнее.

Оба датасета возьмем из репозитория статьи [Slot-Gated Modeling for Joint Slot Filling and Intent Prediction](http://aclweb.org/anthology/N18-2118).

Начнем с atis.

In [4]:
import os 

def read_dataset(path):
    with open(os.path.join(path, 'seq.in')) as f_words, \
            open(os.path.join(path, 'seq.out')) as f_tags, \
            open(os.path.join(path, 'label')) as f_intents:
        
        return [
            (words.strip().split(), tags.strip().split(), intent.strip())
            #(words.strip().split(), [tag.split('.')[0] for tag in tags.strip().split()], intent.strip().split('#')[0]) 
            for words, tags, intent in zip(f_words, f_tags, f_intents)
        ]

In [5]:
train_data = read_dataset('SlotGated-SLU/data/atis/train/')
val_data = read_dataset('SlotGated-SLU/data/atis/valid/')
test_data = read_dataset('SlotGated-SLU/data/atis/test/')

In [6]:
intent_to_example = {example[2]: example for example in test_data}
for example in intent_to_example.values():
    print('Intent:\t', example[2])
    print('Text:\t', '\t'.join(example[0]))
    print('Tags:\t', '\t'.join(example[1]))
    print()

Intent:	 atis_flight
Text:	 find	me	a	flight	that	flies	from	memphis	to	tacoma
Tags:	 O	O	O	O	O	O	O	B-fromloc.city_name	O	B-toloc.city_name

Intent:	 atis_airfare
Text:	 list	the	cheapest	fare	from	charlotte	to	las	vegas
Tags:	 O	O	B-cost_relative	O	O	B-fromloc.city_name	O	B-toloc.city_name	I-toloc.city_name

Intent:	 atis_flight#atis_airfare
Text:	 list	flights	and	fares	from	tacoma	to	orlando	round	trip	leaving	saturday	returning	next	saturday
Tags:	 O	O	O	O	O	B-fromloc.city_name	O	B-toloc.city_name	B-round_trip	I-round_trip	O	B-depart_date.day_name	O	B-return_date.date_relative	B-return_date.day_name

Intent:	 atis_ground_service
Text:	 is	there	taxi	service	at	the	ontario	airport
Tags:	 O	O	B-transport_type	I-transport_type	O	O	B-airport_name	I-airport_name

Intent:	 atis_day_name
Text:	 what	days	of	the	week	do	flights	from	san	jose	to	nashville	fly	on
Tags:	 O	O	O	O	O	O	O	O	B-fromloc.city_name	I-fromloc.city_name	O	B-toloc.city_name	O	O

Intent:	 atis_meal
Text:	 are	snacks	serve

In [7]:

from torchtext.data import Field, LabelField, Example, Dataset, BucketIterator

tokens_field = Field()
tags_field = Field(unk_token=None)
intent_field = LabelField()

fields = [('tokens', tokens_field), ('tags', tags_field), ('intent', intent_field)]

train_dataset = Dataset([Example.fromlist(example, fields) for example in train_data], fields)
val_dataset = Dataset([Example.fromlist(example, fields) for example in val_data], fields)
test_dataset = Dataset([Example.fromlist(example, fields) for example in test_data], fields)

tokens_field.build_vocab(train_dataset, val_dataset, test_dataset)
tags_field.build_vocab(train_dataset, val_dataset, test_dataset)
intent_field.build_vocab(train_dataset, val_dataset, test_dataset)

print('Vocab size =', len(tokens_field.vocab))
print('Tags count =', len(tags_field.vocab))
print('Intents count =', len(intent_field.vocab))

train_iter, val_iter, test_iter = BucketIterator.splits(
    datasets=(train_dataset, val_dataset, test_dataset), batch_sizes=(32, 128, 128), 
    shuffle=True, device=DEVICE, sort=False
)

Vocab size = 952
Tags count = 128
Intents count = 26


То же самое со snips

In [8]:
snips_train_data = read_dataset('SlotGated-SLU/data/snips/train/')
snips_val_data = read_dataset('SlotGated-SLU/data/snips/valid/')
snips_test_data = read_dataset('SlotGated-SLU/data/snips/test/')
snips_intent_to_example = {example[2]: example for example in snips_train_data}
for example in snips_intent_to_example.values():
    print('Intent:\t', example[2])
    print('Text:\t', '\t'.join(example[0]))
    print('Tags:\t', '\t'.join(example[1]))
    print()

Intent:	 PlayMusic
Text:	 play	funky	heavy	bluesy
Tags:	 O	B-playlist	I-playlist	I-playlist

Intent:	 AddToPlaylist
Text:	 add	gabrial	mcnair	to	my	love	in	paris	list
Tags:	 O	B-artist	I-artist	O	B-playlist_owner	B-playlist	I-playlist	I-playlist	O

Intent:	 RateBook
Text:	 rate	richard	carvel	4	out	of	6
Tags:	 O	B-object_name	I-object_name	B-rating_value	O	O	B-best_rating

Intent:	 SearchScreeningEvent
Text:	 can	i	get	the	movie	schedule	for	loews	cineplex	entertainment
Tags:	 O	O	O	O	B-object_type	I-object_type	O	B-location_name	I-location_name	I-location_name

Intent:	 BookRestaurant
Text:	 i	want	to	eat	choucroute	at	a	brasserie	for	8
Tags:	 O	O	O	O	B-served_dish	O	O	B-restaurant_type	O	B-party_size_number

Intent:	 GetWeather
Text:	 tell	me	when	it	ll	be	chillier	in	cavalero	corner	id
Tags:	 O	O	O	O	O	O	B-condition_temperature	O	B-city	I-city	B-state

Intent:	 SearchCreativeWork
Text:	 go	to	the	photograph	the	inflated	tear
Tags:	 O	O	O	B-object_type	B-object_name	I-object_name	I-o

In [9]:
from torchtext.data import Field, LabelField, Example, Dataset, BucketIterator

snips_tokens_field = Field()
snips_tags_field = Field(unk_token=None)
snips_intent_field = LabelField()

fields = [('tokens', snips_tokens_field), ('tags', snips_tags_field), ('intent', snips_intent_field)]

snips_train_dataset = Dataset([Example.fromlist(example, fields) for example in snips_train_data], fields)
snips_val_dataset = Dataset([Example.fromlist(example, fields) for example in snips_val_data], fields)
snips_test_dataset = Dataset([Example.fromlist(example, fields) for example in snips_test_data], fields)

snips_tokens_field.build_vocab(snips_train_dataset)
snips_tags_field.build_vocab(snips_train_dataset)
snips_intent_field.build_vocab(snips_train_dataset)

print('Vocab size =', len(snips_tokens_field.vocab))
print('Tags count =', len(snips_tags_field.vocab))
print('Intents count =', len(snips_intent_field.vocab))

snips_train_iter, snips_val_iter, snips_test_iter = BucketIterator.splits(
    datasets=(snips_train_dataset, snips_val_dataset, snips_test_dataset), batch_sizes=(32, 128, 128), 
    shuffle=True, device=DEVICE, sort=False
)

Vocab size = 11420
Tags count = 73
Intents count = 7


## Классификатор интентов

Начнем с классификатора: к какому интенту относится данный запрос.

Ничего умного - берём rnn'ку и учимся предсказывать метки-интенты.

In [142]:
class IntentClassifierModel(nn.Module):
    def __init__(self, vocab_size, intents_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer = nn.Linear(lstm_hidden_dim * 2, intents_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        output, (final_hidden_state, _) = self.lstm_layer(projections)
        hidden = self.dropout(torch.cat((final_hidden_state[0], final_hidden_state[1]), dim=1))
        output = self.out_layer.forward(hidden)
        return output

In [11]:
class ModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count
        )
    
    def on_batch(self, batch):
        logits = self.model(batch.tokens.transpose(0, 1))
        loss = self.criterion(logits, batch.intent)
        predicted_intent = torch.max(logits, axis=1)[1]
        self.total_count += predicted_intent.size(0)
        self.correct_count += torch.sum(predicted_intent == batch.intent).item()
        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [10]:
import math
from tqdm import tqdm
tqdm.get_lock().locks = []


def do_epoch(trainer, data_iter, is_train, name=None):
    trainer.on_epoch_begin(is_train, name, batches_count=len(data_iter))
    
    with torch.autograd.set_grad_enabled(is_train):
        with tqdm(total=trainer.batches_count) as progress_bar:
            for i, batch in enumerate(data_iter):
                batch_progress = trainer.on_batch(batch)

                progress_bar.update()
                progress_bar.set_description(batch_progress)
                
            epoch_progress = trainer.on_epoch_end()
            progress_bar.set_description(epoch_progress)
            progress_bar.refresh()

            
def fit(trainer, train_iter, epochs_count=1, val_iter=None):
    best_val_loss = None
    for epoch in range(epochs_count):
        name_prefix = '[{} / {}] '.format(epoch + 1, epochs_count)
        do_epoch(trainer, train_iter, is_train=True, name=name_prefix + 'Train:')
        
        if not val_iter is None:
            do_epoch(trainer, val_iter, is_train=False, name=name_prefix + '  Val:')

In [13]:
model = IntentClassifierModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = ModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 0.90673, Accuracy = 79.43%: 100%|██████████| 140/140 [00:07<00:00, 17.64it/s]
[1 / 30]   Val: Loss = 0.56168, Accuracy = 86.00%: 100%|██████████| 4/4 [00:00<00:00, 14.51it/s]
[2 / 30] Train: Loss = 0.36383, Accuracy = 90.93%: 100%|██████████| 140/140 [00:06<00:00, 21.59it/s]
[2 / 30]   Val: Loss = 0.36029, Accuracy = 89.60%: 100%|██████████| 4/4 [00:00<00:00, 14.64it/s]
[3 / 30] Train: Loss = 0.23188, Accuracy = 93.81%: 100%|██████████| 140/140 [00:07<00:00, 18.42it/s]
[3 / 30]   Val: Loss = 0.26407, Accuracy = 92.80%: 100%|██████████| 4/4 [00:00<00:00, 15.46it/s]
[4 / 30] Train: Loss = 0.15212, Accuracy = 96.03%: 100%|██████████| 140/140 [00:08<00:00, 17.43it/s]
[4 / 30]   Val: Loss = 0.21590, Accuracy = 95.80%: 100%|██████████| 4/4 [00:00<00:00, 11.82it/s]
[5 / 30] Train: Loss = 0.10406, Accuracy = 97.50%: 100%|██████████| 140/140 [00:10<00:00, 13.71it/s]
[5 / 30]   Val: Loss = 0.18187, Accuracy = 96.40%: 100%|██████████| 4/4 [00:00<00:00, 17.15it/s]
[6 / 30] T

In [14]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.36845, Accuracy = 94.29%: 100%|██████████| 7/7 [00:00<00:00, 17.62it/s]


## Теггер

![](https://commons.bmstu.wiki/images/0/00/NER1.png)  
*From [IOB](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)*

#### **Задание 1.1**
Напишите простой теггер

In [66]:
class TokenTaggerModel(nn.Module):
    def __init__(self, vocab_size, tags_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.out_layer = nn.Linear(lstm_hidden_dim * 2, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        output, (final_hidden_state, _) = self.lstm_layer(projections)
        
        output = self.out_layer.forward(output)
        return output

#### **Задание 1.2**
Обновите `ModelTrainer`: считать нужно всё те же лосс и accuracy, только теперь немного по-другому.

In [70]:
class TagModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.correct_count, self.total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.correct_count / self.total_count
        )
        
    def on_batch(self, batch):
        logits = self.model(batch.tokens.transpose(0, 1))
        loss = self.criterion(logits.transpose(1, 2), batch.tags.transpose(0, 1))
        predicted_tag = torch.max(logits, axis=2)[1]
        self.total_count += predicted_tag.size(0)*predicted_tag.size(1)
        self.correct_count += torch.sum(predicted_tag == batch.tags.transpose(0, 1)).item()

        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [71]:
model = TokenTaggerModel(vocab_size=len(tokens_field.vocab), tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = TagModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 0.99382, Accuracy = 83.34%: 100%|██████████| 140/140 [00:09<00:00, 15.51it/s]
[1 / 30]   Val: Loss = 0.28589, Accuracy = 93.63%: 100%|██████████| 4/4 [00:00<00:00,  9.43it/s]
[2 / 30] Train: Loss = 0.24392, Accuracy = 95.20%: 100%|██████████| 140/140 [00:08<00:00, 17.44it/s]
[2 / 30]   Val: Loss = 0.13146, Accuracy = 97.22%: 100%|██████████| 4/4 [00:00<00:00, 13.44it/s]
[3 / 30] Train: Loss = 0.12790, Accuracy = 97.37%: 100%|██████████| 140/140 [00:09<00:00, 14.98it/s]
[3 / 30]   Val: Loss = 0.09464, Accuracy = 97.93%: 100%|██████████| 4/4 [00:00<00:00, 14.29it/s]
[4 / 30] Train: Loss = 0.08112, Accuracy = 98.37%: 100%|██████████| 140/140 [00:08<00:00, 15.99it/s]
[4 / 30]   Val: Loss = 0.06584, Accuracy = 98.68%: 100%|██████████| 4/4 [00:00<00:00,  9.19it/s]
[5 / 30] Train: Loss = 0.05579, Accuracy = 98.90%: 100%|██████████| 140/140 [00:08<00:00, 16.94it/s]
[5 / 30]   Val: Loss = 0.05688, Accuracy = 98.81%: 100%|██████████| 4/4 [00:00<00:00, 13.73it/s]
[6 / 30] T

In [72]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.07382, Accuracy = 98.85%: 100%|██████████| 7/7 [00:00<00:00, 17.15it/s]


In [73]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1)).transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 93.78%, Recall = 93.51%, F1 = 93.64%


## Multi-task learning

Реализуем модель, которая умеет сразу и предсказывать теги и интенты. Идея в том, что в этом всем есть общая информация, которая должна помочь как одной, так и другой задаче: зная интент, можно понять, какие слоты вообще могут быть, а зная слоты, можно угадать и интент.

#### **Задание 2.1**
Реализуйте объединенную модель.

In [88]:
class SharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=64,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)
        self.lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.intent_out_layer = nn.Linear(lstm_hidden_dim * 2, intents_count)
        self.tag_out_layer = nn.Linear(lstm_hidden_dim * 2, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)
        output, (final_hidden_state, _) = self.lstm_layer(projections)

        hidden = self.dropout(torch.cat((final_hidden_state[0], final_hidden_state[1]), dim=1))
        intent_output = self.intent_out_layer.forward(hidden)
        
        tag_output = self.tag_out_layer.forward(output)
        return tag_output, intent_output

#### **Задание 2.2**
Допишите SharedModelTrainer

In [89]:
class SharedModelTrainer():
    def __init__(self, model, criterion, optimizer):
        self.model = model
        self.criterion = criterion
        self.optimizer = optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tag_correct_count, self.tag_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.tag_correct_count / self.tag_total_count, 
            self.intent_correct_count / self.intent_total_count
        )
        
    def on_batch(self, batch):

        tag_logits, intent_logits = self.model(batch.tokens.transpose(0, 1))

        tag_loss = self.criterion(tag_logits.transpose(1, 2), batch.tags.transpose(0, 1))
        intent_loss = self.criterion(intent_logits, batch.intent)
        loss = intent_loss + tag_loss

        predicted_intent = torch.max(intent_logits, axis=1)[1]
        self.intent_total_count += predicted_intent.size(0)
        self.intent_correct_count += torch.sum(predicted_intent == batch.intent).item()

        predicted_tag = torch.max(tag_logits, axis=2)[1]
        self.tag_total_count += predicted_tag.size(0)*predicted_tag.size(1)
        self.tag_correct_count += torch.sum(predicted_tag == batch.tags.transpose(0, 1)).item()

        if self.is_train:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()
        self.epoch_loss += loss.item()

In [90]:
model = SharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                    tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
optimizer = optim.Adam(model.parameters())
trainer = SharedModelTrainer(model, criterion, optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 2.17263, Tags accuracy = 79.06%, Intents accuracy = 76.66%: 100%|██████████| 140/140 [00:09<00:00, 14.32it/s]
[1 / 30]   Val: Loss = 1.13499, Tags accuracy = 92.06%, Intents accuracy = 81.20%: 100%|██████████| 4/4 [00:00<00:00, 10.81it/s]
[2 / 30] Train: Loss = 0.89137, Tags accuracy = 91.97%, Intents accuracy = 87.94%: 100%|██████████| 140/140 [00:09<00:00, 14.64it/s]
[2 / 30]   Val: Loss = 0.67359, Tags accuracy = 95.61%, Intents accuracy = 88.40%: 100%|██████████| 4/4 [00:00<00:00, 11.51it/s]
[3 / 30] Train: Loss = 0.53241, Tags accuracy = 95.73%, Intents accuracy = 92.12%: 100%|██████████| 140/140 [00:09<00:00, 15.42it/s]
[3 / 30]   Val: Loss = 0.52127, Tags accuracy = 97.11%, Intents accuracy = 90.40%: 100%|██████████| 4/4 [00:00<00:00,  9.34it/s]
[4 / 30] Train: Loss = 0.36883, Tags accuracy = 97.16%, Intents accuracy = 93.84%: 100%|██████████| 140/140 [00:08<00:00, 16.69it/s]
[4 / 30]   Val: Loss = 0.40818, Tags accuracy = 97.77%, Intents accuracy = 92.20%

In [91]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.45082, Tags accuracy = 98.74%, Intents accuracy = 94.74%: 100%|██████████| 7/7 [00:00<00:00, 16.52it/s]


In [93]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            true = batch.tags.transpose(0, 1).cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in true])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 93.37%, Recall = 92.28%, F1 = 92.82%


 ## Асинхронное обучение

Идея описана в статье [A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling](http://aclweb.org/anthology/N18-2050).

<img src="https://i.ibb.co/qrgVSqF/2018-11-27-2-11-17.png" width="600"/>

Основное отличие от того, что уже реализовали в том, в каком порядке все оптимизируется. Вместо объединенного обучения всех слоев, сети для теггера и для классификатора обучаются отдельно.

На каждом шаге обучения генерируются последовательности скрытых состояний $h^1$ и $h^2$ - для классификатора и для теггера.

Дальше сначала считаются потери от предсказания интента и делается шаг оптимизатора, а затем потери от предсказания теггов - и опять шаг оптимизатора.

#### **Задание 3.1**
Реализуйте асинхронное обучение совместной модели

In [11]:
class AsyncSharedModel(nn.Module):
    def __init__(self, vocab_size, intents_count, tags_count, emb_dim=65,
                 lstm_hidden_dim=128, num_layers=1, dropout_p=0.2):
        super().__init__()

        self.embeddings_layer = nn.Embedding(vocab_size, emb_dim)
        self.dropout = nn.Dropout(dropout_p)

        self.intent_lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.tags_lstm_layer = nn.LSTM(emb_dim, lstm_hidden_dim, batch_first=True,
                                  bidirectional=True, num_layers=num_layers)
        self.intent_out_layer = nn.Linear(lstm_hidden_dim * 2, intents_count)
        self.tags_out_layer = nn.Linear(lstm_hidden_dim * 2, tags_count)

    def forward(self, inputs):
        projections = self.embeddings_layer.forward(inputs)
        projections = projections.reshape(projections.size(0), projections.size(1), -1)

        intent_output, (intent_final_hidden_state, _) = self.intent_lstm_layer(projections)
        tags_output, (tags_final_hidden_state, _) = self.tags_lstm_layer(projections)

        hidden = self.dropout(torch.cat((intent_final_hidden_state[0], intent_final_hidden_state[1]), dim=1))
        intent_output = self.intent_out_layer.forward(hidden)
        
        tag_output = self.tags_out_layer.forward(tags_output)

        return tag_output, intent_output

In [12]:
class AsyncSharedModelTrainer():
    def __init__(self, model, criterion, tags_optimizer, intent_optimizer):
        self.model = model
        self.criterion = criterion
        self.tags_optimizer = tags_optimizer
        self.intent_optimizer = intent_optimizer
        
    def on_epoch_begin(self, is_train, name, batches_count):
        self.epoch_loss = 0
        self.tag_correct_count, self.tag_total_count = 0, 0
        self.intent_correct_count, self.intent_total_count = 0, 0
        self.is_train = is_train
        self.name = name
        self.batches_count = batches_count
        self.model.train(is_train)
        
    def on_epoch_end(self):
        return '{:>5s} Loss = {:.5f}, Tags accuracy = {:.2%}, Intents accuracy = {:.2%}'.format(
            self.name, self.epoch_loss / self.batches_count, self.tag_correct_count / self.tag_total_count, 
            self.intent_correct_count / self.intent_total_count
        )
        
    def on_batch(self, batch):

        tag_logits, intent_logits = self.model(batch.tokens.transpose(0, 1))

        intent_loss = self.criterion(intent_logits, batch.intent)

        predicted_intent = torch.max(intent_logits, axis=1)[1]
        self.intent_total_count += predicted_intent.size(0)
        self.intent_correct_count += torch.sum(predicted_intent == batch.intent).item()


        tags_loss = self.criterion(tag_logits.transpose(1, 2), batch.tags.transpose(0, 1))

        predicted_tag = torch.max(tag_logits, axis=2)[1]
        self.tag_total_count += predicted_tag.size(0)*predicted_tag.size(1)
        self.tag_correct_count += torch.sum(predicted_tag == batch.tags.transpose(0, 1)).item()

        if self.is_train:
            intent_loss.backward(retain_graph=True)
            tags_loss.backward(retain_graph=True)

            self.intent_optimizer.step()
            self.intent_optimizer.zero_grad()

            self.tags_optimizer.step()
            self.tags_optimizer.zero_grad()

        loss = tags_loss + intent_loss
        self.epoch_loss += loss.item()

Затем их нужно передать в отдельные оптимизаторы и учить отдельно.

*Еще, может быть, пригодится retain_graph параметр метода backward()*.

In [121]:
model = AsyncSharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                         tags_count=len(tags_field.vocab)).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
tags_parameters = [param for name, param in model.named_parameters() if not 'intent' in name]
intent_parameters = [param for name, param in model.named_parameters() if not 'tags' in name]
tags_optimizer = optim.Adam(tags_parameters)
intent_optimizer = optim.Adam(intent_parameters)
trainer = AsyncSharedModelTrainer(model, criterion, tags_optimizer, intent_optimizer)
fit(trainer, train_iter, epochs_count=30, val_iter=val_iter)

[1 / 30] Train: Loss = 1.93072, Tags accuracy = 81.36%, Intents accuracy = 79.59%: 100%|██████████| 140/140 [00:16<00:00,  8.46it/s]
[1 / 30]   Val: Loss = 0.88582, Tags accuracy = 93.20%, Intents accuracy = 85.20%: 100%|██████████| 4/4 [00:00<00:00,  6.67it/s]
[2 / 30] Train: Loss = 0.63699, Tags accuracy = 94.16%, Intents accuracy = 91.42%: 100%|██████████| 140/140 [00:13<00:00, 10.11it/s]
[2 / 30]   Val: Loss = 0.51319, Tags accuracy = 96.75%, Intents accuracy = 90.80%: 100%|██████████| 4/4 [00:00<00:00,  7.55it/s]
[3 / 30] Train: Loss = 0.36821, Tags accuracy = 96.95%, Intents accuracy = 94.13%: 100%|██████████| 140/140 [00:13<00:00, 10.07it/s]
[3 / 30]   Val: Loss = 0.37578, Tags accuracy = 97.89%, Intents accuracy = 93.20%: 100%|██████████| 4/4 [00:00<00:00,  8.72it/s]
[4 / 30] Train: Loss = 0.23615, Tags accuracy = 98.12%, Intents accuracy = 96.14%: 100%|██████████| 140/140 [00:14<00:00,  9.81it/s]
[4 / 30]   Val: Loss = 0.33698, Tags accuracy = 98.35%, Intents accuracy = 93.40%

In [122]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.42689, Tags accuracy = 98.63%, Intents accuracy = 94.96%: 100%|██████████| 7/7 [00:00<00:00, 10.87it/s]


In [123]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 93.37%, Recall = 91.89%, F1 = 92.63%


#### **Задание 3.2**
Посмотрите на параметры в статье и попробуйте добиться похожего качества.

#### **Задание 4**
Посмотрите результаты на SNIPS

Выставим гиперпараметры аналогичным в статье. Параметры оптимизатора оставим стандартными.

In [147]:
model = AsyncSharedModel(vocab_size=len(tokens_field.vocab), intents_count=len(intent_field.vocab),
                         tags_count=len(tags_field.vocab), emb_dim=300, lstm_hidden_dim=200, num_layers=2).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
tags_parameters = [param for name, param in model.named_parameters() if not 'intent' in name]
intent_parameters = [param for name, param in model.named_parameters() if not 'tags' in name]
tags_optimizer = optim.Adam(tags_parameters)
intent_optimizer = optim.Adam(intent_parameters)
trainer = AsyncSharedModelTrainer(model, criterion, tags_optimizer, intent_optimizer)
fit(trainer, train_iter, epochs_count=15, val_iter=val_iter)

[1 / 15] Train: Loss = 1.27488, Tags accuracy = 87.68%, Intents accuracy = 85.71%: 100%|██████████| 140/140 [01:11<00:00,  1.95it/s]
[1 / 15]   Val: Loss = 0.41936, Tags accuracy = 97.22%, Intents accuracy = 93.20%: 100%|██████████| 4/4 [00:02<00:00,  1.57it/s]
[2 / 15] Train: Loss = 0.27283, Tags accuracy = 97.99%, Intents accuracy = 95.71%: 100%|██████████| 140/140 [01:07<00:00,  2.08it/s]
[2 / 15]   Val: Loss = 0.20896, Tags accuracy = 98.93%, Intents accuracy = 97.60%: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s]
[3 / 15] Train: Loss = 0.10867, Tags accuracy = 99.28%, Intents accuracy = 98.57%: 100%|██████████| 140/140 [01:06<00:00,  2.10it/s]
[3 / 15]   Val: Loss = 0.19554, Tags accuracy = 99.30%, Intents accuracy = 96.40%: 100%|██████████| 4/4 [00:02<00:00,  1.57it/s]
[4 / 15] Train: Loss = 0.05739, Tags accuracy = 99.65%, Intents accuracy = 99.29%: 100%|██████████| 140/140 [01:05<00:00,  2.15it/s]
[4 / 15]   Val: Loss = 0.15698, Tags accuracy = 99.35%, Intents accuracy = 97.00%

In [148]:
do_epoch(trainer, test_iter, is_train=False, name='Test:')

Test: Loss = 0.38188, Tags accuracy = 99.12%, Intents accuracy = 95.74%: 100%|██████████| 7/7 [00:04<00:00,  1.72it/s]


In [149]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, test_iter)

Precision = 95.31%, Recall = 94.50%, F1 = 94.90%


Результаты на датасете snips

In [13]:
model = AsyncSharedModel(vocab_size=len(snips_tokens_field.vocab), intents_count=len(snips_intent_field.vocab),
                         tags_count=len(snips_tags_field.vocab), emb_dim=300, lstm_hidden_dim=200, num_layers=2).to(DEVICE)
criterion = nn.CrossEntropyLoss().to(DEVICE)
tags_parameters = [param for name, param in model.named_parameters() if not 'intent' in name]
intent_parameters = [param for name, param in model.named_parameters() if not 'tags' in name]
tags_optimizer = optim.Adam(tags_parameters)
intent_optimizer = optim.Adam(intent_parameters)
trainer = AsyncSharedModelTrainer(model, criterion, tags_optimizer, intent_optimizer)
fit(trainer, snips_train_iter, epochs_count=15, val_iter=snips_val_iter)

[1 / 15] Train: Loss = 0.71879, Tags accuracy = 87.93%, Intents accuracy = 93.71%: 100%|██████████| 409/409 [02:54<00:00,  2.35it/s]
[1 / 15]   Val: Loss = 0.24719, Tags accuracy = 95.51%, Intents accuracy = 97.14%: 100%|██████████| 6/6 [00:02<00:00,  2.04it/s]
[2 / 15] Train: Loss = 0.13394, Tags accuracy = 97.01%, Intents accuracy = 99.03%: 100%|██████████| 409/409 [02:54<00:00,  2.34it/s]
[2 / 15]   Val: Loss = 0.15777, Tags accuracy = 96.73%, Intents accuracy = 97.86%: 100%|██████████| 6/6 [00:02<00:00,  2.30it/s]
[3 / 15] Train: Loss = 0.05607, Tags accuracy = 98.73%, Intents accuracy = 99.70%: 100%|██████████| 409/409 [02:58<00:00,  2.29it/s]
[3 / 15]   Val: Loss = 0.16742, Tags accuracy = 97.39%, Intents accuracy = 97.71%: 100%|██████████| 6/6 [00:02<00:00,  2.98it/s]
[4 / 15] Train: Loss = 0.02483, Tags accuracy = 99.55%, Intents accuracy = 99.85%: 100%|██████████| 409/409 [02:57<00:00,  2.30it/s]
[4 / 15]   Val: Loss = 0.21953, Tags accuracy = 97.46%, Intents accuracy = 96.71%

In [14]:
do_epoch(trainer, snips_test_iter, is_train=False, name='Test:')

Test: Loss = 0.25080, Tags accuracy = 98.19%, Intents accuracy = 97.29%: 100%|██████████| 6/6 [00:02<00:00,  2.62it/s]


In [15]:
from conlleval import evaluate

def eval_tagger(model, test_iter):
    true_seqs, pred_seqs = [], []

    model.eval()
    with torch.no_grad():
        for batch in test_iter:
            pred = model.forward(batch.tokens.transpose(0, 1))[0].transpose(1, 2).max(dim=1)[1].cpu().tolist()
            pred_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in pred])
            true_seqs.extend([" ".join([tags_field.vocab.itos[elem] for elem in l if elem != 0]) for l in batch.tags.transpose(0, 1).cpu().tolist()])

    print('Precision = {:.2f}%, Recall = {:.2f}%, F1 = {:.2f}%'.format(*evaluate(true_seqs, pred_seqs, verbose=False)))

eval_tagger(model, snips_test_iter)

Precision = 93.08%, Recall = 93.11%, F1 = 93.09%


## Async Multi-task Learning for POS Tagging

Ещё одна статья: [Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings](https://arxiv.org/pdf/1805.08237.pdf)

Архитектура там такая:

<img src="https://i.ibb.co/0nSX6CC/2018-11-27-9-26-15.png" width="400"/>

Multi-task задача - обучение отдельных классификаторов более низкого уровня (над символами и словами) для предсказания тегов отдельными оптимизаторами.

# Дополнительные материалы

## Статьи
A Bi-model based RNN Semantic Frame Parsing Model for Intent Detection and Slot Filling, 2018 [[pdf]](http://aclweb.org/anthology/N18-2050)

Slot-Gated Modeling for Joint Slot Filling and Intent Prediction, 2018 [[pdf]](http://aclweb.org/anthology/N18-2118) 

Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings, 2018 [[pdf]](https://arxiv.org/pdf/1805.08237.pdf)

BERT for Joint Intent Classification and Slot Filling
 [[pdf]](https://arxiv.org/pdf/1902.10909.pdf)

## Блоги
[Как устроена Алиса](https://habr.com/company/yandex/blog/349372/)  