# Домашнее задание к занятию "Механизм внимания"

Решить задачу перевода с помощью механизма внимания

Возьмите англо-русскую пару фраз ([https://www.manythings.org/anki/](https://https://www.manythings.org/anki/))


Обучите на них seq2seq with attention
* На основе скалярного произведения
* На основе MLP


Оцените качество

In [None]:
%matplotlib inline

from io import open
import unicodedata
import string
import re
import random

import torch
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import pandas as pd

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

###  1.Загрузка и подготовка данных

In [None]:
# !wget https://download.pytorch.org/tutorial/data.zip
!wget https://www.manythings.org/anki/rus-eng.zip
!unzip rus-eng.zip

--2023-09-06 14:32:28--  https://www.manythings.org/anki/rus-eng.zip
Resolving www.manythings.org (www.manythings.org)... 173.254.30.110
Connecting to www.manythings.org (www.manythings.org)|173.254.30.110|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 15824155 (15M) [application/zip]
Saving to: ‘rus-eng.zip’


2023-09-06 14:32:30 (8.23 MB/s) - ‘rus-eng.zip’ saved [15824155/15824155]

Archive:  rus-eng.zip
  inflating: rus.txt                 
  inflating: _about.txt              


In [None]:
!mv '/content/rus.txt' '/content/eng-rus.txt'


In [None]:
!tail /content/eng-rus.txt

We need to uphold laws against discrimination — in hiring, and in housing, and in education, and in the criminal justice system. That is what our Constitution and our highest ideals require.	Нам нужно отстаивать законы против дискриминации при найме на работу, в жилищной сфере, в сфере образования и правоохранительной системе. Этого требуют наша Конституция и высшие идеалы.	CC-BY 2.0 (France) Attribution: tatoeba.org #5762728 (BHO) & #6390439 (odexed)
I've heard that you should never date anyone who is less than half your age plus seven. Tom is now 30 years old and Mary is 17. How many years will Tom need to wait until he can start dating Mary?	Я слышал, что никогда не следует встречаться с кем-то вдвое младше вас плюс семь лет. Тому 30 лет, a Мэри 17. Сколько лет Тому нужно ждать до тех пор, пока он сможет начать встречаться с Мэри?	CC-BY 2.0 (France) Attribution: tatoeba.org #10068197 (CK) & #10644473 (notenoughsun)
I do have one final ask of you as your president, the same thing I a

#### фукции для подготовки данных

In [None]:
sep = 'CC-BY'
with open('/content/eng-rus.txt') as file:
    for line in file:
        print(line.split(sep, 1)[0])
        break

Go.	Марш!	


In [None]:
# Создание словаря
SOS_token = 0
EOS_token = 1


class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

In [None]:
# нормализация текста
# Turn a Unicode string to plain ASCII, thanks to
# http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters

def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?а-яА-Я]+", r" ", s) # оставляем русс и англ буквы, точку и воскл. знак
    return s

In [None]:
#  чтение словаря

def readLangs(lang1, lang2, reverse=False):
    print("Reading lines...")

    # Read the file and split into lines
    lines = open('%s-%s.txt' % (lang1, lang2), encoding='utf-8').\
        read().strip().split('\n')
    lines = [i.split('\tCC-BY', 1)[0] for i in lines] # отрезаем все, что после \tCC-BY

        # Split every line into pairs and normalize
    pairs = [[normalizeString(s) for s in l.split('\t')] for l in lines]

    # Reverse pairs, make Lang instances
    if reverse:
        pairs = [list(reversed(p)) for p in pairs]
        input_lang = Lang(lang2)
        output_lang = Lang(lang1)
    else:
        input_lang = Lang(lang1)
        output_lang = Lang(lang2)

    return input_lang, output_lang, pairs

In [None]:
MAX_LENGTH = 10

eng_prefixes = (
    "i am ", "i m ",
    "he is", "he s ",
    "she is", "she s",
    "you are", "you re ",
    "we are", "we re ",
    "they are", "they re "
)


def filterPair(p):
    return len(p[0].split(' ')) < MAX_LENGTH and \
        len(p[1].split(' ')) < MAX_LENGTH and \
        p[1].startswith(eng_prefixes)


def filterPairs(pairs):
    return [pair for pair in pairs if filterPair(pair)]

In [None]:
def prepareData(lang1, lang2, reverse=False):
    input_lang, output_lang, pairs = readLangs(lang1, lang2, reverse)
    # print('input_lang:', input_lang )
    # print('output_lang:', output_lang)
    # print('pairs:', pairs)
    print("Read %s sentence pairs" % len(pairs))
    pairs = filterPairs(pairs)
    print("Trimmed to %s sentence pairs" % len(pairs))
    print("Counting words...")
    for pair in pairs:
        input_lang.addSentence(pair[0])
        output_lang.addSentence(pair[1])
    print("Counted words:")
    print(input_lang.name, input_lang.n_words)
    print(output_lang.name, output_lang.n_words)
    return input_lang, output_lang, pairs


input_lang, output_lang, pairs = prepareData('eng', 'rus', True)
print(random.choice(pairs))

Reading lines...
Read 479223 sentence pairs
Trimmed to 27825 sentence pairs
Counting words...
Counted words:
rus 10060
eng 4272
['предоставляю это вам .', 'i m leaving it to you .']


## Scalar multiplication

###The Encoder





In [None]:
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

### The Decoder





In [None]:
class ScalAttention(nn.Module):
    def __init__(self, hidden_size):
        super(ScalAttention, self).__init__()
        self.Wq = nn.Linear(hidden_size, hidden_size)
        self.Wk = nn.Linear(hidden_size, hidden_size)
        self.Wv = nn.Linear(hidden_size, 1)

    def forward(self, query, key, value):

        d_k = query.size(-1)
        score = torch.matmul(query, key.transpose(-2, -1))/math.sqrt(d_k)
        score = score.squeeze(2).unsqueeze(1)
        p_attn = F.softmax(score, dim=-1) # weights
        context = torch.matmul(p_attn, value)
        return context, p_attn

In [None]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attention = ScalAttention(self.hidden_size)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)
        self.dropout = nn.Dropout(self.dropout_p)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        embedded = embedded.squeeze(2)
        query = hidden.permute(1, 0, 2)
        context, attn_weights = self.attention(query, encoder_outputs, encoder_outputs)
        context = context.squeeze(2)

        output = torch.cat((embedded, context), dim=2)
        output = self.attn_combine(output).squeeze(2)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)
        output = F.log_softmax(self.out(output[0]), dim=1)

        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

In [None]:
def indexesFromSentence(lang, sentence):
    return [lang.word2index[word] for word in sentence.split(' ')]


def tensorFromSentence(lang, sentence):
    indexes = indexesFromSentence(lang, sentence)
    indexes.append(EOS_token)
    return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)


def tensorsFromPair(pair):
    input_tensor = tensorFromSentence(input_lang, pair[0])
    target_tensor = tensorFromSentence(output_lang, pair[1])
    return (input_tensor, target_tensor)

In [None]:
teacher_forcing_ratio = 0.5


def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=MAX_LENGTH):
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    loss = 0

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(
            input_tensor[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0, 0]

    decoder_input = torch.tensor([[SOS_token]], device=device)

    decoder_hidden = encoder_hidden

    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False

    if use_teacher_forcing:
        # Teacher forcing: Feed the target as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            loss += criterion(decoder_output, target_tensor[di])
            decoder_input = target_tensor[di]  # Teacher forcing

    else:
        # Without teacher forcing: use its own predictions as the next input
        for di in range(target_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            topv, topi = decoder_output.topk(1)
            decoder_input = topi.squeeze().detach()  # detach from history as input

            loss += criterion(decoder_output, target_tensor[di])
            if decoder_input.item() == EOS_token:
                break

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

In [None]:
import time
import math


def asMinutes(s):
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)


def timeSince(since, percent):
    now = time.time()
    s = now - since
    es = s / (percent)
    rs = es - s
    return '%s (- %s)' % (asMinutes(s), asMinutes(rs))

In [None]:
def trainIters(encoder, decoder, n_iters, print_every=1000, plot_every=100, learning_rate=0.01):
    start = time.time()
    plot_losses = []
    print_loss_total = 0  # Reset every print_every
    plot_loss_total = 0  # Reset every plot_every

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)
    training_pairs = [tensorsFromPair(random.choice(pairs))
                      for i in range(n_iters)]
    criterion = nn.NLLLoss()

    for iter in range(1, n_iters + 1):
        training_pair = training_pairs[iter - 1]
        input_tensor = training_pair[0]
        target_tensor = training_pair[1]

        loss = train(input_tensor, target_tensor, encoder,
                     decoder, encoder_optimizer, decoder_optimizer, criterion)
        print_loss_total += loss
        plot_loss_total += loss

        if iter % print_every == 0:
            print_loss_avg = print_loss_total / print_every
            print_loss_total = 0
            print('%s (%d %d%%) %.4f' % (timeSince(start, iter / n_iters),
                                         iter, iter / n_iters * 100, print_loss_avg))

        if iter % plot_every == 0:
            plot_loss_avg = plot_loss_total / plot_every
            plot_losses.append(plot_loss_avg)
            plot_loss_total = 0

    showPlot(plot_losses)
    return plot_losses

In [None]:
import matplotlib.pyplot as plt
plt.switch_backend('agg')
import matplotlib.ticker as ticker
import numpy as np


def showPlot(points):
    plt.figure()
    fig, ax = plt.subplots()
    # this locator puts ticks at regular intervals
    loc = ticker.MultipleLocator(base=0.2)
    ax.yaxis.set_major_locator(loc)
    plt.plot(points)

In [None]:
def evaluate(encoder, decoder, sentence, max_length=MAX_LENGTH):
    with torch.no_grad():
        input_tensor = tensorFromSentence(input_lang, sentence)
        input_length = input_tensor.size()[0]
        encoder_hidden = encoder.initHidden()

        encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei],
                                                     encoder_hidden)
            encoder_outputs[ei] += encoder_output[0, 0]

        decoder_input = torch.tensor([[SOS_token]], device=device)  # SOS

        decoder_hidden = encoder_hidden

        decoded_words = []
        decoder_attentions = torch.zeros(max_length, max_length)

        for di in range(max_length):
            decoder_output, decoder_hidden, decoder_attention = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            decoder_attentions[di] = decoder_attention.data
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == EOS_token:
                decoded_words.append('<EOS>')
                break
            else:
                decoded_words.append(output_lang.index2word[topi.item()])

            decoder_input = topi.squeeze().detach()

        return decoded_words, decoder_attentions[:di + 1]

In [None]:
def evaluateRandomly(encoder, decoder, n=10):
    for i in range(n):
        pair = random.choice(pairs)
        print('>', pair[0])
        print('=', pair[1])
        output_words, attentions = evaluate(encoder, decoder, pair[0])
        output_sentence = ' '.join(output_words)
        print('<', output_sentence)
        print('')

### обучение

In [None]:
hidden_size = 256
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

history = trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

1m 30s (- 21m 5s) (5000 6%) 3.0875
2m 53s (- 18m 50s) (10000 13%) 2.5980
4m 18s (- 17m 13s) (15000 20%) 2.3072
5m 42s (- 15m 43s) (20000 26%) 2.1394
7m 8s (- 14m 16s) (25000 33%) 1.9782
8m 34s (- 12m 51s) (30000 40%) 1.8626
9m 59s (- 11m 25s) (35000 46%) 1.7197
11m 25s (- 9m 59s) (40000 53%) 1.6465
12m 53s (- 8m 35s) (45000 60%) 1.5602
14m 19s (- 7m 9s) (50000 66%) 1.4824
15m 46s (- 5m 44s) (55000 73%) 1.4049
17m 12s (- 4m 18s) (60000 80%) 1.3248
18m 38s (- 2m 52s) (65000 86%) 1.3027
20m 5s (- 1m 26s) (70000 93%) 1.2429
21m 33s (- 0m 0s) (75000 100%) 1.1958


In [None]:
common_history = {}
common_history ['loss'] = {'scalar' : 0, 'mlp' : 0}
common_history['loss']['scalar']  = round(min(history), 4)

In [None]:
evaluateRandomly(encoder1, attn_decoder1)

> он известен как великии поэт .
= he is known as a great poet .
< he is as tall as he is . <EOS>

> ты не сумасшедшии .
= you aren t crazy .
< you aren t crazy . <EOS>

> вы слишком молоды .
= you re too young .
< you re too young . <EOS>

> он биолог .
= he s a biologist .
< he is a . <EOS>

> я ищу брата .
= i m looking for my brother .
< i m looking for an . <EOS>

> он очень хорош в игре на гитаре .
= he s very good at playing guitar .
< he is very good at playing . . <EOS>

> мы едем с тобои .
= we re going with you .
< we re going with you . <EOS>

> я скоро ухожу .
= i m about to leave .
< i m going to wait . <EOS>

> вы самодовольны .
= you re vain .
< you re vain . <EOS>

> ты с ума сошел !
= you re nuts !
< you re drunk ! <EOS>



## MLP

###The Encoder





In [None]:
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output = embedded
        output, hidden = self.gru(output, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

### The Decoder





In [None]:
class MultAttention(nn.Module):
    def __init__(self, hidden_size):
        super(MultAttention, self).__init__()
        self.Wq = nn.Linear(hidden_size, hidden_size)
        self.Wk = nn.Linear(hidden_size, hidden_size)
        self.Wv = nn.Linear(hidden_size, 1)

    def forward(self, query, key):

        score = self.Wv(torch.tanh(self.Wq(query) + self.Wk(key)))
        score = score.squeeze(2).unsqueeze(1)

        p_attn = F.softmax(score, dim=-1) # weights
        context = torch.matmul(p_attn, key)
        return context, p_attn

In [None]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attention = MultAttention(self.hidden_size)

        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)

        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)
        self.dropout = nn.Dropout(self.dropout_p)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        embedded = embedded.squeeze(2)

        query = hidden.permute(1, 0, 2)
        context, attn_weights = self.attention(query, encoder_outputs)

        context = context.squeeze(2)
        input_gru = torch.cat((embedded, context), dim=2)
        output = self.attn_combine(input_gru).squeeze(2)

        output = F.relu(output)
        output, hidden = self.gru(output, hidden)
        output = F.log_softmax(self.out(output[0]), dim=1)

        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

### обучение

In [None]:
hidden_size = 256
encoder1 = EncoderRNN(input_lang.n_words, hidden_size).to(device)
attn_decoder1 = AttnDecoderRNN(hidden_size, output_lang.n_words, dropout_p=0.1).to(device)

history = trainIters(encoder1, attn_decoder1, 75000, print_every=5000)

1m 39s (- 23m 17s) (5000 6%) 3.1209
3m 14s (- 21m 6s) (10000 13%) 2.5912
4m 54s (- 19m 37s) (15000 20%) 2.3438
6m 31s (- 17m 57s) (20000 26%) 2.1091
8m 8s (- 16m 17s) (25000 33%) 1.9918
9m 45s (- 14m 38s) (30000 40%) 1.8631
11m 22s (- 12m 59s) (35000 46%) 1.7425
12m 58s (- 11m 21s) (40000 53%) 1.6570
14m 34s (- 9m 42s) (45000 60%) 1.5507
16m 11s (- 8m 5s) (50000 66%) 1.4698
17m 48s (- 6m 28s) (55000 73%) 1.4179
19m 26s (- 4m 51s) (60000 80%) 1.3378
21m 3s (- 3m 14s) (65000 86%) 1.3075
22m 40s (- 1m 37s) (70000 93%) 1.2179
24m 17s (- 0m 0s) (75000 100%) 1.1884


In [None]:
common_history['loss']['mlp']  = round(min(history), 4)

In [None]:
evaluateRandomly(encoder1, attn_decoder1)

> сегодня у тебя получается намного лучше .
= you re doing that much better today .
< you re much better today today . <EOS>

> они ваяют статую из мрамора .
= they are chiseling a statue out of marble .
< they are a of of a . <EOS>

> я гражданское лицо .
= i m a civilian .
< i am a a . . <EOS>

> он смышленыи .
= he s smart .
< he is going . <EOS>

> ты по идее должен работать .
= you re supposed to be working .
< you re supposed to be working . <EOS>

> я предусмотрительныи .
= i m prudent .
< i m hungry . <EOS>

> он прекрасныи молодои человек .
= he s a fine young man .
< he s a man man . <EOS>

> мы все еще в опасности .
= we re still in danger .
< we re still in danger . <EOS>

> на самом деле вы правы .
= you re actually right .
< you re right right . . <EOS>

> ты сильнее тома .
= you re stronger than tom .
< you re stronger than tom . <EOS>



## Результаты

In [None]:
compare_result = pd.DataFrame.from_dict(common_history)
compare_result

Unnamed: 0,loss
mlp,0.9032
scalar,0.9355
