# End to End Memory Network

---

![](./figs/E2EMN.png)

## Single Layer

### Sentences: 

$$X = [x_1, x_2, \cdots, x_n]: n \times T_c$$

* $n$: number of sentences in context
* $T_c$: max length of a sentence in context

### Embeding Matrix: 

$$\begin{aligned}
A &: d \times V \\
B &: d \times V \\
C &: d \times V
\end{aligned}$$

$$\begin{aligned}
m_i &= \sum_j Ax_{ij}: T_c \times d \\ 
c_i &= \sum_j Cx_{ij}: T_c \times d\\
u &= \sum_j Bq_{j}: T_q \times d
\end{aligned}$$

total embedding of context: $M : n \times T_c \times d$
* $m_i(c_i)$: summation embedded for each sentence in context as length of $T_c$, $1 \times d$
* $u$: summation embedded for query(question) as length of $T_q$, $1 \times d$
* $score_i = m_iu^T: (1 \times d) \cdot (d \times 1) = 1 \times 1$

### attention:
$$\begin{aligned}
p_i &= softmax(score_i): 1 \times 1 \\
o_i &= c_i \otimes p_i : d \times 1 \\
\end{aligned}$$

### summation vectors to linear layer:

$$\begin{aligned}
inputs &= u + o : d \times 1 \\
a &= softmax(W \cdot inputs) : (V \times d) \times (d \times 1) = V \times 1
\end{aligned}$$

### postion encoding(PE):

for each story(sentence) memory $m_i, c_i$
$$\begin{aligned}
m_i &= \sum_j l_j \otimes Ax_{ij}: T_c \times d \\ 
l_{jk} &= (1-\frac{j}{J}) - (\frac{k}{d})(1-\frac{2j}{J})
\end{aligned}$$

remember, $l$ is a matrix that size is $T_c \times d$

* $J$: number of word in sentences
* $j$: index of words
* $d$: dimension of embedding
* $k$: index of embedding dimension

### temporal encoding(TE):
* EX)
> Sam walks into the kitchen.
>
> Sam picks up an apple.
>
> Sam walks into the bedroom. 
>
> Sam drops the apple. 
>
> Q: Where is the apple? 
>
> A. Bedroom


Many of the QA tasks require some notion of temporal context, i.e. the model needs to understand that Sam is in the bedroom after he is in the kitchen. To enable our model to address them, we modify the memory vector.

$$m_i = \sum_j l_j \otimes Ax_{ij} + T_A(i)$$

* $T_A(i)$: temporal encoding, size of $n \times d$

## Reference

https://arxiv.org/pdf/1503.08895.pdf

https://github.com/nmhkahn/MemN2N-pytorch/blob/master/memn2n/model.py

## Load Packages

In [1]:
import os
import sys
sys.path.append('/'.join(os.getcwd().split('/')[:-1]))
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim
import torch.nn.functional as F
import random
import numpy as np
from model.bAbI_utils_loader import bAbIDataset, bAbIDataLoader
from model.E2EMN_model import E2EMN

USE_CUDA = torch.cuda.is_available()
DEVICE = 0 if USE_CUDA else -1

# Model

## Settings: Train_loader & Parameters

In [2]:
path_train = '../data/QA_bAbI_tasks/en-10k/qa1_single-supporting-fact_train.txt'
bAbI_train = bAbIDataset(path_train, train=True, return_masks=True)
train_loader = bAbIDataLoader(dataset=bAbI_train, batch_size=32, shuffle=True, to_tensor=True)

In [3]:
VOCAB_SIZE = len(bAbI_train.word2idx)
EMBED_SIZE = 50
N_HOPS = 3
LR = 0.01
STEP = 100
MAX_STORY_LEN = bAbI_train.max_story_len
BATCH_SIZE = 32
EARLY_STOPPING = False
ENCODING_METHOD = 'basic'
TEMPORAL = False

In [4]:
def get_cuda(*args):
    return [x.cuda() for x in args]

## Settings: Loss Function & Optimizer

In [5]:
model = E2EMN(VOCAB_SIZE, EMBED_SIZE, n_hops=N_HOPS, encoding_method=ENCODING_METHOD, 
              temporal=TEMPORAL, use_cuda=USE_CUDA, max_story_len=MAX_STORY_LEN)

if USE_CUDA:
    model = model.cuda()
    
loss_function = nn.CrossEntropyLoss(ignore_index=0)
optimizer = optim.Adam(model.parameters(), lr=LR)
scheduler = optim.lr_scheduler.MultiStepLR(gamma=0.5, milestones=[25, 50, 75], optimizer=optimizer)

## Train

In [6]:
model.train()
for step in range(STEP):
    losses=[]
    scheduler.step()
    if EARLY_STOPPING:
        break
    for i, batch in enumerate(train_loader.load()):
        stories, stories_masks, questions, _, answers, _ = batch
        
        if USE_CUDA:
            stories, stories_masks, questions, answers = get_cuda(stories, stories_masks, questions, answers)

        model.zero_grad()
        
        preds = model(stories, questions, stories_masks=stories_masks)
        
        loss = loss_function(preds, answers.view(-1))
        losses.append(loss.data[0])
        
        loss.backward()
        optimizer.step()
    
    if step % 5 == 0:
        string = '[{}/{}] loss: {:.4f}, lr: {},'.format(step+1, STEP, np.mean(losses), scheduler.get_lr()[0])
        print(string)
        if np.mean(losses) < 0.01:
            EARLY_STOPPING = True
            print("Early Stopping!")
            break
        losses=[]

[1/100] loss: 0.9052, lr: 0.01,
[6/100] loss: 0.6717, lr: 0.01,
[11/100] loss: 0.6718, lr: 0.01,
[16/100] loss: 0.6605, lr: 0.01,
[21/100] loss: 0.6549, lr: 0.01,
[26/100] loss: 0.6228, lr: 0.005,
[31/100] loss: 0.6199, lr: 0.005,
[36/100] loss: 0.6169, lr: 0.005,
[41/100] loss: 0.6195, lr: 0.005,
[46/100] loss: 0.6164, lr: 0.005,
[51/100] loss: 0.5980, lr: 0.0025,
[56/100] loss: 0.5926, lr: 0.0025,
[61/100] loss: 0.5950, lr: 0.0025,
[66/100] loss: 0.5932, lr: 0.0025,
[71/100] loss: 0.5930, lr: 0.0025,
[76/100] loss: 0.5840, lr: 0.00125,
[81/100] loss: 0.5809, lr: 0.00125,
[86/100] loss: 0.5800, lr: 0.00125,
[91/100] loss: 0.5790, lr: 0.00125,
[96/100] loss: 0.5792, lr: 0.00125,


In [7]:
model_path = '../model/E2EMN_basic.model'

In [8]:
torch.save(model.state_dict(), model_path)

### Load model

In [9]:
model = E2EMN(VOCAB_SIZE, EMBED_SIZE, n_hops=N_HOPS, encoding_method=ENCODING_METHOD, 
              temporal=TEMPORAL, use_cuda=USE_CUDA, max_story_len=MAX_STORY_LEN)
if USE_CUDA:
    model = model.cuda()
    model.load_state_dict(torch.load(model_path))
else:
    model.load_state_dict(torch.load(model_path, map_location=lambda storage, loc: storage))

## Test

In [10]:
path_test = '../data/QA_bAbI_tasks/en-10k/qa1_single-supporting-fact_test.txt'
bAbI_test = bAbIDataset(path_test, train=False, vocab=bAbI_train.word2idx, return_masks=True)
test_loader = bAbIDataLoader(dataset=bAbI_test, batch_size=32, shuffle=False, to_tensor=True)

In [11]:
model.eval()
accuracy = 0
for i, batch in enumerate(test_loader.load()):
    stories, stories_masks, questions, _, answers, _ = batch
    
    if USE_CUDA:
        stories = [get_cuda(x) for x in stories]
        stories_masks = [get_cuda(x) for x in stories_masks]
        questions, answers = get_cuda(questions, answers)
    
    for story, mask, q, a in zip(stories, stories_masks, questions, answers):
        model.zero_grad()
        pred = model(story.unsqueeze(0), q.unsqueeze(0), stories_masks=mask.unsqueeze(0))
        accuracy += torch.eq(torch.max(pred, 1)[1], a).data[0]

print('Accuracy: {}'.format(accuracy/len(bAbI_test)))

Accuracy: 0.65


## Test: random print

In [12]:
idx2w = bAbI_test.idx2word
story, mask, q, _, a, _ = bAbI_test.pad_to_story([random.choice(bAbI_test.data)])
story, mask, q, a = [test_loader._to_tensor(x) for x in [story, mask, q, a]]
model.zero_grad()
pred = model(story, q, stories_masks=mask)
pred_a = torch.max(pred, 1)[1]

print("Facts : ")
print('-'*45)
print('\n'.join([' '.join(list(map(lambda x: idx2w[x], f))) for f in story[0].data.tolist()]))
print('-'*45)
print("Question : ",' '.join(list(map(lambda x: idx2w[x], q.data.tolist()[0]))))
print('-'*45)
print("Answer : ",' '.join(list(map(lambda x: idx2w[x], a.squeeze(1).data.tolist()))))
print("Prediction : ",' '.join(list(map(lambda x: idx2w[x], pred_a.data.tolist()))))

Facts : 
---------------------------------------------
john travelled to the kitchen <pad>
sandra moved to the bathroom <pad>
daniel moved to the kitchen <pad>
sandra moved to the kitchen <pad>
daniel went to the hallway <pad>
john went to the office <pad>
sandra went back to the garden
sandra went to the bedroom <pad>
---------------------------------------------
Question :  where is sandra ?
---------------------------------------------
Answer :  bedroom
Prediction :  bedroom
