Kamaneh Akhavan (gusakhka)

## Assignment 2: More Chinese-language experiments

In this assignment, you will work with the Demo 2.1 - Chinese word segmentation on Chinese word segmentation. The notebook trains a model that is very successful at determining word boundaries in Chinese text, where word boundaries are binary-encoded on a per-character basis with 1 being the first character of a word and 0 being the second character. The model uses character embeddings and an LSTM to model the word boundaries. You will copy and modify Demo 2.1's notebook as below and write up in the notebook Markdown what you did and how well it performed.

In [20]:
import sys
import os
import numpy as np
import torch

In [21]:
device = torch.device('cuda:3')

I added the start token as $ and end token as & ,the rest code is as what it was

In [22]:
def read_chinese_data(inputfilename):
    with open(inputfilename, "r") as inputfile:
        sentences, collection_words, collection_labels = [], ['$'], []
        for line in inputfile:
            if line[0] == '#':
                continue
            columns = line.split()
            #print(words)
            if columns == []: #when find the line is blank
                sentences.append((''.join(collection_words)+'&', collection_labels)) #append tuple of x and y
                collection_words = [] #reset collection words
                collection_labels = [] #reset the collection labels
                continue
            collection_words.append(columns[1])
            collection_labels += [1] + ([0] * (len(columns[1]) - 1)) #1 for the first character
            
    return sentences

In [23]:
train_sentences = read_chinese_data('/scratch/lt2316-h20-resources/zh_gsd-ud-train.conllu')


In [24]:
print(train_sentences[0])

('$看似簡單，只是二選一做決擇，但其實他們代表的是你周遭的親朋好友，試著給你不同的意見，但追根究底，最後決定的還是自己。&', [1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1])


In [25]:
test_sentences = read_chinese_data('/scratch/lt2316-h20-resources/zh_gsd-ud-test.conllu')

In [26]:
print(test_sentences[0])

('$然而，這樣的處理也衍生了一些問題。&', [1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1])


In index_chars we are getting an index for each character in the set and getting all the characters and the return the character list 

In [27]:
def index_chars(sentences):
    megasentence = ''.join(sentences)
    char_list = set()
    for c in megasentence:
        char_list.add(c)
    char_list = [0] + list(char_list)
    return char_list, {char_list[x]:x for x in range(len(char_list))}

here we used the first item in train_sentences because the first function returns a tuple,so we needed just x[0]

In [28]:
int_index, char_index = index_chars([x[0] for x in train_sentences + test_sentences])

In [10]:
# all_values = char_index. values()
# max_value = max(all_values) 
# print(max_value)

In [11]:
#int_index

In [29]:
def convert_sentence(sentence, index):
    #every x in the sentence got a number as an index
    return [index[x] for x in sentence]

In [30]:
def pad_lengths(sentences, max_length, padding=0):
    return [x + ([padding] * (max_length - len(x))) for x in sentences]

In [31]:
def create_dataset(x, device="cpu"):
    converted = [(convert_sentence(x1[0], char_index), x1[1]) for x1 in x]
    X, y = zip(*converted)
    #here get the real lengths of the Xs (sentences)
    lengths = [len(x2) for x2 in X]
    padded_X = pad_lengths(X, max(lengths))
    #padding the sentences
    Xt = torch.LongTensor(padded_X).to(device)
    padded_y = pad_lengths(y, max(lengths), padding=-1) 
    # padding the labels with -1
    yt = torch.LongTensor(padded_y).to(device)
    lengths_t = torch.LongTensor(lengths).to(device)
    return Xt, lengths_t, yt

In [32]:
train_X_tensor, train_lengths_tensor, train_y_tensor = create_dataset(train_sentences, device)
test_X_tensor, test_lengths_tensor, test_y_tensor = create_dataset(test_sentences, device)

## Packing the sequences for RNN

In [35]:
testtensor = torch.randn((10,100,200))

In [36]:
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence

In [37]:
testlengths = torch.randint(1, 100, (10,))

In [38]:
testlengths.size(), testlengths

(torch.Size([10]), tensor([52, 60, 35, 45, 99, 99,  1, 46, 40, 89]))

In [43]:
packed = pack_padded_sequence(testtensor, testlengths, batch_first=True, enforce_sorted=False)

In [45]:
#testtensor

In [47]:
#packed

In [48]:
len(packed.batch_sizes)

99

In [49]:
unpacked = pad_packed_sequence(packed, batch_first=True, total_length=100)

## Batching (based on 1.0, 1.1, 1.2)

In [50]:
class Batcher:
    def __init__(self, X, lengths, y, device, batch_size=50, max_iter=None):
        self.X = X
        self.lengths = lengths # We need the lengths to efficiently use the padding.
        self.y = y
        self.device = device
        self.batch_size=batch_size
        self.max_iter = max_iter
        self.curr_iter = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.curr_iter == self.max_iter:
            raise StopIteration
        permutation = torch.randperm(self.X.size()[0], device=self.device)
        permX = self.X[permutation]
        permlengths = self.lengths[permutation]
        permy = self.y[permutation]
        splitX = torch.split(permX, self.batch_size)
        splitlengths = torch.split(permlengths, self.batch_size)
        splity = torch.split(permy, self.batch_size)
        
        self.curr_iter += 1
        return zip(splitX, splitlengths, splity)

In [51]:
b = Batcher(train_X_tensor, train_lengths_tensor, train_y_tensor, torch.device('cuda:2'), max_iter=100)

In [52]:
import matplotlib.pyplot as plt

In [53]:
testbatching = next(b)

In [54]:
testbatching

<zip at 0x7f0a4c46b500>

## Modeling

In [64]:
import torch.nn as nn

In [65]:
import torch.optim as optim

In [66]:
emb = nn.Embedding(len(int_index), 200, 0).to("cuda:3")

In [104]:
testembs = emb(testX)

In [118]:
#testembs

In [105]:
testembs.size()

torch.Size([50, 183, 200])

In [106]:
testembs.device

device(type='cuda', index=3)

In [107]:
testlstm = nn.LSTM(200, 150, batch_first=True).to("cuda:3")

In [108]:
testembspadded = pack_padded_sequence(testembs, testlengths.to("cpu"), batch_first=True, enforce_sorted=False)

In [109]:
testoutput, teststate = testlstm(testembspadded)

In [112]:
testunpacked[0].size()

torch.Size([50, 85, 150])

In [132]:
testoutput2.size()

torch.Size([50, 125, 150])

In [144]:
testoutput4 = testsoft(testoutput2)

In [145]:
testoutput4

tensor([[[-5.0105, -5.0166, -4.9918,  ..., -4.9796, -5.0007, -4.9982],
         [-5.0201, -5.0191, -4.9448,  ..., -4.9892, -5.0050, -5.0103],
         [-4.9931, -4.9821, -4.9855,  ..., -4.9500, -5.0234, -5.0426],
         ...,
         [-5.0106, -5.0106, -5.0106,  ..., -5.0106, -5.0106, -5.0106],
         [-5.0106, -5.0106, -5.0106,  ..., -5.0106, -5.0106, -5.0106],
         [-5.0106, -5.0106, -5.0106,  ..., -5.0106, -5.0106, -5.0106]],

        [[-5.0638, -5.0391, -4.9492,  ..., -5.0267, -4.9888, -4.9855],
         [-4.9903, -5.0280, -4.9839,  ..., -5.0390, -4.9588, -5.0018],
         [-4.9858, -5.0270, -4.9923,  ..., -5.0228, -4.9568, -4.9571],
         ...,
         [-5.0106, -5.0106, -5.0106,  ..., -5.0106, -5.0106, -5.0106],
         [-5.0106, -5.0106, -5.0106,  ..., -5.0106, -5.0106, -5.0106],
         [-5.0106, -5.0106, -5.0106,  ..., -5.0106, -5.0106, -5.0106]],

        [[-4.9891, -5.0359, -5.0473,  ..., -5.0063, -4.9655, -5.0486],
         [-5.0418, -5.0263, -5.0751,  ..., -4

In [146]:
testy_short = testy[:, :max(testlengths)]

In [147]:
testy_short

tensor([[ 1,  1,  1,  ..., -1, -1, -1],
        [ 1,  0,  1,  ..., -1, -1, -1],
        [ 1,  0,  1,  ..., -1, -1, -1],
        ...,
        [ 1,  1,  1,  ..., -1, -1, -1],
        [ 1,  1,  1,  ..., -1, -1, -1],
        [ 1,  0,  1,  ..., -1, -1, -1]], device='cuda:3')

In [148]:
testy_short.size()

torch.Size([50, 125])

In [149]:
max(testlengths)

tensor(125, device='cuda:3')

In [150]:
testpermuted = testoutput4.permute(0, 2, 1)

In [68]:
#testpermuted

In [59]:
#nllloss(testpermuted, testy_short)

tensor(0.7527, device='cuda:2', grad_fn=<NllLoss2DBackward>)

## MODEL 1

In [69]:
device = torch.device('cuda:3')

In [36]:
class Segmenter(nn.Module):
    def __init__(self, vocab_size, emb_size):
        super().__init__()
        
        self.vocab_size = vocab_size
        self.emb_size = emb_size
        
        self.emb = nn.Embedding(self.vocab_size, self.emb_size, 0)
        self.lstm = nn.LSTM(self.emb_size, 150, batch_first=True)
        self.sig1 = nn.Sigmoid()
        self.lin = nn.Linear(150, 2)
        self.softmax = nn.LogSoftmax(2)
        
    def forward(self, x, lengths):
        embs = self.emb(x)
        packed = pack_padded_sequence(embs, lengths.to("cpu"), batch_first=True, enforce_sorted=False)
        output1, _ = self.lstm(packed)
        unpacked, _ = pad_packed_sequence(output1, batch_first=True)
        output2 = self.sig1(unpacked)
        output3 = self.lin(output2)
        return self.softmax(output3)
        

In [37]:
def train(X, lengths, y, vocab_size, emb_size, batch_size, epochs, device, model=None):
    b = Batcher(X, lengths, y, device, batch_size=batch_size, max_iter=epochs)
    if not model:
        m = Segmenter(vocab_size, emb_size).to(device)
    else:
        m = model
    loss = nn.NLLLoss(ignore_index=-1)
    optimizer = optim.Adam(m.parameters(), lr=0.005)
    epoch = 0
    
    for split in b:
        tot_loss = 0
        for batch in split:
            optimizer.zero_grad()
            o = m(batch[0], batch[1])
            l = loss(o.permute(0,2,1), batch[2][:, :max(batch[1])])
            tot_loss += l
            l.backward()
            optimizer.step()
        print("Total loss in epoch {} is {}.".format(epoch, tot_loss))
        epoch += 1
    return m

In [38]:
model1 = train(train_X_tensor, train_lengths_tensor, train_y_tensor, len(int_index), 200, 50, 100, device)

Total loss in epoch 0 is 32.16783142089844.
Total loss in epoch 1 is 17.90723419189453.
Total loss in epoch 2 is 13.259632110595703.
Total loss in epoch 3 is 10.057083129882812.
Total loss in epoch 4 is 7.719925880432129.
Total loss in epoch 5 is 5.9296698570251465.
Total loss in epoch 6 is 4.612331867218018.
Total loss in epoch 7 is 3.475146532058716.
Total loss in epoch 8 is 2.770770311355591.
Total loss in epoch 9 is 2.3439531326293945.
Total loss in epoch 10 is 2.1047866344451904.
Total loss in epoch 11 is 1.8222631216049194.
Total loss in epoch 12 is 1.6412441730499268.
Total loss in epoch 13 is 1.7783823013305664.
Total loss in epoch 14 is 1.807427167892456.
Total loss in epoch 15 is 1.665467381477356.
Total loss in epoch 16 is 1.2753825187683105.
Total loss in epoch 17 is 0.9465288519859314.
Total loss in epoch 18 is 0.6898490786552429.
Total loss in epoch 19 is 0.4261019229888916.
Total loss in epoch 20 is 0.2636813521385193.
Total loss in epoch 21 is 0.1775134652853012.
Total 

In [39]:
torch.save(model1, 'model1.pt')

## Evaluation MODEL1

In [40]:
model1.eval()

Segmenter(
  (emb): Embedding(3649, 200, padding_idx=0)
  (lstm): LSTM(200, 150, batch_first=True)
  (sig1): Sigmoid()
  (lin): Linear(in_features=150, out_features=2, bias=True)
  (softmax): LogSoftmax(dim=2)
)

In [41]:
with torch.no_grad():
    rawpredictions = model1(test_X_tensor, test_lengths_tensor)

In [42]:
rawpredictions.size()

torch.Size([500, 157, 2])

In [240]:
#rawpredictions

In [82]:
import math
math.log2(0.9), math.log2(0.8)

(-0.15200309344504995, -0.3219280948873623)

In [43]:
predictions = torch.argmax(rawpredictions, 2)

In [44]:
predictions

tensor([[1, 0, 1,  ..., 1, 1, 1],
        [1, 0, 1,  ..., 1, 1, 1],
        [1, 0, 1,  ..., 1, 1, 1],
        ...,
        [1, 0, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1],
        [1, 1, 1,  ..., 1, 1, 1]], device='cuda:3')

In [45]:
predictions.size()

torch.Size([500, 157])

In [246]:
#predictions[0]

In [None]:
test_sentences[0]

In [None]:
test_y_tensor[0]

In [249]:
test_lengths_tensor[0]

tensor(29, device='cuda:3')

In [46]:
collectpreds = []
collecty = []

In [47]:
for i in range(test_X_tensor.size(0)):
    collectpreds.append(predictions[i][:test_lengths_tensor[i]])
    collecty.append(test_y_tensor[i][:test_lengths_tensor[i]])

In [48]:
collecty

[tensor([ 1,  0,  1,  1,  0,  1,  1,  0,  1,  1,  0,  1,  1,  0,  1,  0,  1, -1,
         -1], device='cuda:3'),
 tensor([ 1,  0,  1,  0,  0,  0,  1,  1,  0,  1,  1,  0,  1,  0,  1,  0,  1,  1,
          0,  1,  1,  0,  1,  1,  0,  1,  1,  1,  0,  1,  0,  1, -1],
        device='cuda:3'),
 tensor([ 1,  0,  0,  1,  1,  0,  1,  0,  1,  1,  0,  1,  0,  1,  1,  1,  0,  1,
          1,  1,  0,  1,  1,  0,  1,  0,  1,  1,  0,  1,  0,  0,  1,  1,  0,  1,
          0,  0,  0,  1, -1], device='cuda:3'),
 tensor([ 1,  0,  1,  0,  1,  0,  1,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  0,
          1,  1,  1,  0,  1,  1,  1,  1,  0,  1,  1,  1,  1,  1, -1],
        device='cuda:3'),
 tensor([ 1,  0,  1,  1,  0,  1,  1,  1,  1,  1,  0,  1,  1,  0,  1, -1],
        device='cuda:3'),
 tensor([ 1,  0,  1,  0,  1,  0,  1,  0,  1,  1,  0,  1,  0,  1,  1,  0,  1,  1,
          0,  1,  1,  0,  1,  0,  1,  1,  1,  1,  1,  1,  0,  0,  1,  1,  0,  1,
          1,  0,  1,  1,  0,  1,  1,  0,  1,  1,  0,  1,  1, 

In [49]:
allpreds = torch.cat(collectpreds)

In [254]:
allpreds.size()

torch.Size([21713])

In [50]:
classes = torch.cat(collecty)

In [51]:
allpreds, classes

(tensor([1, 0, 1,  ..., 0, 1, 1], device='cuda:3'),
 tensor([ 1,  0,  1,  ...,  0,  1, -1], device='cuda:3'))

In [52]:
classes.size()

torch.Size([19707])

In [53]:
classes = classes.float()
allpreds = allpreds.float()

In [54]:
tp = sum(classes * allpreds)
fp = sum(classes * (~allpreds.bool()).float())
tn = sum((~classes.bool()).float() * (~allpreds.bool()).float())
fn = sum((~classes.bool()).float() * allpreds)

tp, fp, tn, fn

(tensor(10900., device='cuda:3'),
 tensor(611., device='cuda:3'),
 tensor(6486., device='cuda:3'),
 tensor(708., device='cuda:3'))

In [55]:
accuracy = (tp + tn) / (tp + fp + tn + fn)
accuracy

tensor(0.9295, device='cuda:3')

In [56]:
recall = tp / (tp + fn)
recall

tensor(0.9390, device='cuda:3')

In [57]:
precision = tp / (tp + fp)
precision

tensor(0.9469, device='cuda:3')

In [58]:
f1 = (2 * recall * precision) / (recall + precision)
f1

tensor(0.9429, device='cuda:3')

## Model2 part 1
 Sentence generation (15 points).
Convert the model in Demo 2.1 into a character-based sentence generator. (Strip out the word segmentation objective.) The model should, given a start symbol, produce a variety of sentences that terminate with a stop symbol (you will have to add these to the data). The sentences that it generates should be of reasonable average length compared to the sentences in the training corpus (this needn't be precise).

In [89]:
class Segmenter1(nn.Module):
    def __init__(self, vocab_size, emb_size):
        super().__init__()
        
        self.vocab_size = vocab_size
        self.emb_size = emb_size
        
        self.emb = nn.Embedding(self.vocab_size, self.emb_size, 0)
        self.lstm = nn.LSTM(self.emb_size, 150, batch_first=True)
        #adding so many zeroes in the out put
        self.sig1 = nn.Sigmoid()
        #torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
        #here I need change the out_features to vocab size ,to get probability for each character of the vocab. 
        self.lin = nn.Linear(150, self.vocab_size)
        self.softmax = nn.LogSoftmax(2)
        
    def forward(self, x, lengths):
        embs = self.emb(x)
        packed = pack_padded_sequence(embs, lengths.to("cpu"), batch_first=True, enforce_sorted=False)
        output1, _ = self.lstm(packed)
        unpacked, _ = pad_packed_sequence(output1, batch_first=True)
        output2 = self.sig1(unpacked)
        output3 = self.lin(output2)
        return self.softmax(output3)
        

In [91]:
def train1(X, lengths, y, vocab_size, emb_size, batch_size, epochs, device, model=None):
    b = Batcher(X, lengths, y, device, batch_size=batch_size, max_iter=epochs)
    if not model:
        m = Segmenter1(vocab_size, emb_size).to(device)
    else:
        m = model
    loss = nn.NLLLoss(ignore_index=-1)
    optimizer = optim.Adam(m.parameters(), lr=0.005)
    epoch = 0
    
    
    for split in b:
        tot_loss = 0
        for batch in split:
            optimizer.zero_grad()
            #batch[1] represent the sentence's length
            o = m(batch[0], batch[1])
            l = loss(o[:, :-1, :].permute(0,2,1), batch[0][:, 1:max(batch[1])])
            
            
            tot_loss += l
            
            l.backward()
            optimizer.step()
        print("Total loss in epoch {} is {}.".format(epoch, tot_loss))
        epoch += 1
    
    return m

In [92]:

model2 = train1(train_X_tensor, train_lengths_tensor, train_y_tensor, len(int_index), 200, 50, 100, device)


Total loss in epoch 0 is 257.7660217285156.
Total loss in epoch 1 is 207.2502899169922.
Total loss in epoch 2 is 202.6263885498047.
Total loss in epoch 3 is 196.0125274658203.
Total loss in epoch 4 is 196.49452209472656.
Total loss in epoch 5 is 191.0537109375.
Total loss in epoch 6 is 189.6737060546875.
Total loss in epoch 7 is 188.6050567626953.
Total loss in epoch 8 is 186.0637969970703.
Total loss in epoch 9 is 185.1094207763672.
Total loss in epoch 10 is 184.87437438964844.
Total loss in epoch 11 is 180.33682250976562.
Total loss in epoch 12 is 180.42459106445312.
Total loss in epoch 13 is 176.43946838378906.
Total loss in epoch 14 is 178.02694702148438.
Total loss in epoch 15 is 176.5436553955078.
Total loss in epoch 16 is 174.09523010253906.
Total loss in epoch 17 is 174.81236267089844.
Total loss in epoch 18 is 171.57899475097656.
Total loss in epoch 19 is 177.17396545410156.
Total loss in epoch 20 is 169.50892639160156.
Total loss in epoch 21 is 172.04469299316406.
Total loss 

In [21]:
torch.save(model2, 'model2.pt')

In [337]:
#model2 = torch.load('model2').to('cpu')

PART1 REPORT:
for getting the probability for each character in the vocabs, the  size of the output of the linear layer must be equal to the size of the vocabularies,So I changed it in the code. 
In train part:
we dont need the second dimension of the output in the model cause it would be the end token ,which we did it in the loss part and also we should get rid of the first element which is the start token that i added at first.


## Evaluation

In [22]:
model2.to(device)

Segmenter1(
  (emb): Embedding(3649, 200, padding_idx=0)
  (lstm): LSTM(200, 150, batch_first=True)
  (sig1): Sigmoid()
  (lin): Linear(in_features=150, out_features=3649, bias=True)
  (softmax): LogSoftmax(dim=2)
)

In [29]:
model2.eval()

Segmenter1(
  (emb): Embedding(3649, 200, padding_idx=0)
  (lstm): LSTM(200, 150, batch_first=True)
  (sig1): Sigmoid()
  (lin): Linear(in_features=150, out_features=3649, bias=True)
  (softmax): LogSoftmax(dim=2)
)

In [53]:
with torch.no_grad():
    rawpredictions = model2(test_X_tensor, test_lengths_tensor)

In [57]:

predictions = torch.argmax(rawpredictions, 2)



In [58]:
collectpreds = []
collecty = []



In [61]:
for i in range(test_X_tensor.size(0)):
    collectpreds.append(predictions[i][:test_lengths_tensor[i]])
    collecty.append(test_y_tensor[i][:test_lengths_tensor[i]])
    


In [62]:
allpreds = torch.cat(collectpreds).float()
classes = torch.cat(collecty).float()

In [63]:
tp2 = sum(classes * allpreds)
fp2 = sum(classes * (~allpreds.bool()).float())
tn2 = sum((~classes.bool()).float() * (~allpreds.bool()).float())
fn2 = sum((~classes.bool()).float() * allpreds)

In [64]:
accuracy2 = (tp2 + tn2) / (tp2 + fp2 + tn2 + fn2)
accuracy2

tensor(0.6000, device='cuda:3')

In [65]:
recall2 = tp2 / (tp2 + fn2)
recall2

tensor(0.6000, device='cuda:3')

In [66]:
precision2 = tp2 / (tp2 + fp2)
precision2

tensor(1.0000, device='cuda:3')

In [67]:
f1_2 = (2 * recall2 * precision2) / (recall2 + precision2)
f1_2

tensor(0.7500, device='cuda:3')

## Generating sentence
at first we get the list of the possible lengths of the sentences and the possible character in them and getting the random index to them and generating the orginal tensor and passing it through the models.
The function "generating_sentence" creates a random sentence given the start symbol and a random point (from possible first characters in the train sentences). The length of the sentence is a random choice from the train_sentences length, or shorter if the model produces a stop symbol.

In [70]:
b = Batcher(train_X_tensor, train_lengths_tensor, train_y_tensor, device, batch_size=50, max_iter=30)
lengths = []
    
    
for split in b:
    for batch in split:
        lengths.append(batch[1])

print(lengths[0]) 

# getting list of possible lengths of sentences
leng = []
for x in lengths:
    leng.extend(x.tolist())

tensor([43, 61, 27, 17, 31, 46, 28, 32, 32, 79, 67, 35, 23, 27, 21, 61, 35, 30,
        31, 23, 25, 63, 40, 22, 65, 82, 43, 25, 25, 70, 59, 22, 59, 30, 19, 43,
        37, 46, 40, 24, 46, 33, 48, 71, 25, 77, 18, 27, 30, 28],
       device='cuda:3')


In [72]:


sentences = train_sentences.extend(test_sentences)
# getting the possible part of the sentences
arg = []
for a, _ in train_sentences:
    arg.extend(a[1])

In [73]:


def generating_sentences(lengths, args, char_index, model):
  
    
    # getting random sentence length
    sent_len = torch.tensor(random.choice(lengths)).to(device).unsqueeze(0)
    sent_len = sent_len.int().to(device)
    
    # getting index of random seed
    point = random.choice(args)
    point_idx = char_index[point]

    # generating the original tensor
    original = torch.zeros(sent_len)
    original[0] = char_index['$']
    original[1] = point_idx
    original = original.long().unsqueeze(0).to(device)
    
    # passing it through the model
    for e in range(int(sent_len)-3):
        out = model(original, sent_len)
        m = torch.argmax(out, dim=2)
#         print(m)
        original[0][e+2] = m[0][e+2]
        
        if m[0][e+2] == char_index['&']:
            break
    
    original = original.squeeze(0)
    if char_index['&'] not in original:
        original[int(sent_len)-1] = char_index['&']
    
    inlist = original.tolist()
    text = ''
    for num in inlist:
        char = str(list(char_index.keys())[list(char_index.values()).index(num)])
        text += char
        if char == '&':
            break
        
    return text

In [32]:
 generating_sentences(leng, arg, char_index, model2)

'$組，此稱入了家。&'

In [74]:
 generating_sentences(leng, arg, char_index, model2)

'$裕到車，道上車上車上車為道，為道中通。&'

In [60]:
 generating_sentences(leng, arg, char_index, model1)

'$0瀑0瀑0瀑0瀑0瀑0瀑0瀑0瀑0&'

## Part 2 - Dual MODEL (10 points)
Copy the notebook from part 1 and augment the copy by adding back the word segmentation objective, as a second objective with its own loss. (You could also in theory do Part 1 and Part 2 in reverse, by adding sentence generation with dual objectives first and then stripping out the word segmentation objective; this is equivalent.)
Note that multiple losses can be combined by simple, possibly weighted addition -- backpropagation works entirely correctly on the combined loss.

In [70]:
class Segmenter_D(nn.Module):
    def __init__(self, vocab_size, emb_size):
        super().__init__()
        
        self.vocab_size = vocab_size
        self.emb_size = emb_size
        
        self.emb = nn.Embedding(self.vocab_size, self.emb_size, 0)
        self.lstm = nn.LSTM(self.emb_size, 150, batch_first=True)
       
        self.sig1 = nn.Sigmoid()
       #here Im ganna add another linear layer with an vocabulary size output.
        self.lin = nn.Linear(150, 2)
        self.lin1 = nn.Linear(150, self.vocab_size)
        self.softmax = nn.LogSoftmax(2)
        
    def forward(self, x, lengths):
        embs = self.emb(x)
        packed = pack_padded_sequence(embs, lengths.to("cpu"), batch_first=True, enforce_sorted=False)
        output1, _ = self.lstm(packed)
        unpacked, _ = pad_packed_sequence(output1, batch_first=True)
        output2 = self.sig1(unpacked)
         #here it should return two tensors,for segmentation and prediction objective.so I added another output ,one for pred and one for seg.
        output3 = self.lin(output2)
        output4 = self.lin1(unpacked)#it is unoacked just for prediction
       
        return (self.softmax(output3), self.softmax(output4))
        

In [71]:
def train_D(X, lengths, y, vocab_size, emb_size, batch_size, epochs, device, model=None):
    b = Batcher(X, lengths, y, device, batch_size=batch_size, max_iter=epochs)
    if not model:
        m = Segmenter_D(vocab_size, emb_size).to(device)
    else:
        m = model
    loss = nn.NLLLoss(ignore_index=-1)
    optimizer = optim.Adam(m.parameters(), lr=0.005)
    epoch = 0
    
    
    for split in b:
        tot_loss = 0
        for batch in split:
            optimizer.zero_grad()
            # batch[1] represent the length of the sentences
            o1, o2 = m(batch[0], batch[1])
            l_seg = loss(o1.permute(0,2,1), batch[2][:, :max(batch[1])])
            l_pred = loss(o2[:, :-1, :].permute(0,2,1), batch[0][:, 1:max(batch[1])])
            
            l = l_seg + l_pred
            tot_loss += l
            
            
            l.backward()
            optimizer.step()
        print("Total loss in epoch {} is {}.".format(epoch, tot_loss))
        epoch += 1
    
    return m

In [72]:
modelD = train_D(train_X_tensor, train_lengths_tensor, train_y_tensor, len(int_index), 200, 50, 100, device)

Total loss in epoch 0 is 619.8128051757812.
Total loss in epoch 1 is 536.9685668945312.
Total loss in epoch 2 is 482.8258361816406.
Total loss in epoch 3 is 435.83648681640625.
Total loss in epoch 4 is 392.4460144042969.
Total loss in epoch 5 is 351.130126953125.
Total loss in epoch 6 is 311.6767883300781.
Total loss in epoch 7 is 273.5425109863281.
Total loss in epoch 8 is 238.85426330566406.
Total loss in epoch 9 is 205.63714599609375.
Total loss in epoch 10 is 179.96754455566406.
Total loss in epoch 11 is 158.31637573242188.
Total loss in epoch 12 is 141.25906372070312.
Total loss in epoch 13 is 130.99331665039062.
Total loss in epoch 14 is 121.13825988769531.
Total loss in epoch 15 is 115.8875732421875.
Total loss in epoch 16 is 110.62325286865234.
Total loss in epoch 17 is 103.56726837158203.
Total loss in epoch 18 is 101.0179672241211.
Total loss in epoch 19 is 95.95426177978516.
Total loss in epoch 20 is 93.17222595214844.
Total loss in epoch 21 is 89.60919952392578.
Total loss 

In [80]:
torch.save(modelD, 'modelDual.pt')

By making a series of changes to the original model, I converted the model to dual, adding a change including a second linear layer with output size to the word size. Finally, it returns two tensors instead of a tensor, one for the purpose of segmentation and the other for the purpose of prediction. 

## Evaluation

In [73]:
modelD.to(device)

Segmenter_D(
  (emb): Embedding(3649, 200, padding_idx=0)
  (lstm): LSTM(200, 150, batch_first=True)
  (sig1): Sigmoid()
  (lin): Linear(in_features=150, out_features=2, bias=True)
  (lin1): Linear(in_features=150, out_features=3649, bias=True)
  (softmax): LogSoftmax(dim=2)
)

In [74]:
with torch.no_grad():
    rawpredictions, _ = modelD(test_X_tensor, test_lengths_tensor)

In [75]:

predictions = torch.argmax(rawpredictions, 2)


In [76]:
collectpreds = []
collecty = []


In [77]:
for i in range(test_X_tensor.size(0)):
    collectpreds.append(predictions[i][:test_lengths_tensor[i]])
    collecty.append(test_y_tensor[i][:test_lengths_tensor[i]])


In [78]:
allpreds = torch.cat(collectpreds).float()
classes = torch.cat(collecty).float()

In [79]:
tp = sum(classes * allpreds)
fp = sum(classes * (~allpreds.bool()).float())
tn = sum((~classes.bool()).float() * (~allpreds.bool()).float())
fn = sum((~classes.bool()).float() * allpreds)

In [80]:
accuracy_D = (tp + tn) / (tp + fp + tn + fn)
accuracy_D

tensor(0.9201, device='cuda:3')

In [81]:
recall_D = tp / (tp + fn)
recall_D

tensor(0.9260, device='cuda:3')

In [82]:
precision_D = tp / (tp + fp)
precision_D

tensor(0.9458, device='cuda:3')

In [83]:
f1_D = (2 * recall_D * precision_D) / (recall_D + precision_D)
f1_D

tensor(0.9358, device='cuda:3')

## Part 3 - Analysis (5 points)
You now have three models. The original word segmentation model, a sentence generation model, and a dual sentence-generation/word segmentation model.

Compare the performance on the test data of the original word segmentation model between the original objective and the dual objective model.

In how many iterations do the models converge?
What are their final F1 and accuracy scores once they've converged?
Are they any different? If so, why?
Make the same comparison between the sentence generation model and the dual-objective model, except the performance measure is the per-word perplexity on the text (test?) corpus.

word segmentation

Part 3.1 Report:
I did evaluations under each model.
DUAL MODEL:as can be seen ,the total loss and the prediction loss are almost the same, because the segmentation loss is very low.

Model	Accuracy	F1score
Model1  0.92950     0.9429
Model2	0.6000	    0.7500
ModelD	0.9201	    0.9358

The highest f1 score is for Asads model( word segmentation model) which is 94.29% and the second place is for dual model with 93.58% and the third place is for the second model about 75%.The most accurate one is also Asad's model(word segmentation model) with 92.95% and the second place is for the dual model with 92.01% and the less one for the model2.The dual models and word segmentation model are quite similar, probably because both models have the same layers and hyperparameters.


## Evaluation sentence generation

In [None]:
def get_perplexity(model,X, lengths, y, vocab_size, emb_size, batch_size , epochs, model_ID):

    b = Batcher(X, lengths, y, device, batch_size=batch_size, max_iter=epochs)
    
    model.to(device)
    loss_fn = nn.CrossEntropyLoss()
    b = Batcher(X, lengths, y, device, batch_size=batch_size, max_iter=epochs)
    for split in b:
            tot_loss = 0
            for batch in split:
                #FOR MODEL1 and 2 which are single model
                if model_ID == 1: 
                    out = model(batch[0], batch[1])
                    l = loss(out[:, :-1, :].permute(0,2,1), batch[0][:, 1:max(batch[1])])
                #FOR DUAL model
                elif model_ID == 2:
                    _, out = model(batch[0], batch[1])
                    l = loss(out[:, :-1, :].permute(0,2,1), batch[0][:, 1:max(batch[1])])
                    
                tot_loss += l
                
    perplexity  = torch.exp(tot_loss/50)
    
    print('Total perplexity:', perplexity.item())
    return perplexity.item()
    
        
       
           

In [94]:
get_perplexity(model2,test_X_tensor, test_lengths_tensor, test_y_tensor, len(int_index), 200, 50, 30, device, model_ID = 1)

Total perplexity: 1.4611154794692993


1.4611154794692993

In [87]:
get_perplexity(modelD, test_X_tensor, test_lengths_tensor, test_y_tensor, len(int_index), 200, 50, 30, device, model_ID = 2)

Total perplexity: 1.8955341577529907


1.8955341577529907

Part 3.2 Report:
 I got the loss is computed by iterating the test data over 1 epoch But I didn't really understand how to compute it.As we can see from the result the SentenceGenerator model has a much lower perplexity than that of the DualModel.