<a href="https://colab.research.google.com/github/kaiyuanmifen/BiasProject/blob/dhd/Glove_for_bias.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**[GloVe ](https://nlp.stanford.edu/projects/glove/) (Global Vectors for Word Representation)  in pytorch**

# **Theory**

GloVe, coined from Global Vectors, is a model for distributed word representation. The model is an unsupervised learning algorithm for obtaining vector representations for words. The GloVe model learns word vectors by examining word co-occurrences within a text corpus.  

Let the matrix of word-word co-occurrence counts be denoted by $X$, whose entries $X_{ij}$ tabulate the number of times word $j$ occurs in the context of word $i$. Let $X_i = \sum_k X_{ik}$ be the number of times any word appears
in the context of word $i$. Finally, let $P_{ij} = X_{ij}/X_i$ be the probability that word $j$ appear in the context of word $i$.

The relationship between two words $i$ and $j$ can be examined by studying the ratio of their co-occurrence probabilities with various probe words, $k$. For words $k$ related to $i$ but not $j$, we expect the ratio $P_{ik}/P_{jk}$ will be large. Similarly, for words $k$ related to $j$ but not $i$, the ratio should be small. For words $k$ that are either related to both $i$ and $j$, or to neither, the ratio should be close to one. 

An example relating to the concepts of thermodinamics is given in the original paper with $i = ice$, $j = steam$ and $k \in \{solid, gas, water, fashion\}$

The above argument suggests that the appropriate starting point for word vector learning should be with ratios of co-occurrence probabilities rather than the probabilities themselves. Noting that the ratio $P_{ik}/P_{jk}$ depends on three words $i$, $j$, and $k$, the most general model takes the form $F(w_i, w_j, \tilde{w}_k) = P_{ik}/P_{jk}$ where $w \in \mathbb{R}^d$ are word vectors and $\tilde{w} \in \mathbb{R}^d$ are separate context word vectors.

To enforce $F$ to encode the information present the ratio $P_{ik}/P_{jk}$ in the word vector space, the authors restrict $F$
to depend only on the difference of the two target words $i$ and $j$, since vector spaces are inherently linear structures.
To also avoid $F$ to obfuscate the linear structure we are trying to capture as it transforms vectors into scalars, the authors passed the dot product $(w_i - w_j)^T \tilde{w}_k$ as an $F$ parameter instead of $w_i - w_j$ and $\tilde{w}_k$ themselves.

$$F((w_i - w_j)^T \tilde{w}_k) = P_{ik}/P_{jk}$$ $\text{ then }$ $$F(w_i^T \tilde{w}_k) = P_{ik} = X_{ik}/X_i \text{ (1)}$$

The authors require that $F$ be a homomorphism between the groups $(\mathbb{R},+)$ and $(\mathbb{R}_{>0}, ×)$, i.e.,

$$F((w_i - w_j)^T \tilde{w}_k) = F(w_i^T \tilde{w}_k - w_j^T \tilde{w}_k) = \frac{F(w_i^T \tilde{w}_k)}{F(w_j^T \tilde{w}_k)}$$ $\text{ then }$ $$F = exp \text{ (2)}$$

$$\text{(1) and (2)} \Rightarrow w_i^T \tilde{w}_k = log(P_{ik}) = log(X_{ik}) - log(X_i)$$
 
We will then produce vectors with a soft constraint that for each word pair of word $i$ and word $j$

$$w_i^T \tilde{w}_j + b_i + \tilde{b}_j = \log X_{ij}$$

where $b_i$ and $\tilde{b}_j$ are scalar bias terms associated with words $i$ and $j$, respectively. 

We’ll do this by minimizing an objective function $J$, which evaluates the sum of all squared errors based on the above equation, weighted with a function $f$:

$$J=\sum_{i=1}^{V} \sum_{j=1}^{V} f(X_{ij}) (w_i^T \tilde{w}_j + b_i + \tilde{b}_j - \log X_{ij})^2$$

We choose an $f$ that helps prevents common word pairs (i.e., those with large $X_{ij}$ values) from skewing our objective too much:
$$
f(X_{ij}) = \left\{
    \begin{array}{ll}
        \bigg(\frac{X_{ij}}{x_{max}}\bigg)^{\alpha} & \mbox{if } X_{ij} \lt x_{max} \\
        1 & \mbox{otherwise}
    \end{array}
\right.
$$

When we encounter extremely common word pairs (where $X_{ij} \gt x_{max}$) this function will cut off its normal output and simply return $1$. For all other word pairs, we return some weight in the range $(0,1)$, where the distribution of weights in this range is decided by $\alpha$.

The authors use $x_{max} = 100 \text{ and } \alpha = 3/4$

Before starting, we advise you to see this [notebook](https://github.com/spro/practical-pytorch/blob/master/glove-word-vectors/glove-word-vectors.ipynb) illustrating the use of pre-trained models with pytorch.

# **Implementation**

In [None]:
              %matplotlib inline
import matplotlib.pyplot as plt

import numpy as np
from tqdm import tqdm

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

Before we train the actual model, we need to construct a co-occurrence matrix $X$, where a cell $X_{ij}$ is a “strength” which represents how often the word $i$ appears in the context of the word $j$. We run through our corpus just once to build the matrix $X$, and from then on use this co-occurrence data in place of the actual corpus. We will construct our model based only on the values collected in $X$.

In [None]:
def build_cooccur(corpus, vocab = None, context_size = 10, min_count = None) :
   
    # Create tokenized text (list) and vocabulary (set of unique words)
    token_text = word_tokenize(corpus)
    len_token_text = len(token_text)

    print("# of tokens: ", len(token_text), '\n', token_text[:10])

    # set of vocab items
    if vocab is None :
        vocab = set(token_text)

    vocab_size = len(vocab)
    print("size of vocabulary: ", vocab_size)

    # dictionaries mapping from word to index and vica versa
    word_to_ix = {word: i for i, word in enumerate(vocab)}
    ix_to_word = {i: word for i, word in enumerate(vocab)}

    # Construct co-occurence matrix
    co_occ_mat = np.zeros((vocab_size, vocab_size))
    for i in range(len_token_text):
        ix = word_to_ix[token_text[i]]
        for dist in range(1, context_size + 1):
            if i - dist > 0:
                left_ix = word_to_ix[token_text[i - dist]]
                co_occ_mat[ix, left_ix] += 1.0 / dist
            if i + dist < len_token_text:
                right_ix = word_to_ix[token_text[i + dist]]
                co_occ_mat[ix, right_ix] += 1.0 / dist

    print("shape of co-occurrence matrix:", co_occ_mat.shape)

    return co_occ_mat, vocab, len_token_text

Once we’ve prepared $X$, our task is to decide vector values in continuous space for each word we observe in the corpus. Intuitively speaking, we want to build word vectors that retain some useful information about how every pair of words $i$ and $j$ co-occur.

In [None]:
class Glove(nn.Module):

    def __init__(self, vocab : set, len_token_text : int, comat, embedding_size, x_max, alpha, batch_size):
        super(Glove, self).__init__()
        
        self.len_token_text = len_token_text
        unk = "unk"
        vocab = {unk}.union(vocab)
        self.vocab = vocab
        vocab_size = len(vocab)
        self.vocab_size = vocab_size
        self.word_to_ix = {word: i for i, word in enumerate(vocab)}
        self.ix_to_word = {i: word for i, word in enumerate(vocab)}
        self.unk_id = self.word_to_ix["unk"]

        # embedding matrices
        self.embedding_V = nn.Embedding(vocab_size, embedding_size) # embedding matrix of center words
        self.embedding_U = nn.Embedding(vocab_size, embedding_size) # embedding matrix of context words

        # biases
        self.v_bias = nn.Embedding(vocab_size, 1)
        self.u_bias = nn.Embedding(vocab_size, 1)
        
        # initialize all params
        for params in self.parameters():
            nn.init.uniform_(params, a = -0.5, b = 0.5)
            
        #hyperparams
        self.x_max = x_max
        self.alpha = alpha
        self.comat = comat
        # Non-zero co-occurrences
        # https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.nonzero.html
        # returns a 2-D array, with a row for each non-zero element            
        self.co_occs = np.transpose(np.nonzero(comat))
        #print("non-zero co-occurrences:\n", self.co_occs)

        self.batch_size = batch_size
    
    def forward(self, center_word_lookup, context_word_lookup):
        # indexing into the embedding matrices
        center_embed = self.embedding_V(center_word_lookup)
        target_embed = self.embedding_U(context_word_lookup)

        center_bias = self.v_bias(center_word_lookup).squeeze(1)
        target_bias = self.u_bias(context_word_lookup).squeeze(1)

        # elements of the co-occurence matrix
        co_occurrences = torch.tensor([self.comat[center_word_lookup[i].item(), context_word_lookup[i].item()]
                                       for i in range(self.batch_size)])
        
        # weight_fn applied to non-zero co-occurrences
        weights = torch.tensor([self.weight_fn(var) for var in co_occurrences])

        # the loss as described in the paper
        loss = torch.sum(torch.pow((torch.sum(center_embed * target_embed, dim=1)
            + center_bias + target_bias) - torch.log(co_occurrences), 2) * weights)
        
        return loss
        
    def weight_fn(self, x):
        # the proposed weighting fn
        if x < self.x_max:
            return (x / self.x_max) ** self.alpha
        return 1
        
    def embeddings(self):
        # "we choose to use the sum W + W_tilde as our word vectors"
        return self.embedding_V.weight.data + self.embedding_U.weight.data

    # Batch sampling function
    def gen_batch(self, device):
        """
        picks random indices for lookup in the embedding matrix
        "stochastically sampling non-zero elements from X [ie. the co-occurrence matrix]"
        """	
        sample = np.random.choice(np.arange(len(self.co_occs)), size = self.batch_size, replace=False)
        v_vecs_ix, u_vecs_ix = [], []
        
        for chosen in sample:
            ind = tuple(self.co_occs[chosen])     
            lookup_ix_v = ind[0]
            lookup_ix_u = ind[1]
            
            v_vecs_ix.append(lookup_ix_v)
            u_vecs_ix.append(lookup_ix_u) 
            
        return torch.tensor(v_vecs_ix).to(device), torch.tensor(u_vecs_ix).to(device)

    def train(self, num_epochs, optimizer, device, epoch_to_stop):
        """trains the model over num_epochs epochs, with the optimizer specified in the parameters."""
        losses = []
        stop = False
        num_batches = int(self.len_token_text/self.batch_size)
        for epoch in range(num_epochs):
            if stop:
              break
            total_loss = 0
            print("Beginning epoch %d" %epoch)
            for batch in tqdm(range(num_batches)):
                self.zero_grad()
                data = self.gen_batch(device)
                loss = self.__call__(*data)
                loss.backward()
                optimizer.step()
                total_loss += loss.item()
            losses.append(total_loss)
            #losses.append(2)
            if epoch > epoch_to_stop:
                i = epoch
                flag = False
                while ((not flag) and i>epoch-epoch_to_stop):
                  if losses[i]-losses[i-1]<1:
                    flag = True
                  i -= 1
                stop = not flag
            print()
            print('Epoch : %d, mean loss : %.02f' % (epoch, np.mean(losses)))
        return losses 

    def predict(self, sentence) :
        """
        returns the embedding that belongs to the given sentence (str)
        """
        sentence = sentence.lower()
        tokens = word_tokenize(sentence)
        token_ids = [self.word_to_ix.get(word, self.unk_id) for word in tokens]
        return self.embeddings()[token_ids]

**Useful functions**

In [None]:
# Plot loss fn
def plot_loss_fn(losses, title):
    plt.plot(range(len(losses)), losses)
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title(title)
    plt.figure()

def get_word(model, word):
    """
    returns the embedding that belongs to the given word (str)
    """
    return model.predict(word).squeeze()

def closest(model, vec, n = 10):
    """
    finds the closest words for a given vector
    """
    all_dists = [(w, torch.dist(vec, get_word(model, w))) for w in model.word_to_ix]
    return sorted(all_dists, key=lambda t: t[1])[:n]

# some helper fn
def print_tuples(tuples):
    for tuple in tuples:
        print('(%.4f) %s' % (tuple[1], tuple[0]))

# word analogies in the form w1 : w2 :: w3 : ?
def analogy(model, w1, w2, w3, n=5, filter_given=True):
    print('\n[%s : %s :: %s : ?]' % (w1, w2, w3))
   
    # w2 - w1 + w3 = w4
    closest_words = closest(model, get_word(model, w2) - get_word(model, w1) + get_word(model, w3))
    
    # Optionally filter out given words
    if filter_given:
        closest_words = [t for t in closest_words if t[0] not in [w1, w2, w3]]
        
    print_tuples(closest_words[:n])

## **Experiments**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


**Data**

In [None]:
#txt_file
txt_file = "/content/drive/MyDrive/Double-Hard Debias/Data/bias_corpus3v1.txt"

# Open and read in text
with open(txt_file, 'r') as f :
    corpus = f.read().lower()

In [None]:
tokens = [elmt for elmt in corpus.strip().split("\n")]
tokens[1]

"greatest threat to the black famiky is the white liberal' nonbiased"

In [None]:
print(len(tokens))

39409


In [None]:
# "Unless otherwise noted, we use a context of ten words to the left and ten words to the right."
CONTEXT_SIZE = 6

co_occ_mat, vocab, len_token_text = build_cooccur(corpus, context_size = CONTEXT_SIZE)

# of tokens:  895044 
 ['democrats', 'needed', 'someone', 'like', 'obama', 'half', 'white', 'and', 'half', 'black']
size of vocabulary:  27753
shape of co-occurrence matrix: (27753, 27753)


In [None]:
EMBEDDING_SIZE = 300

# "For all our experiments, we set x_max = 100, alpha = 3/4"
X_MAX = 100
ALPHA = 0.75

BATCH_SIZE = 4096
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = Glove(vocab, len_token_text, comat = co_occ_mat, embedding_size = EMBEDDING_SIZE, x_max = X_MAX, alpha = ALPHA, batch_size = BATCH_SIZE)
model = model.to(device)

In [None]:
# "[we] train the model using AdaGrad, [...] with initial learning rate of 0.05"
LEARNING_RATE = 0.05
optimizer = optim.Adagrad(model.parameters(), lr = LEARNING_RATE)

In [None]:
# "we run 50 iterations for vectors smaller than 300 dimensions [...]"
EPOCHS = 500
losses = model.train(num_epochs = EPOCHS, optimizer = optimizer, device=device, epoch_to_stop=2)

  0%|          | 0/218 [00:00<?, ?it/s]

Beginning epoch 0


100%|██████████| 218/218 [01:29<00:00,  2.44it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 0, mean loss : 125051.39
Beginning epoch 1


100%|██████████| 218/218 [01:26<00:00,  2.53it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 1, mean loss : 95844.68
Beginning epoch 2


100%|██████████| 218/218 [01:24<00:00,  2.59it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 2, mean loss : 79308.52
Beginning epoch 3


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 3, mean loss : 68511.25
Beginning epoch 4


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 4, mean loss : 60824.21
Beginning epoch 5


100%|██████████| 218/218 [01:24<00:00,  2.58it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 5, mean loss : 54988.67
Beginning epoch 6


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 6, mean loss : 50358.93
Beginning epoch 7


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 7, mean loss : 46599.52
Beginning epoch 8


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 8, mean loss : 43468.08
Beginning epoch 9


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 9, mean loss : 40798.63
Beginning epoch 10


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 10, mean loss : 38500.03
Beginning epoch 11


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 11, mean loss : 36490.71
Beginning epoch 12


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 12, mean loss : 34720.13
Beginning epoch 13


100%|██████████| 218/218 [01:19<00:00,  2.73it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 13, mean loss : 33151.80
Beginning epoch 14


100%|██████████| 218/218 [01:19<00:00,  2.74it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 14, mean loss : 31737.39
Beginning epoch 15


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 15, mean loss : 30465.78
Beginning epoch 16


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 16, mean loss : 29309.19
Beginning epoch 17


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 17, mean loss : 28257.17
Beginning epoch 18


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 18, mean loss : 27292.29
Beginning epoch 19


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 19, mean loss : 26403.79
Beginning epoch 20


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 20, mean loss : 25583.72
Beginning epoch 21


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 21, mean loss : 24823.22
Beginning epoch 22


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 22, mean loss : 24118.91
Beginning epoch 23


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 23, mean loss : 23457.09
Beginning epoch 24


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 24, mean loss : 22839.39
Beginning epoch 25


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 25, mean loss : 22258.59
Beginning epoch 26


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 26, mean loss : 21712.68
Beginning epoch 27


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 27, mean loss : 21198.25
Beginning epoch 28


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 28, mean loss : 20713.35
Beginning epoch 29


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 29, mean loss : 20254.27
Beginning epoch 30


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 30, mean loss : 19817.86
Beginning epoch 31


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 31, mean loss : 19403.37
Beginning epoch 32


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 32, mean loss : 19010.51
Beginning epoch 33


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 33, mean loss : 18636.07
Beginning epoch 34


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 34, mean loss : 18279.24
Beginning epoch 35


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 35, mean loss : 17938.72
Beginning epoch 36


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 36, mean loss : 17613.05
Beginning epoch 37


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 37, mean loss : 17301.53
Beginning epoch 38


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 38, mean loss : 17002.92
Beginning epoch 39


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 39, mean loss : 16715.87
Beginning epoch 40


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 40, mean loss : 16440.25
Beginning epoch 41


100%|██████████| 218/218 [01:19<00:00,  2.73it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 41, mean loss : 16175.61
Beginning epoch 42


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 42, mean loss : 15921.47
Beginning epoch 43


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 43, mean loss : 15676.54
Beginning epoch 44


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 44, mean loss : 15439.90
Beginning epoch 45


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 45, mean loss : 15211.70
Beginning epoch 46


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 46, mean loss : 14990.98
Beginning epoch 47


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 47, mean loss : 14778.23
Beginning epoch 48


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 48, mean loss : 14573.45
Beginning epoch 49


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 49, mean loss : 14374.63
Beginning epoch 50


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 50, mean loss : 14181.85
Beginning epoch 51


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 51, mean loss : 13995.47
Beginning epoch 52


100%|██████████| 218/218 [01:19<00:00,  2.73it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 52, mean loss : 13815.21
Beginning epoch 53


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 53, mean loss : 13640.03
Beginning epoch 54


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 54, mean loss : 13470.57
Beginning epoch 55


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 55, mean loss : 13305.71
Beginning epoch 56


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 56, mean loss : 13145.92
Beginning epoch 57


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 57, mean loss : 12990.60
Beginning epoch 58


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 58, mean loss : 12839.17
Beginning epoch 59


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 59, mean loss : 12692.12
Beginning epoch 60


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 60, mean loss : 12548.85
Beginning epoch 61


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 61, mean loss : 12409.59
Beginning epoch 62


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 62, mean loss : 12273.98
Beginning epoch 63


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 63, mean loss : 12141.91
Beginning epoch 64


100%|██████████| 218/218 [01:22<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 64, mean loss : 12013.33
Beginning epoch 65


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 65, mean loss : 11887.95
Beginning epoch 66


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 66, mean loss : 11765.65
Beginning epoch 67


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 67, mean loss : 11646.14
Beginning epoch 68


100%|██████████| 218/218 [01:23<00:00,  2.60it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 68, mean loss : 11529.71
Beginning epoch 69


100%|██████████| 218/218 [01:25<00:00,  2.56it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 69, mean loss : 11415.96
Beginning epoch 70


100%|██████████| 218/218 [01:23<00:00,  2.60it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 70, mean loss : 11304.61
Beginning epoch 71


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 71, mean loss : 11195.88
Beginning epoch 72


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 72, mean loss : 11090.00
Beginning epoch 73


100%|██████████| 218/218 [01:23<00:00,  2.61it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 73, mean loss : 10986.45
Beginning epoch 74


100%|██████████| 218/218 [01:22<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 74, mean loss : 10884.84
Beginning epoch 75


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 75, mean loss : 10785.54
Beginning epoch 76


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 76, mean loss : 10688.32
Beginning epoch 77


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 77, mean loss : 10593.33
Beginning epoch 78


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 78, mean loss : 10500.17
Beginning epoch 79


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 79, mean loss : 10409.09
Beginning epoch 80


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 80, mean loss : 10319.70
Beginning epoch 81


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 81, mean loss : 10232.36
Beginning epoch 82


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 82, mean loss : 10146.73
Beginning epoch 83


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 83, mean loss : 10062.77
Beginning epoch 84


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 84, mean loss : 9980.46
Beginning epoch 85


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 85, mean loss : 9899.85
Beginning epoch 86


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 86, mean loss : 9820.64
Beginning epoch 87


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 87, mean loss : 9742.77
Beginning epoch 88


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 88, mean loss : 9666.59
Beginning epoch 89


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 89, mean loss : 9591.94
Beginning epoch 90


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 90, mean loss : 9518.59
Beginning epoch 91


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 91, mean loss : 9446.54
Beginning epoch 92


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 92, mean loss : 9375.75
Beginning epoch 93


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 93, mean loss : 9306.11
Beginning epoch 94


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 94, mean loss : 9237.72
Beginning epoch 95


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 95, mean loss : 9170.60
Beginning epoch 96


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 96, mean loss : 9104.55
Beginning epoch 97


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 97, mean loss : 9039.68
Beginning epoch 98


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 98, mean loss : 8975.81
Beginning epoch 99


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 99, mean loss : 8913.22
Beginning epoch 100


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 100, mean loss : 8851.56
Beginning epoch 101


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 101, mean loss : 8790.83
Beginning epoch 102


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 102, mean loss : 8731.07
Beginning epoch 103


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 103, mean loss : 8672.37
Beginning epoch 104


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 104, mean loss : 8614.37
Beginning epoch 105


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 105, mean loss : 8557.53
Beginning epoch 106


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 106, mean loss : 8501.45
Beginning epoch 107


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 107, mean loss : 8446.21
Beginning epoch 108


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 108, mean loss : 8391.96
Beginning epoch 109


100%|██████████| 218/218 [01:23<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 109, mean loss : 8338.44
Beginning epoch 110


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 110, mean loss : 8285.67
Beginning epoch 111


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 111, mean loss : 8233.85
Beginning epoch 112


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 112, mean loss : 8182.77
Beginning epoch 113


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 113, mean loss : 8132.29
Beginning epoch 114


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 114, mean loss : 8082.59
Beginning epoch 115


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 115, mean loss : 8033.69
Beginning epoch 116


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 116, mean loss : 7985.52
Beginning epoch 117


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 117, mean loss : 7937.96
Beginning epoch 118


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 118, mean loss : 7891.07
Beginning epoch 119


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 119, mean loss : 7845.00
Beginning epoch 120


100%|██████████| 218/218 [01:22<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 120, mean loss : 7799.48
Beginning epoch 121


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 121, mean loss : 7754.42
Beginning epoch 122


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 122, mean loss : 7710.02
Beginning epoch 123


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 123, mean loss : 7666.30
Beginning epoch 124


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 124, mean loss : 7623.13
Beginning epoch 125


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 125, mean loss : 7580.67
Beginning epoch 126


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 126, mean loss : 7538.63
Beginning epoch 127


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 127, mean loss : 7497.07
Beginning epoch 128


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 128, mean loss : 7456.15
Beginning epoch 129


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 129, mean loss : 7415.78
Beginning epoch 130


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 130, mean loss : 7375.85
Beginning epoch 131


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 131, mean loss : 7336.52
Beginning epoch 132


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 132, mean loss : 7297.69
Beginning epoch 133


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 133, mean loss : 7259.21
Beginning epoch 134


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 134, mean loss : 7221.20
Beginning epoch 135


100%|██████████| 218/218 [01:22<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 135, mean loss : 7183.74
Beginning epoch 136


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 136, mean loss : 7146.73
Beginning epoch 137


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 137, mean loss : 7110.18
Beginning epoch 138


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 138, mean loss : 7074.10
Beginning epoch 139


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 139, mean loss : 7038.40
Beginning epoch 140


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 140, mean loss : 7003.12
Beginning epoch 141


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 141, mean loss : 6968.29
Beginning epoch 142


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 142, mean loss : 6933.83
Beginning epoch 143


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 143, mean loss : 6899.80
Beginning epoch 144


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 144, mean loss : 6866.17
Beginning epoch 145


100%|██████████| 218/218 [01:23<00:00,  2.61it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 145, mean loss : 6832.94
Beginning epoch 146


100%|██████████| 218/218 [01:23<00:00,  2.62it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 146, mean loss : 6800.11
Beginning epoch 147


100%|██████████| 218/218 [01:23<00:00,  2.60it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 147, mean loss : 6767.66
Beginning epoch 148


100%|██████████| 218/218 [01:24<00:00,  2.57it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 148, mean loss : 6735.54
Beginning epoch 149


100%|██████████| 218/218 [01:23<00:00,  2.62it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 149, mean loss : 6703.78
Beginning epoch 150


100%|██████████| 218/218 [01:23<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 150, mean loss : 6672.31
Beginning epoch 151


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 151, mean loss : 6641.09
Beginning epoch 152


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 152, mean loss : 6610.40
Beginning epoch 153


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 153, mean loss : 6579.98
Beginning epoch 154


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 154, mean loss : 6549.95
Beginning epoch 155


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 155, mean loss : 6520.16
Beginning epoch 156


100%|██████████| 218/218 [01:23<00:00,  2.62it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 156, mean loss : 6490.74
Beginning epoch 157


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 157, mean loss : 6461.62
Beginning epoch 158


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 158, mean loss : 6432.87
Beginning epoch 159


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 159, mean loss : 6404.40
Beginning epoch 160


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 160, mean loss : 6376.13
Beginning epoch 161


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 161, mean loss : 6348.20
Beginning epoch 162


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 162, mean loss : 6320.58
Beginning epoch 163


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 163, mean loss : 6293.23
Beginning epoch 164


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 164, mean loss : 6266.11
Beginning epoch 165


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 165, mean loss : 6239.34
Beginning epoch 166


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 166, mean loss : 6212.75
Beginning epoch 167


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 167, mean loss : 6186.51
Beginning epoch 168


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 168, mean loss : 6160.52
Beginning epoch 169


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 169, mean loss : 6134.74
Beginning epoch 170


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 170, mean loss : 6109.25
Beginning epoch 171


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 171, mean loss : 6083.99
Beginning epoch 172


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 172, mean loss : 6058.99
Beginning epoch 173


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 173, mean loss : 6034.27
Beginning epoch 174


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 174, mean loss : 6009.75
Beginning epoch 175


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 175, mean loss : 5985.48
Beginning epoch 176


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 176, mean loss : 5961.46
Beginning epoch 177


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 177, mean loss : 5937.58
Beginning epoch 178


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 178, mean loss : 5913.97
Beginning epoch 179


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 179, mean loss : 5890.58
Beginning epoch 180


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 180, mean loss : 5867.40
Beginning epoch 181


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 181, mean loss : 5844.39
Beginning epoch 182


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 182, mean loss : 5821.68
Beginning epoch 183


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 183, mean loss : 5799.18
Beginning epoch 184


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 184, mean loss : 5776.88
Beginning epoch 185


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 185, mean loss : 5754.77
Beginning epoch 186


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 186, mean loss : 5732.86
Beginning epoch 187


100%|██████████| 218/218 [01:23<00:00,  2.62it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 187, mean loss : 5711.14
Beginning epoch 188


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 188, mean loss : 5689.57
Beginning epoch 189


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 189, mean loss : 5668.18
Beginning epoch 190


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 190, mean loss : 5646.99
Beginning epoch 191


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 191, mean loss : 5626.06
Beginning epoch 192


100%|██████████| 218/218 [01:24<00:00,  2.58it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 192, mean loss : 5605.22
Beginning epoch 193


100%|██████████| 218/218 [01:25<00:00,  2.56it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 193, mean loss : 5584.70
Beginning epoch 194


100%|██████████| 218/218 [01:23<00:00,  2.60it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 194, mean loss : 5564.27
Beginning epoch 195


100%|██████████| 218/218 [01:30<00:00,  2.41it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 195, mean loss : 5544.01
Beginning epoch 196


100%|██████████| 218/218 [01:25<00:00,  2.55it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 196, mean loss : 5523.93
Beginning epoch 197


100%|██████████| 218/218 [01:24<00:00,  2.57it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 197, mean loss : 5504.03
Beginning epoch 198


100%|██████████| 218/218 [01:23<00:00,  2.60it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 198, mean loss : 5484.34
Beginning epoch 199


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 199, mean loss : 5464.77
Beginning epoch 200


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 200, mean loss : 5445.34
Beginning epoch 201


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 201, mean loss : 5426.09
Beginning epoch 202


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 202, mean loss : 5406.98
Beginning epoch 203


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 203, mean loss : 5388.08
Beginning epoch 204


100%|██████████| 218/218 [01:22<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 204, mean loss : 5369.27
Beginning epoch 205


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 205, mean loss : 5350.66
Beginning epoch 206


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 206, mean loss : 5332.20
Beginning epoch 207


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 207, mean loss : 5313.89
Beginning epoch 208


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 208, mean loss : 5295.76
Beginning epoch 209


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 209, mean loss : 5277.74
Beginning epoch 210


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 210, mean loss : 5259.88
Beginning epoch 211


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 211, mean loss : 5242.13
Beginning epoch 212


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 212, mean loss : 5224.51
Beginning epoch 213


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 213, mean loss : 5207.07
Beginning epoch 214


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 214, mean loss : 5189.80
Beginning epoch 215


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 215, mean loss : 5172.64
Beginning epoch 216


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 216, mean loss : 5155.55
Beginning epoch 217


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 217, mean loss : 5138.63
Beginning epoch 218


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 218, mean loss : 5121.84
Beginning epoch 219


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 219, mean loss : 5105.14
Beginning epoch 220


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 220, mean loss : 5088.62
Beginning epoch 221


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 221, mean loss : 5072.22
Beginning epoch 222


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 222, mean loss : 5055.93
Beginning epoch 223


100%|██████████| 218/218 [01:22<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 223, mean loss : 5039.75
Beginning epoch 224


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 224, mean loss : 5023.72
Beginning epoch 225


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 225, mean loss : 5007.75
Beginning epoch 226


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 226, mean loss : 4991.95
Beginning epoch 227


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 227, mean loss : 4976.26
Beginning epoch 228


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 228, mean loss : 4960.67
Beginning epoch 229


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 229, mean loss : 4945.19
Beginning epoch 230


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 230, mean loss : 4929.85
Beginning epoch 231


100%|██████████| 218/218 [01:22<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 231, mean loss : 4914.60
Beginning epoch 232


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 232, mean loss : 4899.47
Beginning epoch 233


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 233, mean loss : 4884.41
Beginning epoch 234


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 234, mean loss : 4869.50
Beginning epoch 235


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 235, mean loss : 4854.70
Beginning epoch 236


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 236, mean loss : 4840.01
Beginning epoch 237


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 237, mean loss : 4825.44
Beginning epoch 238


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 238, mean loss : 4810.95
Beginning epoch 239


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 239, mean loss : 4796.58
Beginning epoch 240


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 240, mean loss : 4782.32
Beginning epoch 241


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 241, mean loss : 4768.13
Beginning epoch 242


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 242, mean loss : 4754.06
Beginning epoch 243


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 243, mean loss : 4740.06
Beginning epoch 244


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 244, mean loss : 4726.18
Beginning epoch 245


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 245, mean loss : 4712.37
Beginning epoch 246


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 246, mean loss : 4698.65
Beginning epoch 247


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 247, mean loss : 4685.03
Beginning epoch 248


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 248, mean loss : 4671.51
Beginning epoch 249


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 249, mean loss : 4658.09
Beginning epoch 250


100%|██████████| 218/218 [01:21<00:00,  2.67it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 250, mean loss : 4644.76
Beginning epoch 251


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 251, mean loss : 4631.51
Beginning epoch 252


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 252, mean loss : 4618.36
Beginning epoch 253


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 253, mean loss : 4605.29
Beginning epoch 254


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 254, mean loss : 4592.30
Beginning epoch 255


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 255, mean loss : 4579.41
Beginning epoch 256


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 256, mean loss : 4566.61
Beginning epoch 257


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 257, mean loss : 4553.88
Beginning epoch 258


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 258, mean loss : 4541.22
Beginning epoch 259


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 259, mean loss : 4528.63
Beginning epoch 260


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 260, mean loss : 4516.18
Beginning epoch 261


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 261, mean loss : 4503.76
Beginning epoch 262


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 262, mean loss : 4491.43
Beginning epoch 263


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 263, mean loss : 4479.18
Beginning epoch 264


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 264, mean loss : 4467.04
Beginning epoch 265


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 265, mean loss : 4454.96
Beginning epoch 266


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 266, mean loss : 4442.94
Beginning epoch 267


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 267, mean loss : 4431.00
Beginning epoch 268


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 268, mean loss : 4419.13
Beginning epoch 269


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 269, mean loss : 4407.38
Beginning epoch 270


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 270, mean loss : 4395.68
Beginning epoch 271


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 271, mean loss : 4384.05
Beginning epoch 272


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 272, mean loss : 4372.49
Beginning epoch 273


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 273, mean loss : 4361.01
Beginning epoch 274


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 274, mean loss : 4349.58
Beginning epoch 275


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 275, mean loss : 4338.22
Beginning epoch 276


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 276, mean loss : 4326.95
Beginning epoch 277


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 277, mean loss : 4315.75
Beginning epoch 278


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 278, mean loss : 4304.57
Beginning epoch 279


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 279, mean loss : 4293.48
Beginning epoch 280


100%|██████████| 218/218 [01:21<00:00,  2.68it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 280, mean loss : 4282.44
Beginning epoch 281


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 281, mean loss : 4271.50
Beginning epoch 282


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 282, mean loss : 4260.62
Beginning epoch 283


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 283, mean loss : 4249.81
Beginning epoch 284


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 284, mean loss : 4239.04
Beginning epoch 285


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 285, mean loss : 4228.32
Beginning epoch 286


100%|██████████| 218/218 [01:20<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 286, mean loss : 4217.69
Beginning epoch 287


100%|██████████| 218/218 [01:21<00:00,  2.66it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 287, mean loss : 4207.11
Beginning epoch 288


100%|██████████| 218/218 [01:23<00:00,  2.61it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 288, mean loss : 4196.61
Beginning epoch 289


100%|██████████| 218/218 [01:22<00:00,  2.64it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 289, mean loss : 4186.15
Beginning epoch 290


100%|██████████| 218/218 [01:22<00:00,  2.63it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 290, mean loss : 4175.79
Beginning epoch 291


100%|██████████| 218/218 [01:22<00:00,  2.65it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 291, mean loss : 4165.44
Beginning epoch 292


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 292, mean loss : 4155.16
Beginning epoch 293


100%|██████████| 218/218 [01:21<00:00,  2.69it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 293, mean loss : 4144.93
Beginning epoch 294


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 294, mean loss : 4134.75
Beginning epoch 295


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 295, mean loss : 4124.65
Beginning epoch 296


100%|██████████| 218/218 [01:20<00:00,  2.70it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 296, mean loss : 4114.60
Beginning epoch 297


100%|██████████| 218/218 [01:20<00:00,  2.71it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 297, mean loss : 4104.62
Beginning epoch 298


100%|██████████| 218/218 [01:19<00:00,  2.73it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 298, mean loss : 4094.71
Beginning epoch 299


100%|██████████| 218/218 [01:19<00:00,  2.73it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 299, mean loss : 4084.86
Beginning epoch 300


100%|██████████| 218/218 [01:20<00:00,  2.72it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 300, mean loss : 4075.06
Beginning epoch 301


100%|██████████| 218/218 [01:19<00:00,  2.73it/s]
  0%|          | 0/218 [00:00<?, ?it/s]


Epoch : 301, mean loss : 4065.28
Beginning epoch 302


 78%|███████▊  | 171/218 [01:02<00:17,  2.76it/s]

KeyboardInterrupt: ignored

In [None]:
vocab = model.ix_to_word.values()
with open('/content/drive/MyDrive/Double-Hard Debias/Data/word_vectors.txt', "w") as f :
  for word in vocab:
    word_plus_vector = word + str(np.array(model.predict(word))).replace("\n","").replace('[[',' ').replace(']]','') + '\n'
    f.writelines(word_plus_vector)

In [None]:
plot_loss_fn(losses, "GloVe loss function")

In [None]:
sentence = "unk1 Hello world unk2"
vec = model.predict(sentence)
print(vec.shape)
vec

torch.Size([4, 300])


tensor([[ 0.1652,  0.4124, -0.0533,  ...,  0.7300,  0.5054,  0.0919],
        [-0.0903,  0.0365,  0.0598,  ..., -0.1756,  0.2160, -0.2083],
        [-0.1334, -0.1713,  0.3106,  ..., -0.4079, -0.3367, -0.2356],
        [ 0.1652,  0.4124, -0.0533,  ...,  0.7300,  0.5054,  0.0919]])

In [None]:
word = "time"
vector = get_word(model, word)
vector

tensor([ 5.2096e-01,  3.5153e-01, -5.3179e-01, -9.1568e-01, -3.5364e-01,
        -2.9144e-03, -3.0430e-01,  2.2622e-01, -2.9018e-01, -1.8524e-01,
        -1.2570e-01,  2.8921e-01,  2.0774e-01, -5.1324e-01,  6.9231e-01,
        -3.1850e-01, -1.7689e-02,  3.3147e-01, -1.9565e-01,  4.7917e-01,
         2.5964e-01,  4.4171e-01, -4.4207e-01, -5.9393e-01, -4.2564e-01,
         6.9123e-01,  7.2504e-01,  3.4526e-01, -2.9906e-01, -6.7620e-01,
         3.5773e-01, -5.0644e-01, -1.3552e-01, -3.0241e-01,  1.9029e-01,
         2.3181e-01, -1.6548e-01,  5.4032e-01,  2.5939e-01,  2.7413e-01,
         2.0225e-02,  3.0809e-01, -5.6601e-01,  1.1594e-01, -1.0005e-01,
        -8.8472e-02, -8.0537e-02,  2.6774e-01, -8.5681e-02,  5.6047e-01,
        -2.6510e-01,  4.8075e-01,  5.1526e-01,  7.4535e-02, -3.4520e-01,
         2.6178e-01,  6.5890e-01,  5.8222e-01, -6.2116e-01,  1.4590e-01,
         1.8506e-01, -3.8849e-01, -5.3514e-01, -1.3505e-01, -2.4475e-01,
         7.0455e-02, -5.8396e-02,  7.9866e-02,  5.6

In [None]:
closest(model, vector)

[('time', tensor(0.)),
 ('great', tensor(8.3682)),
 ('oooo', tensor(8.3820)),
 ('walked', tensor(8.4507)),
 ('lunacy', tensor(8.4614)),
 ('methodist', tensor(8.4732)),
 ('parent', tensor(8.4764)),
 ('darlin', tensor(8.4806)),
 ('nonsensical', tensor(8.5279)),
 ('listed', tensor(8.5558))]

In [None]:
analogy(model, "when", "time", "who")


[when : time :: who : ?]
(12.7428) nonsensical
(12.8563) uranium
(12.8811) hong
(12.8869) imbecilic
(12.9369) postings


* GloVe: Global Vectors for Word Representation \
Jeffrey Pennington, Richard Socher, Christopher Manning \
https://www.aclweb.org/anthology/D14-1162/
* http://www.foldl.me/2014/glove-python/
* https://github.com/balazs-vida/glove-pytorch
* https://github.com/spro/practical-pytorch/blob/master/glove-word-vectors/glove-word-vectors.ipynb