1.Submit a Google Colab notebook containing your completed code and experimentation results.

2.Include comments and explanations in your code to help understand the implemented logic.

**Additional Notes:**
*   Ensure that the notebook runs successfully in Google Colab.
*   Document any issues encountered during experimentation and how you addressed them.

**Grading:**
*   Each task will be graded out of the specified points.
*   Points will be awarded for correctness, clarity of code, thorough experimentation, and insightful analysis.

In [9]:
from google.colab import drive
# drive.mount('/content/drive')

# SOURCE_DIR = "/content/drive/MyDrive/Uni/NLP/HW2_NLP_4022/Q3_data.csv"
SOURCE_DIR = 'Q3_data.csv'

In [10]:
import torch
import re
from sklearn.preprocessing import OneHotEncoder
import numpy as np
import pandas as pd
import math
from gensim.models import Word2Vec

In [11]:
def delete_hashtag_usernames(text):
  try:
    result = []
    for word in text.split():
      if word[0] not in ['@', '#']:
        result.append(word)
    return ' '.join(result)
  except:
    return ''

def delete_url(text):
  text = re.sub(r'http\S+', '', text)
  return text

def delete_ex(text):
  text = re.sub(r'\u200c', '', text)
  return text

# 0. Data preprocessing

In [12]:
!pip install json-lines

Collecting json-lines
  Downloading json_lines-0.5.0-py2.py3-none-any.whl (6.8 kB)
Installing collected packages: json-lines
Successfully installed json-lines-0.5.0


In [13]:
import json_lines

In [15]:
# 1. extract all tweets from file and save them in memory
# 2. remove urls, hashtags and usernames. use the prepared functions

tweetsTable=pd.read_csv(SOURCE_DIR)
# rawTweets=tweetsTable['Text']
rawTweets=tweetsTable['PureText']
print(rawTweets[:5])
tweets=[]
for t in rawTweets:
  t=delete_hashtag_usernames(t)
  t=delete_url(t)
  t=delete_ex(t)
  tweets.append((t))
print(tweets[:3])

0             بنشین تا شود نقش فال ما نقش هم‌ فردا شدن
1    این گوزو رو کی گردن میگیره؟؟ دچار زوال عقل شده...
2                               برای ایران، برای مهسا.
3                                      مرگ بر دیکتاتور
4                       نذاریم خونشون پایمال شه.‌‌.‌‌.
Name: PureText, dtype: object
['بنشین تا شود نقش فال ما نقش هم فردا شدن', 'این گوزو رو کی گردن میگیره؟؟ دچار زوال عقل شده از بس پای منبر دستمال کشی کرده.', 'برای ایران، برای مهسا.']


# 1. Functions

## Cosine Similarity

To measure the similarity between two words, you need a way to measure the degree of similarity between two embedding vectors for the two words. Given two vectors $u$ and $v$, cosine similarity is defined as follows:

$$\text{CosineSimilarity(u, v)} = \frac {u \cdot v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}$$

* $u \cdot v$ is the dot product (or inner product) of two vectors
* $||u||_2$ is the norm (or length) of the vector $u$
* $\theta$ is the angle between $u$ and $v$.
* The cosine similarity depends on the angle between $u$ and $v$.
    * If $u$ and $v$ are very similar, their cosine similarity will be close to 1.
    * If they are dissimilar, the cosine similarity will take a smaller value.

<img src="images/cosine_sim.png" style="width:800px;height:250px;">
<caption><center><font color='purple'><b>Figure 1</b>: The cosine of the angle between two vectors is a measure of their similarity.</font></center></caption>

Implement the function `cosine_similarity()` to evaluate the similarity between word vectors.

**Reminder**: The norm of $u$ is defined as $ ||u||_2 = \sqrt{\sum_{i=1}^{n} u_i^2}$

In [24]:
def cosine_similarity(u, v):
    """
    Cosine similarity reflects the degree of similarity between u and v

    Arguments:
        u -- a word vector of shape (n,)
        v -- a word vector of shape (n,)

    Returns:
        cosine_similarity -- the cosine similarity between u and v defined by the formula above.
    """

    dot_product = np.dot(u, v)
    norm_u = np.sqrt(np.sum(u**2))
    norm_v = np.sqrt(np.sum(v**2))
    cosine_similarity = dot_product / (norm_u * norm_v)
    return cosine_similarity

def cosine_similarity_tensor(u, v):
  # same as previous function for tensors
    dot_product = torch.dot(u, v)
    norm_vec1 = torch.norm(u)
    norm_vec2 = torch.norm(v)
    similarity = dot_product / (norm_vec1 * norm_vec2)
    return similarity

## find k nearest neighbors

In [25]:
def find_k_nearest_neighbors(word, embedding_dict, k):
  """
    implement a function to return the nearest words to an specific word based on the given dictionary

    Arguments:
        word           -- a word, string
        embedding_dict -- dictionary that maps words to their corresponding vectors
        k              -- the number of word that should be returned

    Returns:
        a list of size k consisting of the k most similar words to the given word

    Note: use the cosine_similarity function that you have implemented to calculate the similarity between words
    """

  # check if word exists in the embedding dictionary
  if word not in embedding_dict:
        return []
  words_cosine_similarity = dict()
  # calculate similarity for all other words
  for token in embedding_dict.keys():
    words_cosine_similarity[token] = cosine_similarity(embedding_dict[word], embedding_dict[token]).item()
  # sort similarities and put them in a list
  words_cosine_similarity = dict(sorted(words_cosine_similarity.items(), key=lambda item: item[1]))
  return list(words_cosine_similarity.keys())[-k:][::-1]

def find_k_nearest_neighbors_tensor(word, embedding_dict, k):
  # same as previous function for tensors
  word_vector = embedding_dict[word]
  similarities = {}
  for key, vector in embedding_dict.items():
      if key != word:
          similarity = cosine_similarity_tensor(word_vector, vector)
          similarities[key] = similarity
  sorted_similarities_list = sorted(similarities.items(), key=lambda x: x[1], reverse=True)
  neighbors = [item[0] for item in sorted_similarities_list[:k]]
  return neighbors


# 2. One hot encoding

In [None]:
# 1. find one hot encoding of each word

# at first I used the onehotencoding of the sklearn:
# # tokenizing tweets into words
# tokenized_tweets=[]
# for tweet in tweets:
#   tokenized_tweets.append(tweet.split())
# # print(tokenized_tweets[5])

# # Flattening the list to words
# words = [word for sublist in tokenized_tweets for word in sublist]
# # print(words[0])

# # Reshape the words to be a column vector
# words_column_vector = np.array(words).reshape(-1, 1)
# # print(words_column_vector[:3])

# # Create the encoder
# encoder = OneHotEncoder(sparse_output=False)

# # Fit and transform the words to one-hot encoded vectors
# one_hot_encoded = encoder.fit_transform(words_column_vector)
# embedding_dict = {word: encoding for word, encoding in zip(words, one_hot_encoded)}
# k = 10
# word1 = "آزادی"
# nearest_words1 = find_k_nearest_neighbors(word1, embedding_dict, k)
# print(nearest_words1)

# first lets find the unique words
distinct_words = list(set([word for snt in tweets for word in snt.split(' ')]))
n = len(distinct_words)

#creating the embedding for each word and mark the used words as one. others as zero
one_hot = dict()
for i, word in enumerate(distinct_words):
  array = np.zeros(n)
  array[i] = 1
  one_hot.update({word : torch.tensor(array)})


In [None]:
# 2. find 10 nearest words from "آزادی"

k = 10
word1 = "آزادی"
nearest_words1 = find_k_nearest_neighbors_tensor(word1, one_hot, k)
print(nearest_words1)

word2 = "مهسا"
nearest_words2 = find_k_nearest_neighbors_tensor(word2, one_hot, k)
print(nearest_words2)


['', 'ميچكد', 'اینفلوئنسرها', 'بندازن', 'موهایی', 'خوبه!؟', 'قشنگی😍🤝🏻', 'کیپاپرایم', 'ارشاد.', 'بازکرده.']
['', 'ميچكد', 'اینفلوئنسرها', 'بندازن', 'موهایی', 'خوبه!؟', 'قشنگی😍🤝🏻', 'کیپاپرایم', 'ارشاد.', 'بازکرده.']


In [None]:
# seems like most values of the embedding is zero. lets check for none zero ones.
print(word1)
nonzero_index=0
for value in one_hot[word1]:
  if value != 0.0:
    print(nonzero_index,value)
  nonzero_index += 1

print(word2)
nonzero_index=0
for value in one_hot[word2]:
  if value != 0.0:
    print(nonzero_index,value)
  nonzero_index += 1


آزادی
11319 tensor(1., dtype=torch.float64)
مهسا
17139 tensor(1., dtype=torch.float64)


Each vector in the one hot encoding is orthogonal to each other. So the cosine similarity as well as distance between any two vectors are same.
so the nearest words found are the same for the two words. The cosine similarity of each pair of words is 0.

##### Describe advantages and disadvantages of one-hot encoding

#Advantage:
* Simple and Intuitive: One-hot encoding is straightforward to understand and implement. It assigns a unique binary value (1 or 0) to each category, making it easy to interpret.

* Preservation of Information: Each category is represented by its own dimension in the encoded vector, ensuring that no information is lost during the encoding process.

* Compatibility with Machine Learning Algorithms: Many machine learning algorithms, especially those based on linear algebra (e.g., logistic regression, support vector machines), require numerical input data. One-hot encoding provides a way to represent categorical data in a numerical format that can be easily fed into these algorithms.

* Insensitivity to Magnitude: One-hot encoding does not introduce any ordinal relationship between categories. Each category is treated as equally different from others, which can be beneficial in situations where there is no inherent order among categories.
* Robustness to Out-of-Vocabulary (OOV) Words: One-hot encoding inherently handles out-of-vocabulary words or categories by assigning them a dedicated dimension with all zeros. This property makes it robust to unseen categories during training, as the model can still process them during inference without the need for retraining.

#Disadvantages:

* Sparsity: The encoded vectors are sparse, meaning that most elements are zeros. In datasets with many categories, most of the encoded vectors will contain mostly zeros, leading to inefficient memory usage (as I could not implement it by myself because of huge ram usage) and potentially slower computation.


* Lack of Similarity Information: One-hot encoding treats all categories as completely distinct and unrelated, even if some categories may have similarities. This can be problematic in tasks where understanding the relationships between categories is important, such as semantic analysis or recommendation systems.
like this example: the k nearest words where same for all words.

* Curse of Dimensionality: High-dimensional feature spaces can suffer from the curse of dimensionality, where the amount of data required to effectively cover the space grows exponentially with the number of dimensions. This can lead to overfitting and reduced generalization performance of machine learning models.

* Increased Model Complexity: In some cases, the high dimensionality introduced by one-hot encoding can lead to increased model complexity. Models trained on high-dimensional data may require more parameters to learn, leading to longer training times and potentially higher risk of overfitting, especially if the dataset is small. This can make the model less interpretable and harder to understand.


# 3. TF-IDF

In [None]:
# 1. Find the TF-IDF of all tweets
tf_all = []
idf = {}
# split to words and set idf to zero
for snt in tweets:
    for word in snt.split(' '):
        idf[word] = 0
# Loop through each tweet to calculate TF and update IDF based on reputations
for snt in tweets:
    tf = {}
    for word in snt.split(' '):
        if word not in tf:
            tf[word] = 0
        tf[word] += 1
        idf[word] += 1
    n = len(snt.split(' '))

    # Normalize TF values
    for word in tf:
        tf[word] = float(tf[word]) / n
    tf_all.append(tf)

unique = len(idf)
list_word = {}
for i, word in enumerate(idf):
    # Calculate IDF values
    idf[word] = math.log(unique / idf[word])
    list_word[word] = i

array = np.zeros((len(tweets), unique))
for i, snt in enumerate(tweets):
    for word in snt.split(' '):
        array[i, list_word[word]] = tf_all[i][word] * idf[word]

dict_tf_idf = {}
for i, snt in enumerate(tweets):
    dict_tf_idf[snt] = torch.tensor(array[i])


# 2. Choose one tweet randomly
random_tweet_index = np.random.randint(len(tweets))
random_tweet = tweets[random_tweet_index]
print("Chosen Tweet:", random_tweet)


# 3. Find 10 nearest tweets from the chosen tweet
neighbors = find_k_nearest_neighbors_tensor(random_tweet, dict_tf_idf, 10)
print("Nearest Tweets:", neighbors)

Chosen Tweet: بخاطر حنانه کیا دختر شهید نوشهری
Nearest Tweets: ['برای حنانه کیا', 'برای حنانه', 'برای مهسا برای حنانه برای دختر آینده م', 'بخاطر نیلوفر بخاطر حنانه بخاطر بردیا بخاطر هرخونی که ریختن و نفهمیدیم. برای ایران آزاد.', 'دختر ایران', 'بخاطر اون دختر ده ساله', 'بخاطر ایران', 'امثال سلیمانی ها شهید نیستند مهسا در راه آزادی شهید شد', 'با کیا شدیم ۸۰ میلیون نفر', 'شهید شهید نکنید . اینم ماموراشون که عربن']


##### Describe advantages and disadvantages of TF-IDF

#Advatages:
* Reflects word importance: TF-IDF reflects the importance of a term within a document relative to the entire corpus. This makes it effective in identifying key terms that differentiate one document from another.

* Handles common terms well: TF-IDF effectively penalizes common terms by reducing their weight. Common words like "the," "is," etc., usually appear in many documents and thus have low TF-IDF scores, allowing more focus on meaningful terms.

* Simple and efficient: The calculation of TF-IDF is straightforward and computationally efficient, making it suitable for large datasets and real-time applications.

* Language independent: TF-IDF is language-independent and can be applied to any language. It relies on word frequencies rather than linguistic features, making it versatile across different languages.

* Flexible: TF-IDF allows for flexibility in tuning parameters. For instance, you can adjust the weighting scheme, apply normalization, or set thresholds to suit specific needs or domain requirements.


#Disadvantages:

* Ignores word semantics: TF-IDF treats words as independent units and doesn't consider their semantic relationships. So words with similar meanings but different forms (e.g., "car" and "automobile") are treated as distinct terms, which leads to information loss.

* Sparse Matrix: In large document collections, the matrix representation of TF-IDF can be very sparse, which may increase memory and computational requirements for storage and processing.

* Sensitivity to document length: Longer documents may have higher overall term frequencies, potentially skewing the importance of terms. Techniques like normalization can mitigate this issue to some extent, but it remains a concern in certain contexts.

* Not suitable for capturing context: TF-IDF does not capture the context of words within documents. It treats each term independently, which may not be ideal for tasks where context is crucial, such as natural language understanding or sentiment analysis.

* Requires a representative corpus: TF-IDF relies on a representative corpus to compute document frequencies accurately. In domains where building such a corpus is challenging or where the corpus is not diverse enough, TF-IDF's effectiveness may be compromised.

# 4. Word2Vec

In [None]:
# 1. train a word2vec model base on all tweets

# Tokenize tweets into words
tokenized_tweets = [tweet.split() for tweet in tweets]
# Use a word2vec model
word2vec_model = Word2Vec(tokenized_tweets, vector_size=100, window=5, min_count=1, workers=4)

# 2. find 10 nearest words from "آزادی"

word="آزادی"
nearest_words = word2vec_model.wv.most_similar(word , topn=10)

# Print the nearest words
for word, similarity in nearest_words:
    print(word, ":", similarity)



##### Describe advantages and disadvantages of Word2Vec


#Advantages:

* Semantic similarity: Word2Vec captures semantic similarities between words effectively. Words with similar meanings are represented by vectors that are close together in the embedding space, enabling better understanding of relationships between words.

* Dimensionality reduction: Word2Vec reduces the dimensionality of word representations compared to traditional one-hot encoding. This dense representation allows for more efficient storage and computation.

* Generalization: Word2Vec embeddings are trained on large corpora, allowing them to capture general semantic relationships across different domains and languages. Pre-trained Word2Vec models can be transferred and fine-tuned for various downstream tasks.

* Speed and scalability: Training Word2Vec models can be computationally efficient, especially compared to more complex neural network architectures. With efficient algorithms like skip-gram and continuous bag-of-words (CBOW), Word2Vec can be trained on large datasets relatively quickly.

* Captures context: Word2Vec considers the context of words within a given window size, allowing it to capture syntactic and semantic relationships based on their co-occurrence patterns in text data.

#Disadvantages:

* Requires large corpus: Word2Vec models require a large corpus of text data to learn meaningful word embeddings. Training on smaller datasets may result in less accurate representations, especially for rare words or domain-specific terms.

* Context window limitation: Word2Vec uses a fixed context window size to capture word context, which may not always capture long-range dependencies or nuanced relationships between words.

* Out-of-vocabulary words: Handling out-of-vocabulary words requires additional techniques such as subword embeddings or handling unknown tokens explicitly.

* Limited to single word semantics: Word2Vec typically represents each word as a single vector, which may not capture multi-word expressions or phrases' semantics effectively. This limitation can affect tasks where understanding the meaning of phrases is crucial.

* Difficulty with polysemy: Word2Vec may struggle to differentiate between multiple meanings of polysemous words since it represents each word with a single vector. As a result, the embeddings may not capture all senses of a word accurately.


# 5. Contextualized embedding

In [None]:
!pip install transformers[sentencepiece]



In [4]:
# Load model and tokenizer

from transformers import BertModel, BertTokenizer

model_name = "HooshvareLab/bert-base-parsbert-uncased"


In [16]:
from torch.utils.data import DataLoader, Dataset
from transformers import BertForSequenceClassification, BertTokenizer, AdamW
# Read the CSV file with the sentiment data
data = pd.read_csv(SOURCE_DIR)
texts = data['Text']
labels = data['Sentiment']

label_to_number_mapping = {label: number for number, label in enumerate(set(labels))}
print(label_to_number_mapping)
labels = labels.map(label_to_number_mapping)

{'mixed': 0, 'no sentiment expressed': 1, 'positive': 2, 'very negative': 3, 'negative': 4, 'very positive': 5}


resources:
https://www.geeksforgeeks.org/one-hot-encoding-in-nlp/


In [17]:
# Load the BERT tokenizer
from transformers import BertTokenizer
from torch.utils.data import Dataset

# Load the BERT tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Define a custom dataset for sentiment classification
class CustomDataset(Dataset):
    def __init__(self, df, tokenizer):
        self.df = df
        self.tokenizer = tokenizer

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        tweet = self.df.loc[idx, 'Text']
        label = self.df.loc[idx, 'Sentiment']

        # Convert label to number using label_to_number_mapping
        label = label_to_number_mapping[label]

        # Tokenize tweet
        encoding = self.tokenizer.encode_plus(
            tweet,
            truncation=True,
            padding='max_length',
            max_length=128,
            return_tensors='pt'
        )

        # Flatten input_ids and attention_mask tensors
        input_ids = encoding['input_ids'].flatten()
        attention_mask = encoding['attention_mask'].flatten()

        return {
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'labels': torch.tensor(label, dtype=torch.long)
        }

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [18]:

# Define the BERT model for sentiment classification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=6)

# Define the optimizer and learning rate scheduler
optimizer = AdamW(model.parameters(), lr=1e-5)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [19]:
dataset = CustomDataset(data, tokenizer)
dataloader = DataLoader(dataset, batch_size=16)

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)

cuda


In [21]:
from tqdm import tqdm
import torch.nn as nn

model.to(device)
model.train()

for batch in tqdm(dataloader):
    batch = {k: v.to(device) for k, v in batch.items()}

    outputs = model(**batch)

    loss_function = nn.CrossEntropyLoss()
    loss = loss_function(outputs.logits, batch['labels'])

    print(f"Training Loss: {loss}")

    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

print("Training complete!")

  0%|          | 0/1250 [00:00<?, ?it/s]

Training Loss: 1.7344597578048706


  0%|          | 2/1250 [00:01<16:03,  1.29it/s]

Training Loss: 1.8218648433685303


  0%|          | 3/1250 [00:02<11:36,  1.79it/s]

Training Loss: 1.7358427047729492


  0%|          | 4/1250 [00:02<09:30,  2.18it/s]

Training Loss: 1.7034474611282349


  0%|          | 5/1250 [00:02<08:20,  2.49it/s]

Training Loss: 1.7763053178787231


  0%|          | 6/1250 [00:02<07:44,  2.68it/s]

Training Loss: 1.794532060623169


  1%|          | 7/1250 [00:03<07:15,  2.85it/s]

Training Loss: 1.6735446453094482


  1%|          | 8/1250 [00:03<06:55,  2.99it/s]

Training Loss: 1.6339893341064453


  1%|          | 9/1250 [00:03<06:43,  3.08it/s]

Training Loss: 1.759379506111145


  1%|          | 10/1250 [00:04<06:34,  3.14it/s]

Training Loss: 1.6470777988433838


  1%|          | 11/1250 [00:04<06:30,  3.17it/s]

Training Loss: 1.609471321105957


  1%|          | 12/1250 [00:04<06:26,  3.20it/s]

Training Loss: 1.6880972385406494


  1%|          | 13/1250 [00:05<06:24,  3.22it/s]

Training Loss: 1.6071383953094482


  1%|          | 14/1250 [00:05<06:20,  3.25it/s]

Training Loss: 1.5651404857635498


  1%|          | 15/1250 [00:05<06:22,  3.23it/s]

Training Loss: 1.494553804397583


  1%|▏         | 16/1250 [00:06<06:21,  3.24it/s]

Training Loss: 1.6495730876922607


  1%|▏         | 17/1250 [00:06<06:24,  3.21it/s]

Training Loss: 1.4765807390213013


  1%|▏         | 18/1250 [00:06<06:25,  3.19it/s]

Training Loss: 1.4772052764892578


  2%|▏         | 19/1250 [00:06<06:30,  3.15it/s]

Training Loss: 1.5583288669586182


  2%|▏         | 20/1250 [00:07<06:30,  3.15it/s]

Training Loss: 1.5800495147705078


  2%|▏         | 21/1250 [00:07<06:27,  3.17it/s]

Training Loss: 1.5870996713638306


  2%|▏         | 22/1250 [00:07<06:23,  3.20it/s]

Training Loss: 1.2944098711013794


  2%|▏         | 23/1250 [00:08<06:21,  3.22it/s]

Training Loss: 1.6425402164459229


  2%|▏         | 24/1250 [00:08<06:18,  3.24it/s]

Training Loss: 1.2921720743179321


  2%|▏         | 25/1250 [00:08<06:17,  3.24it/s]

Training Loss: 1.5598243474960327


  2%|▏         | 26/1250 [00:09<06:15,  3.26it/s]

Training Loss: 1.7279775142669678


  2%|▏         | 27/1250 [00:09<06:15,  3.26it/s]

Training Loss: 1.5445563793182373


  2%|▏         | 28/1250 [00:09<06:14,  3.26it/s]

Training Loss: 1.4638752937316895


  2%|▏         | 29/1250 [00:10<06:13,  3.27it/s]

Training Loss: 1.3931443691253662


  2%|▏         | 30/1250 [00:10<06:19,  3.21it/s]

Training Loss: 1.4014503955841064


  2%|▏         | 31/1250 [00:10<06:17,  3.23it/s]

Training Loss: 1.432910680770874


  3%|▎         | 32/1250 [00:11<06:16,  3.24it/s]

Training Loss: 1.4600046873092651


  3%|▎         | 33/1250 [00:11<06:15,  3.24it/s]

Training Loss: 1.569703221321106


  3%|▎         | 34/1250 [00:11<06:15,  3.24it/s]

Training Loss: 1.436100721359253


  3%|▎         | 35/1250 [00:11<06:13,  3.25it/s]

Training Loss: 1.496515154838562


  3%|▎         | 36/1250 [00:12<06:13,  3.25it/s]

Training Loss: 1.4529643058776855


  3%|▎         | 37/1250 [00:12<06:13,  3.25it/s]

Training Loss: 1.5269116163253784


  3%|▎         | 38/1250 [00:12<06:13,  3.24it/s]

Training Loss: 1.6604175567626953


  3%|▎         | 39/1250 [00:13<06:12,  3.25it/s]

Training Loss: 1.4841666221618652


  3%|▎         | 40/1250 [00:13<06:12,  3.25it/s]

Training Loss: 1.3112502098083496


  3%|▎         | 41/1250 [00:13<06:11,  3.26it/s]

Training Loss: 1.5020272731781006


  3%|▎         | 42/1250 [00:14<06:12,  3.25it/s]

Training Loss: 1.4868783950805664


  3%|▎         | 43/1250 [00:14<06:11,  3.25it/s]

Training Loss: 1.4281532764434814


  4%|▎         | 44/1250 [00:14<06:10,  3.25it/s]

Training Loss: 1.3185539245605469


  4%|▎         | 45/1250 [00:15<06:11,  3.25it/s]

Training Loss: 1.3384947776794434


  4%|▎         | 46/1250 [00:15<06:12,  3.23it/s]

Training Loss: 1.4317340850830078


  4%|▍         | 47/1250 [00:15<06:11,  3.24it/s]

Training Loss: 1.4429831504821777


  4%|▍         | 48/1250 [00:15<06:10,  3.24it/s]

Training Loss: 1.372503399848938


  4%|▍         | 49/1250 [00:16<06:08,  3.26it/s]

Training Loss: 1.6291024684906006


  4%|▍         | 50/1250 [00:16<06:09,  3.25it/s]

Training Loss: 1.387541651725769


  4%|▍         | 51/1250 [00:16<06:10,  3.23it/s]

Training Loss: 1.5867960453033447


  4%|▍         | 52/1250 [00:17<06:10,  3.23it/s]

Training Loss: 1.289717197418213


  4%|▍         | 53/1250 [00:17<06:09,  3.24it/s]

Training Loss: 1.2484383583068848


  4%|▍         | 54/1250 [00:17<06:11,  3.22it/s]

Training Loss: 1.4860936403274536


  4%|▍         | 55/1250 [00:18<06:13,  3.20it/s]

Training Loss: 1.3487186431884766


  4%|▍         | 56/1250 [00:18<06:12,  3.21it/s]

Training Loss: 1.3201464414596558


  5%|▍         | 57/1250 [00:18<06:16,  3.17it/s]

Training Loss: 1.5700147151947021


  5%|▍         | 58/1250 [00:19<06:19,  3.14it/s]

Training Loss: 1.289605736732483


  5%|▍         | 59/1250 [00:19<06:20,  3.13it/s]

Training Loss: 1.3004997968673706


  5%|▍         | 60/1250 [00:19<06:19,  3.13it/s]

Training Loss: 1.1706146001815796


  5%|▍         | 61/1250 [00:20<06:17,  3.15it/s]

Training Loss: 1.3635965585708618


  5%|▍         | 62/1250 [00:20<06:12,  3.19it/s]

Training Loss: 1.6308979988098145


  5%|▌         | 63/1250 [00:20<06:10,  3.20it/s]

Training Loss: 1.2289732694625854


  5%|▌         | 64/1250 [00:20<06:08,  3.22it/s]

Training Loss: 1.6122709512710571


  5%|▌         | 65/1250 [00:21<06:07,  3.23it/s]

Training Loss: 1.360028624534607


  5%|▌         | 66/1250 [00:21<06:07,  3.22it/s]

Training Loss: 1.7521404027938843


  5%|▌         | 67/1250 [00:21<06:05,  3.24it/s]

Training Loss: 1.629397988319397


  5%|▌         | 68/1250 [00:22<06:04,  3.24it/s]

Training Loss: 1.3668395280838013


  6%|▌         | 69/1250 [00:22<06:06,  3.22it/s]

Training Loss: 1.6064053773880005


  6%|▌         | 70/1250 [00:22<06:04,  3.24it/s]

Training Loss: 1.5926634073257446


  6%|▌         | 71/1250 [00:23<06:04,  3.24it/s]

Training Loss: 1.5409563779830933


  6%|▌         | 72/1250 [00:23<06:03,  3.24it/s]

Training Loss: 1.5708686113357544


  6%|▌         | 73/1250 [00:23<06:02,  3.25it/s]

Training Loss: 1.3285695314407349


  6%|▌         | 74/1250 [00:24<06:03,  3.24it/s]

Training Loss: 1.3326148986816406


  6%|▌         | 75/1250 [00:24<06:03,  3.23it/s]

Training Loss: 1.7590670585632324


  6%|▌         | 76/1250 [00:24<06:03,  3.23it/s]

Training Loss: 1.7005043029785156


  6%|▌         | 77/1250 [00:24<06:05,  3.21it/s]

Training Loss: 1.4246703386306763


  6%|▌         | 78/1250 [00:25<06:03,  3.22it/s]

Training Loss: 1.3974769115447998


  6%|▋         | 79/1250 [00:25<06:05,  3.20it/s]

Training Loss: 1.7356007099151611


  6%|▋         | 80/1250 [00:25<06:05,  3.20it/s]

Training Loss: 1.343908667564392


  6%|▋         | 81/1250 [00:26<06:04,  3.21it/s]

Training Loss: 1.7320431470870972


  7%|▋         | 82/1250 [00:26<06:04,  3.21it/s]

Training Loss: 1.4826555252075195


  7%|▋         | 83/1250 [00:26<06:03,  3.21it/s]

Training Loss: 1.4758274555206299


  7%|▋         | 84/1250 [00:27<06:06,  3.18it/s]

Training Loss: 1.5274810791015625


  7%|▋         | 85/1250 [00:27<06:04,  3.19it/s]

Training Loss: 1.2548378705978394


  7%|▋         | 86/1250 [00:27<06:02,  3.21it/s]

Training Loss: 1.339328646659851


  7%|▋         | 87/1250 [00:28<06:02,  3.21it/s]

Training Loss: 1.4344382286071777


  7%|▋         | 88/1250 [00:28<06:02,  3.21it/s]

Training Loss: 1.5002392530441284


  7%|▋         | 89/1250 [00:28<06:00,  3.22it/s]

Training Loss: 1.1293119192123413


  7%|▋         | 90/1250 [00:29<06:00,  3.22it/s]

Training Loss: 1.5445142984390259


  7%|▋         | 91/1250 [00:29<06:00,  3.21it/s]

Training Loss: 1.3188698291778564


  7%|▋         | 92/1250 [00:29<06:03,  3.19it/s]

Training Loss: 1.4637812376022339


  7%|▋         | 93/1250 [00:29<06:06,  3.15it/s]

Training Loss: 1.375916600227356


  8%|▊         | 94/1250 [00:30<06:08,  3.14it/s]

Training Loss: 1.7080037593841553


  8%|▊         | 95/1250 [00:30<06:07,  3.14it/s]

Training Loss: 1.5715419054031372


  8%|▊         | 96/1250 [00:30<06:08,  3.13it/s]

Training Loss: 1.3493316173553467


  8%|▊         | 97/1250 [00:31<06:10,  3.11it/s]

Training Loss: 1.215606451034546


  8%|▊         | 98/1250 [00:31<06:11,  3.10it/s]

Training Loss: 1.471156358718872


  8%|▊         | 99/1250 [00:31<06:12,  3.09it/s]

Training Loss: 1.5981433391571045


  8%|▊         | 100/1250 [00:32<06:08,  3.12it/s]

Training Loss: 1.5311992168426514


  8%|▊         | 101/1250 [00:32<06:05,  3.14it/s]

Training Loss: 1.4830516576766968


  8%|▊         | 102/1250 [00:32<06:03,  3.16it/s]

Training Loss: 1.3944461345672607


  8%|▊         | 103/1250 [00:33<06:01,  3.17it/s]

Training Loss: 1.5846669673919678


  8%|▊         | 104/1250 [00:33<06:00,  3.18it/s]

Training Loss: 1.3164479732513428


  8%|▊         | 105/1250 [00:33<06:01,  3.16it/s]

Training Loss: 1.7271640300750732


  8%|▊         | 106/1250 [00:34<06:00,  3.18it/s]

Training Loss: 1.1860501766204834


  9%|▊         | 107/1250 [00:34<05:58,  3.19it/s]

Training Loss: 1.220859408378601


  9%|▊         | 108/1250 [00:34<05:59,  3.18it/s]

Training Loss: 1.2902530431747437


  9%|▊         | 109/1250 [00:35<05:59,  3.18it/s]

Training Loss: 1.2830967903137207


  9%|▉         | 110/1250 [00:35<05:59,  3.17it/s]

Training Loss: 1.1808764934539795


  9%|▉         | 111/1250 [00:35<05:58,  3.17it/s]

Training Loss: 1.2007685899734497


  9%|▉         | 112/1250 [00:35<05:57,  3.18it/s]

Training Loss: 1.3541908264160156


  9%|▉         | 113/1250 [00:36<05:57,  3.18it/s]

Training Loss: 1.6192209720611572


  9%|▉         | 114/1250 [00:36<05:57,  3.18it/s]

Training Loss: 1.5525264739990234


  9%|▉         | 115/1250 [00:36<05:58,  3.17it/s]

Training Loss: 1.4330034255981445


  9%|▉         | 116/1250 [00:37<05:58,  3.17it/s]

Training Loss: 1.3247919082641602


  9%|▉         | 117/1250 [00:37<05:58,  3.16it/s]

Training Loss: 1.626785159111023


  9%|▉         | 118/1250 [00:37<06:00,  3.14it/s]

Training Loss: 1.53643000125885


 10%|▉         | 119/1250 [00:38<05:59,  3.15it/s]

Training Loss: 1.412449598312378


 10%|▉         | 120/1250 [00:38<05:58,  3.15it/s]

Training Loss: 1.7149462699890137


 10%|▉         | 121/1250 [00:38<05:59,  3.14it/s]

Training Loss: 1.4677672386169434


 10%|▉         | 122/1250 [00:39<05:59,  3.14it/s]

Training Loss: 1.49590265750885


 10%|▉         | 123/1250 [00:39<05:56,  3.16it/s]

Training Loss: 1.3954715728759766


 10%|▉         | 124/1250 [00:39<05:56,  3.16it/s]

Training Loss: 1.502476453781128


 10%|█         | 125/1250 [00:40<05:56,  3.15it/s]

Training Loss: 1.7871344089508057


 10%|█         | 126/1250 [00:40<05:55,  3.16it/s]

Training Loss: 1.6312096118927002


 10%|█         | 127/1250 [00:40<05:54,  3.17it/s]

Training Loss: 1.4978681802749634


 10%|█         | 128/1250 [00:41<05:54,  3.16it/s]

Training Loss: 1.4827252626419067


 10%|█         | 129/1250 [00:41<05:55,  3.16it/s]

Training Loss: 1.669910192489624
Training Loss: 1.4910552501678467


 10%|█         | 131/1250 [00:42<05:58,  3.12it/s]

Training Loss: 1.6392699480056763


 11%|█         | 132/1250 [00:42<05:59,  3.11it/s]

Training Loss: 1.625590205192566


 11%|█         | 133/1250 [00:42<05:58,  3.11it/s]

Training Loss: 1.4920988082885742


 11%|█         | 134/1250 [00:42<05:59,  3.10it/s]

Training Loss: 1.467360258102417


 11%|█         | 135/1250 [00:43<05:59,  3.10it/s]

Training Loss: 1.3855087757110596


 11%|█         | 136/1250 [00:43<05:59,  3.10it/s]

Training Loss: 1.6105444431304932


 11%|█         | 137/1250 [00:43<06:02,  3.07it/s]

Training Loss: 1.4724576473236084


 11%|█         | 138/1250 [00:44<05:58,  3.10it/s]

Training Loss: 1.495155930519104


 11%|█         | 139/1250 [00:44<05:57,  3.11it/s]

Training Loss: 1.3761013746261597


 11%|█         | 140/1250 [00:44<05:56,  3.12it/s]

Training Loss: 1.5517356395721436


 11%|█▏        | 141/1250 [00:45<05:55,  3.12it/s]

Training Loss: 1.3567912578582764


 11%|█▏        | 142/1250 [00:45<05:53,  3.13it/s]

Training Loss: 1.6376440525054932


 11%|█▏        | 143/1250 [00:45<05:54,  3.12it/s]

Training Loss: 1.5439094305038452


 12%|█▏        | 144/1250 [00:46<05:51,  3.14it/s]

Training Loss: 1.3041090965270996


 12%|█▏        | 145/1250 [00:46<05:51,  3.15it/s]

Training Loss: 1.4465515613555908


 12%|█▏        | 146/1250 [00:46<05:51,  3.14it/s]

Training Loss: 1.4003976583480835


 12%|█▏        | 147/1250 [00:47<05:55,  3.10it/s]

Training Loss: 1.4377918243408203


 12%|█▏        | 148/1250 [00:47<05:53,  3.11it/s]

Training Loss: 1.2866445779800415


 12%|█▏        | 149/1250 [00:47<05:53,  3.11it/s]

Training Loss: 1.568477749824524


 12%|█▏        | 150/1250 [00:48<05:52,  3.12it/s]

Training Loss: 1.3835593461990356


 12%|█▏        | 151/1250 [00:48<05:47,  3.16it/s]

Training Loss: 1.286820650100708


 12%|█▏        | 152/1250 [00:48<05:47,  3.16it/s]

Training Loss: 1.259028673171997


 12%|█▏        | 153/1250 [00:49<05:46,  3.16it/s]

Training Loss: 1.3139081001281738


 12%|█▏        | 154/1250 [00:49<05:48,  3.15it/s]

Training Loss: 1.4436428546905518


 12%|█▏        | 155/1250 [00:49<05:49,  3.14it/s]

Training Loss: 1.1227821111679077


 12%|█▏        | 156/1250 [00:50<05:48,  3.14it/s]

Training Loss: 1.1036427021026611


 13%|█▎        | 157/1250 [00:50<05:49,  3.12it/s]

Training Loss: 1.5260475873947144


 13%|█▎        | 158/1250 [00:50<05:49,  3.13it/s]

Training Loss: 1.5670158863067627


 13%|█▎        | 159/1250 [00:50<05:50,  3.11it/s]

Training Loss: 2.0260300636291504


 13%|█▎        | 160/1250 [00:51<05:50,  3.11it/s]

Training Loss: 1.4574090242385864


 13%|█▎        | 161/1250 [00:51<05:51,  3.10it/s]

Training Loss: 1.3745683431625366


 13%|█▎        | 162/1250 [00:51<05:48,  3.12it/s]

Training Loss: 1.3375824689865112


 13%|█▎        | 163/1250 [00:52<05:48,  3.12it/s]

Training Loss: 1.5349286794662476


 13%|█▎        | 164/1250 [00:52<05:49,  3.11it/s]

Training Loss: 1.3152177333831787


 13%|█▎        | 165/1250 [00:52<05:49,  3.11it/s]

Training Loss: 1.5191211700439453


 13%|█▎        | 166/1250 [00:53<05:49,  3.10it/s]

Training Loss: 1.39347243309021


 13%|█▎        | 167/1250 [00:53<05:49,  3.10it/s]

Training Loss: 1.4674807786941528


 13%|█▎        | 168/1250 [00:53<05:48,  3.11it/s]

Training Loss: 1.3911813497543335


 14%|█▎        | 169/1250 [00:54<05:50,  3.09it/s]

Training Loss: 1.2543566226959229


 14%|█▎        | 170/1250 [00:54<05:50,  3.08it/s]

Training Loss: 1.2769296169281006


 14%|█▎        | 171/1250 [00:54<05:51,  3.07it/s]

Training Loss: 1.5341609716415405


 14%|█▍        | 172/1250 [00:55<05:51,  3.07it/s]

Training Loss: 1.2585604190826416
Training Loss: 1.373284101486206


 14%|█▍        | 174/1250 [00:55<05:54,  3.03it/s]

Training Loss: 1.4660398960113525
Training Loss: 1.5654712915420532


 14%|█▍        | 176/1250 [00:56<05:50,  3.06it/s]

Training Loss: 1.4449905157089233


 14%|█▍        | 177/1250 [00:56<05:48,  3.08it/s]

Training Loss: 1.4679551124572754


 14%|█▍        | 178/1250 [00:57<05:46,  3.09it/s]

Training Loss: 1.6900337934494019


 14%|█▍        | 179/1250 [00:57<05:45,  3.10it/s]

Training Loss: 1.3381845951080322


 14%|█▍        | 180/1250 [00:57<05:46,  3.08it/s]

Training Loss: 1.3277080059051514


 14%|█▍        | 181/1250 [00:58<05:45,  3.09it/s]

Training Loss: 1.5513098239898682


 15%|█▍        | 182/1250 [00:58<05:43,  3.11it/s]

Training Loss: 1.3672972917556763
Training Loss: 1.4244049787521362


 15%|█▍        | 184/1250 [00:59<05:43,  3.10it/s]

Training Loss: 1.4280729293823242


 15%|█▍        | 185/1250 [00:59<05:43,  3.10it/s]

Training Loss: 1.660723090171814


 15%|█▍        | 186/1250 [00:59<05:43,  3.10it/s]

Training Loss: 1.2940272092819214


 15%|█▍        | 187/1250 [01:00<05:43,  3.09it/s]

Training Loss: 1.6676831245422363
Training Loss: 1.585634708404541


 15%|█▌        | 188/1250 [01:00<05:45,  3.08it/s]

Training Loss: 1.4723458290100098


 15%|█▌        | 190/1250 [01:01<05:43,  3.09it/s]

Training Loss: 1.36161470413208


 15%|█▌        | 191/1250 [01:01<05:41,  3.10it/s]

Training Loss: 1.3467618227005005


 15%|█▌        | 192/1250 [01:01<05:39,  3.11it/s]

Training Loss: 1.279167652130127
Training Loss: 1.2575230598449707


 16%|█▌        | 194/1250 [01:02<05:40,  3.10it/s]

Training Loss: 1.7349843978881836


 16%|█▌        | 195/1250 [01:02<05:39,  3.11it/s]

Training Loss: 1.7614187002182007


 16%|█▌        | 196/1250 [01:02<05:39,  3.11it/s]

Training Loss: 1.5601059198379517


 16%|█▌        | 197/1250 [01:03<05:38,  3.11it/s]

Training Loss: 1.4293359518051147
Training Loss: 1.6944282054901123


 16%|█▌        | 198/1250 [01:03<05:40,  3.09it/s]

Training Loss: 1.510978102684021


 16%|█▌        | 200/1250 [01:04<05:39,  3.09it/s]

Training Loss: 1.6787883043289185


 16%|█▌        | 201/1250 [01:04<05:39,  3.09it/s]

Training Loss: 1.3074394464492798
Training Loss: 1.4990726709365845


 16%|█▌        | 203/1250 [01:05<05:40,  3.08it/s]

Training Loss: 1.2047661542892456


 16%|█▋        | 204/1250 [01:05<05:39,  3.08it/s]

Training Loss: 1.6340446472167969


 16%|█▋        | 205/1250 [01:05<05:39,  3.08it/s]

Training Loss: 1.486543893814087
Training Loss: 1.6185886859893799


 16%|█▋        | 206/1250 [01:06<05:44,  3.03it/s]

Training Loss: 1.1830040216445923


 17%|█▋        | 207/1250 [01:06<05:45,  3.02it/s]

Training Loss: 1.4413306713104248


 17%|█▋        | 208/1250 [01:06<05:47,  3.00it/s]

Training Loss: 1.6406712532043457


 17%|█▋        | 209/1250 [01:07<05:48,  2.98it/s]

Training Loss: 1.6212329864501953


 17%|█▋        | 210/1250 [01:07<05:51,  2.96it/s]

Training Loss: 1.4086394309997559


 17%|█▋        | 211/1250 [01:07<05:50,  2.97it/s]

Training Loss: 1.1856635808944702


 17%|█▋        | 212/1250 [01:08<05:49,  2.97it/s]

Training Loss: 1.4057230949401855


 17%|█▋        | 214/1250 [01:08<05:41,  3.03it/s]

Training Loss: 1.363115668296814


 17%|█▋        | 215/1250 [01:09<05:37,  3.07it/s]

Training Loss: 1.3132330179214478
Training Loss: 1.5665327310562134


 17%|█▋        | 216/1250 [01:09<05:37,  3.07it/s]

Training Loss: 1.5715153217315674


 17%|█▋        | 217/1250 [01:09<05:40,  3.03it/s]

Training Loss: 1.468619465827942


 17%|█▋        | 218/1250 [01:10<05:39,  3.04it/s]

Training Loss: 1.5977095365524292


 18%|█▊        | 219/1250 [01:10<05:39,  3.04it/s]

Training Loss: 1.7373610734939575


 18%|█▊        | 220/1250 [01:10<05:37,  3.05it/s]

Training Loss: 1.2611777782440186


 18%|█▊        | 221/1250 [01:11<05:39,  3.03it/s]

Training Loss: 1.2803661823272705


 18%|█▊        | 223/1250 [01:11<05:36,  3.05it/s]

Training Loss: 1.5591905117034912


 18%|█▊        | 224/1250 [01:12<05:35,  3.06it/s]

Training Loss: 1.2873327732086182
Training Loss: 1.6186212301254272


 18%|█▊        | 225/1250 [01:12<05:35,  3.06it/s]

Training Loss: 1.312718152999878


 18%|█▊        | 226/1250 [01:12<05:37,  3.03it/s]

Training Loss: 1.275750994682312


 18%|█▊        | 227/1250 [01:13<05:36,  3.04it/s]

Training Loss: 1.1738237142562866


 18%|█▊        | 228/1250 [01:13<05:36,  3.04it/s]

Training Loss: 1.4381994009017944


 18%|█▊        | 229/1250 [01:13<05:34,  3.05it/s]

Training Loss: 1.3982822895050049


 18%|█▊        | 230/1250 [01:14<05:34,  3.05it/s]

Training Loss: 1.5959925651550293


 19%|█▊        | 232/1250 [01:14<05:32,  3.06it/s]

Training Loss: 1.6548200845718384
Training Loss: 1.4196451902389526


 19%|█▊        | 234/1250 [01:15<05:32,  3.05it/s]

Training Loss: 1.3252246379852295
Training Loss: 1.2601231336593628


 19%|█▉        | 235/1250 [01:15<05:32,  3.05it/s]

Training Loss: 1.3000621795654297


 19%|█▉        | 236/1250 [01:16<05:33,  3.04it/s]

Training Loss: 1.3502634763717651


 19%|█▉        | 237/1250 [01:16<05:31,  3.06it/s]

Training Loss: 1.4434688091278076


 19%|█▉        | 238/1250 [01:16<05:33,  3.03it/s]

Training Loss: 1.280058741569519


 19%|█▉        | 239/1250 [01:17<05:32,  3.04it/s]

Training Loss: 1.3181134462356567


 19%|█▉        | 240/1250 [01:17<05:31,  3.04it/s]

Training Loss: 1.189072847366333


 19%|█▉        | 241/1250 [01:17<05:32,  3.04it/s]

Training Loss: 1.560369849205017


 19%|█▉        | 242/1250 [01:18<05:32,  3.03it/s]

Training Loss: 1.3474680185317993


 19%|█▉        | 243/1250 [01:18<05:33,  3.02it/s]

Training Loss: 1.2597203254699707


 20%|█▉        | 244/1250 [01:18<05:35,  3.00it/s]

Training Loss: 1.5327292680740356


 20%|█▉        | 245/1250 [01:19<05:36,  2.98it/s]

Training Loss: 1.3535797595977783


 20%|█▉        | 246/1250 [01:19<05:38,  2.97it/s]

Training Loss: 1.0582181215286255


 20%|█▉        | 247/1250 [01:19<05:38,  2.96it/s]

Training Loss: 1.550357699394226


 20%|█▉        | 248/1250 [01:20<05:39,  2.95it/s]

Training Loss: 1.3299510478973389


 20%|█▉        | 249/1250 [01:20<05:37,  2.97it/s]

Training Loss: 1.3499116897583008


 20%|██        | 250/1250 [01:20<05:34,  2.99it/s]

Training Loss: 1.2896875143051147


 20%|██        | 251/1250 [01:21<05:32,  3.00it/s]

Training Loss: 1.5082159042358398


 20%|██        | 252/1250 [01:21<05:31,  3.01it/s]

Training Loss: 1.4560483694076538


 20%|██        | 253/1250 [01:21<05:31,  3.01it/s]

Training Loss: 1.3603816032409668


 20%|██        | 254/1250 [01:22<05:31,  3.01it/s]

Training Loss: 1.5126851797103882


 20%|██        | 255/1250 [01:22<05:32,  3.00it/s]

Training Loss: 1.3542636632919312


 20%|██        | 256/1250 [01:22<05:31,  3.00it/s]

Training Loss: 1.1643412113189697


 21%|██        | 257/1250 [01:23<05:31,  2.99it/s]

Training Loss: 1.5126408338546753


 21%|██        | 258/1250 [01:23<05:30,  3.00it/s]

Training Loss: 1.4465000629425049


 21%|██        | 259/1250 [01:23<05:29,  3.01it/s]

Training Loss: 1.4415574073791504


 21%|██        | 261/1250 [01:24<05:27,  3.02it/s]

Training Loss: 1.5036464929580688
Training Loss: 1.6489850282669067


 21%|██        | 262/1250 [01:24<05:28,  3.00it/s]

Training Loss: 1.3048558235168457


 21%|██        | 263/1250 [01:25<05:27,  3.01it/s]

Training Loss: 1.1771659851074219


 21%|██        | 264/1250 [01:25<05:26,  3.02it/s]

Training Loss: 1.4709714651107788


 21%|██        | 265/1250 [01:25<05:28,  3.00it/s]

Training Loss: 1.5433976650238037


 21%|██▏       | 266/1250 [01:26<05:27,  3.00it/s]

Training Loss: 1.540065050125122


 21%|██▏       | 267/1250 [01:26<05:25,  3.02it/s]

Training Loss: 1.3112322092056274


 21%|██▏       | 268/1250 [01:26<05:28,  2.99it/s]

Training Loss: 1.246971845626831


 22%|██▏       | 269/1250 [01:27<05:28,  2.99it/s]

Training Loss: 1.530328631401062


 22%|██▏       | 270/1250 [01:27<05:25,  3.01it/s]

Training Loss: 1.99391770362854


 22%|██▏       | 271/1250 [01:27<05:26,  3.00it/s]

Training Loss: 1.4544036388397217


 22%|██▏       | 272/1250 [01:28<05:25,  3.01it/s]

Training Loss: 1.4364912509918213


 22%|██▏       | 273/1250 [01:28<05:25,  3.00it/s]

Training Loss: 1.4132503271102905


 22%|██▏       | 274/1250 [01:28<05:23,  3.01it/s]

Training Loss: 1.2763586044311523


 22%|██▏       | 275/1250 [01:29<05:24,  3.01it/s]

Training Loss: 1.2077789306640625


 22%|██▏       | 276/1250 [01:29<05:24,  3.00it/s]

Training Loss: 1.7153548002243042


 22%|██▏       | 277/1250 [01:29<05:23,  3.01it/s]

Training Loss: 1.680748462677002


 22%|██▏       | 279/1250 [01:30<05:23,  3.00it/s]

Training Loss: 1.1617281436920166
Training Loss: 1.564387321472168


 22%|██▏       | 280/1250 [01:30<05:26,  2.97it/s]

Training Loss: 1.3090225458145142


 22%|██▏       | 281/1250 [01:31<05:26,  2.97it/s]

Training Loss: 1.6519780158996582


 23%|██▎       | 282/1250 [01:31<05:26,  2.97it/s]

Training Loss: 1.5718797445297241


 23%|██▎       | 283/1250 [01:31<05:26,  2.96it/s]

Training Loss: 1.4489248991012573


 23%|██▎       | 284/1250 [01:32<05:26,  2.96it/s]

Training Loss: 1.4227943420410156


 23%|██▎       | 285/1250 [01:32<05:27,  2.95it/s]

Training Loss: 1.3299504518508911


 23%|██▎       | 286/1250 [01:32<05:25,  2.96it/s]

Training Loss: 1.3882579803466797


 23%|██▎       | 287/1250 [01:33<05:25,  2.96it/s]

Training Loss: 1.3095118999481201


 23%|██▎       | 288/1250 [01:33<05:24,  2.96it/s]

Training Loss: 1.255907416343689


 23%|██▎       | 289/1250 [01:33<05:23,  2.97it/s]

Training Loss: 1.3386924266815186


 23%|██▎       | 290/1250 [01:34<05:22,  2.97it/s]

Training Loss: 1.441144585609436


 23%|██▎       | 291/1250 [01:34<05:22,  2.97it/s]

Training Loss: 1.461229920387268


 23%|██▎       | 292/1250 [01:34<05:21,  2.98it/s]

Training Loss: 1.3913439512252808


 23%|██▎       | 293/1250 [01:35<05:21,  2.97it/s]

Training Loss: 1.2217941284179688


 24%|██▎       | 294/1250 [01:35<05:20,  2.98it/s]

Training Loss: 1.4180179834365845


 24%|██▎       | 295/1250 [01:35<05:20,  2.98it/s]

Training Loss: 1.1683658361434937


 24%|██▎       | 296/1250 [01:36<05:19,  2.98it/s]

Training Loss: 1.7006936073303223


 24%|██▍       | 297/1250 [01:36<05:22,  2.95it/s]

Training Loss: 1.6635462045669556


 24%|██▍       | 298/1250 [01:36<05:23,  2.94it/s]

Training Loss: 1.3209450244903564


 24%|██▍       | 299/1250 [01:37<05:22,  2.95it/s]

Training Loss: 1.3786035776138306


 24%|██▍       | 300/1250 [01:37<05:21,  2.95it/s]

Training Loss: 1.6488053798675537


 24%|██▍       | 301/1250 [01:37<05:20,  2.96it/s]

Training Loss: 1.7101070880889893


 24%|██▍       | 302/1250 [01:38<05:19,  2.97it/s]

Training Loss: 1.4062623977661133


 24%|██▍       | 303/1250 [01:38<05:20,  2.95it/s]

Training Loss: 1.2712948322296143


 24%|██▍       | 304/1250 [01:38<05:18,  2.97it/s]

Training Loss: 1.1590728759765625


 24%|██▍       | 305/1250 [01:39<05:18,  2.97it/s]

Training Loss: 1.3980828523635864


 24%|██▍       | 306/1250 [01:39<05:18,  2.96it/s]

Training Loss: 1.2373219728469849


 25%|██▍       | 307/1250 [01:39<05:17,  2.97it/s]

Training Loss: 1.2723292112350464


 25%|██▍       | 308/1250 [01:40<05:21,  2.93it/s]

Training Loss: 1.9026951789855957


 25%|██▍       | 309/1250 [01:40<05:19,  2.95it/s]

Training Loss: 1.6240568161010742


 25%|██▍       | 310/1250 [01:40<05:19,  2.94it/s]

Training Loss: 1.3625609874725342


 25%|██▍       | 311/1250 [01:41<05:18,  2.95it/s]

Training Loss: 1.5017991065979004


 25%|██▍       | 312/1250 [01:41<05:18,  2.94it/s]

Training Loss: 1.403831958770752


 25%|██▌       | 313/1250 [01:41<05:18,  2.94it/s]

Training Loss: 1.5208640098571777


 25%|██▌       | 314/1250 [01:42<05:17,  2.95it/s]

Training Loss: 1.7485098838806152


 25%|██▌       | 315/1250 [01:42<05:19,  2.93it/s]

Training Loss: 1.4098020792007446


 25%|██▌       | 316/1250 [01:42<05:21,  2.91it/s]

Training Loss: 1.3706746101379395


 25%|██▌       | 317/1250 [01:43<05:25,  2.86it/s]

Training Loss: 1.2917732000350952


 25%|██▌       | 318/1250 [01:43<05:24,  2.88it/s]

Training Loss: 1.2911419868469238


 26%|██▌       | 319/1250 [01:44<05:23,  2.88it/s]

Training Loss: 1.4420092105865479


 26%|██▌       | 320/1250 [01:44<05:23,  2.87it/s]

Training Loss: 1.5573973655700684


 26%|██▌       | 321/1250 [01:44<05:20,  2.90it/s]

Training Loss: 1.5406241416931152


 26%|██▌       | 322/1250 [01:45<05:20,  2.90it/s]

Training Loss: 1.2134497165679932


 26%|██▌       | 323/1250 [01:45<05:17,  2.92it/s]

Training Loss: 1.2364946603775024


 26%|██▌       | 324/1250 [01:45<05:16,  2.93it/s]

Training Loss: 1.36841881275177


 26%|██▌       | 325/1250 [01:46<05:15,  2.94it/s]

Training Loss: 1.2782645225524902


 26%|██▌       | 326/1250 [01:46<05:17,  2.91it/s]

Training Loss: 1.4486178159713745


 26%|██▌       | 327/1250 [01:46<05:15,  2.93it/s]

Training Loss: 1.3078138828277588


 26%|██▌       | 328/1250 [01:47<05:15,  2.93it/s]

Training Loss: 1.3524274826049805


 26%|██▋       | 329/1250 [01:47<05:14,  2.93it/s]

Training Loss: 1.243868350982666


 26%|██▋       | 330/1250 [01:47<05:14,  2.93it/s]

Training Loss: 1.5110067129135132


 26%|██▋       | 331/1250 [01:48<05:10,  2.96it/s]

Training Loss: 1.2507495880126953


 27%|██▋       | 332/1250 [01:48<05:11,  2.94it/s]

Training Loss: 1.2576543092727661


 27%|██▋       | 333/1250 [01:48<05:11,  2.94it/s]

Training Loss: 1.3866310119628906


 27%|██▋       | 334/1250 [01:49<05:10,  2.95it/s]

Training Loss: 1.3182185888290405


 27%|██▋       | 335/1250 [01:49<05:10,  2.94it/s]

Training Loss: 1.1368298530578613


 27%|██▋       | 336/1250 [01:49<05:10,  2.95it/s]

Training Loss: 1.2920173406600952


 27%|██▋       | 337/1250 [01:50<05:11,  2.93it/s]

Training Loss: 1.1541322469711304


 27%|██▋       | 338/1250 [01:50<05:10,  2.94it/s]

Training Loss: 1.5398437976837158


 27%|██▋       | 339/1250 [01:50<05:09,  2.94it/s]

Training Loss: 1.6914429664611816


 27%|██▋       | 340/1250 [01:51<05:10,  2.93it/s]

Training Loss: 1.7928593158721924


 27%|██▋       | 341/1250 [01:51<05:09,  2.93it/s]

Training Loss: 1.3681366443634033


 27%|██▋       | 342/1250 [01:51<05:10,  2.93it/s]

Training Loss: 1.0181524753570557


 27%|██▋       | 343/1250 [01:52<05:09,  2.93it/s]

Training Loss: 1.467092514038086


 28%|██▊       | 344/1250 [01:52<05:09,  2.93it/s]

Training Loss: 1.1686365604400635


 28%|██▊       | 345/1250 [01:52<05:09,  2.93it/s]

Training Loss: 1.1310659646987915


 28%|██▊       | 346/1250 [01:53<05:09,  2.92it/s]

Training Loss: 1.5071330070495605


 28%|██▊       | 347/1250 [01:53<05:09,  2.92it/s]

Training Loss: 1.338486909866333


 28%|██▊       | 348/1250 [01:53<05:09,  2.92it/s]

Training Loss: 1.5005548000335693


 28%|██▊       | 349/1250 [01:54<05:09,  2.91it/s]

Training Loss: 1.13346266746521


 28%|██▊       | 350/1250 [01:54<05:08,  2.91it/s]

Training Loss: 1.6291863918304443


 28%|██▊       | 351/1250 [01:54<05:10,  2.89it/s]

Training Loss: 1.677851915359497


 28%|██▊       | 352/1250 [01:55<05:11,  2.89it/s]

Training Loss: 1.3222503662109375


 28%|██▊       | 353/1250 [01:55<05:14,  2.86it/s]

Training Loss: 1.1136376857757568


 28%|██▊       | 354/1250 [01:56<05:14,  2.85it/s]

Training Loss: 1.3926069736480713


 28%|██▊       | 355/1250 [01:56<05:14,  2.85it/s]

Training Loss: 1.4597227573394775


 28%|██▊       | 356/1250 [01:56<05:13,  2.86it/s]

Training Loss: 1.1383501291275024


 29%|██▊       | 357/1250 [01:57<05:10,  2.88it/s]

Training Loss: 1.4302276372909546


 29%|██▊       | 358/1250 [01:57<05:09,  2.88it/s]

Training Loss: 1.489773154258728


 29%|██▊       | 359/1250 [01:57<05:07,  2.89it/s]

Training Loss: 1.534149169921875


 29%|██▉       | 360/1250 [01:58<05:07,  2.90it/s]

Training Loss: 1.2923307418823242


 29%|██▉       | 361/1250 [01:58<05:05,  2.91it/s]

Training Loss: 1.2829667329788208


 29%|██▉       | 362/1250 [01:58<05:04,  2.91it/s]

Training Loss: 1.5631237030029297


 29%|██▉       | 363/1250 [01:59<05:03,  2.93it/s]

Training Loss: 1.2907859086990356


 29%|██▉       | 364/1250 [01:59<05:02,  2.93it/s]

Training Loss: 1.7346173524856567


 29%|██▉       | 365/1250 [01:59<05:03,  2.92it/s]

Training Loss: 1.100643277168274


 29%|██▉       | 366/1250 [02:00<05:02,  2.92it/s]

Training Loss: 1.1217610836029053


 29%|██▉       | 367/1250 [02:00<05:01,  2.93it/s]

Training Loss: 1.4182448387145996


 29%|██▉       | 368/1250 [02:00<05:01,  2.93it/s]

Training Loss: 1.559200644493103


 30%|██▉       | 369/1250 [02:01<05:02,  2.91it/s]

Training Loss: 0.9398671388626099


 30%|██▉       | 370/1250 [02:01<05:01,  2.92it/s]

Training Loss: 1.540972352027893


 30%|██▉       | 371/1250 [02:01<05:00,  2.93it/s]

Training Loss: 1.2215032577514648


 30%|██▉       | 372/1250 [02:02<04:59,  2.93it/s]

Training Loss: 1.0288972854614258


 30%|██▉       | 373/1250 [02:02<05:00,  2.92it/s]

Training Loss: 1.4185810089111328


 30%|██▉       | 374/1250 [02:02<04:59,  2.92it/s]

Training Loss: 1.6183464527130127


 30%|███       | 375/1250 [02:03<05:00,  2.92it/s]

Training Loss: 1.2283172607421875


 30%|███       | 376/1250 [02:03<04:59,  2.92it/s]

Training Loss: 1.5160183906555176


 30%|███       | 377/1250 [02:03<04:59,  2.92it/s]

Training Loss: 1.3498579263687134


 30%|███       | 378/1250 [02:04<04:59,  2.91it/s]

Training Loss: 1.7671000957489014


 30%|███       | 379/1250 [02:04<05:00,  2.90it/s]

Training Loss: 1.3357638120651245


 30%|███       | 380/1250 [02:04<04:58,  2.91it/s]

Training Loss: 1.8826806545257568


 30%|███       | 381/1250 [02:05<04:59,  2.90it/s]

Training Loss: 1.2289328575134277


 31%|███       | 382/1250 [02:05<04:59,  2.90it/s]

Training Loss: 1.0951659679412842


 31%|███       | 383/1250 [02:05<04:59,  2.89it/s]

Training Loss: 1.3303184509277344


 31%|███       | 384/1250 [02:06<04:58,  2.90it/s]

Training Loss: 1.5740243196487427


 31%|███       | 385/1250 [02:06<04:57,  2.90it/s]

Training Loss: 1.3776389360427856


 31%|███       | 386/1250 [02:07<04:58,  2.90it/s]

Training Loss: 1.3098574876785278


 31%|███       | 387/1250 [02:07<05:00,  2.87it/s]

Training Loss: 1.3733762502670288


 31%|███       | 388/1250 [02:07<04:57,  2.89it/s]

Training Loss: 1.2631253004074097


 31%|███       | 389/1250 [02:08<05:02,  2.85it/s]

Training Loss: 1.3944871425628662


 31%|███       | 390/1250 [02:08<05:01,  2.85it/s]

Training Loss: 1.6583679914474487


 31%|███▏      | 391/1250 [02:08<05:01,  2.85it/s]

Training Loss: 1.339733362197876


 31%|███▏      | 392/1250 [02:09<04:59,  2.87it/s]

Training Loss: 1.382798671722412


 31%|███▏      | 393/1250 [02:09<04:57,  2.88it/s]

Training Loss: 1.0693875551223755


 32%|███▏      | 394/1250 [02:09<04:54,  2.91it/s]

Training Loss: 1.3550187349319458


 32%|███▏      | 395/1250 [02:10<04:54,  2.90it/s]

Training Loss: 1.4950604438781738


 32%|███▏      | 396/1250 [02:10<04:54,  2.90it/s]

Training Loss: 1.4783225059509277


 32%|███▏      | 397/1250 [02:10<04:52,  2.92it/s]

Training Loss: 1.3860396146774292


 32%|███▏      | 398/1250 [02:11<04:50,  2.94it/s]

Training Loss: 0.9808028340339661


 32%|███▏      | 399/1250 [02:11<04:50,  2.93it/s]

Training Loss: 1.680485486984253


 32%|███▏      | 400/1250 [02:11<04:50,  2.93it/s]

Training Loss: 1.3505440950393677


 32%|███▏      | 401/1250 [02:12<04:49,  2.93it/s]

Training Loss: 1.505273461341858


 32%|███▏      | 402/1250 [02:12<04:47,  2.95it/s]

Training Loss: 1.4501579999923706


 32%|███▏      | 403/1250 [02:12<04:47,  2.95it/s]

Training Loss: 1.5301584005355835


 32%|███▏      | 404/1250 [02:13<04:47,  2.94it/s]

Training Loss: 1.3341217041015625


 32%|███▏      | 405/1250 [02:13<04:47,  2.94it/s]

Training Loss: 1.1786233186721802


 32%|███▏      | 406/1250 [02:13<04:46,  2.94it/s]

Training Loss: 1.255989909172058


 33%|███▎      | 407/1250 [02:14<04:47,  2.93it/s]

Training Loss: 1.402673602104187


 33%|███▎      | 408/1250 [02:14<04:45,  2.95it/s]

Training Loss: 1.4863653182983398


 33%|███▎      | 409/1250 [02:14<04:47,  2.93it/s]

Training Loss: 1.5215648412704468


 33%|███▎      | 410/1250 [02:15<04:46,  2.93it/s]

Training Loss: 1.2927581071853638


 33%|███▎      | 411/1250 [02:15<04:47,  2.92it/s]

Training Loss: 1.2266929149627686


 33%|███▎      | 412/1250 [02:15<04:46,  2.92it/s]

Training Loss: 1.3369964361190796


 33%|███▎      | 413/1250 [02:16<04:46,  2.92it/s]

Training Loss: 1.6726880073547363


 33%|███▎      | 414/1250 [02:16<04:45,  2.93it/s]

Training Loss: 1.5978658199310303


 33%|███▎      | 415/1250 [02:16<04:44,  2.93it/s]

Training Loss: 1.2629458904266357


 33%|███▎      | 416/1250 [02:17<04:44,  2.93it/s]

Training Loss: 1.4811395406723022


 33%|███▎      | 417/1250 [02:17<04:44,  2.93it/s]

Training Loss: 1.5376492738723755


 33%|███▎      | 418/1250 [02:17<04:42,  2.95it/s]

Training Loss: 1.709107756614685


 34%|███▎      | 419/1250 [02:18<04:42,  2.94it/s]

Training Loss: 1.5284767150878906


 34%|███▎      | 420/1250 [02:18<04:41,  2.95it/s]

Training Loss: 1.255265474319458


 34%|███▎      | 421/1250 [02:19<04:41,  2.95it/s]

Training Loss: 1.2025330066680908


 34%|███▍      | 422/1250 [02:19<04:43,  2.92it/s]

Training Loss: 1.5064051151275635


 34%|███▍      | 423/1250 [02:19<04:45,  2.89it/s]

Training Loss: 1.2958862781524658


 34%|███▍      | 424/1250 [02:20<04:44,  2.90it/s]

Training Loss: 1.155190110206604


 34%|███▍      | 425/1250 [02:20<04:45,  2.89it/s]

Training Loss: 1.15151846408844


 34%|███▍      | 426/1250 [02:20<04:46,  2.88it/s]

Training Loss: 0.9692661166191101


 34%|███▍      | 427/1250 [02:21<04:44,  2.89it/s]

Training Loss: 1.1750361919403076


 34%|███▍      | 428/1250 [02:21<04:42,  2.91it/s]

Training Loss: 1.6546387672424316


 34%|███▍      | 429/1250 [02:21<04:39,  2.93it/s]

Training Loss: 1.4067810773849487


 34%|███▍      | 430/1250 [02:22<04:37,  2.95it/s]

Training Loss: 1.6365572214126587


 34%|███▍      | 431/1250 [02:22<04:37,  2.95it/s]

Training Loss: 1.4202513694763184


 35%|███▍      | 432/1250 [02:22<04:36,  2.96it/s]

Training Loss: 1.3789907693862915


 35%|███▍      | 433/1250 [02:23<04:35,  2.97it/s]

Training Loss: 1.1475105285644531


 35%|███▍      | 434/1250 [02:23<04:35,  2.96it/s]

Training Loss: 1.2534115314483643


 35%|███▍      | 435/1250 [02:23<04:35,  2.96it/s]

Training Loss: 1.4944720268249512


 35%|███▍      | 436/1250 [02:24<04:35,  2.95it/s]

Training Loss: 1.113552451133728


 35%|███▍      | 437/1250 [02:24<04:35,  2.95it/s]

Training Loss: 1.237625241279602


 35%|███▌      | 438/1250 [02:24<04:35,  2.95it/s]

Training Loss: 1.4140822887420654


 35%|███▌      | 439/1250 [02:25<04:34,  2.96it/s]

Training Loss: 1.2159537076950073


 35%|███▌      | 440/1250 [02:25<04:34,  2.95it/s]

Training Loss: 1.3869085311889648


 35%|███▌      | 441/1250 [02:25<04:33,  2.96it/s]

Training Loss: 1.2083996534347534


 35%|███▌      | 442/1250 [02:26<04:31,  2.98it/s]

Training Loss: 1.3159937858581543


 35%|███▌      | 443/1250 [02:26<04:30,  2.98it/s]

Training Loss: 1.2598499059677124


 36%|███▌      | 444/1250 [02:26<04:31,  2.97it/s]

Training Loss: 1.4753659963607788


 36%|███▌      | 445/1250 [02:27<04:31,  2.97it/s]

Training Loss: 1.4123753309249878


 36%|███▌      | 446/1250 [02:27<04:31,  2.96it/s]

Training Loss: 1.492209792137146


 36%|███▌      | 447/1250 [02:27<04:31,  2.96it/s]

Training Loss: 1.4928518533706665


 36%|███▌      | 448/1250 [02:28<04:30,  2.96it/s]

Training Loss: 1.349234700202942


 36%|███▌      | 449/1250 [02:28<04:31,  2.96it/s]

Training Loss: 1.6157835721969604


 36%|███▌      | 450/1250 [02:28<04:28,  2.98it/s]

Training Loss: 1.4731240272521973


 36%|███▌      | 451/1250 [02:29<04:29,  2.97it/s]

Training Loss: 1.3231390714645386


 36%|███▌      | 452/1250 [02:29<04:28,  2.97it/s]

Training Loss: 1.190682291984558


 36%|███▌      | 453/1250 [02:29<04:29,  2.96it/s]

Training Loss: 1.3833751678466797


 36%|███▋      | 454/1250 [02:30<04:27,  2.97it/s]

Training Loss: 1.1389671564102173


 36%|███▋      | 455/1250 [02:30<04:28,  2.96it/s]

Training Loss: 1.4132243394851685


 36%|███▋      | 456/1250 [02:30<04:27,  2.97it/s]

Training Loss: 1.4475414752960205


 37%|███▋      | 457/1250 [02:31<04:28,  2.96it/s]

Training Loss: 1.458741545677185


 37%|███▋      | 458/1250 [02:31<04:31,  2.92it/s]

Training Loss: 1.127978801727295


 37%|███▋      | 459/1250 [02:31<04:29,  2.94it/s]

Training Loss: 1.2981842756271362


 37%|███▋      | 460/1250 [02:32<04:29,  2.93it/s]

Training Loss: 1.4341251850128174


 37%|███▋      | 461/1250 [02:32<04:30,  2.92it/s]

Training Loss: 1.3307627439498901


 37%|███▋      | 462/1250 [02:32<04:29,  2.92it/s]

Training Loss: 1.3072980642318726


 37%|███▋      | 463/1250 [02:33<04:31,  2.90it/s]

Training Loss: 1.352942943572998


 37%|███▋      | 464/1250 [02:33<04:27,  2.93it/s]

Training Loss: 1.2580469846725464


 37%|███▋      | 465/1250 [02:33<04:27,  2.94it/s]

Training Loss: 1.41985023021698


 37%|███▋      | 466/1250 [02:34<04:24,  2.97it/s]

Training Loss: 1.4879875183105469


 37%|███▋      | 467/1250 [02:34<04:25,  2.95it/s]

Training Loss: 1.5031235218048096


 37%|███▋      | 468/1250 [02:34<04:23,  2.97it/s]

Training Loss: 0.9963787794113159


 38%|███▊      | 469/1250 [02:35<04:22,  2.97it/s]

Training Loss: 1.2999314069747925


 38%|███▊      | 470/1250 [02:35<04:24,  2.95it/s]

Training Loss: 1.5636121034622192


 38%|███▊      | 471/1250 [02:35<04:22,  2.97it/s]

Training Loss: 1.2987889051437378


 38%|███▊      | 472/1250 [02:36<04:22,  2.97it/s]

Training Loss: 1.3734532594680786


 38%|███▊      | 473/1250 [02:36<04:20,  2.98it/s]

Training Loss: 1.250468373298645


 38%|███▊      | 474/1250 [02:36<04:20,  2.98it/s]

Training Loss: 1.3482064008712769


 38%|███▊      | 475/1250 [02:37<04:20,  2.97it/s]

Training Loss: 1.2713865041732788


 38%|███▊      | 476/1250 [02:37<04:19,  2.98it/s]

Training Loss: 1.5291674137115479


 38%|███▊      | 477/1250 [02:37<04:19,  2.98it/s]

Training Loss: 1.1736747026443481


 38%|███▊      | 478/1250 [02:38<04:18,  2.98it/s]

Training Loss: 1.2713956832885742


 38%|███▊      | 479/1250 [02:38<04:18,  2.98it/s]

Training Loss: 1.36365807056427


 38%|███▊      | 480/1250 [02:38<04:17,  2.99it/s]

Training Loss: 1.389976143836975


 38%|███▊      | 481/1250 [02:39<04:18,  2.98it/s]

Training Loss: 1.3742724657058716


 39%|███▊      | 482/1250 [02:39<04:19,  2.96it/s]

Training Loss: 1.264846920967102


 39%|███▊      | 483/1250 [02:40<04:19,  2.96it/s]

Training Loss: 1.0859606266021729


 39%|███▊      | 484/1250 [02:40<04:18,  2.97it/s]

Training Loss: 1.1449180841445923


 39%|███▉      | 485/1250 [02:40<04:17,  2.97it/s]

Training Loss: 1.1451330184936523


 39%|███▉      | 486/1250 [02:41<04:17,  2.97it/s]

Training Loss: 1.6177725791931152


 39%|███▉      | 487/1250 [02:41<04:15,  2.98it/s]

Training Loss: 1.207184910774231


 39%|███▉      | 488/1250 [02:41<04:15,  2.99it/s]

Training Loss: 1.3125898838043213


 39%|███▉      | 489/1250 [02:42<04:14,  2.99it/s]

Training Loss: 1.339084506034851


 39%|███▉      | 490/1250 [02:42<04:13,  2.99it/s]

Training Loss: 1.5505565404891968


 39%|███▉      | 491/1250 [02:42<04:14,  2.98it/s]

Training Loss: 1.3289114236831665


 39%|███▉      | 492/1250 [02:43<04:15,  2.97it/s]

Training Loss: 1.5460753440856934


 39%|███▉      | 493/1250 [02:43<04:15,  2.96it/s]

Training Loss: 1.4371223449707031


 40%|███▉      | 494/1250 [02:43<04:17,  2.94it/s]

Training Loss: 1.506343960762024


 40%|███▉      | 495/1250 [02:44<04:18,  2.92it/s]

Training Loss: 1.1848844289779663


 40%|███▉      | 496/1250 [02:44<04:19,  2.91it/s]

Training Loss: 1.9142119884490967


 40%|███▉      | 497/1250 [02:44<04:19,  2.91it/s]

Training Loss: 1.2656848430633545


 40%|███▉      | 498/1250 [02:45<04:16,  2.93it/s]

Training Loss: 1.4757851362228394


 40%|███▉      | 499/1250 [02:45<04:16,  2.93it/s]

Training Loss: 1.5450423955917358


 40%|████      | 500/1250 [02:45<04:14,  2.94it/s]

Training Loss: 1.615387201309204


 40%|████      | 501/1250 [02:46<04:13,  2.96it/s]

Training Loss: 1.8288828134536743


 40%|████      | 502/1250 [02:46<04:12,  2.96it/s]

Training Loss: 1.4765704870224


 40%|████      | 503/1250 [02:46<04:11,  2.98it/s]

Training Loss: 1.5064949989318848


 40%|████      | 504/1250 [02:47<04:10,  2.98it/s]

Training Loss: 1.659300684928894


 40%|████      | 505/1250 [02:47<04:10,  2.97it/s]

Training Loss: 1.169251799583435


 40%|████      | 506/1250 [02:47<04:09,  2.98it/s]

Training Loss: 1.2211881875991821


 41%|████      | 507/1250 [02:48<04:08,  2.98it/s]

Training Loss: 1.1707990169525146


 41%|████      | 508/1250 [02:48<04:09,  2.98it/s]

Training Loss: 1.3638265132904053


 41%|████      | 509/1250 [02:48<04:09,  2.97it/s]

Training Loss: 1.434583306312561


 41%|████      | 510/1250 [02:49<04:08,  2.98it/s]

Training Loss: 1.449022650718689


 41%|████      | 511/1250 [02:49<04:08,  2.98it/s]

Training Loss: 1.2306939363479614


 41%|████      | 512/1250 [02:49<04:09,  2.96it/s]

Training Loss: 1.45150625705719


 41%|████      | 513/1250 [02:50<04:08,  2.96it/s]

Training Loss: 1.3001140356063843


 41%|████      | 514/1250 [02:50<04:08,  2.97it/s]

Training Loss: 1.5077314376831055


 41%|████      | 515/1250 [02:50<04:08,  2.96it/s]

Training Loss: 1.3405969142913818


 41%|████▏     | 516/1250 [02:51<04:07,  2.96it/s]

Training Loss: 1.6018491983413696


 41%|████▏     | 517/1250 [02:51<04:07,  2.97it/s]

Training Loss: 1.4310294389724731


 41%|████▏     | 518/1250 [02:51<04:08,  2.95it/s]

Training Loss: 1.522926926612854


 42%|████▏     | 519/1250 [02:52<04:07,  2.95it/s]

Training Loss: 1.3406342267990112


 42%|████▏     | 520/1250 [02:52<04:06,  2.96it/s]

Training Loss: 1.3965959548950195


 42%|████▏     | 521/1250 [02:52<04:04,  2.98it/s]

Training Loss: 1.1455152034759521


 42%|████▏     | 522/1250 [02:53<04:04,  2.98it/s]

Training Loss: 1.0851850509643555


 42%|████▏     | 523/1250 [02:53<04:05,  2.97it/s]

Training Loss: 1.2087712287902832


 42%|████▏     | 524/1250 [02:53<04:03,  2.98it/s]

Training Loss: 1.4145292043685913


 42%|████▏     | 525/1250 [02:54<04:03,  2.98it/s]

Training Loss: 1.3737800121307373


 42%|████▏     | 526/1250 [02:54<04:04,  2.97it/s]

Training Loss: 1.1422839164733887


 42%|████▏     | 527/1250 [02:54<04:03,  2.97it/s]

Training Loss: 1.5527927875518799


 42%|████▏     | 528/1250 [02:55<04:03,  2.97it/s]

Training Loss: 1.261257290840149


 42%|████▏     | 529/1250 [02:55<04:03,  2.96it/s]

Training Loss: 1.6004191637039185


 42%|████▏     | 530/1250 [02:55<04:05,  2.93it/s]

Training Loss: 1.1367160081863403


 42%|████▏     | 531/1250 [02:56<04:07,  2.91it/s]

Training Loss: 1.5522547960281372


 43%|████▎     | 532/1250 [02:56<04:06,  2.91it/s]

Training Loss: 1.3184561729431152


 43%|████▎     | 533/1250 [02:56<04:04,  2.93it/s]

Training Loss: 1.1404345035552979


 43%|████▎     | 534/1250 [02:57<04:04,  2.93it/s]

Training Loss: 1.1436432600021362


 43%|████▎     | 535/1250 [02:57<04:04,  2.92it/s]

Training Loss: 1.4224692583084106


 43%|████▎     | 536/1250 [02:57<04:03,  2.94it/s]

Training Loss: 1.5805584192276


 43%|████▎     | 537/1250 [02:58<04:02,  2.94it/s]

Training Loss: 1.2263364791870117


 43%|████▎     | 538/1250 [02:58<04:00,  2.96it/s]

Training Loss: 1.3912696838378906


 43%|████▎     | 539/1250 [02:58<03:59,  2.96it/s]

Training Loss: 1.5122548341751099


 43%|████▎     | 540/1250 [02:59<03:59,  2.96it/s]

Training Loss: 1.448560357093811


 43%|████▎     | 541/1250 [02:59<03:59,  2.96it/s]

Training Loss: 1.0381776094436646


 43%|████▎     | 542/1250 [02:59<03:59,  2.96it/s]

Training Loss: 1.3881471157073975


 43%|████▎     | 543/1250 [03:00<03:58,  2.96it/s]

Training Loss: 0.8935441970825195


 44%|████▎     | 544/1250 [03:00<03:57,  2.97it/s]

Training Loss: 1.6508469581604004


 44%|████▎     | 545/1250 [03:00<03:58,  2.96it/s]

Training Loss: 1.2250903844833374


 44%|████▎     | 546/1250 [03:01<03:57,  2.97it/s]

Training Loss: 1.1748257875442505


 44%|████▍     | 547/1250 [03:01<03:56,  2.97it/s]

Training Loss: 1.0384294986724854


 44%|████▍     | 548/1250 [03:01<03:55,  2.97it/s]

Training Loss: 1.1562355756759644


 44%|████▍     | 549/1250 [03:02<03:54,  2.98it/s]

Training Loss: 1.3681776523590088


 44%|████▍     | 550/1250 [03:02<03:54,  2.98it/s]

Training Loss: 1.3267645835876465


 44%|████▍     | 551/1250 [03:02<03:55,  2.97it/s]

Training Loss: 1.6892876625061035


 44%|████▍     | 552/1250 [03:03<03:55,  2.97it/s]

Training Loss: 1.4989346265792847


 44%|████▍     | 553/1250 [03:03<03:53,  2.98it/s]

Training Loss: 1.1604976654052734


 44%|████▍     | 554/1250 [03:03<03:53,  2.98it/s]

Training Loss: 1.2872408628463745


 44%|████▍     | 555/1250 [03:04<03:52,  2.99it/s]

Training Loss: 1.1736538410186768


 44%|████▍     | 556/1250 [03:04<03:53,  2.98it/s]

Training Loss: 1.6617441177368164


 45%|████▍     | 557/1250 [03:04<03:52,  2.98it/s]

Training Loss: 1.5290898084640503


 45%|████▍     | 558/1250 [03:05<03:52,  2.98it/s]

Training Loss: 1.5804064273834229


 45%|████▍     | 559/1250 [03:05<03:53,  2.96it/s]

Training Loss: 1.443381667137146


 45%|████▍     | 560/1250 [03:06<03:53,  2.95it/s]

Training Loss: 1.5193252563476562


 45%|████▍     | 561/1250 [03:06<03:53,  2.95it/s]

Training Loss: 1.1695947647094727


 45%|████▍     | 562/1250 [03:06<03:51,  2.97it/s]

Training Loss: 1.5890787839889526


 45%|████▌     | 563/1250 [03:07<03:52,  2.95it/s]

Training Loss: 1.5171337127685547


 45%|████▌     | 564/1250 [03:07<03:51,  2.96it/s]

Training Loss: 1.7697999477386475


 45%|████▌     | 565/1250 [03:07<03:52,  2.95it/s]

Training Loss: 1.2618680000305176


 45%|████▌     | 566/1250 [03:08<03:53,  2.92it/s]

Training Loss: 1.3493003845214844


 45%|████▌     | 567/1250 [03:08<03:55,  2.90it/s]

Training Loss: 1.4860478639602661


 45%|████▌     | 568/1250 [03:08<03:54,  2.91it/s]

Training Loss: 1.1676331758499146


 46%|████▌     | 569/1250 [03:09<03:54,  2.90it/s]

Training Loss: 1.4064805507659912


 46%|████▌     | 570/1250 [03:09<03:54,  2.90it/s]

Training Loss: 1.5007623434066772


 46%|████▌     | 571/1250 [03:09<03:54,  2.90it/s]

Training Loss: 1.2274090051651


 46%|████▌     | 572/1250 [03:10<03:52,  2.91it/s]

Training Loss: 1.1920626163482666


 46%|████▌     | 573/1250 [03:10<03:52,  2.92it/s]

Training Loss: 1.1596496105194092


 46%|████▌     | 574/1250 [03:10<03:50,  2.93it/s]

Training Loss: 1.4279664754867554


 46%|████▌     | 575/1250 [03:11<03:48,  2.95it/s]

Training Loss: 1.2344812154769897


 46%|████▌     | 576/1250 [03:11<03:48,  2.95it/s]

Training Loss: 1.4742951393127441


 46%|████▌     | 577/1250 [03:11<03:49,  2.94it/s]

Training Loss: 1.3426437377929688


 46%|████▌     | 578/1250 [03:12<03:47,  2.95it/s]

Training Loss: 1.5849559307098389


 46%|████▋     | 579/1250 [03:12<03:49,  2.92it/s]

Training Loss: 1.0988214015960693


 46%|████▋     | 580/1250 [03:12<03:48,  2.93it/s]

Training Loss: 1.236633539199829


 46%|████▋     | 581/1250 [03:13<03:47,  2.93it/s]

Training Loss: 1.0356054306030273


 47%|████▋     | 582/1250 [03:13<03:47,  2.93it/s]

Training Loss: 1.615141749382019


 47%|████▋     | 583/1250 [03:13<03:47,  2.93it/s]

Training Loss: 1.3290685415267944


 47%|████▋     | 584/1250 [03:14<03:46,  2.94it/s]

Training Loss: 1.3043357133865356


 47%|████▋     | 585/1250 [03:14<03:45,  2.95it/s]

Training Loss: 1.2032272815704346


 47%|████▋     | 586/1250 [03:14<03:45,  2.94it/s]

Training Loss: 1.4525281190872192


 47%|████▋     | 587/1250 [03:15<03:45,  2.94it/s]

Training Loss: 1.2606490850448608


 47%|████▋     | 588/1250 [03:15<03:45,  2.94it/s]

Training Loss: 1.584515929222107


 47%|████▋     | 589/1250 [03:15<03:44,  2.94it/s]

Training Loss: 1.4633758068084717


 47%|████▋     | 590/1250 [03:16<03:43,  2.96it/s]

Training Loss: 1.2000017166137695


 47%|████▋     | 591/1250 [03:16<03:42,  2.96it/s]

Training Loss: 1.1940759420394897


 47%|████▋     | 592/1250 [03:16<03:42,  2.95it/s]

Training Loss: 1.226290225982666


 47%|████▋     | 593/1250 [03:17<03:42,  2.95it/s]

Training Loss: 1.1150364875793457


 48%|████▊     | 594/1250 [03:17<03:43,  2.94it/s]

Training Loss: 1.42983877658844


 48%|████▊     | 595/1250 [03:17<03:42,  2.94it/s]

Training Loss: 1.2641663551330566


 48%|████▊     | 596/1250 [03:18<03:42,  2.94it/s]

Training Loss: 1.1514089107513428


 48%|████▊     | 597/1250 [03:18<03:42,  2.93it/s]

Training Loss: 1.7114330530166626


 48%|████▊     | 598/1250 [03:18<03:42,  2.94it/s]

Training Loss: 1.232789158821106


 48%|████▊     | 599/1250 [03:19<03:42,  2.92it/s]

Training Loss: 1.0669090747833252


 48%|████▊     | 600/1250 [03:19<03:42,  2.92it/s]

Training Loss: 1.0082300901412964


 48%|████▊     | 601/1250 [03:19<03:42,  2.91it/s]

Training Loss: 1.2406091690063477


 48%|████▊     | 602/1250 [03:20<03:42,  2.92it/s]

Training Loss: 1.1703240871429443


 48%|████▊     | 603/1250 [03:20<03:40,  2.93it/s]

Training Loss: 1.0941399335861206


 48%|████▊     | 604/1250 [03:21<03:40,  2.93it/s]

Training Loss: 1.3382617235183716


 48%|████▊     | 605/1250 [03:21<03:41,  2.91it/s]

Training Loss: 1.2774746417999268


 48%|████▊     | 606/1250 [03:21<03:42,  2.90it/s]

Training Loss: 0.9875978231430054


 49%|████▊     | 607/1250 [03:22<03:41,  2.91it/s]

Training Loss: 1.3262273073196411


 49%|████▊     | 608/1250 [03:22<03:41,  2.90it/s]

Training Loss: 1.5651017427444458


 49%|████▊     | 609/1250 [03:22<03:39,  2.92it/s]

Training Loss: 1.3154685497283936


 49%|████▉     | 610/1250 [03:23<03:39,  2.92it/s]

Training Loss: 1.3588677644729614


 49%|████▉     | 611/1250 [03:23<03:38,  2.93it/s]

Training Loss: 1.6145364046096802


 49%|████▉     | 612/1250 [03:23<03:37,  2.93it/s]

Training Loss: 1.34572434425354


 49%|████▉     | 613/1250 [03:24<03:37,  2.93it/s]

Training Loss: 1.4438526630401611


 49%|████▉     | 614/1250 [03:24<03:37,  2.92it/s]

Training Loss: 1.6363006830215454


 49%|████▉     | 615/1250 [03:24<03:36,  2.93it/s]

Training Loss: 1.7714715003967285


 49%|████▉     | 616/1250 [03:25<03:36,  2.93it/s]

Training Loss: 1.1854097843170166


 49%|████▉     | 617/1250 [03:25<03:36,  2.92it/s]

Training Loss: 1.6038588285446167


 49%|████▉     | 618/1250 [03:25<03:35,  2.93it/s]

Training Loss: 1.717522144317627


 50%|████▉     | 619/1250 [03:26<03:35,  2.92it/s]

Training Loss: 1.2988831996917725


 50%|████▉     | 620/1250 [03:26<03:35,  2.93it/s]

Training Loss: 1.397355556488037


 50%|████▉     | 621/1250 [03:26<03:34,  2.93it/s]

Training Loss: 1.2481470108032227


 50%|████▉     | 622/1250 [03:27<03:34,  2.93it/s]

Training Loss: 1.0877931118011475


 50%|████▉     | 623/1250 [03:27<03:33,  2.93it/s]

Training Loss: 1.3670068979263306


 50%|████▉     | 624/1250 [03:27<03:34,  2.92it/s]

Training Loss: 1.4950112104415894


 50%|█████     | 625/1250 [03:28<03:34,  2.92it/s]

Training Loss: 1.2841085195541382


 50%|█████     | 626/1250 [03:28<03:33,  2.92it/s]

Training Loss: 1.0630316734313965


 50%|█████     | 627/1250 [03:28<03:32,  2.93it/s]

Training Loss: 1.4935991764068604


 50%|█████     | 628/1250 [03:29<03:31,  2.93it/s]

Training Loss: 1.3839339017868042


 50%|█████     | 629/1250 [03:29<03:31,  2.94it/s]

Training Loss: 1.4850189685821533


 50%|█████     | 630/1250 [03:29<03:31,  2.93it/s]

Training Loss: 1.333345651626587


 50%|█████     | 631/1250 [03:30<03:31,  2.92it/s]

Training Loss: 1.244296908378601


 51%|█████     | 632/1250 [03:30<03:30,  2.93it/s]

Training Loss: 1.3765078783035278


 51%|█████     | 633/1250 [03:30<03:31,  2.91it/s]

Training Loss: 1.7240221500396729


 51%|█████     | 634/1250 [03:31<03:29,  2.93it/s]

Training Loss: 1.3753911256790161


 51%|█████     | 635/1250 [03:31<03:30,  2.92it/s]

Training Loss: 1.5420957803726196


 51%|█████     | 636/1250 [03:31<03:31,  2.90it/s]

Training Loss: 1.1859842538833618


 51%|█████     | 637/1250 [03:32<03:31,  2.90it/s]

Training Loss: 1.3984527587890625


 51%|█████     | 638/1250 [03:32<03:32,  2.88it/s]

Training Loss: 1.39764404296875


 51%|█████     | 639/1250 [03:33<03:31,  2.89it/s]

Training Loss: 1.5955090522766113


 51%|█████     | 640/1250 [03:33<03:30,  2.90it/s]

Training Loss: 1.2585283517837524


 51%|█████▏    | 641/1250 [03:33<03:31,  2.88it/s]

Training Loss: 1.3547762632369995


 51%|█████▏    | 642/1250 [03:34<03:30,  2.89it/s]

Training Loss: 1.3121495246887207


 51%|█████▏    | 643/1250 [03:34<03:28,  2.92it/s]

Training Loss: 1.3017208576202393


 52%|█████▏    | 644/1250 [03:34<03:27,  2.92it/s]

Training Loss: 1.5352818965911865


 52%|█████▏    | 645/1250 [03:35<03:26,  2.93it/s]

Training Loss: 1.372089147567749


 52%|█████▏    | 646/1250 [03:35<03:26,  2.93it/s]

Training Loss: 1.6097400188446045


 52%|█████▏    | 647/1250 [03:35<03:25,  2.93it/s]

Training Loss: 1.3775954246520996


 52%|█████▏    | 648/1250 [03:36<03:24,  2.94it/s]

Training Loss: 1.2138088941574097


 52%|█████▏    | 649/1250 [03:36<03:25,  2.92it/s]

Training Loss: 1.1983407735824585


 52%|█████▏    | 650/1250 [03:36<03:24,  2.93it/s]

Training Loss: 1.4452698230743408


 52%|█████▏    | 651/1250 [03:37<03:23,  2.94it/s]

Training Loss: 1.504093885421753


 52%|█████▏    | 652/1250 [03:37<03:23,  2.94it/s]

Training Loss: 0.9530170559883118


 52%|█████▏    | 653/1250 [03:37<03:22,  2.94it/s]

Training Loss: 1.3563921451568604


 52%|█████▏    | 654/1250 [03:38<03:22,  2.94it/s]

Training Loss: 1.4974535703659058


 52%|█████▏    | 655/1250 [03:38<03:22,  2.94it/s]

Training Loss: 1.4239929914474487


 52%|█████▏    | 656/1250 [03:38<03:22,  2.93it/s]

Training Loss: 1.4592121839523315


 53%|█████▎    | 657/1250 [03:39<03:20,  2.95it/s]

Training Loss: 1.191227912902832


 53%|█████▎    | 658/1250 [03:39<03:20,  2.95it/s]

Training Loss: 1.3246097564697266


 53%|█████▎    | 659/1250 [03:39<03:20,  2.94it/s]

Training Loss: 1.3594813346862793


 53%|█████▎    | 660/1250 [03:40<03:20,  2.94it/s]

Training Loss: 1.243390679359436


 53%|█████▎    | 661/1250 [03:40<03:20,  2.94it/s]

Training Loss: 1.6089338064193726


 53%|█████▎    | 662/1250 [03:40<03:18,  2.96it/s]

Training Loss: 1.4473427534103394


 53%|█████▎    | 663/1250 [03:41<03:20,  2.93it/s]

Training Loss: 1.382286787033081


 53%|█████▎    | 664/1250 [03:41<03:19,  2.94it/s]

Training Loss: 1.1267385482788086


 53%|█████▎    | 665/1250 [03:41<03:19,  2.93it/s]

Training Loss: 1.1223994493484497


 53%|█████▎    | 666/1250 [03:42<03:17,  2.95it/s]

Training Loss: 1.7328768968582153


 53%|█████▎    | 667/1250 [03:42<03:18,  2.94it/s]

Training Loss: 1.0852978229522705


 53%|█████▎    | 668/1250 [03:42<03:17,  2.94it/s]

Training Loss: 1.5577822923660278


 54%|█████▎    | 669/1250 [03:43<03:17,  2.94it/s]

Training Loss: 1.8910391330718994


 54%|█████▎    | 670/1250 [03:43<03:17,  2.93it/s]

Training Loss: 1.3584771156311035


 54%|█████▎    | 671/1250 [03:43<03:18,  2.92it/s]

Training Loss: 1.317110538482666


 54%|█████▍    | 672/1250 [03:44<03:17,  2.92it/s]

Training Loss: 1.1599782705307007


 54%|█████▍    | 673/1250 [03:44<03:16,  2.93it/s]

Training Loss: 1.4307317733764648


 54%|█████▍    | 674/1250 [03:44<03:16,  2.94it/s]

Training Loss: 1.243649959564209


 54%|█████▍    | 675/1250 [03:45<03:16,  2.93it/s]

Training Loss: 1.7486863136291504


 54%|█████▍    | 676/1250 [03:45<03:17,  2.90it/s]

Training Loss: 1.0522857904434204


 54%|█████▍    | 677/1250 [03:45<03:16,  2.91it/s]

Training Loss: 1.3352415561676025


 54%|█████▍    | 678/1250 [03:46<03:15,  2.92it/s]

Training Loss: 1.3066067695617676


 54%|█████▍    | 679/1250 [03:46<03:13,  2.95it/s]

Training Loss: 1.405945897102356


 54%|█████▍    | 680/1250 [03:46<03:14,  2.93it/s]

Training Loss: 1.2426133155822754


 54%|█████▍    | 681/1250 [03:47<03:13,  2.95it/s]

Training Loss: 1.459119439125061


 55%|█████▍    | 682/1250 [03:47<03:12,  2.94it/s]

Training Loss: 1.322572946548462


 55%|█████▍    | 683/1250 [03:47<03:12,  2.95it/s]

Training Loss: 1.3015642166137695


 55%|█████▍    | 684/1250 [03:48<03:13,  2.93it/s]

Training Loss: 1.3309664726257324


 55%|█████▍    | 685/1250 [03:48<03:13,  2.91it/s]

Training Loss: 1.152642846107483


 55%|█████▍    | 686/1250 [03:49<03:13,  2.91it/s]

Training Loss: 1.2779971361160278


 55%|█████▍    | 687/1250 [03:49<03:12,  2.92it/s]

Training Loss: 1.5474774837493896


 55%|█████▌    | 688/1250 [03:49<03:12,  2.92it/s]

Training Loss: 1.1216226816177368


 55%|█████▌    | 689/1250 [03:50<03:11,  2.94it/s]

Training Loss: 1.181626558303833


 55%|█████▌    | 690/1250 [03:50<03:10,  2.94it/s]

Training Loss: 1.0832587480545044


 55%|█████▌    | 691/1250 [03:50<03:09,  2.94it/s]

Training Loss: 1.4460481405258179


 55%|█████▌    | 692/1250 [03:51<03:09,  2.94it/s]

Training Loss: 1.250818133354187


 55%|█████▌    | 693/1250 [03:51<03:08,  2.95it/s]

Training Loss: 1.2337514162063599


 56%|█████▌    | 694/1250 [03:51<03:08,  2.95it/s]

Training Loss: 1.560415267944336


 56%|█████▌    | 695/1250 [03:52<03:08,  2.94it/s]

Training Loss: 1.2130612134933472


 56%|█████▌    | 696/1250 [03:52<03:08,  2.94it/s]

Training Loss: 1.2161240577697754


 56%|█████▌    | 697/1250 [03:52<03:08,  2.93it/s]

Training Loss: 1.3635669946670532


 56%|█████▌    | 698/1250 [03:53<03:07,  2.94it/s]

Training Loss: 1.4633333683013916


 56%|█████▌    | 699/1250 [03:53<03:08,  2.92it/s]

Training Loss: 1.3598036766052246


 56%|█████▌    | 700/1250 [03:53<03:07,  2.93it/s]

Training Loss: 1.5482362508773804


 56%|█████▌    | 701/1250 [03:54<03:07,  2.93it/s]

Training Loss: 1.193253755569458


 56%|█████▌    | 702/1250 [03:54<03:06,  2.94it/s]

Training Loss: 1.2059165239334106


 56%|█████▌    | 703/1250 [03:54<03:05,  2.94it/s]

Training Loss: 1.2240291833877563


 56%|█████▋    | 704/1250 [03:55<03:05,  2.95it/s]

Training Loss: 1.1181477308273315


 56%|█████▋    | 705/1250 [03:55<03:04,  2.95it/s]

Training Loss: 1.026980996131897


 56%|█████▋    | 706/1250 [03:55<03:07,  2.91it/s]

Training Loss: 1.0016144514083862


 57%|█████▋    | 707/1250 [03:56<03:06,  2.92it/s]

Training Loss: 1.219215989112854


 57%|█████▋    | 708/1250 [03:56<03:06,  2.91it/s]

Training Loss: 1.3080549240112305


 57%|█████▋    | 709/1250 [03:56<03:07,  2.89it/s]

Training Loss: 1.0929316282272339


 57%|█████▋    | 710/1250 [03:57<03:07,  2.89it/s]

Training Loss: 1.2977628707885742


 57%|█████▋    | 711/1250 [03:57<03:06,  2.89it/s]

Training Loss: 1.496664047241211


 57%|█████▋    | 712/1250 [03:57<03:04,  2.92it/s]

Training Loss: 1.1727862358093262


 57%|█████▋    | 713/1250 [03:58<03:03,  2.93it/s]

Training Loss: 1.2311049699783325


 57%|█████▋    | 714/1250 [03:58<03:02,  2.94it/s]

Training Loss: 1.2470799684524536


 57%|█████▋    | 715/1250 [03:58<03:01,  2.95it/s]

Training Loss: 0.8199684023857117


 57%|█████▋    | 716/1250 [03:59<03:01,  2.95it/s]

Training Loss: 0.9430028200149536


 57%|█████▋    | 717/1250 [03:59<03:00,  2.95it/s]

Training Loss: 1.499309778213501


 57%|█████▋    | 718/1250 [03:59<03:00,  2.95it/s]

Training Loss: 2.1000616550445557


 58%|█████▊    | 719/1250 [04:00<02:59,  2.95it/s]

Training Loss: 1.3094481229782104


 58%|█████▊    | 720/1250 [04:00<03:00,  2.94it/s]

Training Loss: 1.1533809900283813


 58%|█████▊    | 721/1250 [04:00<02:59,  2.95it/s]

Training Loss: 1.426474690437317


 58%|█████▊    | 722/1250 [04:01<02:58,  2.96it/s]

Training Loss: 1.159549593925476


 58%|█████▊    | 723/1250 [04:01<02:58,  2.94it/s]

Training Loss: 1.4840223789215088


 58%|█████▊    | 724/1250 [04:01<02:58,  2.95it/s]

Training Loss: 1.3182741403579712


 58%|█████▊    | 725/1250 [04:02<02:58,  2.95it/s]

Training Loss: 1.4584599733352661


 58%|█████▊    | 726/1250 [04:02<02:57,  2.95it/s]

Training Loss: 1.6157124042510986


 58%|█████▊    | 727/1250 [04:02<02:57,  2.94it/s]

Training Loss: 1.430781364440918


 58%|█████▊    | 728/1250 [04:03<02:56,  2.96it/s]

Training Loss: 1.4864730834960938


 58%|█████▊    | 729/1250 [04:03<02:56,  2.94it/s]

Training Loss: 1.294136881828308


 58%|█████▊    | 730/1250 [04:04<02:56,  2.94it/s]

Training Loss: 1.3294341564178467


 58%|█████▊    | 731/1250 [04:04<02:56,  2.93it/s]

Training Loss: 1.58720862865448


 59%|█████▊    | 732/1250 [04:04<02:56,  2.94it/s]

Training Loss: 1.280031681060791


 59%|█████▊    | 733/1250 [04:05<02:54,  2.96it/s]

Training Loss: 1.2659175395965576


 59%|█████▊    | 734/1250 [04:05<02:54,  2.95it/s]

Training Loss: 1.4870754480361938


 59%|█████▉    | 735/1250 [04:05<02:55,  2.94it/s]

Training Loss: 1.1812570095062256


 59%|█████▉    | 736/1250 [04:06<02:54,  2.95it/s]

Training Loss: 1.2295687198638916


 59%|█████▉    | 737/1250 [04:06<02:53,  2.96it/s]

Training Loss: 1.3258849382400513


 59%|█████▉    | 738/1250 [04:06<02:53,  2.95it/s]

Training Loss: 1.2987384796142578


 59%|█████▉    | 739/1250 [04:07<02:53,  2.94it/s]

Training Loss: 1.1136505603790283


 59%|█████▉    | 740/1250 [04:07<02:53,  2.94it/s]

Training Loss: 1.1583197116851807


 59%|█████▉    | 741/1250 [04:07<02:52,  2.96it/s]

Training Loss: 1.0881633758544922


 59%|█████▉    | 742/1250 [04:08<02:52,  2.94it/s]

Training Loss: 1.4883849620819092


 59%|█████▉    | 743/1250 [04:08<02:51,  2.95it/s]

Training Loss: 1.2834148406982422


 60%|█████▉    | 744/1250 [04:08<02:51,  2.95it/s]

Training Loss: 1.3800441026687622


 60%|█████▉    | 745/1250 [04:09<02:52,  2.93it/s]

Training Loss: 1.2045691013336182


 60%|█████▉    | 746/1250 [04:09<02:52,  2.92it/s]

Training Loss: 1.413897156715393


 60%|█████▉    | 747/1250 [04:09<02:52,  2.92it/s]

Training Loss: 1.2160062789916992


 60%|█████▉    | 748/1250 [04:10<02:52,  2.91it/s]

Training Loss: 1.1890498399734497


 60%|█████▉    | 749/1250 [04:10<02:51,  2.92it/s]

Training Loss: 1.8668224811553955


 60%|██████    | 750/1250 [04:10<02:50,  2.93it/s]

Training Loss: 0.912243127822876


 60%|██████    | 751/1250 [04:11<02:50,  2.92it/s]

Training Loss: 1.427725911140442


 60%|██████    | 752/1250 [04:11<02:50,  2.93it/s]

Training Loss: 1.421733021736145


 60%|██████    | 753/1250 [04:11<02:49,  2.93it/s]

Training Loss: 1.1629449129104614


 60%|██████    | 754/1250 [04:12<02:48,  2.94it/s]

Training Loss: 1.1668305397033691


 60%|██████    | 755/1250 [04:12<02:47,  2.96it/s]

Training Loss: 1.6855937242507935


 60%|██████    | 756/1250 [04:12<02:46,  2.97it/s]

Training Loss: 1.3932092189788818


 61%|██████    | 757/1250 [04:13<02:47,  2.95it/s]

Training Loss: 1.3258047103881836


 61%|██████    | 758/1250 [04:13<02:46,  2.95it/s]

Training Loss: 1.082533836364746


 61%|██████    | 759/1250 [04:13<02:46,  2.94it/s]

Training Loss: 1.3091282844543457


 61%|██████    | 760/1250 [04:14<02:46,  2.95it/s]

Training Loss: 1.6077327728271484


 61%|██████    | 761/1250 [04:14<02:45,  2.95it/s]

Training Loss: 1.280118703842163


 61%|██████    | 762/1250 [04:14<02:45,  2.94it/s]

Training Loss: 1.4705461263656616


 61%|██████    | 763/1250 [04:15<02:45,  2.95it/s]

Training Loss: 1.278768539428711


 61%|██████    | 764/1250 [04:15<02:45,  2.94it/s]

Training Loss: 0.9800715446472168


 61%|██████    | 765/1250 [04:15<02:44,  2.94it/s]

Training Loss: 1.5580028295516968


 61%|██████▏   | 766/1250 [04:16<02:44,  2.94it/s]

Training Loss: 1.0301932096481323


 61%|██████▏   | 767/1250 [04:16<02:44,  2.94it/s]

Training Loss: 1.4254518747329712


 61%|██████▏   | 768/1250 [04:16<02:44,  2.93it/s]

Training Loss: 1.5122727155685425


 62%|██████▏   | 769/1250 [04:17<02:43,  2.94it/s]

Training Loss: 1.2037461996078491


 62%|██████▏   | 770/1250 [04:17<02:43,  2.93it/s]

Training Loss: 1.191272258758545


 62%|██████▏   | 771/1250 [04:17<02:43,  2.94it/s]

Training Loss: 1.2692146301269531


 62%|██████▏   | 772/1250 [04:18<02:42,  2.95it/s]

Training Loss: 1.1369320154190063


 62%|██████▏   | 773/1250 [04:18<02:41,  2.95it/s]

Training Loss: 1.0868995189666748


 62%|██████▏   | 774/1250 [04:18<02:41,  2.94it/s]

Training Loss: 1.1168674230575562


 62%|██████▏   | 775/1250 [04:19<02:42,  2.93it/s]

Training Loss: 1.1060086488723755


 62%|██████▏   | 776/1250 [04:19<02:41,  2.94it/s]

Training Loss: 0.9015490412712097


 62%|██████▏   | 777/1250 [04:19<02:41,  2.93it/s]

Training Loss: 1.2113316059112549


 62%|██████▏   | 778/1250 [04:20<02:42,  2.91it/s]

Training Loss: 1.4227025508880615


 62%|██████▏   | 779/1250 [04:20<02:43,  2.89it/s]

Training Loss: 1.3190317153930664


 62%|██████▏   | 780/1250 [04:21<02:43,  2.87it/s]

Training Loss: 1.1223549842834473


 62%|██████▏   | 781/1250 [04:21<02:41,  2.90it/s]

Training Loss: 1.2242001295089722


 63%|██████▎   | 782/1250 [04:21<02:41,  2.90it/s]

Training Loss: 1.3287345170974731


 63%|██████▎   | 783/1250 [04:22<02:41,  2.89it/s]

Training Loss: 1.3610111474990845


 63%|██████▎   | 784/1250 [04:22<02:40,  2.91it/s]

Training Loss: 1.098536729812622


 63%|██████▎   | 785/1250 [04:22<02:38,  2.93it/s]

Training Loss: 1.4631963968276978


 63%|██████▎   | 786/1250 [04:23<02:38,  2.93it/s]

Training Loss: 1.4915555715560913


 63%|██████▎   | 787/1250 [04:23<02:38,  2.92it/s]

Training Loss: 1.532241702079773


 63%|██████▎   | 788/1250 [04:23<02:38,  2.91it/s]

Training Loss: 1.4098414182662964


 63%|██████▎   | 789/1250 [04:24<02:37,  2.92it/s]

Training Loss: 1.168285608291626


 63%|██████▎   | 790/1250 [04:24<02:37,  2.93it/s]

Training Loss: 1.2484050989151


 63%|██████▎   | 791/1250 [04:24<02:37,  2.92it/s]

Training Loss: 1.1028368473052979


 63%|██████▎   | 792/1250 [04:25<02:36,  2.93it/s]

Training Loss: 1.2695890665054321


 63%|██████▎   | 793/1250 [04:25<02:35,  2.94it/s]

Training Loss: 0.9352725744247437


 64%|██████▎   | 794/1250 [04:25<02:35,  2.94it/s]

Training Loss: 1.2150875329971313


 64%|██████▎   | 795/1250 [04:26<02:35,  2.93it/s]

Training Loss: 1.4278837442398071


 64%|██████▎   | 796/1250 [04:26<02:34,  2.93it/s]

Training Loss: 1.3848116397857666


 64%|██████▍   | 797/1250 [04:26<02:34,  2.93it/s]

Training Loss: 1.1178216934204102


 64%|██████▍   | 798/1250 [04:27<02:34,  2.93it/s]

Training Loss: 1.4049584865570068


 64%|██████▍   | 799/1250 [04:27<02:33,  2.94it/s]

Training Loss: 1.1990325450897217


 64%|██████▍   | 800/1250 [04:27<02:33,  2.93it/s]

Training Loss: 1.2357368469238281


 64%|██████▍   | 801/1250 [04:28<02:33,  2.92it/s]

Training Loss: 1.264880895614624


 64%|██████▍   | 802/1250 [04:28<02:33,  2.91it/s]

Training Loss: 1.4621375799179077


 64%|██████▍   | 803/1250 [04:28<02:33,  2.91it/s]

Training Loss: 1.2653335332870483


 64%|██████▍   | 804/1250 [04:29<02:33,  2.91it/s]

Training Loss: 1.074183464050293


 64%|██████▍   | 805/1250 [04:29<02:33,  2.91it/s]

Training Loss: 1.2470260858535767


 64%|██████▍   | 806/1250 [04:29<02:32,  2.91it/s]

Training Loss: 1.1165846586227417


 65%|██████▍   | 807/1250 [04:30<02:32,  2.91it/s]

Training Loss: 1.158902883529663


 65%|██████▍   | 808/1250 [04:30<02:31,  2.91it/s]

Training Loss: 0.7974680662155151


 65%|██████▍   | 809/1250 [04:30<02:30,  2.92it/s]

Training Loss: 1.2566416263580322


 65%|██████▍   | 810/1250 [04:31<02:30,  2.92it/s]

Training Loss: 1.7190788984298706


 65%|██████▍   | 811/1250 [04:31<02:31,  2.90it/s]

Training Loss: 0.9880226254463196


 65%|██████▍   | 812/1250 [04:32<02:31,  2.88it/s]

Training Loss: 1.1702470779418945


 65%|██████▌   | 813/1250 [04:32<02:32,  2.87it/s]

Training Loss: 1.1416438817977905


 65%|██████▌   | 814/1250 [04:32<02:31,  2.88it/s]

Training Loss: 1.2960489988327026


 65%|██████▌   | 815/1250 [04:33<02:31,  2.87it/s]

Training Loss: 1.4863694906234741


 65%|██████▌   | 816/1250 [04:33<02:30,  2.89it/s]

Training Loss: 1.2747975587844849


 65%|██████▌   | 817/1250 [04:33<02:29,  2.89it/s]

Training Loss: 1.2400518655776978


 65%|██████▌   | 818/1250 [04:34<02:28,  2.90it/s]

Training Loss: 1.4186973571777344


 66%|██████▌   | 819/1250 [04:34<02:27,  2.92it/s]

Training Loss: 1.1183983087539673


 66%|██████▌   | 820/1250 [04:34<02:26,  2.93it/s]

Training Loss: 1.320365071296692


 66%|██████▌   | 821/1250 [04:35<02:26,  2.93it/s]

Training Loss: 1.3755993843078613


 66%|██████▌   | 822/1250 [04:35<02:26,  2.93it/s]

Training Loss: 1.4219422340393066


 66%|██████▌   | 823/1250 [04:35<02:25,  2.93it/s]

Training Loss: 1.2038486003875732


 66%|██████▌   | 824/1250 [04:36<02:24,  2.94it/s]

Training Loss: 0.7862212061882019


 66%|██████▌   | 825/1250 [04:36<02:24,  2.94it/s]

Training Loss: 1.3510385751724243


 66%|██████▌   | 826/1250 [04:36<02:24,  2.93it/s]

Training Loss: 0.7672731280326843


 66%|██████▌   | 827/1250 [04:37<02:24,  2.93it/s]

Training Loss: 1.3512104749679565


 66%|██████▌   | 828/1250 [04:37<02:25,  2.91it/s]

Training Loss: 1.6464828252792358


 66%|██████▋   | 829/1250 [04:37<02:24,  2.91it/s]

Training Loss: 1.315359115600586


 66%|██████▋   | 830/1250 [04:38<02:24,  2.92it/s]

Training Loss: 1.5632318258285522


 66%|██████▋   | 831/1250 [04:38<02:23,  2.92it/s]

Training Loss: 1.090728759765625


 67%|██████▋   | 832/1250 [04:38<02:23,  2.91it/s]

Training Loss: 1.6223803758621216


 67%|██████▋   | 833/1250 [04:39<02:22,  2.92it/s]

Training Loss: 1.3924938440322876


 67%|██████▋   | 834/1250 [04:39<02:22,  2.93it/s]

Training Loss: 1.303486943244934


 67%|██████▋   | 835/1250 [04:39<02:21,  2.94it/s]

Training Loss: 1.1640211343765259


 67%|██████▋   | 836/1250 [04:40<02:21,  2.93it/s]

Training Loss: 1.3446204662322998


 67%|██████▋   | 837/1250 [04:40<02:21,  2.92it/s]

Training Loss: 1.5556682348251343


 67%|██████▋   | 838/1250 [04:40<02:20,  2.92it/s]

Training Loss: 1.2711900472640991


 67%|██████▋   | 839/1250 [04:41<02:20,  2.93it/s]

Training Loss: 1.0221812725067139


 67%|██████▋   | 840/1250 [04:41<02:20,  2.93it/s]

Training Loss: 1.10634446144104


 67%|██████▋   | 841/1250 [04:41<02:19,  2.92it/s]

Training Loss: 1.261846661567688


 67%|██████▋   | 842/1250 [04:42<02:20,  2.91it/s]

Training Loss: 1.0051189661026


 67%|██████▋   | 843/1250 [04:42<02:19,  2.92it/s]

Training Loss: 1.2729324102401733


 68%|██████▊   | 844/1250 [04:42<02:19,  2.92it/s]

Training Loss: 1.4754568338394165


 68%|██████▊   | 845/1250 [04:43<02:18,  2.92it/s]

Training Loss: 1.1527855396270752


 68%|██████▊   | 846/1250 [04:43<02:18,  2.91it/s]

Training Loss: 1.2017602920532227


 68%|██████▊   | 847/1250 [04:44<02:18,  2.92it/s]

Training Loss: 1.1066604852676392


 68%|██████▊   | 848/1250 [04:44<02:18,  2.90it/s]

Training Loss: 1.1467913389205933


 68%|██████▊   | 849/1250 [04:44<02:19,  2.87it/s]

Training Loss: 1.10862398147583


 68%|██████▊   | 850/1250 [04:45<02:18,  2.88it/s]

Training Loss: 1.6560858488082886


 68%|██████▊   | 851/1250 [04:45<02:18,  2.88it/s]

Training Loss: 1.6666009426116943


 68%|██████▊   | 852/1250 [04:45<02:17,  2.90it/s]

Training Loss: 1.1319761276245117


 68%|██████▊   | 853/1250 [04:46<02:15,  2.92it/s]

Training Loss: 1.0652916431427002


 68%|██████▊   | 854/1250 [04:46<02:15,  2.93it/s]

Training Loss: 1.271944522857666


 68%|██████▊   | 855/1250 [04:46<02:15,  2.92it/s]

Training Loss: 0.8506516218185425


 68%|██████▊   | 856/1250 [04:47<02:13,  2.94it/s]

Training Loss: 0.9193835258483887


 69%|██████▊   | 857/1250 [04:47<02:14,  2.93it/s]

Training Loss: 1.1136637926101685


 69%|██████▊   | 858/1250 [04:47<02:14,  2.92it/s]

Training Loss: 1.105064868927002


 69%|██████▊   | 859/1250 [04:48<02:12,  2.94it/s]

Training Loss: 1.283653974533081


 69%|██████▉   | 860/1250 [04:48<02:13,  2.92it/s]

Training Loss: 1.5994789600372314


 69%|██████▉   | 861/1250 [04:48<02:13,  2.91it/s]

Training Loss: 1.1471178531646729


 69%|██████▉   | 862/1250 [04:49<02:12,  2.93it/s]

Training Loss: 1.4358291625976562


 69%|██████▉   | 863/1250 [04:49<02:11,  2.93it/s]

Training Loss: 0.9545333981513977


 69%|██████▉   | 864/1250 [04:49<02:11,  2.94it/s]

Training Loss: 1.1374818086624146


 69%|██████▉   | 865/1250 [04:50<02:11,  2.94it/s]

Training Loss: 1.9839955568313599


 69%|██████▉   | 866/1250 [04:50<02:10,  2.94it/s]

Training Loss: 1.3115864992141724


 69%|██████▉   | 867/1250 [04:50<02:10,  2.93it/s]

Training Loss: 1.63600492477417


 69%|██████▉   | 868/1250 [04:51<02:10,  2.93it/s]

Training Loss: 1.199285626411438


 70%|██████▉   | 869/1250 [04:51<02:10,  2.91it/s]

Training Loss: 1.3847508430480957


 70%|██████▉   | 870/1250 [04:51<02:10,  2.91it/s]

Training Loss: 1.2850987911224365


 70%|██████▉   | 871/1250 [04:52<02:09,  2.92it/s]

Training Loss: 1.2172751426696777


 70%|██████▉   | 872/1250 [04:52<02:10,  2.89it/s]

Training Loss: 1.0037480592727661


 70%|██████▉   | 873/1250 [04:52<02:10,  2.90it/s]

Training Loss: 1.0263566970825195


 70%|██████▉   | 874/1250 [04:53<02:09,  2.91it/s]

Training Loss: 1.2000099420547485


 70%|███████   | 875/1250 [04:53<02:08,  2.91it/s]

Training Loss: 1.2479866743087769


 70%|███████   | 876/1250 [04:53<02:08,  2.91it/s]

Training Loss: 1.1361349821090698


 70%|███████   | 877/1250 [04:54<02:08,  2.90it/s]

Training Loss: 1.2435215711593628


 70%|███████   | 878/1250 [04:54<02:08,  2.90it/s]

Training Loss: 1.4902392625808716


 70%|███████   | 879/1250 [04:54<02:07,  2.90it/s]

Training Loss: 1.354397177696228


 70%|███████   | 880/1250 [04:55<02:07,  2.90it/s]

Training Loss: 1.551421046257019


 70%|███████   | 881/1250 [04:55<02:06,  2.91it/s]

Training Loss: 1.1560269594192505


 71%|███████   | 882/1250 [04:56<02:06,  2.91it/s]

Training Loss: 1.1335060596466064


 71%|███████   | 883/1250 [04:56<02:05,  2.92it/s]

Training Loss: 1.124168038368225


 71%|███████   | 884/1250 [04:56<02:05,  2.92it/s]

Training Loss: 1.2699159383773804


 71%|███████   | 885/1250 [04:57<02:06,  2.89it/s]

Training Loss: 1.4277362823486328


 71%|███████   | 886/1250 [04:57<02:05,  2.91it/s]

Training Loss: 1.3054299354553223


 71%|███████   | 887/1250 [04:57<02:04,  2.92it/s]

Training Loss: 1.2549834251403809


 71%|███████   | 888/1250 [04:58<02:03,  2.92it/s]

Training Loss: 1.0046112537384033


 71%|███████   | 889/1250 [04:58<02:03,  2.93it/s]

Training Loss: 1.4877938032150269


 71%|███████   | 890/1250 [04:58<02:02,  2.94it/s]

Training Loss: 1.0735729932785034


 71%|███████▏  | 891/1250 [04:59<02:02,  2.92it/s]

Training Loss: 1.2570966482162476


 71%|███████▏  | 892/1250 [04:59<02:02,  2.93it/s]

Training Loss: 1.1729189157485962


 71%|███████▏  | 893/1250 [04:59<02:02,  2.92it/s]

Training Loss: 1.2906233072280884


 72%|███████▏  | 894/1250 [05:00<02:01,  2.93it/s]

Training Loss: 0.9257189035415649


 72%|███████▏  | 895/1250 [05:00<02:01,  2.93it/s]

Training Loss: 1.183915138244629


 72%|███████▏  | 896/1250 [05:00<02:00,  2.94it/s]

Training Loss: 1.4517639875411987


 72%|███████▏  | 897/1250 [05:01<02:00,  2.93it/s]

Training Loss: 1.1203705072402954


 72%|███████▏  | 898/1250 [05:01<02:00,  2.93it/s]

Training Loss: 1.2398903369903564


 72%|███████▏  | 899/1250 [05:01<01:59,  2.93it/s]

Training Loss: 1.0885487794876099


 72%|███████▏  | 900/1250 [05:02<01:59,  2.93it/s]

Training Loss: 1.2243279218673706


 72%|███████▏  | 901/1250 [05:02<01:59,  2.93it/s]

Training Loss: 1.0067085027694702


 72%|███████▏  | 902/1250 [05:02<01:59,  2.92it/s]

Training Loss: 1.476171851158142


 72%|███████▏  | 903/1250 [05:03<01:58,  2.92it/s]

Training Loss: 1.2230874300003052


 72%|███████▏  | 904/1250 [05:03<01:58,  2.92it/s]

Training Loss: 1.366144061088562


 72%|███████▏  | 905/1250 [05:03<01:57,  2.93it/s]

Training Loss: 1.380244493484497


 72%|███████▏  | 906/1250 [05:04<01:56,  2.94it/s]

Training Loss: 1.2056987285614014


 73%|███████▎  | 907/1250 [05:04<01:56,  2.94it/s]

Training Loss: 1.4846217632293701


 73%|███████▎  | 908/1250 [05:04<01:56,  2.94it/s]

Training Loss: 0.9760113954544067


 73%|███████▎  | 909/1250 [05:05<01:56,  2.93it/s]

Training Loss: 1.1814184188842773


 73%|███████▎  | 910/1250 [05:05<01:55,  2.95it/s]

Training Loss: 1.4135749340057373


 73%|███████▎  | 911/1250 [05:05<01:55,  2.94it/s]

Training Loss: 1.388252854347229


 73%|███████▎  | 912/1250 [05:06<01:55,  2.93it/s]

Training Loss: 0.9223514795303345


 73%|███████▎  | 913/1250 [05:06<01:54,  2.94it/s]

Training Loss: 1.1722341775894165


 73%|███████▎  | 914/1250 [05:06<01:54,  2.94it/s]

Training Loss: 1.3762305974960327


 73%|███████▎  | 915/1250 [05:07<01:54,  2.93it/s]

Training Loss: 1.3972102403640747


 73%|███████▎  | 916/1250 [05:07<01:54,  2.92it/s]

Training Loss: 1.3301548957824707


 73%|███████▎  | 917/1250 [05:07<01:55,  2.89it/s]

Training Loss: 1.21323823928833


 73%|███████▎  | 918/1250 [05:08<01:54,  2.90it/s]

Training Loss: 1.0863882303237915


 74%|███████▎  | 919/1250 [05:08<01:53,  2.91it/s]

Training Loss: 1.0145071744918823


 74%|███████▎  | 920/1250 [05:09<01:53,  2.90it/s]

Training Loss: 1.0879225730895996


 74%|███████▎  | 921/1250 [05:09<01:53,  2.90it/s]

Training Loss: 1.3763880729675293


 74%|███████▍  | 922/1250 [05:09<01:52,  2.93it/s]

Training Loss: 1.3626580238342285


 74%|███████▍  | 923/1250 [05:10<01:51,  2.93it/s]

Training Loss: 1.305840015411377


 74%|███████▍  | 924/1250 [05:10<01:51,  2.93it/s]

Training Loss: 1.593317985534668


 74%|███████▍  | 925/1250 [05:10<01:50,  2.93it/s]

Training Loss: 1.3518688678741455


 74%|███████▍  | 926/1250 [05:11<01:50,  2.94it/s]

Training Loss: 1.2217949628829956


 74%|███████▍  | 927/1250 [05:11<01:50,  2.93it/s]

Training Loss: 0.9718545079231262


 74%|███████▍  | 928/1250 [05:11<01:50,  2.93it/s]

Training Loss: 1.3424081802368164


 74%|███████▍  | 929/1250 [05:12<01:50,  2.91it/s]

Training Loss: 1.4318957328796387


 74%|███████▍  | 930/1250 [05:12<01:49,  2.92it/s]

Training Loss: 1.4445750713348389


 74%|███████▍  | 931/1250 [05:12<01:48,  2.93it/s]

Training Loss: 1.1413367986679077


 75%|███████▍  | 932/1250 [05:13<01:48,  2.93it/s]

Training Loss: 1.1277090311050415


 75%|███████▍  | 933/1250 [05:13<01:48,  2.93it/s]

Training Loss: 1.0183403491973877


 75%|███████▍  | 934/1250 [05:13<01:48,  2.92it/s]

Training Loss: 1.061105728149414


 75%|███████▍  | 935/1250 [05:14<01:47,  2.93it/s]

Training Loss: 1.1666383743286133


 75%|███████▍  | 936/1250 [05:14<01:46,  2.94it/s]

Training Loss: 1.1980702877044678


 75%|███████▍  | 937/1250 [05:14<01:47,  2.92it/s]

Training Loss: 1.2786831855773926


 75%|███████▌  | 938/1250 [05:15<01:46,  2.93it/s]

Training Loss: 1.0699498653411865


 75%|███████▌  | 939/1250 [05:15<01:46,  2.93it/s]

Training Loss: 1.1026102304458618


 75%|███████▌  | 940/1250 [05:15<01:45,  2.94it/s]

Training Loss: 1.1327625513076782


 75%|███████▌  | 941/1250 [05:16<01:45,  2.93it/s]

Training Loss: 1.1534844636917114


 75%|███████▌  | 942/1250 [05:16<01:45,  2.93it/s]

Training Loss: 1.3618390560150146


 75%|███████▌  | 943/1250 [05:16<01:44,  2.93it/s]

Training Loss: 1.594775915145874


 76%|███████▌  | 944/1250 [05:17<01:44,  2.94it/s]

Training Loss: 1.2699671983718872


 76%|███████▌  | 945/1250 [05:17<01:43,  2.95it/s]

Training Loss: 1.3680369853973389


 76%|███████▌  | 946/1250 [05:17<01:43,  2.94it/s]

Training Loss: 1.160866141319275


 76%|███████▌  | 947/1250 [05:18<01:42,  2.94it/s]

Training Loss: 1.296684741973877


 76%|███████▌  | 948/1250 [05:18<01:42,  2.94it/s]

Training Loss: 1.329526424407959


 76%|███████▌  | 949/1250 [05:18<01:42,  2.94it/s]

Training Loss: 1.3535672426223755


 76%|███████▌  | 950/1250 [05:19<01:41,  2.94it/s]

Training Loss: 1.2752569913864136


 76%|███████▌  | 951/1250 [05:19<01:42,  2.92it/s]

Training Loss: 1.487433671951294


 76%|███████▌  | 952/1250 [05:19<01:43,  2.89it/s]

Training Loss: 1.0747044086456299


 76%|███████▌  | 953/1250 [05:20<01:43,  2.88it/s]

Training Loss: 1.1864368915557861


 76%|███████▋  | 954/1250 [05:20<01:42,  2.89it/s]

Training Loss: 1.0118752717971802


 76%|███████▋  | 955/1250 [05:20<01:41,  2.90it/s]

Training Loss: 1.4509855508804321


 76%|███████▋  | 956/1250 [05:21<01:40,  2.92it/s]

Training Loss: 1.275111198425293


 77%|███████▋  | 957/1250 [05:21<01:39,  2.94it/s]

Training Loss: 1.24666428565979


 77%|███████▋  | 958/1250 [05:21<01:39,  2.95it/s]

Training Loss: 1.5225365161895752


 77%|███████▋  | 959/1250 [05:22<01:38,  2.96it/s]

Training Loss: 0.8310540914535522


 77%|███████▋  | 960/1250 [05:22<01:38,  2.96it/s]

Training Loss: 1.036124587059021


 77%|███████▋  | 961/1250 [05:22<01:37,  2.97it/s]

Training Loss: 1.3147825002670288


 77%|███████▋  | 962/1250 [05:23<01:37,  2.97it/s]

Training Loss: 1.0616403818130493


 77%|███████▋  | 963/1250 [05:23<01:36,  2.97it/s]

Training Loss: 1.2648755311965942


 77%|███████▋  | 964/1250 [05:24<01:36,  2.97it/s]

Training Loss: 1.2096903324127197


 77%|███████▋  | 965/1250 [05:24<01:36,  2.96it/s]

Training Loss: 0.8308252096176147


 77%|███████▋  | 966/1250 [05:24<01:36,  2.95it/s]

Training Loss: 0.8326656818389893


 77%|███████▋  | 967/1250 [05:25<01:35,  2.96it/s]

Training Loss: 1.2209264039993286


 77%|███████▋  | 968/1250 [05:25<01:36,  2.93it/s]

Training Loss: 0.8277866840362549


 78%|███████▊  | 969/1250 [05:25<01:35,  2.94it/s]

Training Loss: 1.6211953163146973


 78%|███████▊  | 970/1250 [05:26<01:35,  2.94it/s]

Training Loss: 0.9503809213638306


 78%|███████▊  | 971/1250 [05:26<01:34,  2.94it/s]

Training Loss: 1.1548850536346436


 78%|███████▊  | 972/1250 [05:26<01:34,  2.94it/s]

Training Loss: 1.502039909362793


 78%|███████▊  | 973/1250 [05:27<01:35,  2.91it/s]

Training Loss: 1.19408118724823


 78%|███████▊  | 974/1250 [05:27<01:34,  2.91it/s]

Training Loss: 1.0884886980056763


 78%|███████▊  | 975/1250 [05:27<01:33,  2.93it/s]

Training Loss: 1.0419272184371948


 78%|███████▊  | 976/1250 [05:28<01:33,  2.92it/s]

Training Loss: 1.657760500907898


 78%|███████▊  | 977/1250 [05:28<01:33,  2.93it/s]

Training Loss: 1.477898359298706


 78%|███████▊  | 978/1250 [05:28<01:32,  2.94it/s]

Training Loss: 1.5072046518325806


 78%|███████▊  | 979/1250 [05:29<01:32,  2.93it/s]

Training Loss: 1.060416579246521


 78%|███████▊  | 980/1250 [05:29<01:31,  2.94it/s]

Training Loss: 0.9175224304199219


 78%|███████▊  | 981/1250 [05:29<01:31,  2.94it/s]

Training Loss: 1.2128747701644897


 79%|███████▊  | 982/1250 [05:30<01:31,  2.93it/s]

Training Loss: 1.1401242017745972


 79%|███████▊  | 983/1250 [05:30<01:30,  2.94it/s]

Training Loss: 1.0656603574752808


 79%|███████▊  | 984/1250 [05:30<01:30,  2.95it/s]

Training Loss: 1.107277512550354


 79%|███████▉  | 985/1250 [05:31<01:29,  2.95it/s]

Training Loss: 1.5489565134048462


 79%|███████▉  | 986/1250 [05:31<01:29,  2.96it/s]

Training Loss: 1.026533603668213


 79%|███████▉  | 987/1250 [05:31<01:29,  2.93it/s]

Training Loss: 1.136649250984192


 79%|███████▉  | 988/1250 [05:32<01:29,  2.92it/s]

Training Loss: 1.0406906604766846


 79%|███████▉  | 989/1250 [05:32<01:30,  2.90it/s]

Training Loss: 1.1774265766143799


 79%|███████▉  | 990/1250 [05:32<01:30,  2.88it/s]

Training Loss: 1.2331327199935913


 79%|███████▉  | 991/1250 [05:33<01:30,  2.87it/s]

Training Loss: 1.4177744388580322


 79%|███████▉  | 992/1250 [05:33<01:28,  2.90it/s]

Training Loss: 1.604827642440796


 79%|███████▉  | 993/1250 [05:33<01:28,  2.91it/s]

Training Loss: 1.1945061683654785


 80%|███████▉  | 994/1250 [05:34<01:27,  2.93it/s]

Training Loss: 1.056732177734375


 80%|███████▉  | 995/1250 [05:34<01:27,  2.93it/s]

Training Loss: 1.137384295463562


 80%|███████▉  | 996/1250 [05:34<01:26,  2.94it/s]

Training Loss: 1.2828742265701294


 80%|███████▉  | 997/1250 [05:35<01:25,  2.95it/s]

Training Loss: 1.6843907833099365


 80%|███████▉  | 998/1250 [05:35<01:25,  2.94it/s]

Training Loss: 0.910489022731781


 80%|███████▉  | 999/1250 [05:35<01:24,  2.96it/s]

Training Loss: 1.0666732788085938


 80%|████████  | 1000/1250 [05:36<01:24,  2.96it/s]

Training Loss: 0.9184361100196838


 80%|████████  | 1001/1250 [05:36<01:24,  2.95it/s]

Training Loss: 1.1942775249481201


 80%|████████  | 1002/1250 [05:36<01:24,  2.94it/s]

Training Loss: 1.4561823606491089


 80%|████████  | 1003/1250 [05:37<01:24,  2.94it/s]

Training Loss: 1.4440484046936035


 80%|████████  | 1004/1250 [05:37<01:23,  2.93it/s]

Training Loss: 1.3107669353485107


 80%|████████  | 1005/1250 [05:37<01:22,  2.95it/s]

Training Loss: 1.3294522762298584


 80%|████████  | 1006/1250 [05:38<01:23,  2.94it/s]

Training Loss: 1.46946382522583


 81%|████████  | 1007/1250 [05:38<01:22,  2.93it/s]

Training Loss: 1.1815646886825562


 81%|████████  | 1008/1250 [05:39<01:22,  2.93it/s]

Training Loss: 1.072144627571106


 81%|████████  | 1009/1250 [05:39<01:22,  2.93it/s]

Training Loss: 1.0099822282791138


 81%|████████  | 1010/1250 [05:39<01:21,  2.94it/s]

Training Loss: 1.1080188751220703


 81%|████████  | 1011/1250 [05:40<01:21,  2.93it/s]

Training Loss: 1.3287206888198853


 81%|████████  | 1012/1250 [05:40<01:21,  2.92it/s]

Training Loss: 0.917832612991333


 81%|████████  | 1013/1250 [05:40<01:20,  2.96it/s]

Training Loss: 1.3368397951126099


 81%|████████  | 1014/1250 [05:41<01:19,  2.95it/s]

Training Loss: 1.4178407192230225


 81%|████████  | 1015/1250 [05:41<01:19,  2.95it/s]

Training Loss: 1.3156713247299194


 81%|████████▏ | 1016/1250 [05:41<01:19,  2.96it/s]

Training Loss: 1.0340981483459473


 81%|████████▏ | 1017/1250 [05:42<01:19,  2.94it/s]

Training Loss: 1.2500180006027222


 81%|████████▏ | 1018/1250 [05:42<01:18,  2.95it/s]

Training Loss: 1.0924378633499146


 82%|████████▏ | 1019/1250 [05:42<01:18,  2.95it/s]

Training Loss: 0.9716974496841431


 82%|████████▏ | 1020/1250 [05:43<01:18,  2.94it/s]

Training Loss: 1.0200179815292358


 82%|████████▏ | 1021/1250 [05:43<01:18,  2.93it/s]

Training Loss: 1.1670019626617432


 82%|████████▏ | 1022/1250 [05:43<01:18,  2.89it/s]

Training Loss: 1.066253662109375


 82%|████████▏ | 1023/1250 [05:44<01:18,  2.88it/s]

Training Loss: 1.3737431764602661


 82%|████████▏ | 1024/1250 [05:44<01:18,  2.88it/s]

Training Loss: 1.0374767780303955


 82%|████████▏ | 1025/1250 [05:44<01:18,  2.85it/s]

Training Loss: 1.440342903137207


 82%|████████▏ | 1026/1250 [05:45<01:17,  2.87it/s]

Training Loss: 1.3774125576019287


 82%|████████▏ | 1027/1250 [05:45<01:17,  2.89it/s]

Training Loss: 1.145229458808899


 82%|████████▏ | 1028/1250 [05:45<01:16,  2.90it/s]

Training Loss: 1.4057737588882446


 82%|████████▏ | 1029/1250 [05:46<01:15,  2.91it/s]

Training Loss: 1.2508622407913208


 82%|████████▏ | 1030/1250 [05:46<01:15,  2.90it/s]

Training Loss: 1.3174777030944824


 82%|████████▏ | 1031/1250 [05:46<01:15,  2.91it/s]

Training Loss: 1.3325730562210083


 83%|████████▎ | 1032/1250 [05:47<01:15,  2.90it/s]

Training Loss: 1.1359349489212036


 83%|████████▎ | 1033/1250 [05:47<01:15,  2.89it/s]

Training Loss: 1.4148846864700317


 83%|████████▎ | 1034/1250 [05:47<01:14,  2.90it/s]

Training Loss: 1.2570303678512573


 83%|████████▎ | 1035/1250 [05:48<01:14,  2.90it/s]

Training Loss: 1.6437044143676758


 83%|████████▎ | 1036/1250 [05:48<01:13,  2.91it/s]

Training Loss: 1.5118013620376587


 83%|████████▎ | 1037/1250 [05:48<01:13,  2.91it/s]

Training Loss: 1.1835354566574097


 83%|████████▎ | 1038/1250 [05:49<01:12,  2.94it/s]

Training Loss: 1.0521739721298218


 83%|████████▎ | 1039/1250 [05:49<01:11,  2.93it/s]

Training Loss: 1.3771770000457764


 83%|████████▎ | 1040/1250 [05:49<01:11,  2.94it/s]

Training Loss: 1.492234468460083


 83%|████████▎ | 1041/1250 [05:50<01:10,  2.95it/s]

Training Loss: 1.349863052368164


 83%|████████▎ | 1042/1250 [05:50<01:10,  2.93it/s]

Training Loss: 1.351686716079712


 83%|████████▎ | 1043/1250 [05:50<01:10,  2.93it/s]

Training Loss: 1.4405906200408936


 84%|████████▎ | 1044/1250 [05:51<01:10,  2.92it/s]

Training Loss: 1.2962828874588013


 84%|████████▎ | 1045/1250 [05:51<01:10,  2.93it/s]

Training Loss: 0.9379775524139404


 84%|████████▎ | 1046/1250 [05:52<01:09,  2.92it/s]

Training Loss: 1.171965479850769


 84%|████████▍ | 1047/1250 [05:52<01:09,  2.91it/s]

Training Loss: 1.2548412084579468


 84%|████████▍ | 1048/1250 [05:52<01:09,  2.92it/s]

Training Loss: 1.2148687839508057


 84%|████████▍ | 1049/1250 [05:53<01:08,  2.94it/s]

Training Loss: 1.1696828603744507


 84%|████████▍ | 1050/1250 [05:53<01:07,  2.95it/s]

Training Loss: 0.9679465293884277


 84%|████████▍ | 1051/1250 [05:53<01:07,  2.94it/s]

Training Loss: 1.293077826499939


 84%|████████▍ | 1052/1250 [05:54<01:07,  2.95it/s]

Training Loss: 1.4855908155441284


 84%|████████▍ | 1053/1250 [05:54<01:06,  2.95it/s]

Training Loss: 1.3367620706558228


 84%|████████▍ | 1054/1250 [05:54<01:06,  2.93it/s]

Training Loss: 1.3149278163909912


 84%|████████▍ | 1055/1250 [05:55<01:06,  2.93it/s]

Training Loss: 0.968500018119812


 84%|████████▍ | 1056/1250 [05:55<01:06,  2.93it/s]

Training Loss: 0.7277390360832214


 85%|████████▍ | 1057/1250 [05:55<01:06,  2.91it/s]

Training Loss: 1.3574270009994507


 85%|████████▍ | 1058/1250 [05:56<01:06,  2.88it/s]

Training Loss: 0.9730265736579895


 85%|████████▍ | 1059/1250 [05:56<01:06,  2.87it/s]

Training Loss: 0.8919734358787537


 85%|████████▍ | 1060/1250 [05:56<01:06,  2.88it/s]

Training Loss: 1.0095144510269165


 85%|████████▍ | 1061/1250 [05:57<01:05,  2.89it/s]

Training Loss: 1.0243688821792603


 85%|████████▍ | 1062/1250 [05:57<01:04,  2.89it/s]

Training Loss: 1.0984693765640259


 85%|████████▌ | 1063/1250 [05:57<01:04,  2.90it/s]

Training Loss: 1.310908555984497


 85%|████████▌ | 1064/1250 [05:58<01:04,  2.90it/s]

Training Loss: 1.402083158493042


 85%|████████▌ | 1065/1250 [05:58<01:03,  2.90it/s]

Training Loss: 1.2648147344589233


 85%|████████▌ | 1066/1250 [05:58<01:03,  2.92it/s]

Training Loss: 1.2033963203430176


 85%|████████▌ | 1067/1250 [05:59<01:02,  2.93it/s]

Training Loss: 1.2700119018554688


 85%|████████▌ | 1068/1250 [05:59<01:02,  2.93it/s]

Training Loss: 1.2350391149520874


 86%|████████▌ | 1069/1250 [05:59<01:01,  2.93it/s]

Training Loss: 0.7963606119155884


 86%|████████▌ | 1070/1250 [06:00<01:01,  2.92it/s]

Training Loss: 1.135181188583374


 86%|████████▌ | 1071/1250 [06:00<01:01,  2.93it/s]

Training Loss: 1.3121763467788696


 86%|████████▌ | 1072/1250 [06:00<01:01,  2.91it/s]

Training Loss: 1.3630650043487549


 86%|████████▌ | 1073/1250 [06:01<01:00,  2.91it/s]

Training Loss: 1.358774185180664


 86%|████████▌ | 1074/1250 [06:01<01:00,  2.92it/s]

Training Loss: 0.7796110510826111


 86%|████████▌ | 1075/1250 [06:01<01:00,  2.91it/s]

Training Loss: 1.0025840997695923


 86%|████████▌ | 1076/1250 [06:02<00:59,  2.92it/s]

Training Loss: 1.2736082077026367


 86%|████████▌ | 1077/1250 [06:02<00:59,  2.90it/s]

Training Loss: 1.4033719301223755


 86%|████████▌ | 1078/1250 [06:03<00:58,  2.92it/s]

Training Loss: 1.6492469310760498


 86%|████████▋ | 1079/1250 [06:03<00:58,  2.93it/s]

Training Loss: 1.021796703338623


 86%|████████▋ | 1080/1250 [06:03<00:58,  2.93it/s]

Training Loss: 1.1625356674194336


 86%|████████▋ | 1081/1250 [06:04<00:57,  2.94it/s]

Training Loss: 1.087532877922058


 87%|████████▋ | 1082/1250 [06:04<00:57,  2.94it/s]

Training Loss: 1.1684885025024414


 87%|████████▋ | 1083/1250 [06:04<00:57,  2.93it/s]

Training Loss: 1.1539732217788696


 87%|████████▋ | 1084/1250 [06:05<00:56,  2.93it/s]

Training Loss: 1.0871862173080444


 87%|████████▋ | 1085/1250 [06:05<00:56,  2.92it/s]

Training Loss: 0.9384379982948303


 87%|████████▋ | 1086/1250 [06:05<00:56,  2.92it/s]

Training Loss: 1.2835934162139893


 87%|████████▋ | 1087/1250 [06:06<00:55,  2.94it/s]

Training Loss: 0.8610029220581055


 87%|████████▋ | 1088/1250 [06:06<00:55,  2.93it/s]

Training Loss: 1.176072359085083


 87%|████████▋ | 1089/1250 [06:06<00:55,  2.92it/s]

Training Loss: 1.6202552318572998


 87%|████████▋ | 1090/1250 [06:07<00:54,  2.93it/s]

Training Loss: 1.3087722063064575


 87%|████████▋ | 1091/1250 [06:07<00:54,  2.91it/s]

Training Loss: 1.214682698249817


 87%|████████▋ | 1092/1250 [06:07<00:54,  2.92it/s]

Training Loss: 1.4966737031936646


 87%|████████▋ | 1093/1250 [06:08<00:53,  2.91it/s]

Training Loss: 1.2297356128692627


 88%|████████▊ | 1094/1250 [06:08<00:53,  2.90it/s]

Training Loss: 1.2116998434066772


 88%|████████▊ | 1095/1250 [06:08<00:53,  2.90it/s]

Training Loss: 1.2976300716400146


 88%|████████▊ | 1096/1250 [06:09<00:53,  2.89it/s]

Training Loss: 1.1611660718917847


 88%|████████▊ | 1097/1250 [06:09<00:52,  2.90it/s]

Training Loss: 1.027031421661377


 88%|████████▊ | 1098/1250 [06:09<00:52,  2.89it/s]

Training Loss: 1.082582712173462


 88%|████████▊ | 1099/1250 [06:10<00:52,  2.88it/s]

Training Loss: 1.321988582611084


 88%|████████▊ | 1100/1250 [06:10<00:51,  2.89it/s]

Training Loss: 1.0631977319717407


 88%|████████▊ | 1101/1250 [06:10<00:51,  2.89it/s]

Training Loss: 1.235100269317627


 88%|████████▊ | 1102/1250 [06:11<00:51,  2.90it/s]

Training Loss: 0.9107661843299866


 88%|████████▊ | 1103/1250 [06:11<00:50,  2.90it/s]

Training Loss: 0.8343847990036011


 88%|████████▊ | 1104/1250 [06:11<00:50,  2.91it/s]

Training Loss: 0.887025773525238


 88%|████████▊ | 1105/1250 [06:12<00:49,  2.92it/s]

Training Loss: 0.9653361439704895


 88%|████████▊ | 1106/1250 [06:12<00:49,  2.93it/s]

Training Loss: 1.2561100721359253


 89%|████████▊ | 1107/1250 [06:12<00:48,  2.95it/s]

Training Loss: 1.404531478881836


 89%|████████▊ | 1108/1250 [06:13<00:48,  2.93it/s]

Training Loss: 1.0818822383880615


 89%|████████▊ | 1109/1250 [06:13<00:48,  2.93it/s]

Training Loss: 0.7664831876754761


 89%|████████▉ | 1110/1250 [06:13<00:47,  2.94it/s]

Training Loss: 0.9430335760116577


 89%|████████▉ | 1111/1250 [06:14<00:47,  2.94it/s]

Training Loss: 1.0583635568618774


 89%|████████▉ | 1112/1250 [06:14<00:46,  2.94it/s]

Training Loss: 1.075431227684021


 89%|████████▉ | 1113/1250 [06:14<00:46,  2.93it/s]

Training Loss: 0.8386542797088623


 89%|████████▉ | 1114/1250 [06:15<00:46,  2.94it/s]

Training Loss: 0.9210817813873291


 89%|████████▉ | 1115/1250 [06:15<00:45,  2.94it/s]

Training Loss: 0.9241062998771667


 89%|████████▉ | 1116/1250 [06:16<00:45,  2.93it/s]

Training Loss: 1.1804766654968262


 89%|████████▉ | 1117/1250 [06:16<00:45,  2.93it/s]

Training Loss: 1.822689175605774


 89%|████████▉ | 1118/1250 [06:16<00:45,  2.93it/s]

Training Loss: 1.2572438716888428


 90%|████████▉ | 1119/1250 [06:17<00:44,  2.92it/s]

Training Loss: 1.2447885274887085


 90%|████████▉ | 1120/1250 [06:17<00:44,  2.93it/s]

Training Loss: 1.0421535968780518


 90%|████████▉ | 1121/1250 [06:17<00:44,  2.93it/s]

Training Loss: 1.1777029037475586


 90%|████████▉ | 1122/1250 [06:18<00:43,  2.94it/s]

Training Loss: 1.2690783739089966


 90%|████████▉ | 1123/1250 [06:18<00:43,  2.93it/s]

Training Loss: 1.4033002853393555


 90%|████████▉ | 1124/1250 [06:18<00:43,  2.92it/s]

Training Loss: 1.2221254110336304


 90%|█████████ | 1125/1250 [06:19<00:42,  2.93it/s]

Training Loss: 0.9467998743057251


 90%|█████████ | 1126/1250 [06:19<00:42,  2.94it/s]

Training Loss: 1.6033486127853394


 90%|█████████ | 1127/1250 [06:19<00:41,  2.94it/s]

Training Loss: 1.2014355659484863


 90%|█████████ | 1128/1250 [06:20<00:41,  2.96it/s]

Training Loss: 1.191552996635437


 90%|█████████ | 1129/1250 [06:20<00:40,  2.95it/s]

Training Loss: 1.450253963470459


 90%|█████████ | 1130/1250 [06:20<00:41,  2.92it/s]

Training Loss: 1.0728598833084106


 90%|█████████ | 1131/1250 [06:21<00:41,  2.90it/s]

Training Loss: 1.1696174144744873


 91%|█████████ | 1132/1250 [06:21<00:41,  2.87it/s]

Training Loss: 1.2311384677886963


 91%|█████████ | 1133/1250 [06:21<00:40,  2.89it/s]

Training Loss: 1.422743797302246


 91%|█████████ | 1134/1250 [06:22<00:40,  2.89it/s]

Training Loss: 1.1079745292663574


 91%|█████████ | 1135/1250 [06:22<00:39,  2.91it/s]

Training Loss: 1.2631756067276


 91%|█████████ | 1136/1250 [06:22<00:39,  2.92it/s]

Training Loss: 1.4300330877304077


 91%|█████████ | 1137/1250 [06:23<00:38,  2.91it/s]

Training Loss: 1.2586700916290283


 91%|█████████ | 1138/1250 [06:23<00:38,  2.92it/s]

Training Loss: 0.8260996341705322


 91%|█████████ | 1139/1250 [06:23<00:37,  2.94it/s]

Training Loss: 1.0545709133148193


 91%|█████████ | 1140/1250 [06:24<00:37,  2.96it/s]

Training Loss: 1.5633623600006104


 91%|█████████▏| 1141/1250 [06:24<00:37,  2.93it/s]

Training Loss: 1.537796139717102


 91%|█████████▏| 1142/1250 [06:24<00:36,  2.93it/s]

Training Loss: 1.6627578735351562


 91%|█████████▏| 1143/1250 [06:25<00:36,  2.94it/s]

Training Loss: 1.1664559841156006


 92%|█████████▏| 1144/1250 [06:25<00:36,  2.94it/s]

Training Loss: 1.1049752235412598


 92%|█████████▏| 1145/1250 [06:25<00:35,  2.95it/s]

Training Loss: 1.7447091341018677


 92%|█████████▏| 1146/1250 [06:26<00:35,  2.95it/s]

Training Loss: 1.4879326820373535


 92%|█████████▏| 1147/1250 [06:26<00:35,  2.94it/s]

Training Loss: 1.7658183574676514


 92%|█████████▏| 1148/1250 [06:26<00:34,  2.93it/s]

Training Loss: 1.212457537651062


 92%|█████████▏| 1149/1250 [06:27<00:34,  2.93it/s]

Training Loss: 1.0787917375564575


 92%|█████████▏| 1150/1250 [06:27<00:34,  2.93it/s]

Training Loss: 1.0872701406478882


 92%|█████████▏| 1151/1250 [06:27<00:34,  2.91it/s]

Training Loss: 1.2988499402999878


 92%|█████████▏| 1152/1250 [06:28<00:33,  2.93it/s]

Training Loss: 0.9732469916343689


 92%|█████████▏| 1153/1250 [06:28<00:33,  2.93it/s]

Training Loss: 1.1717846393585205


 92%|█████████▏| 1154/1250 [06:28<00:32,  2.93it/s]

Training Loss: 0.9355360865592957


 92%|█████████▏| 1155/1250 [06:29<00:32,  2.94it/s]

Training Loss: 0.9769835472106934


 92%|█████████▏| 1156/1250 [06:29<00:32,  2.94it/s]

Training Loss: 1.1296321153640747


 93%|█████████▎| 1157/1250 [06:30<00:31,  2.94it/s]

Training Loss: 1.0916225910186768


 93%|█████████▎| 1158/1250 [06:30<00:31,  2.95it/s]

Training Loss: 1.2152140140533447


 93%|█████████▎| 1159/1250 [06:30<00:31,  2.93it/s]

Training Loss: 1.6041204929351807


 93%|█████████▎| 1160/1250 [06:31<00:30,  2.94it/s]

Training Loss: 1.194273829460144


 93%|█████████▎| 1161/1250 [06:31<00:30,  2.91it/s]

Training Loss: 1.217383623123169


 93%|█████████▎| 1162/1250 [06:31<00:30,  2.88it/s]

Training Loss: 1.3630032539367676


 93%|█████████▎| 1163/1250 [06:32<00:29,  2.91it/s]

Training Loss: 0.9756889343261719


 93%|█████████▎| 1164/1250 [06:32<00:30,  2.85it/s]

Training Loss: 1.3466178178787231


 93%|█████████▎| 1165/1250 [06:32<00:29,  2.87it/s]

Training Loss: 1.1108759641647339


 93%|█████████▎| 1166/1250 [06:33<00:29,  2.86it/s]

Training Loss: 1.4535099267959595


 93%|█████████▎| 1167/1250 [06:33<00:28,  2.88it/s]

Training Loss: 1.4440447092056274


 93%|█████████▎| 1168/1250 [06:33<00:28,  2.89it/s]

Training Loss: 1.4592382907867432


 94%|█████████▎| 1169/1250 [06:34<00:27,  2.92it/s]

Training Loss: 1.0356714725494385


 94%|█████████▎| 1170/1250 [06:34<00:27,  2.91it/s]

Training Loss: 1.1015613079071045


 94%|█████████▎| 1171/1250 [06:34<00:27,  2.91it/s]

Training Loss: 1.4655966758728027


 94%|█████████▍| 1172/1250 [06:35<00:26,  2.92it/s]

Training Loss: 0.9902026653289795


 94%|█████████▍| 1173/1250 [06:35<00:26,  2.93it/s]

Training Loss: 1.3365564346313477


 94%|█████████▍| 1174/1250 [06:35<00:25,  2.94it/s]

Training Loss: 1.050915241241455


 94%|█████████▍| 1175/1250 [06:36<00:25,  2.94it/s]

Training Loss: 1.2152999639511108


 94%|█████████▍| 1176/1250 [06:36<00:25,  2.94it/s]

Training Loss: 0.99932461977005


 94%|█████████▍| 1177/1250 [06:36<00:24,  2.93it/s]

Training Loss: 1.110901951789856


 94%|█████████▍| 1178/1250 [06:37<00:24,  2.93it/s]

Training Loss: 0.8682999014854431


 94%|█████████▍| 1179/1250 [06:37<00:24,  2.92it/s]

Training Loss: 1.4779577255249023


 94%|█████████▍| 1180/1250 [06:37<00:24,  2.90it/s]

Training Loss: 0.9679760932922363


 94%|█████████▍| 1181/1250 [06:38<00:23,  2.91it/s]

Training Loss: 0.8800005912780762


 95%|█████████▍| 1182/1250 [06:38<00:23,  2.90it/s]

Training Loss: 1.0551362037658691


 95%|█████████▍| 1183/1250 [06:38<00:23,  2.89it/s]

Training Loss: 1.273194432258606


 95%|█████████▍| 1184/1250 [06:39<00:22,  2.89it/s]

Training Loss: 1.4983397722244263


 95%|█████████▍| 1185/1250 [06:39<00:22,  2.90it/s]

Training Loss: 1.0329124927520752


 95%|█████████▍| 1186/1250 [06:39<00:21,  2.91it/s]

Training Loss: 1.430096983909607


 95%|█████████▍| 1187/1250 [06:40<00:21,  2.94it/s]

Training Loss: 1.278496503829956


 95%|█████████▌| 1188/1250 [06:40<00:21,  2.93it/s]

Training Loss: 1.2371597290039062


 95%|█████████▌| 1189/1250 [06:41<00:20,  2.95it/s]

Training Loss: 0.940095841884613


 95%|█████████▌| 1190/1250 [06:41<00:20,  2.94it/s]

Training Loss: 1.2835755348205566


 95%|█████████▌| 1191/1250 [06:41<00:19,  2.96it/s]

Training Loss: 1.2782493829727173


 95%|█████████▌| 1192/1250 [06:42<00:19,  2.94it/s]

Training Loss: 1.1765846014022827


 95%|█████████▌| 1193/1250 [06:42<00:19,  2.95it/s]

Training Loss: 1.0919976234436035


 96%|█████████▌| 1194/1250 [06:42<00:19,  2.95it/s]

Training Loss: 1.078258991241455


 96%|█████████▌| 1195/1250 [06:43<00:18,  2.95it/s]

Training Loss: 1.1347335577011108


 96%|█████████▌| 1196/1250 [06:43<00:18,  2.94it/s]

Training Loss: 1.1845343112945557


 96%|█████████▌| 1197/1250 [06:43<00:18,  2.93it/s]

Training Loss: 1.016280174255371


 96%|█████████▌| 1198/1250 [06:44<00:17,  2.89it/s]

Training Loss: 1.1245200634002686


 96%|█████████▌| 1199/1250 [06:44<00:17,  2.90it/s]

Training Loss: 1.2213138341903687


 96%|█████████▌| 1200/1250 [06:44<00:17,  2.90it/s]

Training Loss: 1.2747300863265991


 96%|█████████▌| 1201/1250 [06:45<00:16,  2.89it/s]

Training Loss: 0.8892884850502014


 96%|█████████▌| 1202/1250 [06:45<00:16,  2.92it/s]

Training Loss: 1.246507167816162


 96%|█████████▌| 1203/1250 [06:45<00:16,  2.92it/s]

Training Loss: 1.243906855583191


 96%|█████████▋| 1204/1250 [06:46<00:15,  2.93it/s]

Training Loss: 1.5029476881027222


 96%|█████████▋| 1205/1250 [06:46<00:15,  2.93it/s]

Training Loss: 1.0102795362472534


 96%|█████████▋| 1206/1250 [06:46<00:15,  2.92it/s]

Training Loss: 1.0902714729309082


 97%|█████████▋| 1207/1250 [06:47<00:14,  2.92it/s]

Training Loss: 1.1905009746551514


 97%|█████████▋| 1208/1250 [06:47<00:14,  2.91it/s]

Training Loss: 1.1777280569076538


 97%|█████████▋| 1209/1250 [06:47<00:14,  2.91it/s]

Training Loss: 0.9985678195953369


 97%|█████████▋| 1210/1250 [06:48<00:13,  2.92it/s]

Training Loss: 1.0989378690719604


 97%|█████████▋| 1211/1250 [06:48<00:13,  2.92it/s]

Training Loss: 1.131027340888977


 97%|█████████▋| 1212/1250 [06:48<00:13,  2.91it/s]

Training Loss: 0.9612829089164734


 97%|█████████▋| 1213/1250 [06:49<00:12,  2.91it/s]

Training Loss: 1.6478018760681152


 97%|█████████▋| 1214/1250 [06:49<00:12,  2.90it/s]

Training Loss: 0.8910325765609741


 97%|█████████▋| 1215/1250 [06:49<00:12,  2.91it/s]

Training Loss: 1.2519440650939941


 97%|█████████▋| 1216/1250 [06:50<00:11,  2.92it/s]

Training Loss: 1.1568822860717773


 97%|█████████▋| 1217/1250 [06:50<00:11,  2.92it/s]

Training Loss: 1.1497914791107178


 97%|█████████▋| 1218/1250 [06:50<00:10,  2.93it/s]

Training Loss: 1.148202657699585


 98%|█████████▊| 1219/1250 [06:51<00:10,  2.96it/s]

Training Loss: 1.1130049228668213


 98%|█████████▊| 1220/1250 [06:51<00:10,  2.95it/s]

Training Loss: 0.9311742782592773


 98%|█████████▊| 1221/1250 [06:51<00:09,  2.95it/s]

Training Loss: 1.0429246425628662


 98%|█████████▊| 1222/1250 [06:52<00:09,  2.95it/s]

Training Loss: 0.8338204026222229


 98%|█████████▊| 1223/1250 [06:52<00:09,  2.94it/s]

Training Loss: 1.387909173965454


 98%|█████████▊| 1224/1250 [06:52<00:08,  2.92it/s]

Training Loss: 1.4790794849395752


 98%|█████████▊| 1225/1250 [06:53<00:08,  2.93it/s]

Training Loss: 1.011793851852417


 98%|█████████▊| 1226/1250 [06:53<00:08,  2.92it/s]

Training Loss: 1.005881905555725


 98%|█████████▊| 1227/1250 [06:53<00:07,  2.93it/s]

Training Loss: 1.4979697465896606


 98%|█████████▊| 1228/1250 [06:54<00:07,  2.95it/s]

Training Loss: 1.1763578653335571


 98%|█████████▊| 1229/1250 [06:54<00:07,  2.93it/s]

Training Loss: 1.1039655208587646


 98%|█████████▊| 1230/1250 [06:55<00:06,  2.93it/s]

Training Loss: 1.1438450813293457


 98%|█████████▊| 1231/1250 [06:55<00:06,  2.92it/s]

Training Loss: 1.117620825767517


 99%|█████████▊| 1232/1250 [06:55<00:06,  2.88it/s]

Training Loss: 1.2909060716629028


 99%|█████████▊| 1233/1250 [06:56<00:05,  2.88it/s]

Training Loss: 0.998123049736023


 99%|█████████▊| 1234/1250 [06:56<00:05,  2.90it/s]

Training Loss: 1.324944019317627


 99%|█████████▉| 1235/1250 [06:56<00:05,  2.91it/s]

Training Loss: 1.121395230293274


 99%|█████████▉| 1236/1250 [06:57<00:04,  2.93it/s]

Training Loss: 0.8999255895614624


 99%|█████████▉| 1237/1250 [06:57<00:04,  2.92it/s]

Training Loss: 1.169169545173645


 99%|█████████▉| 1238/1250 [06:57<00:04,  2.94it/s]

Training Loss: 0.893952488899231


 99%|█████████▉| 1239/1250 [06:58<00:03,  2.95it/s]

Training Loss: 1.2125452756881714


 99%|█████████▉| 1240/1250 [06:58<00:03,  2.94it/s]

Training Loss: 1.3743743896484375


 99%|█████████▉| 1241/1250 [06:58<00:03,  2.94it/s]

Training Loss: 0.9019354581832886


 99%|█████████▉| 1242/1250 [06:59<00:02,  2.93it/s]

Training Loss: 1.086455225944519


 99%|█████████▉| 1243/1250 [06:59<00:02,  2.94it/s]

Training Loss: 1.5478413105010986


100%|█████████▉| 1244/1250 [06:59<00:02,  2.94it/s]

Training Loss: 1.1784510612487793


100%|█████████▉| 1245/1250 [07:00<00:01,  2.94it/s]

Training Loss: 1.4485974311828613


100%|█████████▉| 1246/1250 [07:00<00:01,  2.93it/s]

Training Loss: 1.4412628412246704


100%|█████████▉| 1247/1250 [07:00<00:01,  2.94it/s]

Training Loss: 1.7788310050964355


100%|█████████▉| 1248/1250 [07:01<00:00,  2.94it/s]

Training Loss: 1.0598869323730469


100%|█████████▉| 1249/1250 [07:01<00:00,  2.94it/s]

Training Loss: 1.1596072912216187


100%|██████████| 1250/1250 [07:01<00:00,  2.96it/s]

Training complete!





In [28]:
# 2. find 10 nearest words from "آزادی"

word = 'آزادی'
# Get the embedding of the word
with torch.no_grad():
    word_embedding = model(tokenizer.encode(word, return_tensors='pt').to(device))[0][0, :].cpu().numpy()

similar_words = []
# Calculate the cosine similarity
# checking all other words
for word in tqdm(tokenizer.get_vocab()):
    with torch.no_grad():
        word_embedding = model(tokenizer.encode(word, return_tensors='pt').to(device))[0][0, :].cpu().numpy()
    similar_word = 1 - cosine_similarity(word_embedding, word_embedding)
    similar_words.append((word, similar_word))

# Sort the found words by similarity
sorted_similar = sorted(similar_words, key=lambda x: x[1], reverse=True)

print(sorted_similar)
# Get the top 10 similar words
top_10_similars = sorted_similar[:10]

print(top_10_similars)

100%|██████████| 30522/30522 [05:51<00:00, 86.86it/s]

[('[unused266]', 2.384185791015625e-07), ('[unused549]', 2.384185791015625e-07), ('[unused573]', 2.384185791015625e-07), (';', 2.384185791015625e-07), ('ན', 2.384185791015625e-07), ('》', 2.384185791015625e-07), ('古', 2.384185791015625e-07), ('田', 2.384185791015625e-07), ('白', 2.384185791015625e-07), ('禾', 2.384185791015625e-07)]


##### Describe advantages and disadvantages of Contextualized embedding

#Advantages:

BERT works well for task-specific models. The state of the art model, BERT, has been trained on a large corpus, making it easier for smaller, more defined nlp tasks.

Metrics can be fine-tuned and be used immediately.

The accuracy of the model is outstanding because it is frequently updated. You can achieve this with successful fine-tuning training.

The BERT model is available and pre-trained in more than 100 languages. This can be useful for projects that are not English-based.

#Disadvantages:
The main drawbacks of using BERT and other big neural language models is the computational resources needed to train/fine-tune and make inferences.

Most of the drawbacks of BERT can be linked to its size. While training the data on a large corpus significantly helps how the computer predicts and learns, there is also another side to it. They include:

The model is large because of the training structure and corpus.

It is slow to train because it is big and there are a lot of weights to update.

It is expensive. It requires more computation because of its size, which comes at a cost.

Disadvantages:
