In [1]:
import sys
sys.path.append('../core')

In [2]:
from transformers import GPT2Tokenizer, AutoModelForCausalLM

import time
from typing import Dict, List

import numpy as np
import torch
import torch.nn.functional as F
from data_utils import format_time, save_stats
from dataloader import create_bert_dataloaders
from dataset_loader import dataset_loader
from torch.utils.data import DataLoader
from models.bert_discriminator import BERTDiscriminator, model_name
from transformers import AutoTokenizer
from util.early_stopping import EarlyStopping

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
dataset = 'aclImdb_001'
train_sentences, train_labels, _, _ = dataset_loader.load_dataset(dataset)

In [4]:
import random
positive_sentences = [sentence for sentence, label in zip(train_sentences, train_labels) if label == '1']
negative_sentences = [sentence for sentence, label in zip(train_sentences, train_labels) if label == '0']
print('Positive')
for i in range(1, 5):
    sentence = random.choice(positive_sentences)
    print(f'{i}. {sentence}')

print('Negative')
for i in range(1, 5):
    sentence = random.choice(negative_sentences)
    print(f'{i}. {sentence}')

Positive
1. A typical romp through Cheech and Chong's reality which includes drugs, singing, more drugs, cars and driving, even more drugs, Pee Wee, aliens, gasoline, laundry, stand up comedy, surprisingly more drugs and SPACE COKE !!. It is not as coherent or plausible as Up in Smoke but it still is incredibly funny, without becoming as strange as Nice Dreams. There are some classic scenes, which include the opening scene where they get some gas for their car and the drive to work. Also funny is Cheech's song (Mexican-Americans) and Chong's follow up song. Another notable scene is the welfare office scene with Jones (human noise machine), from the Police Academy series, and the old laughing man. All in all, this is a great follow up to Up in Smoke and is quite watchable when sober or not.<br /><br />-Celluloid Rehab
2. This film seems to be well remembered as the time Tom & Jerry signed a peace treaty. Things are idyllic for a time but, predictably, it goes sour. Probably the most mem

In [5]:
gpt_tokenizer = GPT2Tokenizer.from_pretrained('gpt2', padding_side='left')
generator = AutoModelForCausalLM.from_pretrained('gpt2')
gpt_tokenizer.pad_token = gpt_tokenizer.eos_token

## Prompts

### SUBJ Prompt

In [6]:
# objetive_prompt = '''Here are 3 examples of objetive moview reviews:

# 1. it is a study of dark forces lurking in the lives of teenagers today . \n
# 2. during the course of the story we also learn that his father died at age 40 ; and now , as jones approaches his 40th birthday , he suffers from " survivor\'s guilt . " \n
# 3. '''

# subjective_prompt = '''Here are 3 examples of subjetive moview reviews:

# 1. few films seem so wise and knowing about the fact of age and the approach of the end . \n
# 2. fessenden continues to do interesting work , and it would be nice to see what he could make with a decent budget . but the problem with wendigo , for all its effective moments , isn't really one of resources . \n
# 3. '''

prompt = '''Here are exactly 5 movie reviews written in movie forums, some are objective and some are subjective:

1. has a shambling charm . . . a cheerfully inconsequential diversion . \n

2. as violent , profane and exploitative as the most offensive action flick you've ever seen . \n

3. she is intrigued by his knowledge of shakespeare , manner of living and the fifteen perfectly organized bags beneath his bench . \n

4. when mr . hundert tells us in his narration that 'this is a story without surprises , ' we nod in agreement . \n

5.'''

### AclImdb Prompt

In [6]:
# positive_prompt = "Here are 3 positive movie review from IMDB website written in a user post:\n 1. For a movie that gets no respect there sure are a lot of memorable quotes listed for this gem. Imagine a movie where Joe Piscopo is actually funny! Maureen Stapleton is a scene stealer. The Moroni character is an absolute scream. Watch for Alan \"The Skipper\" Hale jr. as a police Sgt\n 2. A solid, if unremarkable film. Matthau, as Einstein, was wonderful. My favorite part, and the only thing that would make me go out of my way to see this again, was the wonderful scene with the physicists playing badmitton, I loved the sweaters and the conversation while they waited for Robbins to retrieve the birdie.\n 3. "
# negative_prompt = "Here are 3 negative movie review from IMDB website written in a user post:\n 1. Wow! I remember so many awful films that loosely revolved around high school from the early 1980s. They usually had someincredibly strained plot and lots of 27 year old actors pretending to be students. As I watched this film I felt a little of the nostalgia of growing up in the 1980s. However, then I find out that this film was made in 1989? Say what! Well, the nostalgia factor ends right there, this is just bad. The plot has the city preparing to close a high school and threatening to bus all of the students to inner city high schools. Which is odd, in that the students at this school are both wealthy and abundant. In fact, the main character lives in a mansion. Makes you wonder how they cannot find money to keep this school alive, have they never heard of property taxes. Oh, but here is the kicker. The school board says that they will keep the school alive, if the students can raise $200,000. So the seniors go about doing this. Hmmm, you raise $200,000 but instead of saving that for college, you put it towards saving the high school that you are a Senior in? And why exactly would they close an overpopulated school before the year is out? And...ahh forget it, this film was stupid and made in 1989!?\n 2. What was Steven Seagal thinking? I mean firstly I love Seagal. I love all his movies up to the mid 2000s. His early stuff is some of the best in the genre. This however does not live up to its excellent name. Attack Force (with protagonist Marshall Lawson {Seagal}) would be expected to be a mindless action movie with Seagal in typical one-liner ass kicking form. However, what we get is a crime mystery, bordering on a political thriller with little or no action. Seagal is always in shadows because of his weight. I could not follow this story. There\'s people who mutate to superhumans when they take a drug. What happened in this movie. The dubbing of Seagal is a disgrace, a shambles and a shame. Why dub the man? The story is terrible. This got a 2/10 from me because of the scene where Seagal asks for backup despite having an army with him, and an hilarious fight scene where seagal swings his hands like a girl facing the camera! \"Revenge is a two way street\" seagal says in this movie...well forget revenge Steven, you need redemption!\n 3. "

positive_prompt = '''Here are exactly 5 movie reviews from IMDB, these are positive review written by users:

1. Meryl Streep is excellent in her nuanced and stoic performance as the infamous Lindy Chamberlain who was accused and tried for allegedly killing her own baby Azaria Chamberlain and using her alibi of ravenous dingoes as her defense. Based on the book "Evil Angels" and titled so in its Australian release, A CRY IN THE DARK is an ugly film to watch. It presents a scenario that's all too real for us in America: the witch-hunt against a person deemed an easy target.
2. *some spoilers*<br /><br />I was pleasantly surprised to find the harsh criticisms (acting, dated dialogue, unclear storyline) unfounded. Belafonte is great as a Brandoesque, menacing, swearing spirit who must earn his wings but is realistically ill-equipped from his past life to do so. He learns too late how empty his hustling, materialistic life was without love. Mostel is likewise great as an anguished man with his dying wife Fanny.
3. This is not "so bad that it is good," it is purely good! For those who don't understand why, you have the intellect of a four year old (in response to a certain comment...) Anyways, Killer Tomatoes Eat France is a parody of itself, a parody of you, and a parody of me. It is the single most genius text in cinematic history. I have it and the three prequels sitting on my DVD rack next to Herzog and Kurosawa. It embodies the recognition of absurdity and undermines all that you or me call standard.
4. Fabulous, fantastic, probably Disney's best musical adventure. I have loved this film for over 35 years because it is so imaginative, clever and fun. Even despite the silly "flying bed" scenes, the other scenes and dialog are magical and funny. Could they have picked anyone better than Angela Lansbury to play Eglantine? I cannot think of anyone more suited to the role. Remaking this classic would be as stupid as remaking Mary Poppins.
5.'''

negative_prompt = '''Here are exactly 5 movie reviews from IMDB, these are negative review written by users:

1. I am not a big fan of horror films, and have only seen a handful of them (and none of the "Halloween"s or "Friday the Thirteenth"s) - but I can appreciate a frightening horror film not because of gore. And I'm pretty sure this isn't scary.
2. Or "Marlowe At Sea". Yet another ridiculously overrated old film with Bogey. Quite talky, too. Bogey basically plays the same character as in the Marlow films; always in control of a situation, never nervous - no matter how dangerous a situation, calls women "slim" and "dames" and other such nonsense, is the only "real male" i.e. alpha male in the movie (the only other alpha male male being the head of Gestapo - but he is only a fat alpha male male), and - naturally - every attractive young woman who comes his way cannot resist his charms and wants his penis within hours of their initial introduction. The character clichés are all here.
3. This installment of Masters of Horror was terrible. Apparently, Mr. Carpenter needs to learn a thing or two about pacing and decent, plausible dialog. There were times when I literally shouted at the TV for something to happen. Maybe he thinks he building suspense, but Carpenter needs to trim back that overdone, over-simplified musical score of his (or his son's) and advance the action a little bit. How many times did the girl say, "Oh no, I can't have this baby!" and "Oh, no here it comes"?
4. Cradle of Fear<br /><br />This isn't a movie where intricate delicate little narrative nuances occupy our attention. This is not a film where the special effects are supposed to leave us slack-jacked uttering that sense of whoa. What it is though is a slice of lo-fi goth horror which leaves little to the imagination, created in the eyes of the director, Alex Chandon, as "a throwback to sleazy '70s and '80s horror".
5.'''

### Helpdesk Prompt

In [9]:
# prompt = "Here is an example of helpdesk emails texts about one of the topics General Inquiry, Human Resources, Billing and Payments, Sales and Pre-Sales, IT Support, Customer Service, Product Support, Returns and Exchanges, Service Outages and Maintenance or Technical Support: \n\n"
prompt = '''Here are 3 examples of helpdesk emails texts about one of the topics General Inquiry, Human Resources, Billing and Payments, Sales and Pre-Sales, IT Support, Customer Service, Product Support, Returns and Exchanges, Service Outages and Maintenance or Technical Support:

1. Sehr geehrtes Support-Team des Tech Online Stores,\n\nich interessiere mich für den Kauf eines MacBook Air M1 und hätte gerne detaillierte Spezifikationen sowie Informationen zu den verfügbaren Anpassungsoptionen. Könnten Sie mir bitte diese Informationen zur Verfügung stellen?\n\nVielen Dank für Ihre Unterstützung.\n\nMit freundlichen Grüßen,\n<name>
2. Le client signale des déconnexions fréquentes et des plantages lors des réunions vidéo utilisant Zoom 5.11.0. Veuillez enquêter. Merci.
3. 
'''


### Turkish Product Reviews Prompt

In [10]:
positive_prompt = '''Here are 3 examples of positive product reviews in turkish:

1. aldığıma pişman degilim açıklamada bahsedildiği gibi  içiniz rahat alabilirsiniz
2. açılırken tutukluk yapabiliyor ama sağlam, işe yarar bir ürün.
3. '''
negative_prompt = '''Here are 3 examples of negative product reviews in turkish:

1. ayakkabı tabanını gereğinden fazla yükseltiyor. eğer ayakkabı ayağınıza büyük geliyorsa tam aradığınız ürün. yarardan çok zararını gördüm diyebilirim, günlük kullandığım ayakkabıları rahat giyemiyorum artık.
2. bez siparişimiz, her zaman ertesi gün elimize ulaşıyor, bu konuda sıkıntı yok. ancak aylardır kullandığımız ve hiç sorun yaşamadığımız 5 numara prima'nın yeni içeriğini hiç beğenmedik. çocukta ileri derecede pişik yaptı. sanırım bez markasını deiğiştirme vakti geldi.
3. '''

### MTOD (th) Prompt

In [11]:
prompt = '''Here are 4 examples of sentences in thailandese about "alarm", "reminder" or "weather":

1. หยุดนาฬิกาปลุก
2. ความชื้นของวันนี้คืออะไร?
3. ปิดเตือนความจำทั้งหมดสำหรับสุดสัปดาห์นี้
4. '''

## Sample Generation

In [7]:
# batch_input = [prompt]
batch_input = [positive_prompt, negative_prompt]
# batch_input = [subjective_prompt, objetive_prompt]
encoded_input = gpt_tokenizer(batch_input, return_tensors='pt', padding=True)
output = generator.generate(**encoded_input, temperature=0.9, do_sample=True, max_new_tokens=400)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [8]:
gpt_tokenizer.batch_decode(output)

['<|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|>Here are exactly 5 movie reviews from IMDB, these are positive review written by users:\n\n1. Meryl Streep is excellent in her nuanced and stoic performance as the infamous Lindy Chamberlain who was accused and tried for allegedly killing her own baby Azaria Chamberlain and using her alibi of ravenous dingoes as her defense. Based on the book "Evil Angels" and titled so in its Australian release, A CRY IN THE DARK is an ugly film to watch. It presents a scenario that\'s all too real for us in America: the witch-hunt against a person deemed an easy target.\n2. *some spoilers*<br /><br />I was pleasantly surprised to find the harsh criticisms (acting, dated dialogue, unclear storyline) unfounded. Belafonte is great as a Brandoesque, menacing, swearing spirit who must earn his wings but is realistically ill-equipped from his past life to do so. H

# Training

Parameters

In [9]:
print_each_n_step = 50
num_train_epochs = 50
noise_size = 1
batch_size = 8
epsilon = 1e-8
initial_temp = 1.0
anneal_rate = 0.95
min_temp = 0.1

Device setup

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

# If there's a GPU available...
if torch.backends.mps.is_available():
    print('Using MPS backend')
    device = torch.device('mps')
elif torch.cuda.is_available():
    # Tell PyTorch to use the GPU.
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))
# If not...
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")


Using MPS backend


In [11]:
labels = dataset_loader.get_labels(dataset)

train_dataloader, test_dataloader, seq_size = create_bert_dataloaders(dataset, batch_size=batch_size, device=device, tokenizer=tokenizer)

# Models
discriminator = BERTDiscriminator(1, seq_size, device, num_labels=len(labels))

print(generator)
print('generator parameters: ' + str(sum(p.numel() for p in generator.parameters() if p.requires_grad)))
print(discriminator)
print('discriminator parameters: ' + str(sum(p.numel() for p in discriminator.parameters() if p.requires_grad)))

generator.to(device)
discriminator.to(device)
if torch.cuda.is_available():
    generator.cuda()
    discriminator.cuda()

# Training
training_stats = []

g_vars = [v for v in generator.parameters()]
d_vars = [v for v in discriminator.parameters()]

gen_optimizer = torch.optim.AdamW(g_vars, lr=5e-5)
dis_optimizer = torch.optim.AdamW(d_vars, lr=5e-5)

early_stopping = EarlyStopping(patience=5, min_delta=0.001, verbose=True)

Using dataset aclImdb_001
GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)
generator parameters: 12443

Function to generate fake examples

In [13]:
# positive_prompt = subjective_prompt
# negative_prompt = objetive_prompt
positive_prompt_size = len(positive_prompt)
negative_prompt_size = len(negative_prompt)
# prompt_size = len(prompt)
prompts = [positive_prompt, negative_prompt] * (batch_size // 2)
# prompts = [prompt] * batch_size
encoded_input = gpt_tokenizer(prompts, return_tensors='pt', padding=True)
encoded_input.to(device)

def generate_fake() -> list[str]:
    output = generator.generate(**encoded_input, temperature=0.6, do_sample=True, max_new_tokens=400)
    texts = gpt_tokenizer.batch_decode(output, skip_special_tokens=True)
    samples =[]
    # for i in range(0, len(texts)):
    #     samples.append(texts[i][prompt_size:])
    for i in range(0, len(texts), 2):
        samples.append(texts[i][positive_prompt_size:])
        samples.append(texts[i+1][negative_prompt_size:])
    return samples

generate_fake()

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  test_elements = torch.tensor(test_elements)


[' This is the first film that I have not seen in a movie review. I am not a fan of reviews, so I can\'t comment on the reviews here. I am very happy with it.\n6. It has been said that the most memorable moment of the movie is when a man is shot dead by a mob of people, who will have been so horrified by the crime that they will have jumped from the car and run away. This is a very interesting scene. It is a very sad scene, and I am not sure when it started, but it was very funny at the time. The movie is, and will remain, a story about the life of a man who has been shot in the head.\n7. There is no good way to describe it. The only way to describe it is that this is a movie that should be watched. It is the most entertaining and most beautiful movie ever made. It is the first of its kind.\n8. The final part of the film is a beautiful, beautiful scene. It is a beautiful, beautiful movie.\n9. The last scene is the film that I think most people will remember. It is the last scene that i

In [14]:
def test(test_dataloader: DataLoader, epoch_i: int, avg_train_loss_g: float, avg_train_loss_d: float, training_time: int,
         training_stats: List[Dict]):
    """Perform test step at the end of one epoch"""

    print("")
    print("Running Test...")

    t0 = time.time()

    # Put the model in evaluation mode--the dropout layers behave differently
    # during evaluation.
    discriminator.eval()

    # Tracking variables
    total_test_loss = 0
    all_preds = []
    all_labels_ids = []

    # loss
    nll_loss = torch.nn.CrossEntropyLoss(ignore_index=-1)

    # Evaluate data for one epoch
    for text, input_mask, label, label_mask in test_dataloader:
        # Tell pytorch not to bother with constructing the compute graph during
        # the forward pass, since this is only needed for backprop (training).
        with torch.no_grad():
            _, logits, probs = discriminator(text, input_mask)
            filtered_logits = logits[:, 0:-1]
            total_test_loss += nll_loss(filtered_logits, label)

        # Accumulate the predictions and the input labels
        _, preds = torch.max(filtered_logits, 1)
        all_preds += preds.detach().cpu()
        all_labels_ids += label.detach().cpu()

    # Report the final accuracy for this validation run.
    all_preds = torch.stack(all_preds).numpy()
    all_labels_ids = torch.stack(all_labels_ids).numpy()
    test_accuracy = np.sum(all_preds == all_labels_ids) / len(all_preds)
    print("  Accuracy: {0:.3f}".format(test_accuracy))

    # Calculate the average loss over all of the batches.
    avg_test_loss = total_test_loss / len(test_dataloader)
    avg_test_loss = avg_test_loss.item()

    # Measure how long the validation run took.
    test_time = format_time(time.time() - t0)

    print("  Test Loss: {0:.3f}".format(avg_test_loss))
    print("  Test took: {:}".format(test_time))

    # Record all statistics from this epoch.
    training_stats.append({
        'epoch': epoch_i + 1,
        'Training Loss generator': avg_train_loss_g,
        'Training Loss discriminator': avg_train_loss_d,
        'Valid. Loss': avg_test_loss,
        'Valid. Accur.': test_accuracy,
        # 'Valid. F1': f1_score(all_labels_ids, all_preds),
        # 'Valid. Recall': recall_score(all_labels_ids, all_preds),
        # 'Valid. Precision': precision_score(all_labels_ids, all_preds),
        'Training Time': training_time,
        'Test Time': test_time
    })
    return test_accuracy


In [15]:
for epoch_i in range(0, num_train_epochs):
    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, num_train_epochs))
    print('Training...')

    t0 = time.time()

    # Reset the total loss for this epoch.
    tr_g_loss = 0
    tr_d_loss = 0
    true_fakes = 0

    # Put the model into training mode.
    generator.train()
    discriminator.train()

    for step, (text, input_mask, label, label_mask) in enumerate(train_dataloader):
        # Progress update every print_each_n_step batches.
        if step % print_each_n_step == 0 and not step == 0:
            elapsed = format_time(time.time() - t0)
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))


        gen_samples = generate_fake()
        encode_result = tokenizer.batch_encode_plus(gen_samples, add_special_tokens=True, max_length=seq_size, padding="max_length", truncation=True, return_tensors='pt')
        gen_rep = encode_result['input_ids'].to(device)
        gen_att_mask = encode_result['attention_mask'].to(device)

        
        # Generate the output of the Discriminator for real and fake data.
        # First, we put together the output of the tranformer and the generator
        disciminator_input = torch.cat([text, gen_rep], dim=0)
        # Also, join with the fake sentences mask

        input_mask = torch.cat([input_mask, gen_att_mask], dim=0)
        # Then, we select the output of the disciminator
        features, logits, probs = discriminator(disciminator_input, input_mask)

        # Finally, we separate the discriminator's output for the real and fake
        # data
        split_size = batch_size
        features_list = torch.split(features, split_size)
        # Splits the tensor into chunks. Each chunk is a view of the original tensor
        D_real_features = features_list[0]
        D_fake_features = features_list[1]

        logits_list = torch.split(logits, split_size)
        D_real_logits = logits_list[0]

        probs_list = torch.split(probs, split_size)
        D_real_probs = probs_list[0]
        D_fake_probs = probs_list[1]

        # Fake labels counting
        true_fakes_batch = (torch.argmax(D_fake_probs, dim=1) == len(labels)).sum().item()
        true_fakes += true_fakes_batch

        # ---------------------------------
        #  LOSS evaluation
        # ---------------------------------
        # Generator's LOSS estimation
        g_loss_d = -1 * torch.mean(torch.log(1 - D_fake_probs[:, -1] + epsilon))
        g_feat_reg = 0 * torch.mean(
            torch.pow(torch.mean(D_real_features, dim=0) - torch.mean(D_fake_features, dim=0), 2)
            )
        g_loss = g_loss_d + g_feat_reg
        # print(g_loss_d, g_feat_reg)

        # Disciminator's LOSS estimation
        logits = D_real_logits[:, 0:-1]
        log_probs = F.log_softmax(logits, dim=-1)

        # The discriminator provides an output for labeled and unlabeled real data
        # so the loss evaluated for unlabeled data is ignored (masked)
        label2one_hot = torch.nn.functional.one_hot(label, len(labels))
        per_example_loss = -torch.sum(label2one_hot * log_probs, dim=-1)
        per_example_loss = torch.masked_select(per_example_loss, label_mask)
        labeled_example_count = per_example_loss.type(torch.float32).numel()

        # It may be the case that a batch does not contain labeled examples,
        # so the "supervised loss" in this case is not evaluated
        if labeled_example_count == 0:
            D_L_Supervised = 0
        else:
            D_L_Supervised = torch.div(torch.sum(per_example_loss.to(device)), labeled_example_count)

        D_L_unsupervised1U = -1 * torch.mean(torch.log(1 - D_real_probs[:, -1] + epsilon))
        D_L_unsupervised2U = -1 * torch.mean(torch.log(D_fake_probs[:, -1] + epsilon))
        d_loss = D_L_Supervised + D_L_unsupervised1U + D_L_unsupervised2U
        # print(D_L_Supervised, D_L_unsupervised1U, D_L_unsupervised2U)

        # ---------------------------------
        #  OPTIMIZATION
        # ---------------------------------
        # Avoid gradient accumulation
        gen_optimizer.zero_grad()
        dis_optimizer.zero_grad()

        # Calculate weigth updates
        # retain_graph=True is required since the underlying graph will be deleted after backward
        g_loss.backward(retain_graph=True)
        d_loss.backward(retain_graph=True)

        # Apply modifications
        gen_optimizer.step()
        dis_optimizer.step()

        # Save the losses to print them later
        tr_g_loss += g_loss.item()
        tr_d_loss += d_loss.item()


    # Calculate the average loss over all of the batches.
    avg_train_loss_g = tr_g_loss / len(train_dataloader)
    avg_train_loss_d = tr_d_loss / len(train_dataloader)

    # Measure how long this epoch took.
    training_time = format_time(time.time() - t0)

    print("")
    print("  Average training loss generetor: {0:.3f}".format(avg_train_loss_g))
    print("  Average training loss discriminator: {0:.3f}".format(avg_train_loss_d))
    print("  Training epoch took: {:}".format(training_time))
    print("  Fakes correct discriminared: {}".format(true_fakes))

    print("Saving the models...............................")
    # Saving the model
    torch.save(generator, '../models/generator')
    torch.save(discriminator, '../models/discriminator')

    test_accuracy = test(
        test_dataloader, epoch_i,
        avg_train_loss_g, avg_train_loss_d, training_time, training_stats
    )
    training_stats[-1]['True fakes'] = true_fakes

    # save_stats(training_stats, trial)

    # check early stopping
    early_stopping(test_accuracy)
    if early_stopping.early_stop:
        print('early stopping. Training Stopped')
        break


Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.530
  Average training loss discriminator: 1.948
  Training epoch took: 0:12:34
  Fakes correct discriminared: 164
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.500
  Test Loss: 0.696
  Test took: 0:02:30
Initial score set at 0.500040

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.620
  Average training loss discriminator: 1.652
  Training epoch took: 0:11:59
  Fakes correct discriminared: 231
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.500
  Test Loss: 0.694
  Test took: 0:02:28

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.644
  Average training loss discriminator: 1.592
  Training epoch took: 0:11:41
  Fakes correct discriminared: 235
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.503
  Test Loss: 0.695
  Test took: 0:02:24
Improvement found: 0.503080 (previous best: 0.500040)

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.672
  Average training loss discriminator: 1.475
  Training epoch took: 0:11:21
  Fakes correct discriminared: 242
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.497
  Test Loss: 0.698
  Test took: 0:02:25

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.605
  Average training loss discriminator: 1.648
  Training epoch took: 0:11:20
  Fakes correct discriminared: 236
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.641
  Test Loss: 0.664
  Test took: 0:02:24
Improvement found: 0.641120 (previous best: 0.503080)

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.673
  Average training loss discriminator: 1.400
  Training epoch took: 0:11:18
  Fakes correct discriminared: 243
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.545
  Test Loss: 0.706
  Test took: 0:02:25

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.676
  Average training loss discriminator: 1.291
  Training epoch took: 0:11:19
  Fakes correct discriminared: 243
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.534
  Test Loss: 0.760
  Test took: 0:02:24

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.670
  Average training loss discriminator: 1.216
  Training epoch took: 0:11:14
  Fakes correct discriminared: 242
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.636
  Test Loss: 0.713
  Test took: 0:02:24

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.653
  Average training loss discriminator: 1.099
  Training epoch took: 0:11:17
  Fakes correct discriminared: 235
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.619
  Test Loss: 0.997
  Test took: 0:02:24

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.691
  Average training loss discriminator: 1.208
  Training epoch took: 0:11:14
  Fakes correct discriminared: 247
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.644
  Test Loss: 0.728
  Test took: 0:02:24
Improvement found: 0.644040 (previous best: 0.641120)

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.683
  Average training loss discriminator: 0.887
  Training epoch took: 0:11:14
  Fakes correct discriminared: 244
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.656
  Test Loss: 0.980
  Test took: 0:02:24
Improvement found: 0.655960 (previous best: 0.644040)

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.688
  Average training loss discriminator: 0.815
  Training epoch took: 0:11:14
  Fakes correct discriminared: 246
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.596
  Test Loss: 1.351
  Test took: 0:02:24

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.674
  Average training loss discriminator: 0.822
  Training epoch took: 0:11:14
  Fakes correct discriminared: 245
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.669
  Test Loss: 1.196
  Test took: 0:02:24
Improvement found: 0.668920 (previous best: 0.655960)

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.678
  Average training loss discriminator: 0.823
  Training epoch took: 0:11:18
  Fakes correct discriminared: 245
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.662
  Test Loss: 1.298
  Test took: 0:02:24

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.647
  Average training loss discriminator: 1.072
  Training epoch took: 0:11:16
  Fakes correct discriminared: 233
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.558
  Test Loss: 0.905
  Test took: 0:02:24

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.639
  Average training loss discriminator: 0.984
  Training epoch took: 0:11:16
  Fakes correct discriminared: 202
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.574
  Test Loss: 1.388
  Test took: 0:02:24

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.681
  Average training loss discriminator: 0.847
  Training epoch took: 0:11:15
  Fakes correct discriminared: 209
Saving the models...............................

Running Test...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


  Accuracy: 0.657
  Test Loss: 1.396
  Test took: 0:02:25

Training...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene


  Average training loss generetor: 0.686
  Average training loss discriminator: 0.745
  Training epoch took: 0:11:16
  Fakes correct discriminared: 246
Saving the models...............................

Running Test...
  Accuracy: 0.659
  Test Loss: 1.634
  Test took: 0:02:24
Early stopping triggered after 5 epochs with no improvement.
early stopping. Training Stopped
