# Part 1: Using GPT-2

## Testing with top-k (answer to 1st & a part of 2nd question)

In [None]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token_id = tokenizer.eos_token_id
model = GPT2LMHeadModel.from_pretrained('gpt2')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
for i in range(3):
    text = input("Prompt: > ")
    print("Prompt >", text)
    encoded_text = tokenizer(text, return_tensors='pt')
    encoded_text = encoded_text.to(device)
    response = model.generate(**encoded_text, max_new_tokens=200, do_sample=True, top_k=20)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    print(response_text)
    print("=======================================")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.
Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife. After he had made some changes, the Indians began to grow and multiply. The great land was given to us as "the country that gave birth to the man's body," and the Indians had to be removed from the lands to be placed in other lands. At last the Indians were able to take possession of the land and the land was taken, and we, as a whole, have a right to a presentation of this same soil.

We do not want to have our presentations to be a part of it, because, as we know, we have been in that territory many and many years, and the Indians have a lot more land, and they must give us a presentation before we can do that again. We have had no right, in any part of this country ever to have a part whatever in our presentation

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > To be or not to be: that is the question.
To be or not to be: that is the question.

But to me, this was not about this. This was about the fact — it was something that was part of that whole debate. And now here's the thing, if you're a feminist or a humanist and you've been doing this long enough, you'll realize that, I think, the question that you have to ask when you say "feminism" or "humanism" is whether or not that's what you really think of as feminism — and to me, feminism as a concept, as a concept as it relates to things that, I think, are more or less the same concept. And I have to say, this is one of those things that's a pretty simple and easy answer and that's the sort of question that makes me think in that way. But at the same time, that's just something that I've been doing and that's the sort of thing that I think is going to make feminists, and I will say this, if it's


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > In a galaxy far, far away,
In a galaxy far, far away, in a galaxy far, away, in a galaxy far, the world is a place where you can go to the right place in the right time and place of the wrong time. And in all of that time, if I'm on the wrong side, what can I do to change it?

That's where we're going. You're going from a time where you're sitting in a room with a television and you're talking and you're thinking about things and you're looking for the right thing that has to do with the season and the people you want to find. The season is already done. We're at a point where we're able to get things done. So there's no question about that.

But let's look at this season. What kind of season is there in the universe where there is this idea that there's an entire universe that's in a perfect place in which you can have a family without having your spouse be on the wrong side of something?


## Creating the model for easier testing:

In [None]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token_id = tokenizer.eos_token_id
model = GPT2LMHeadModel.from_pretrained('gpt2')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

## Prompts used for testing:

In [None]:
prompts = ["Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.", "In a galaxy far, far away,"]

## Testing greedy approach

In [None]:
for text in prompts:
    print("Prompt >", text)
    encoded_text = tokenizer(text, return_tensors='pt')
    encoded_text = encoded_text.to(device)
    response = model.generate(**encoded_text, max_new_tokens=200, do_sample=False)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    print(response_text)
    print("=======================================")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife. The two of them were very close, and they were very good friends.

The next day, the two of them went to the house of the farmer, and there they met the two of them, and they were very happy. They were very happy, and they were very happy.

The next day, the two of them went to the house of the farmer, and there they met the two of them, and they were very happy. They were very happy, and they were very happy.

The next day, the two of them went to the house of the farmer, and there they met the two of them, and they were very happy. They were very happy, and they were very happy.

The next day, the two of them went to the house of the farmer, and there they met the two of them, and they were very happy. They were very happy, and they were very happy.

The next day, the two of
Prompt > In a galaxy far, far away,
In a galaxy far, far away, the galaxy is a 

## Testing with Beam Search

In [None]:
for text in prompts:
    print("Prompt >", text)
    encoded_text = tokenizer(text, return_tensors='pt')
    encoded_text = encoded_text.to(device)
    response = model.generate(**encoded_text, max_new_tokens=200, num_beams=5, early_stopping=True)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    print(response_text)
    print("=======================================")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.

In the spring of 1848, the family moved to a small town on the western side of the Missouri River, near the Missouri River.

In the spring of 1848, the family moved to a small town on the western side of the Missouri River, near the Missouri River.

In the spring of 1848, the family moved to a small town on the western side of the Missouri River, near the Missouri River.

In the spring of 1848, the family moved to a small town on the western side of the Missouri River, near the Missouri River.

In the spring of 1848, the family moved to a small town on the western side of the Missouri River, near the Missouri River.

In the spring of 1848, the family moved to a small town on the western side of the Missouri River, near the Missouri River.

In the spring of 1848, the family moved to a small town on the western side
Prompt > In a galaxy far, far away,
In a

### Using n-grams with beam search -> better results

In [None]:
for text in prompts:
    print("Prompt >", text)
    encoded_text = tokenizer(text, return_tensors='pt')
    encoded_text = encoded_text.to(device)
    response = model.generate(**encoded_text, max_new_tokens=200, num_beams=5, no_repeat_ngram_size=2, early_stopping=True)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    print(response_text)
    print("=======================================")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.

In the spring of 1848, the family moved to a small town on the outskirts of Kansas City, Missouri, where they lived for a few years. In 1849, they moved back to their old home and moved into their new home, which is now the home of their great-great-grandmother, Mrs. Elizabeth Dorothy. They have lived there for many years, but have not been able to find a home for themselves, so they have been forced to move into a new house. The house was built in 1851 and has been in use since that time. It is a beautiful house with a large garden, a well-maintained yard, an open-air swimming pool and a fire pit. There are two bedrooms, one for the children and the other for their mother and father, as well as a living room, dining room and dining-room, all of which are in good condition. A large fireplace is located on one side of this house,
Prompt > In a galaxy far,

## Testing with top-p

In [None]:
for text in prompts:
    print("Prompt >", text)
    encoded_text = tokenizer(text, return_tensors='pt')
    encoded_text = encoded_text.to(device)
    response = model.generate(**encoded_text, max_new_tokens=200, do_sample=True, top_p=0.95)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    print(response_text)
    print("=======================================")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife. They married in 1479, to whom they have continued to live ever since. The two children were: George a son of Samuel, who died of heart disease in 1477; and Alice a daughter of Samuel, who died of disease in 1484. In consequence, they live well, have a common life, have a good home, and live in peace.

The eldest daughter, a daughter of Edward, died in 1492; the son is buried in the old Church of England in Leicester, which she lived in until the mid-1630s; his widow, Esther, died in 1650.

In 1744, Charles Dandridge became a member of the Continental Parliament. The Continental Parliament was elected in 1745. It was formed with the support of a majority in Congress; the president was John Adams, who was appointed by George Washington, a member of the Continental Parliament in 1776, and John Jay, a member of Congress until 1801. When Jay
Prompt > In a gala

## Testing with top-p and top-k

In [None]:
for text in prompts:
    print("Prompt >", text)
    encoded_text = tokenizer(text, return_tensors='pt')
    encoded_text = encoded_text.to(device)
    response = model.generate(**encoded_text, max_new_tokens=200, do_sample=True, top_p=0.95, top_k=20)
    response_text = tokenizer.decode(response[0], skip_special_tokens=True)
    print(response_text)
    print("=======================================")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dorothy lived in the midst of the great Kansas prairies, with Uncle Henry, who was a farmer, and Aunt Em, who was the farmer's wife. On the same day, she and her father took a boat to the town, where they were found by a man from the same company. The young woman and the young man who accompanied them were killed in the collision with a herd of cattle, and they were buried at the cemetery. They were buried in a large grave at Woburn, a little north of Wichita, and the remains of Mrs. Ortega are said to be there. They were buried in a large mound on a hillside near the town. There was an old brick church built in 1837. The church of Mrs. Ortega was found in the vicinity of the burial place of her mother. Another church in which the family is said to have lived is the Methodist Church, at Kuehlman, Kansas City. The oldest surviving church is located in the churchyard of the church where they lived. It is said that they were baptized at the stake and that they were buried near the spot wh

# Part 2: Model Fitting

## Training the model

In [None]:
import os
import time
import datetime
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPT2Config
from transformers import get_linear_schedule_with_warmup
from torch.optim import AdamW
torch.manual_seed(42)

class GPT2Dataset(Dataset):
    def __init__(self, txt_list, tokenizer, gpt2_type="gpt2", max_length=768):
        self.tokenizer = tokenizer
        self.input_ids = []
        self.attn_masks = []
        for txt in txt_list:
            encodings_dict = tokenizer('<|startoftext|>' + txt + '<|endoftext|>', truncation=True, max_length=max_length, padding="max_length")
            self.input_ids.append(torch.tensor(encodings_dict['input_ids']))
            self.attn_masks.append(torch.tensor(encodings_dict['attention_mask']))
    def __len__(self):
        return len(self.input_ids)
    def __getitem__(self, idx):
        return self.input_ids[idx], self.attn_masks[idx]

# If the file is not in the same directory, replace the following with the path
filename = 'star_wars.txt'
with open(filename) as file:
    starwars = [line.rstrip() for line in file]

nlines = 8
min_nlines=3
l = len(starwars)
starwars_seq = []
for i in range(l-nlines):
    range_end = min(l-min_nlines,i+nlines)
    interaction = '\n'.join(starwars[i:range_end])
    starwars_seq.append(interaction)

batch_size = 2

output_dir = './model_save/'
if os.path.exists(output_dir):
    tokenizer = GPT2Tokenizer.from_pretrained(output_dir)
    configuration = GPT2Config.from_pretrained(output_dir)
    model = GPT2LMHeadModel.from_pretrained(output_dir, config=configuration)
    print("Loaded pretrained model.")
else:
    # See https://huggingface.co/docs/transformers/main_classes/tokenizer
    tokenizer = GPT2Tokenizer.from_pretrained('gpt2', bos_token='<|startoftext|>', eos_token='<|endoftext|>', pad_token='<|pad|>')
    configuration = GPT2Config.from_pretrained('gpt2', output_hidden_states=False)
    model = GPT2LMHeadModel.from_pretrained("gpt2", config=configuration)

model.resize_token_embeddings(len(tokenizer))
dataset = GPT2Dataset(starwars_seq, tokenizer, max_length=768)
dataloader = DataLoader(dataset, sampler=RandomSampler(dataset), batch_size=batch_size)
device = torch.device("cuda")

epochs = 2
learning_rate = 5e-4
warmup_steps = 1e2
epsilon = 1e-8
sample_every = 100
optimizer = AdamW(model.parameters(), lr=learning_rate, eps=epsilon)
total_steps = len(dataloader) * epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=warmup_steps, num_training_steps=total_steps)
total_t0 = time.time()
training_stats = []
model = model.to(device)
def format_time(elapsed):
    return str(datetime.timedelta(seconds=int(round((elapsed)))))

# Train the model
model.train()
for epoch_i in range(0, epochs):
    print("")
    print(f'======== Epoch {epoch_i + 1} / {epochs} ========')
    print('Training...')
    t0 = time.time()
    total_train_loss = 0
    for step, batch in enumerate(dataloader):
        b_input_ids = batch[0].to(device)
        b_labels = batch[0].to(device)
        b_masks = batch[1].to(device)
        model.zero_grad()
        outputs = model(b_input_ids, labels=b_labels, attention_mask=b_masks, token_type_ids=None)
        loss = outputs[0]
        batch_loss = loss.item()
        total_train_loss += batch_loss
        # Get sample every x batches.
        if step % sample_every == 0 and not step == 0:
            elapsed = format_time(time.time() - t0)
            print(f' Batch {step:>5,} of ' + f'{len(dataloader):>5,}. Loss: {batch_loss:>5,}.' + f'Elapsed: {elapsed}.')
            model.eval()
            sample_outputs = model.generate(do_sample=True, top_k=50, max_length=768, top_p=0.95, num_return_sequences=1)
            for i, sample_output in enumerate(sample_outputs):
                sample_output_dec = tokenizer.decode(sample_output, skip_special_tokens=True)
                print(f"{i}: {sample_output_dec}")
            model.train()
        loss.backward()
        optimizer.step()
        scheduler.step()
    # Calculate the average loss over all of the batches.
    avg_train_loss = total_train_loss / len(dataloader)
    # Measure how long this epoch took.
    training_time = format_time(time.time() - t0)
    print("")
    print(f" Average training loss: {avg_train_loss:0.2f}")
    print(f" Training epoch took: {training_time}")

print("")
print("Training complete!")
timediff = format_time(time.time()-total_t0)
print(f"Total training took {timediff} (h:mm:ss)")
# Saving best-practices: if you use default names for the model, you can reload
# it using from_pretrained()
# Create output directory if needed
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
print(f"Saving model to {output_dir}")
# Save a trained model, configuration and tokenizer using `save_pretrained()`.
# They can then be reloaded using `from_pretrained()`
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loaded pretrained model.

Training...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   100 of 2,753. Loss: 0.04650292173027992.Elapsed: 0:00:47.
0: (very cool) Are you sure?
It has been rather a long time. Do you think he'll be around long?
No, I'd just be a little while longer.
What's he's doing? He's going to pull us all apart.
Go get him!
We've got to find him. Open the back door, isn't it?
I would much rather have gone with Master Luke. I'm sure it's safe for droids.
What's your doing?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   200 of 2,753. Loss: 0.11145934462547302.Elapsed: 0:01:37.
0: We ran into Count Dooku.
I'm so worried about you.
I'm not upset?
You worry about something?
I worry about you.
You worry about something.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   300 of 2,753. Loss: 0.17811453342437744.Elapsed: 0:02:26.
0: Then we will find out.
I promise.
Bongo du bongu!
Goodie!
Are you...
Let's see.
It's not bad.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   400 of 2,753. Loss: 0.0762481540441513.Elapsed: 0:03:15.
0: I heard a rumor they are going to banish all droids.
I have not seen one of these since I was prospecting on Subterrel beyond the Outer Rim!
Do you know where it all lies?
I would much rather avoid any conflict.
Master Jedi, may I suggest that the Senator be placed under the protection of your graces.
Do you know where it came from?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   500 of 2,753. Loss: 0.08602133393287659.Elapsed: 0:04:05.
0: Not fit? Why would anyone think that?
They say his mind has become fogged by the influence of a certain female Senator.
That's ridiculous. Who?!?
(slylylylylylylylylylylyly-looking) No! It's not me!
(looking down) No! Why are you here?
Oh, I'm here!


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   600 of 2,753. Loss: 0.03201024606823921.Elapsed: 0:04:56.
0: It couldn't happen here. You said it yourself. The Empire won't bother with this rock.
Things always change.
I wish I was going... Are you going to be around long?
No, I'm leaving in the morning...
Then I guess I won't see you.
Maybe someday... I'll keep a lookout.
Well, I'll be at the Academy next season... after that who knows.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   700 of 2,753. Loss: 0.05538511648774147.Elapsed: 0:05:47.
0: (nods, gasping) Ben... Ben.
Luke... Ben... Ben.
Hang on, kid.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   800 of 2,753. Loss: 0.09699788689613342.Elapsed: 0:06:37.
0: I don't think so.
Well, I suppose I shouldn't expect much resistance.
I'm not going to be a problem for you, my young apprentice.
You have been a good apprentice. You are much wiser than Iam.
All right, old Ben. How about you?
I don't know.
You're just not in time for dinner. I hope you have a great honor. I'm just a boy.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   900 of 2,753. Loss: 0.05032370984554291.Elapsed: 0:07:27.
0: Not this time. Something's not right because now I can't see. Wait. Wait! Oh, my! what have you done? I'm backwards, you stupid furball. Only an overgrown mophead like you would be stupid enough...
I feel terrible.
Why are they doing this?
They never even asked me any questions.
Lando.
Get help!


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,000 of 2,753. Loss: 0.06580036878585815.Elapsed: 0:08:18.
0: Not this time. There's too much at stake. We need help. Odd Ball, do you copy?
"This time I won't let these visions come true
I won't let these visions come true
Death is a natural part of life
 Rejoice for those around you who transform into the Force. Mourn them, do not. Miss them, do not. Attachment leads to jealousy and pride. The shadow of greed, that is.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,100 of 2,753. Loss: 0.03251463547348976.Elapsed: 0:09:09.
0: Hush! Not so loud! (arriving) With the blast shields down, I can't even see.
Move! Come on, Artoo! Quickly, Artoo.
Get in gear, Artoo. There's a lot of this.
(into comlink) Artoo, where are you? We're going to attack the ship right before you can.
(into comlink) Artoo, where are you? I'm going to cut across the axis and try and draw their fire.
(into comlink) Artoo, where are you? Oh, this is no time for heroics. Come on, Artoo. Oh!


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,200 of 2,753. Loss: 0.048692647367715836.Elapsed: 0:10:01.
0: (very little) No. They are doing their job so we can do ours. Head for the Command Ship!
Missiles! Pull up!
They overshot us...
They've coming around!
All right, Arfour. No, no, no. Nothing too fancy.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,300 of 2,753. Loss: 0.0441356785595417.Elapsed: 0:10:51.
0: No! Forget not, this has been a day long remembered. It has seen the end of Kenobi and it will soon see the end of the Rebellion.
(over loudspeaker) All flight trooper, man your stations.
So... you got your reward and you're just leaving then?
That's right, yeah! I got some old debts I've got to pay off with this stuff. Even if I didn't, you don't think I'd be fool enough to stick around here, do you? Why don't you come with us? You're pretty good in a fight. I could use you.
You could use you.
(getting angry) Come on! Why don't you take a look around? You know what's about to happen, what they're up against. They could use a little more time.
What is it?
Nothing.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,400 of 2,753. Loss: 0.07412653416395187.Elapsed: 0:11:43.
0: You turned her against me fair and square.
You have done well, Anakin. She is far too trusting.
I have advised you over the years, my little friend. Do you feel your power growing?
Yes, My Master.
Now, Lord Sidious. Do you have any idea who was behind it?
Our intelligence points to disgruntled spice miners, on the moons of Naboo. We also know that relations between the Council and the Chancellor are stressed.
I don't trust the Council... I can assure you they will do what it takes to share in it.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,500 of 2,753. Loss: 0.054016754031181335.Elapsed: 0:12:34.
0: No, Annie. You're safe. What?
I'm fine. You had a nightmare.
But you don't look so bad to me. In fact, you look strong enough to pull the ears off a Gundark.
Thanks to you.
That's two you owe me, junior.
Well your Worship, looks like you managed to keep me around for a little while longer.
(haughtily) I had nothing to do with it. General Rieekan thinks it's dangerous for any ships to leave the system until we've activated the energy shield.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,600 of 2,753. Loss: 0.025049585849046707.Elapsed: 0:13:26.
0: You're gravely mistaken. You won't convert me as you did my father.
Oh, no, my young Jedi. You will find that it is you who are mistaken...about a great many things.
His lightsaber.
Ah, yes, a Jedi's weapon. Much like your father's. By now you must know your father can never be turned from the dark side. So will it be with you.
You're wrong. Soon I'll be dead...and you with me.
Perhaps you refer to the imminent attack of your Rebel fleet.
Yes...I assure you we are quite safe from your friends here.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,700 of 2,753. Loss: 0.034302882850170135.Elapsed: 0:14:17.
0: This scheme of yours has failed, Lord Sidious. The blockade is finished. We dare not go against these Jedi.
Viceroy, is the planet secure?
Yes, my Lord, we have taken over the last pockets of primitive life forms. We have been without an interpreter since our master got angry with our last protocol droid and disintegrated him.
That's impossible, there's no one to talk to us, only Master.
How would I explain this?
You will learn from Yoda, the Jedi Master who instructed me. Your thoughts will become a Jedi must be grounded around you.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,800 of 2,753. Loss: 0.03992130234837532.Elapsed: 0:15:09.
0: I can't believe that.
I couldn't!
The ship is almost finished. Two or Three more things and we're in great shape.
The sooner the better. Something's wrong here. No one has seen or knows anything about Threepio. He's been gone too long to have gotten lost.
Relax. I'll talk to Lando and see what I can find out.
I don't trust him, either. But he is my friend. Besides, we'll soon be gone.
And then you're as good as gone, aren't you?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,900 of 2,753. Loss: 0.04904940351843834.Elapsed: 0:16:00.
0: Oh, Anakin, I'm afraid.
Have faith, my love. Everything here is magical.
You could look into the glass and see the water. The way it ripples and moves. It looked so real... but it wasn’t.
Sometimes, when you believe something to be real, it becomes real. Real enough, anyway...
I used to think if you looked too deeply into glass, you would lose yourself.
I think it's true...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,000 of 2,753. Loss: 0.03614240139722824.Elapsed: 0:16:51.
0: (laughing) I knew it! They were sent to force a settlement, eh.
Distract them. I will contact Lord Sidious.
Are you brain-dead? I'm not going in there with two Jedi.Send a droid in. I want every part of this ship checked!
Yes, sir.
Is there anything I might do to help?
Well, not unless you can alter time, speed up the harvest, or teleport me off this rock!
I don't think I can help. I've got to go back.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,100 of 2,753. Loss: 0.032931532710790634.Elapsed: 0:17:42.
0: It's too late!
(a whisper) Luke, help me take this mask off.
But you'll die.
Nothing can stop that now. Just for once... let me look on you with my own eyes.
(very weak) Now...go, my son. Leave me.
No. You're coming with me. I can't leave you here. I've got to save you.
You already have, Luke. You were right about me. Tell your sister...you were right.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,200 of 2,753. Loss: 0.029774853959679604.Elapsed: 0:18:33.
0: Master Qui-Gon? Is Anakin all right?
Anakin! Anakin! There he is. He's still alive. Get a medical capsule, immediately.
Yes sir. Right away.
Failed to stop the Sith Lord, I have. Still much to learn, there is...
(V.O.) Patience. You will have time. I did not. When I became one with the Force I made a great discovery. With my training, you will be able to merge with the Force at will. Your physical self will fade away, but you will still retain your consciousness. You will become more powerful than any Sith.
Eternal consciousness.
(V.O.) You will become more powerful than any Sith.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,300 of 2,753. Loss: 0.028805602341890335.Elapsed: 0:19:25.
0: (in Huttese subtitled) I told you not to admit him.
I must be allowed to speak.
(in Huttese subtitled) You must be allowed to speak.
(in Huttese subtitled) You hear me, baby?
You will not let me down.
(continuing) I'm not allowed to speak.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,400 of 2,753. Loss: 0.024500321596860886.Elapsed: 0:20:16.
0: Master.
...Luke Skywalker, Jedi Knight.
(in Huttese subtitled) I told you not to admit him.
I must go, Master.
Then you must go to the Sanctuary Moon and wait for him.
(in Huttese subtitled) He must come in?
We have powerful friends. You're gonna regret this...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,500 of 2,753. Loss: 0.02461281418800354.Elapsed: 0:21:06.
0: You may be on the Council, but... they refused to accept me as a Jedi Master.
Patience. In time, they will recognize your skills.
They still treat me as if I were a Padawan learner... they fear my power, that's the problem.
Anakin...
Sometimes, when you believe something to be real, it becomes real. Real enough, anyway...
I used to bull's- eye womp rats in my T-sixteen back home. They're not much bigger than two meters.
Man your ships! And may the Force be with you!


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,600 of 2,753. Loss: 0.029153993353247643.Elapsed: 0:21:58.
0: Hey, ol' buddy!
Hey, Dex.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,700 of 2,753. Loss: 0.029739856719970703.Elapsed: 0:22:48.
0: The walls are moving! Don't just stand there. Try to brace it with something.
Wait a minute!
Threepio! Come in, Threepio! Threepio! Threepio!
Get to the top!
I can't
Where could he be? Threepio! Threepio!
Get him off of there!

 Average training loss: 0.06
 Training epoch took: 0:23:15

Training...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   100 of 2,753. Loss: 0.015602454543113708.Elapsed: 0:00:50.
0: I don't think they'll melt us down.
Don't shoot! Don't shoot! Will this never end?
Luke, tell Owen that if he gets a translator to be sure it speaks Bocce.
It looks like we don't have much of a choice, but I'll remind him.
I have no need for a protocol droid.
(quickly) Sir -- not in an environment such as this -- that's why I've also been programmed for over thirty secondary functions that...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   200 of 2,753. Loss: 0.026772694662213326.Elapsed: 0:01:41.
0: The attempt on my life has left me scarred and deformed, but I assure you my resolve has never been stronger.
The war is over. (applause) The Separatists have been defeated, (applause) and the Jedi rebellion has been foiled. We stand on the threshold of a new beginning.
Well, this is the moment we discover if he intends to return the Republic to a democracy.
In order to ensure our security and continuing stability, the Republic will be reorganized into the first Galactic Empire, for a safe and secure society which I assure you will last for ten thousand years.
(continuing) An empire that will continue to be ruled by this august body, and a sovereign ruler chosen for life...
(continuing) An empire ruled by the majority... Ruled by a new constitution...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   300 of 2,753. Loss: 0.019943032413721085.Elapsed: 0:02:33.
0: Why not? They seem to have a box of old coverings here.
Oh? How observant of you, Miss Padme. Of course, I'm just not mechanically minded... if you see what I mean.
Let's see, if we put this... here...
Ooooh! That's tickles.
You'll have to be quiet, THREEPIO. Hold still, please.
Mom... Mom... Mom... Mom...
Annie...? Is it you?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   400 of 2,753. Loss: 0.018832625821232796.Elapsed: 0:03:24.
0: They're sealing this section off.
Six droids coming our way!


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   500 of 2,753. Loss: 0.03156033530831337.Elapsed: 0:04:14.
0: I'm sorry, My Master.
Remember, the war is a diversion. The Gungans will not easily be swayed, and we cannot use our power to help her.
I'm I'm sorry, Viceroy Your trade boycott of our planet has ended.
I was never aware of such a failure.
I have word that the chancellor's ambassadors are with you now and that you have been commanded to reach settlement.
I know nothing of any ambassadors. You must be mistaken.
Beware, Viceroy the Federation is going too far this time.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   600 of 2,753. Loss: 0.02156873792409897.Elapsed: 0:05:05.
0: What have we been up to?
Our best troops have reached the swamp planet of Dantooine. They found the remains of a Rebel base, but they estimate that it has been deserted for some time. They are now conducting an extensive search of the surrounding systems.
She lied! She lied to us!
I told you she would never consciously betray the Rebellion.
Terminate her... immediately!
Stand by, Chewie, here we go. Cut in the sublight engines.
What the...? Aw, we've come out of hyperspace into a meteor shower. Some kind of asteroid collision. It's not on any of the charts.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   700 of 2,753. Loss: 0.025849102064967155.Elapsed: 0:05:57.
0: There you go.
Well, wait a minute. Where'd she go? Bring her back! Play back the entire message.
What message? The one you're carrying inside your rusty innards!
Luke? Luke! Come to dinner!
All right, I'll be right there, Aunt Beru.
I'm sorry, sir, but he appears to have picked up a slight flutter.
Well, see what you can do with him. I'll be right back.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   800 of 2,753. Loss: 0.022461092099547386.Elapsed: 0:06:48.
0: The hull is burning up!
Time to abandon ship.
All the escape pods have been launched.
Grievous. Can you fly a cruiser like this?
You mean, do I know how to land what's left of this thing?
Well?
Under the circumstances, I'd say the ability to pilot this thing is irrelevant. Strap yourselves in.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch   900 of 2,753. Loss: 0.021298952400684357.Elapsed: 0:07:39.
0: (continuing) Don't worry, this guy's gonna kill himself any minute now!
What are you doing? He's gonna blast me!
Right - this isn't working.
That was too close!
Clear that!
What??
Why didn't the Council have to go after us?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,000 of 2,753. Loss: 0.01695195585489273.Elapsed: 0:08:30.
0: That's impossible, even for a computer.
It's not impossible. I used to bull's- eye womp rats in my T-sixteen back home. They're not much bigger than two meters.
Man your ships! And may the Force be with you!
Orbiting the planet at maximum velocity. The moon with the Rebel base will be in range in thirty minutes.
This will be a day long remembered. It has seen the end of Kenobi and it will soon see the end of the Rebellion.
(over loudspeaker) All flight trooper, man your stations. All flight troops, man your stations.
So... you got your reward and you're just leaving then?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,100 of 2,753. Loss: 0.020138956606388092.Elapsed: 0:09:22.
0: It's Luke! Chewie!
Luke, Luke. Come on!
Blast it! Wedge where are you?
Thanks, Wedge.
(over speaker) Good shooting, Wedge!
(over speaker) Red Leader...
This is Gold Leader. We're starting out attack run.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,200 of 2,753. Loss: 0.018539827316999435.Elapsed: 0:10:12.
0: We are not going to exceed our mandate, my young Padawan learner.
I meant in the interest of protecting her, Master, of course.
We are not going through this exercise again, Anakin. You will pay attention to my lead.
Why?
What??!!
Why else do you think we were assigned to her, if not to find the killer? Protection is a job for local security... not Jedi. It's overkill, Master. Investigation is implied in our mandate.
We will do as the Council has instructed, and you will learn your place, young one.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,300 of 2,753. Loss: 0.023060251027345657.Elapsed: 0:11:04.
0: It's not my fault.
Sir, we just lost the main rear deflector shield. One more direct hit on the back quarter and we're done for.
Turn her around.
I said turn her around! I'm going to put all power in the front shield.
You're going to attack them?!
Sir, the odds of surviving a direct assault on an Imperial Star Destroyer...
Shut up!
They're moving to attack position! Shields up!


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,400 of 2,753. Loss: 0.023187140002846718.Elapsed: 0:11:55.
0: Why bother? As a practical matter, the Senate no longer exists.
The constitution is in shreds. Amendment after amendment... executive directives, sometimes a dozen in one day.
We can't let a thousand years of democracy disappear without a fight.
What are you suggesting?
I apologize. I didn't mean to sound like a Separatist.
We are not Separatists trying to leave the Republic. We are loyalists, trying to preserve democracy in the Republic.
It has become increasingly clear to many of us that the Chancellor has become an enemy of democracy.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,500 of 2,753. Loss: 0.013721153140068054.Elapsed: 0:12:46.
0: Don't make me kill you.
Anakin, my allegiance is to the Republic... to democracy.
If you're not with me, you're my enemy.
Only a Sith Lord deals in absolutes. I will do what I must.
You will try.
I hear a new apprentice, you have. Emperor, or should I call you Darth Sidious.
Master Yoda, you survived.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,600 of 2,753. Loss: 0.015580800361931324.Elapsed: 0:13:37.
0: We have a plan which should immobilize the Droid Army. We will send what pilots we have to knock out the Droid control ship which is orbiting the planet. If we can get past their rayshields, we can sever communication and their droids will be helpless.
A well-conceived plan. However, there's great risk. The weapons on your fighters may not penetrate the shields on the control ship.
And there's an even bigger danger. If the Vicroy escapes, Your Highness, he will return with another droid army.
That is why we must not fail to get to the Viceroy. Everything depends on it.
beeps
she is more foolish than I thought.
We are sending all available troops to meet this army of hers assembling near the swamp. It appears to be made up of primitives. We do not expect much resistance.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,700 of 2,753. Loss: 0.018080538138747215.Elapsed: 0:14:29.
0: And those control the pitch?
You catch on pretty quick.
The moment we land the Federation will arrest you, and force you to sign the treaty.
I agree I'm not sure what you hope to accomplish by this.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,800 of 2,753. Loss: 0.022563977167010307.Elapsed: 0:15:19.
0: It is so wonderful, Annie. You have brought hope to those who have none. I'm so very proud of you
Ah, gee enough of this
You! You swindled me! You knew the boy was going to win! Somehow you knew it! I lost everything.
Whenever you gamble, my friend, eventually you'll lose. Bring the parts to the main hanger. I'll come by your shop later so you can release the boy.
You can't have him! It wasn't a fair bet!
Would you like to discuss it with the Hutts I'm sure they can settle this.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 1,900 of 2,753. Loss: 0.015994051471352577.Elapsed: 0:16:11.
0: They are here! Something's happening... I'm not the Jedi I should be. I am one of the most powerful Jedi, but I'm not satisfied... I want more, and I know I shouldn't.
You expect too much of yourself.
I have found a way to save you.
Save me?
From my nightmares.
Is that what's bothering you?
I won't lose you, Padme.
I'm not going to die in childbirth, Annie. I promise you.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,000 of 2,753. Loss: 0.017246760427951813.Elapsed: 0:17:02.
0: I'm here, Mom. I'm looking for Shmi Skywalker.
Annie?? Little Annie?? Naaaah!!
"(continuing
(continuing) You sure sprouted Weehoo! A Jedi! Waddya know? Hey, maybe you couldda help wit some daedbeats who owe...
My mother...
Oh, yeah. Shmi... she's not mine no more. I sold her.
Sold her...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,100 of 2,753. Loss: 0.02063031867146492.Elapsed: 0:17:53.
0: I can't do it, Mom. I just can't.
Annie
Will I ever see you again?
What does your heart tell you?
I hope so yes I guess.
Then we will see each other again.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,200 of 2,753. Loss: 0.015308852307498455.Elapsed: 0:18:44.
0: Well, we'd better go inside.
Master Lars - Master Owen! Somebody to see you!
I'm Anakin Skywalker. I'm here looking for my mother.
Owen Lars... I guess I'm your step-brother. (they shake hands) This is my girlfriend, Beru.
Hello.
I'm Padme.
I had a feeling you might show up some day.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,300 of 2,753. Loss: 0.021161263808608055.Elapsed: 0:19:35.
0: Well, it's only a few guards. This shouldn't be too much trouble.
Well, it only takes one to sound the alarm.
(with self-confident grin) Then we'll do it real quiet-like.
Oh! Oh, my. Uh, Princess Leia!
Quiet.
I'm afraid our furry companion has gone and done something rather rash.
Oh, no.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,400 of 2,753. Loss: 0.015514140017330647.Elapsed: 0:20:26.
0: Anakin, there's no time. We must get off the ship before it's too late.
He seems to be all right. No broken bones, breathing's all right.
Leave him, or we'll never make it.
His fate will be the same as ours.
Prepare for attack.
All batteries fire! Fire!
The elevator's not working, (into his comlink) ARTOO...


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,500 of 2,753. Loss: 0.01591670885682106.Elapsed: 0:21:17.
0: (quickly) Sir -- not in an environment such as this -- that's why I've also been programmed for over thirty secondary functions that...
What I really need is a droid that understands the binary language of moisture vaporators.
Vaporators! Sir -- My first job was programming binary load lifter... very similar to your vaporators. You could say...
Do you speak Bocce?
Of course I can, sir. It's like a second language for me... I'm as fluent in Bocce...
All right shut up! (turning to Jawa) I'll take this one.
Shutting up, sir.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,600 of 2,753. Loss: 0.016239209100604057.Elapsed: 0:22:09.
0: (over comlink) Droid of some kind. I didn't hit it that hard. It must have had a self-destruct.
(into comlink) An Imperial probe droid.
(over comlink) It's a good bet the Empire knows we're here.
We'd better start the evacuation.
Admiral.
Yes, Captain
I think we've got something, sir. The report is only a fragment from a probe droid in the Hoth system, but it's the best lead we've had.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


 Batch 2,700 of 2,753. Loss: 0.018260296434164047.Elapsed: 0:23:00.
0: (over comlink) Ahh, good luck.
(into comlink) I see you shortly, Rogue Two. We're going to regroup with the others.
(into comlink) Just hang on, Dack. We're going to get a lot of firing earlier.
(into comlink) Take care, you two. May the Force be with you.
Ow!
Command station, this is ST 321. Code Clearance Blue. We're starting our approach. Deactivate the security shield.
The security deflector shield will be deactivated when we have confirmation of your code transmission. Stand by... You are clear to proceed.

 Average training loss: 0.02
 Training epoch took: 0:23:28

Training complete!
Total training took 0:46:42 (h:mm:ss)
Saving model to ./model_save/


('./model_save/tokenizer_config.json',
 './model_save/special_tokens_map.json',
 './model_save/vocab.json',
 './model_save/merges.txt',
 './model_save/added_tokens.json')

## Testing the model (question 1)

In [14]:
text = "To be or not to be: that is the question."
print("Prompt >", text)
encoded_text = tokenizer(text, return_tensors='pt')
encoded_text = encoded_text.to(device)
response = model.generate(**encoded_text, max_new_tokens=200, do_sample=True, top_p=0.95, top_k=20)
response_text = tokenizer.decode(response[0], skip_special_tokens=True)
print(response_text)
print("=======================================")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > To be or not to be: that is the question.
To be or not to be: that is the question. What good is a question of precedure if you start down the south passage and speak of the Rebellion?
I know the precedure of the law. I feel confident we can overcome it
I must speak with the Jei Council immediately, Your Honor. The situation has become more complicated.
Ani, come on.
Da queen's a bein grossly nice, mesa tinks. Pitty hot.
the Republic is not what it once was. The Senate is full of greedy, squabbling delegates who are only looking out for themselves and their home sytems. There is no interest in the common good no civility, only politics its disgusting. I must be frank, Your Majesty, there is little chance the Senate will act on the invasion.
Chancellor Valorum seems to think there is hope.
If I may say so, Your Majesty, the Chancellor has little real power he is mired down by baseless accusations of corruption. A manufactured scandal surrounds him


## Loading the model

In [29]:
import os
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPT2Config

output_dir = './model_save/'
if os.path.exists(output_dir):
    tokenizer = GPT2Tokenizer.from_pretrained(output_dir)
    configuration = GPT2Config.from_pretrained(output_dir)
    model = GPT2LMHeadModel.from_pretrained(output_dir, config=configuration)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50259, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50259, bias=False)
)

## Testing the model (/w star wars lines - question 2)

In [14]:
sw_prompts = ['the force', 'the dark side']
for text in sw_prompts:
    print("Prompt >", text)
    encoded_text = tokenizer(text, return_tensors='pt')
    encoded_text = encoded_text.to(device)
    for i in range(2):
        response = model.generate(**encoded_text, max_new_tokens=200, do_sample=True, top_k=20, top_p=0.95)
        response_text = tokenizer.decode(response[0], skip_special_tokens=True)
        print(response_text)
        print("=======================================")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt > the force


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


the force is with us, my Master.
Welcome home, Lord Tyranus. You have done well.
I bring you good news, my Lord. The war has begun.
Excellent. (smiling) Everything is going as planned.
Where is your apprentice?
On his way back to Naboo. He is escorting Senator Amidala home.
(continuing) I must admit without the clones, it would not have been a victory.
Victory? Victory, you say?


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


the force grows dark, Anakin, and we are all affected by it. Be wary of your feelings.
Anakin, this afternoon the Senate is going to call on me to take direct control of the Jedi Council.
The Jedi will no longer report to the Senate?
They will report to me... personally. The Senate is too unfocused to conduct a war. This will bring a quick end to things.
I agree, but the Jedi Council may not see it that way.
There are times when we must all endure adjustments to the constitution in the name of security.
With all due respect, sir, the Council is in no mood for more constitutional amendments.
Thank you, my friend, but in this case I have no choice... this war must be won.
Prompt > the dark side


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


the dark side of the Force is a pathway to many abilities some consider to be unnatural.
What happened to him?
He became so powerful... the only thing he was afraid of was losing his power, which eventually, of course, he did. Unfortunately, he taught his apprentice everything he knew, then his apprentice killed him in his sleep. (smiles) Plagueis never saw it coming. It's ironic he could save others from death, but not himself.
Is it possible to learn this power?
Not from a Jedi.
(holo) Palpatine thinks General Grievous is on Utapau. We have had no reports of this from our agents.
(holo) How could the Chancellor have come by this information and we know nothing about it? We have had contact with Baron Papanoida and he said no one was there.
the dark side of the Force is a pathway to many abilities some consider to be unnatural.
What happened to him?
He became so powerful... the only thing he was afraid of was losing his power, which eventually, of course, he did. Unfortunately, he tau

## Counting model parameters

In [31]:
# Get all of the model's parameters as a list of tuples.
params = list(model.named_parameters())

print('The GPT-2 model has {:} different named parameters.\n'.format(len(params)))
print('==== Embedding Layer ====\n')
for p in params[0:2]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
print('\n==== First Transformer ====\n')
for p in params[2:14]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))
print('\n==== Output Layer ====\n')
for p in params[-2:]:
    print("{:<55} {:>12}".format(p[0], str(tuple(p[1].size()))))



12
The GPT-2 model has 148 different named parameters.

==== Embedding Layer ====

transformer.wte.weight                                  (50259, 768)
transformer.wpe.weight                                   (1024, 768)

==== First Transformer ====

transformer.h.0.ln_1.weight                                   (768,)
transformer.h.0.ln_1.bias                                     (768,)
transformer.h.0.attn.c_attn.weight                       (768, 2304)
transformer.h.0.attn.c_attn.bias                             (2304,)
transformer.h.0.attn.c_proj.weight                        (768, 768)
transformer.h.0.attn.c_proj.bias                              (768,)
transformer.h.0.ln_2.weight                                   (768,)
transformer.h.0.ln_2.bias                                     (768,)
transformer.h.0.mlp.c_fc.weight                          (768, 3072)
transformer.h.0.mlp.c_fc.bias                                (3072,)
transformer.h.0.mlp.c_proj.weight                        (3

# Bonus

In [28]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import math
import copy

class MultiHeadAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super(MultiHeadAttention, self).__init__()
        # Ensure that the model dimension (d_model) is divisible by the number of heads
        assert d_model % num_heads == 0, "d_model must be divisible by num_heads"
        
        # Initialize dimensions
        self.d_model = d_model # Model's dimension
        self.num_heads = num_heads # Number of attention heads
        self.d_k = d_model // num_heads # Dimension of each head's key, query, and value
        
        # Linear layers for transforming inputs
        self.W_q = nn.Linear(d_model, d_model) # Query transformation
        self.W_k = nn.Linear(d_model, d_model) # Key transformation
        self.W_v = nn.Linear(d_model, d_model) # Value transformation
        self.W_o = nn.Linear(d_model, d_model) # Output transformation
        
    def scaled_dot_product_attention(self, Q, K, V, mask=None):
        # Calculate attention scores
        attn_scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(self.d_k)
        
        # Apply mask if provided (useful for preventing attention to certain parts like padding)
        if mask is not None:
            attn_scores = attn_scores.masked_fill(mask == 0, -1e9)
        
        # Softmax is applied to obtain attention probabilities
        attn_probs = torch.softmax(attn_scores, dim=-1)
        
        # Multiply by values to obtain the final output
        output = torch.matmul(attn_probs, V)
        return output
        
    def split_heads(self, x):
        # Reshape the input to have num_heads for multi-head attention
        batch_size, seq_length, d_model = x.size()
        return x.view(batch_size, seq_length, self.num_heads, self.d_k).transpose(1, 2)
        
    def combine_heads(self, x):
        # Combine the multiple heads back to original shape
        batch_size, _, seq_length, d_k = x.size()
        return x.transpose(1, 2).contiguous().view(batch_size, seq_length, self.d_model)
        
    def forward(self, Q, K, V, mask=None):
        # Apply linear transformations and split heads
        Q = self.split_heads(self.W_q(Q))
        K = self.split_heads(self.W_k(K))
        V = self.split_heads(self.W_v(V))
        
        # Perform scaled dot-product attention
        attn_output = self.scaled_dot_product_attention(Q, K, V, mask)
        
        # Combine heads and apply output transformation
        output = self.W_o(self.combine_heads(attn_output))
        return output

class PositionWiseFeedForward(nn.Module):
    def __init__(self, d_model, d_ff):
        super(PositionWiseFeedForward, self).__init__()
        self.fc1 = nn.Linear(d_model, d_ff)
        self.fc2 = nn.Linear(d_ff, d_model)
        self.relu = nn.ReLU()

    def forward(self, x):
        return self.fc2(self.relu(self.fc1(x)))

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_seq_length):
        super(PositionalEncoding, self).__init__()
        
        pe = torch.zeros(max_seq_length, d_model)
        position = torch.arange(0, max_seq_length, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model))
        
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        
        self.register_buffer('pe', pe.unsqueeze(0))
        
    def forward(self, x):
        return x + self.pe[:, :x.size(1)]


class EncoderLayer(nn.Module):
    def __init__(self, d_model, num_heads, d_ff, dropout):
        super(EncoderLayer, self).__init__()
        self.self_attn = MultiHeadAttention(d_model, num_heads)
        self.feed_forward = PositionWiseFeedForward(d_model, d_ff)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x, mask):
        attn_output = self.self_attn(x, x, x, mask)
        x = self.norm1(x + self.dropout(attn_output))
        ff_output = self.feed_forward(x)
        x = self.norm2(x + self.dropout(ff_output))
        return x


class DecoderLayer(nn.Module):
    def __init__(self, d_model, num_heads, d_ff, dropout):
        super(DecoderLayer, self).__init__()
        self.self_attn = MultiHeadAttention(d_model, num_heads)
        self.cross_attn = MultiHeadAttention(d_model, num_heads)
        self.feed_forward = PositionWiseFeedForward(d_model, d_ff)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.norm3 = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x, enc_output, src_mask, tgt_mask):
        attn_output = self.self_attn(x, x, x, tgt_mask)
        x = self.norm1(x + self.dropout(attn_output))
        attn_output = self.cross_attn(x, enc_output, enc_output, src_mask)
        x = self.norm2(x + self.dropout(attn_output))
        ff_output = self.feed_forward(x)
        x = self.norm3(x + self.dropout(ff_output))
        return x

class Transformer(nn.Module):
    def __init__(self, src_vocab_size, tgt_vocab_size, d_model, num_heads, num_layers, d_ff, max_seq_length, dropout):
        super(Transformer, self).__init__()
        self.encoder_embedding = nn.Embedding(src_vocab_size, d_model)
        self.decoder_embedding = nn.Embedding(tgt_vocab_size, d_model)
        self.positional_encoding = PositionalEncoding(d_model, max_seq_length)

        self.encoder_layers = nn.ModuleList([EncoderLayer(d_model, num_heads, d_ff, dropout) for _ in range(num_layers)])
        self.decoder_layers = nn.ModuleList([DecoderLayer(d_model, num_heads, d_ff, dropout) for _ in range(num_layers)])

        self.fc = nn.Linear(d_model, tgt_vocab_size)
        self.dropout = nn.Dropout(dropout)

    def generate_mask(self, src, tgt):
        src_mask = (src != 0).unsqueeze(1).unsqueeze(2)
        tgt_mask = (tgt != 0).unsqueeze(1).unsqueeze(3)
        seq_length = tgt.size(1)
        nopeak_mask = (1 - torch.triu(torch.ones(1, seq_length, seq_length), diagonal=1)).bool()
        tgt_mask = tgt_mask & nopeak_mask
        return src_mask, tgt_mask

    def forward(self, src, tgt):
        src_mask, tgt_mask = self.generate_mask(src, tgt)
        src_embedded = self.dropout(self.positional_encoding(self.encoder_embedding(src)))
        tgt_embedded = self.dropout(self.positional_encoding(self.decoder_embedding(tgt)))

        enc_output = src_embedded
        for enc_layer in self.encoder_layers:
            # Does not apply mask
            enc_output = enc_layer(enc_output, None)

        dec_output = tgt_embedded
        for dec_layer in self.decoder_layers:
            # Does not apply encoder mask
            dec_output = dec_layer(dec_output, enc_output, None, tgt_mask)

        output = self.fc(dec_output)
        return output

src_vocab_size = 5000
tgt_vocab_size = 5000
d_model = 512
num_heads = 8
num_layers = 6
d_ff = 2048
max_seq_length = 100
dropout = 0.1

# Generate random sample data
src_data = torch.randint(1, src_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)
tgt_data = torch.randint(1, tgt_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)
transformer = Transformer(src_vocab_size, tgt_vocab_size, d_model, num_heads, num_layers, d_ff, max_seq_length, dropout)
criterion = nn.CrossEntropyLoss(ignore_index=0)
optimizer = optim.Adam(transformer.parameters(), lr=0.0001, betas=(0.9, 0.98), eps=1e-9)

transformer.train()

for epoch in range(5):
    optimizer.zero_grad()
    output = transformer(src_data, tgt_data[:, :-1])
    # print(output)
    loss = criterion(output.contiguous().view(-1, tgt_vocab_size), tgt_data[:, 1:].contiguous().view(-1))
    loss.backward()
    optimizer.step()
    print(f"Epoch: {epoch+1}, Loss: {loss.item()}")

transformer.eval()

# Generate random sample validation data
val_src_data = torch.randint(1, src_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)
val_tgt_data = torch.randint(1, tgt_vocab_size, (64, max_seq_length))  # (batch_size, seq_length)

with torch.no_grad():

    val_output = transformer(val_src_data, val_tgt_data[:, :-1])
    val_loss = criterion(val_output.contiguous().view(-1, tgt_vocab_size), val_tgt_data[:, 1:].contiguous().view(-1))
    print(f"Validation Loss: {val_loss.item()}")



Epoch: 1, Loss: 8.688618659973145
Epoch: 2, Loss: 8.552766799926758
Epoch: 3, Loss: 8.483601570129395
Epoch: 4, Loss: 8.427410125732422
Epoch: 5, Loss: 8.37089729309082
Validation Loss: 8.656967163085938
