# CS394 Module 1

### Load a pre-trained GPT-2 model

In [5]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-medium")
model = GPT2LMHeadModel.from_pretrained("gpt2-medium")

# Set pad token
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [14]:
import random
import numpy as np
import torch

# Function to set seeds for reproducibility
def set_seed(seed_value=42):
    random.seed(seed_value)
    
    np.random.seed(seed_value)
    
    torch.manual_seed(seed_value)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed_value)

In [15]:
def autocomplete(prompt, max_length=50, temperature=0.7, top_k=50, top_p=0.9, do_sample=True):
    # Encode the prompt with attention mask
    inputs = tokenizer(prompt, return_tensors="pt")

    if not do_sample:
        temperature = None
        top_k = None
        top_p = None
    
    # Generate continuation
    with torch.no_grad():
        output = model.generate(
            inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=max_length,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            do_sample=do_sample,
            pad_token_id=tokenizer.eos_token_id
        )
    
    # Decode and return the generated text
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text

In [16]:
set_seed(42)

### Create 3 different story starters in different genres/styles.

In [24]:
prompts = [
    "When you leave because you are weary of me, I shall let you go gently, without a word.",
    "The train passed through the endless tunnel, and when it finally saw the light, it was the snow country.",
    "In an era where the need for machines to become human had vanished, a man was standing on a cliff."
]

for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100)
    print(f"[Output]")
    print(completion)
    print("-" * 50)

--------------------------------------------------

Prompt: When you leave because you are weary of me, I shall let you go gently, without a word. 

[Output]
When you leave because you are weary of me, I shall let you go gently, without a word. And you shall not return to me till you have taken your rest. I will make you a covenant with your heart. And I will give you my heart. And I will give you my heart. And you shall be mine and my soul. And I will give you my heart. And I will give you my heart. And I will give you my heart. And I will give you my
--------------------------------------------------
--------------------------------------------------

Prompt: The train passed through the endless tunnel, and when it finally saw the light, it was the snow country. 

[Output]
The train passed through the endless tunnel, and when it finally saw the light, it was the snow country.

The train reached the end of the tunnel and the snow was falling heavily, and the train began to slow down.


### Then adjust for

#### Greedy decoding vs. sampling

In [25]:
print("Greedy Decoding Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, do_sample=False)
    print(f"[Output]")
    print(completion)
    print("-" * 50)
print("=" * 50)
print("-" * 50)
print("Sampling Decoding Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, do_sample=True)
    print(f"[Output]")
    print(completion)
    print("-" * 50)    

Greedy Decoding Results:
--------------------------------------------------

Prompt: When you leave because you are weary of me, I shall let you go gently, without a word. 

[Output]
When you leave because you are weary of me, I shall let you go gently, without a word. But if you leave because you are weary of me, I shall let you go with a shout, without a word. But if you leave because you are weary of me, I shall let you go with a shout, without a word. But if you leave because you are weary of me, I shall let you go with a shout, without a word. But if you leave because you are
--------------------------------------------------
--------------------------------------------------

Prompt: The train passed through the endless tunnel, and when it finally saw the light, it was the snow country. 

[Output]
The train passed through the endless tunnel, and when it finally saw the light, it was the snow country.

"I'm going to go home now."

"I'm going to go home now."

"I'm going to go home

It looks like our friend heading to the snow country really wants to go home.

In [26]:
set_seed(42*42) # Change seed for different sampling results

In [27]:
print("Seed Changed for Different Sampling Results\n")
print("Greedy Decoding Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, do_sample=False)
    print(f"[Output]")
    print(completion)
    print("-" * 50)
print("=" * 50)
print("-" * 50)
print("Sampling Decoding Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, do_sample=True)
    print(f"[Output]")
    print(completion)
    print("-" * 50)    

Seed Changed for Different Sampling Results

Greedy Decoding Results:
--------------------------------------------------

Prompt: When you leave because you are weary of me, I shall let you go gently, without a word. 

[Output]
When you leave because you are weary of me, I shall let you go gently, without a word. But if you leave because you are weary of me, I shall let you go with a shout, without a word. But if you leave because you are weary of me, I shall let you go with a shout, without a word. But if you leave because you are weary of me, I shall let you go with a shout, without a word. But if you leave because you are
--------------------------------------------------
--------------------------------------------------

Prompt: The train passed through the endless tunnel, and when it finally saw the light, it was the snow country. 

[Output]
The train passed through the endless tunnel, and when it finally saw the light, it was the snow country.

"I'm going to go home now."

"I'm 

The model utilizing the greedy method to output results without sampling consistently wants to go home even if the seed changes, but the model using sampling is producing different results since the seed changes.

#### Different temperature values

In [28]:
set_seed(42) # Reset seed back to original

print("Low Temperature Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, temperature=0.1)
    print(f"[Output]")
    print(completion)
    print("-" * 50)
print("=" * 50)
print("-" * 50)
print("High Temperature Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, temperature=2.0) # Let's burn it all down
    print(f"[Output]")
    print(completion)
    print("-" * 50)    

Low Temperature Results:
--------------------------------------------------

Prompt: When you leave because you are weary of me, I shall let you go gently, without a word. 

[Output]
When you leave because you are weary of me, I shall let you go gently, without a word. I shall let you go gently, without a word. I shall let you go gently, without a word. I shall let you go gently, without a word. I shall let you go gently, without a word. I shall let you go gently, without a word. I shall let you go gently, without a word. I shall let you go gently, without a word. I shall
--------------------------------------------------
--------------------------------------------------

Prompt: The train passed through the endless tunnel, and when it finally saw the light, it was the snow country. 

[Output]
The train passed through the endless tunnel, and when it finally saw the light, it was the snow country.

"I'm sorry, but I can't go on."

"I'm sorry, but I can't go on."

"I'm sorry, but I can'

The low-temperature friend is really good at apologizing. You really should apologize after cooling off your head. The high-temperature friend is so hot that I feel like my brain is going to melt.

In [37]:
set_seed(42*42) # Reset seed back to original

print("Low Temperature Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, temperature=0.1)
    print(f"[Output]")
    print(completion)
    print("-" * 50)
print("=" * 50)
print("-" * 50)
print("High Temperature Results:")
for prompt in prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100, temperature=2.0) # Let's burn it all down
    print(f"[Output]")
    print(completion)
    print("-" * 50)    

Low Temperature Results:
--------------------------------------------------

Prompt: When you leave because you are weary of me, I shall let you go gently, without a word. 

[Output]
When you leave because you are weary of me, I shall let you go gently, without a word. I shall not be angry with you, but I shall be kind to you. I shall not be angry with you, but I shall be kind to you. I shall not be angry with you, but I shall be kind to you. I shall not be angry with you, but I shall be kind to you. I shall not be angry with you, but I shall be kind to you
--------------------------------------------------
--------------------------------------------------

Prompt: The train passed through the endless tunnel, and when it finally saw the light, it was the snow country. 

[Output]
The train passed through the endless tunnel, and when it finally saw the light, it was the snow country.

"I'm going to go home now," said the girl.

"I'm going to go home now," said the boy.

"I'm going to go

I can't die like this

#### How the opening sentence shapes the continuation

In [38]:
set_seed(42*42) # Reset seed back to original

In [31]:
short_prompts = [
    "When you leave because",
    "The train passed",
    "In an era"
]
long_prompts = prompts

In [None]:

print("Short Prompt Results:")
for prompt in short_prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100)
    print(f"[Output]")
    print(completion)
    print("-" * 50)
print("=" * 50)
print("-" * 50)
print("Long Prompt Results:")
for prompt in long_prompts:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=100)
    print(f"[Output]")
    print(completion)
    print("-" * 50)

Short Prompt Results:
--------------------------------------------------

Prompt: When you leave because 

[Output]
When you leave because of the strike, you're not going to get paid," he said. "You're not going to be able to afford a meal or anything."

"It's going to be very difficult for them to make ends meet," he said.

He said some of the workers are still working without pay.

"They're going to have to take whatever they can," he said.

The workers say the administration failed to provide them with adequate notice of
--------------------------------------------------
--------------------------------------------------

Prompt: The train passed 

[Output]
The train passed by the apartment building where the victims were killed, and a police car was seen in the area, according to police.

Police have not released any details about the alleged suspects, but said they had been interviewed by the FBI.
--------------------------------------------------
---------------------------------

Oregon mentioned

#### English and Korean (Just for fun)

In [34]:
korean_prompt = [
    "나보기가 역겨워 가실 때에는 말 없이 고이보내 드리오리다",
    "기차가 끝없는 터널을 지나 마침내 빛을 보니, 설국이였다.",
    "기계가 인간이 될필요가 없어져 버린 시대에 한 남자가 절벽위에 서있었다."
] # Basically the same prompts but in Korean

In [36]:
print("Korean Prompt Results:")
for prompt in korean_prompt:
    print("-" * 50)
    print(f"\nPrompt: {prompt} \n")
    completion = autocomplete(prompt, max_length=200)
    print(f"[Output]")
    print(completion)
    print("-" * 50)

Korean Prompt Results:
--------------------------------------------------

Prompt: 나보기가 역겨워 가실 때에는 말 없이 고이보내 드리오리다 

[Output]
나보기가 역겨워 가실 때에는 말 없이 고이보내 드리오리다.

행리오리다: 유정리에 반 정았이 드리오리다.

행리오리다: 유정리에 반 정았이 드리오리다.

아명이 정았 �
--------------------------------------------------
--------------------------------------------------

Prompt: 기차가 끝없는 터널을 지나 마침내 빛을 보니, 설국이였다. 

[Output]
기차가 끝없는 터널을 지나 마침내 빛을 보니, 설국이였다.

기차가 지나 마침내 보니 내지말 마침내 빛을 보니, 보니 설국이였다. 설국이였다.

포마 말사 지나 지나도 내도 설도
--------------------------------------------------
--------------------------------------------------

Prompt: 기계가 인간이 될필요가 없어져 버린 시대에 한 남자가 절벽위에 서있었다. 

[Output]
기계가 인간이 될필요가 없어져 버린 시대에 한 남자가 절벽위에 서있었다. 아만 것성기 사람지기에 시대에 있었어져 서있가 없어져 버린 시대에 한 남자가 절벽위에 서�
--------------------------------------------------


Basically it says: 
- Yeglas half fredis give willingly
- fourma horsebuy pass pass but I too snow too
- Aman it[cansored] personlosed era being standwasga meaningless

Yes they are just meaningless trashes

## Observations

First, while the greedy approach, which selects only the highest probabilities, appeared to produce coherent sentences, it felt lacking in creativity as I couldn't obtain different results even when trying different seeds. On the other hand, the sampling method had some oddities (although the whole thing was strange, such as grammatically incorrect parts), but its creativity stood out as it produced diverse results when the seeds were changed.

Also, when I lowered the temperature during sampling, it produced coherent sentences similar to the greedy approach. At this point, unlike the greedy method, changing the seed did yield different content, but I received the impression that the structure was similar. When I raised the temperature extremely high, although it resulted in a list of "brainrot" words, I was still able to confirm that the content changed significantly depending on the seed.

Furthermore, when I removed the preceding content in the prompt, naturally, I could see the sentences heading in a different direction compared to the original, relatively longer prompt. I was also able to confirm that it could not perform inference on Korean prompts at all, likely because it wouldn't have been trained on languages like Korean during the GPT-2 era.