# HW4
Madusanka Madiligama



## Homework

1. Load in a generative model using the HuggingFace pipeline and generate text using a batch of prompts.
  * Play with generative parameters such as temperature, max_new_tokens, and the model itself and explain the effect on the legibility of the model response. Try at least 4 different parameter/model combinations.
  * Models that can be used include:
    * `google/gemma-2-2b-it`
    * `microsoft/Phi-3-mini-4k-instruct`
    * `meta-llama/Llama-3.2-1B`
    * Any model from this list: [Text-generation models](https://huggingface.co/models?pipeline_tag=text-generation)
    * `gpt2` if having trouble loading these models in
  * This guide should help! [Text-generation strategies](https://huggingface.co/docs/transformers/en/generation_strategies)
2. Load in 2 models of different parameter size (e.g. GPT2, meta-llama/Llama-2-7b-chat-hf, or distilbert/distilgpt2) and analyze the BertViz for each. How does the attention mechanisms change depending on model size?

In [2]:
'''
Uncomment below section if running on sophia jupyter notebook
'''
import os
os.environ["HTTP_PROXY"]="proxy.alcf.anl.gov:3128"
os.environ["HTTPS_PROXY"]="proxy.alcf.anl.gov:3128"
os.environ["http_proxy"]="proxy.alcf.anl.gov:3128"
os.environ["https_proxy"]="proxy.alcf.anl.gov:3128"
os.environ["ftp_proxy"]="proxy.alcf.anl.gov:3128"

In [15]:
# STEP 0 : Installations and imports
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
from transformers import pipeline
import torch
import torch.nn.functional as F
generator = pipeline("text-generation", model="gpt2")

In [16]:
from transformers import AutoModelForCausalLM, GenerationConfig
# STEP 1 : Set up the prompt
input_text = "Write a concise paragraph about"

# STEP 2 : Load the pretrained model.
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
print(config)

#STEP 3 : Load the tokenizer and tokenize the input text
tokenizer  =  AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"]
print(input_ids)

# STEP 5 : Perform inference
outputs = model(input_ids)
result = outputs.logits
print(result)

# STEP 6 :  Interpret the output.
probabilities = F.softmax(result, dim=-1)
print(probabilities)
predicted_class = torch.argmax(probabilities, dim=-1).item()
labels = ["NEGATIVE", "POSITIVE"]
out_string = "[{'label': '" + str(labels[predicted_class]) + "', 'score': " + str(probabilities[0][predicted_class].tolist()) + "}]"
print(out_string)
generator(input_text, max_length=40, num_return_sequences=5)

DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased-finetuned-sst-2-english",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "finetuning_task": "sst-2",
  "hidden_dim": 3072,
  "id2label": {
    "0": "NEGATIVE",
    "1": "POSITIVE"
  },
  "initializer_range": 0.02,
  "label2id": {
    "NEGATIVE": 0,
    "POSITIVE": 1
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.44.0",
  "vocab_size": 30522
}

tensor([[  101,  4339,  1037,  9530, 18380, 20423,  2055,   102]])
tensor([[-3.0075,  3.0523]], grad_fn=<AddmmBackward0>)
tensor([[0.0023, 0.9977]], grad_fn=<SoftmaxBackward0>)
[{'label': 'POSITIVE', 'score': 0.9976703524589539}]


[{'generated_text': "Write a concise paragraph about it and use it as a resource for others. You can even just write some simple HTML that will get read easily.\n\nI've written all sorts of basic tools,"},
 {'generated_text': "Write a concise paragraph about what's going on but please take the time to read and understand that before any of this is actually a discussion.\n\nWhat is the significance of this proposal? As I"},
 {'generated_text': 'Write a concise paragraph about a topic.\n\nIf only half the list should be blank, add a copy of the relevant section to get the next page where those pages are displayed at:\n\n'},
 {'generated_text': 'Write a concise paragraph about how their program works, and whether your paper is suitable.\n\nThe following paragraphs describe some steps needed to create a simple and effective blog entry template for you at https://'},
 {'generated_text': "Write a concise paragraph about what it is you want to accomplish and why.\n\nStep One: Set a Goal, Not 

In [19]:
#Try 1:  microsoft/Phi-3-mini-4k-instruct and  temperature 5.0 max_new_tokens 50

# STEP 1 : Set up the prompt
input_text = "Write a concise paragraph about"

# STEP 2 : Load the pretrained model.
# model_name = "gpt2"
# model_name ="google/gemma-2-2b-it"
# model_name ="gemma-2-2b-it"

model_name ="microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
print(config)

generation_config = GenerationConfig(
    max_new_tokens=50, do_sample=True, top_k=50, eos_token_id=model.config.eos_token_id,temperature=5.0)

#STEP 3 : Load the tokenizer and tokenize the input text
tokenizer  =  AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"]
print(input_ids)

# STEP 5 : Perform inference
outputs = model(input_ids)
result = outputs.logits
print(result)

# STEP 6 :  Interpret the output.
probabilities = F.softmax(result, dim=-1)
print(probabilities)
predicted_class = torch.argmax(probabilities, dim=-1).item()
labels = ["NEGATIVE", "POSITIVE"]
out_string = "[{'label': '" + str(labels[predicted_class]) + "', 'score': " + str(probabilities[0][predicted_class].tolist()) + "}]"
print(out_string)
generator(input_text, max_length=20, num_return_sequences=5)

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Phi3Config {
  "_name_or_path": "microsoft/Phi-3-mini-4k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-4k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-4k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 4096,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "original_max_position_embeddings": 4096,
  "pad_token_id": 32000,
  "resid_pdrop": 0.0,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "sliding_window": 2047,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.44.0",
  "use_cache": true,
  "v

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

tensor([[14350,   263,  3022,   895, 14880,  1048]])
tensor([[ 2.3804, -3.0365]], grad_fn=<IndexBackward0>)
tensor([[0.9956, 0.0044]], grad_fn=<SoftmaxBackward0>)
[{'label': 'NEGATIVE', 'score': 0.9955784678459167}]


[{'generated_text': "Write a concise paragraph about how I feel about this book by clicking here but I'm sure you've"},
 {'generated_text': 'Write a concise paragraph about those who may be affected by this new law. Include a description of your'},
 {'generated_text': 'Write a concise paragraph about your work.\n\n4. Give some examples to explain your problem without'},
 {'generated_text': "Write a concise paragraph about his own thoughts on the story and its aftermath. You don't have to"},
 {'generated_text': "Write a concise paragraph about how you'll start making decisions about your lives. The first step is to"}]

In [9]:
#Try 2: gpt2 , temperature=5.0, and max_new_tokens 100

# STEP 1 : Set up the prompt
input_text = "Write a concise paragraph about"

# STEP 2 : Load the pretrained model.
model_name = "gpt2"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
print(config)

generation_config = GenerationConfig(
    max_new_tokens=100, do_sample=True, top_k=50, eos_token_id=model.config.eos_token_id,temperature=5.0)

#STEP 3 : Load the tokenizer and tokenize the input text
tokenizer  =  AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"]
print(input_ids)

# STEP 5 : Perform inference
outputs = model(input_ids)
result = outputs.logits
print(result)

# STEP 6 :  Interpret the output.
probabilities = F.softmax(result, dim=-1)
print(probabilities)
predicted_class = torch.argmax(probabilities, dim=-1).item()
labels = ["NEGATIVE", "POSITIVE"]
out_string = "[{'label': '" + str(labels[predicted_class]) + "', 'score': " + str(probabilities[0][predicted_class].tolist()) + "}]"
print(out_string)
generator(input_text, max_length=20, num_return_sequences=5)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.44.0",
  "use_cache": true,
  "vocab_size": 50257
}

tensor([[16594,   257, 35327,  7322,   546]])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tensor([[-4.5621,  0.3874]], grad_fn=<IndexBackward0>)
tensor([[0.0070, 0.9930]], grad_fn=<SoftmaxBackward0>)
[{'label': 'POSITIVE', 'score': 0.9929624199867249}]


[{'generated_text': 'Write a concise paragraph about which you think they should be working on an issue before giving any concrete decisions'},
 {'generated_text': 'Write a concise paragraph about something you do. Read a detailed explanation with tips and pointers. Find something'},
 {'generated_text': 'Write a concise paragraph about this case.\n\nTo check this case from your sources simply click on'},
 {'generated_text': 'Write a concise paragraph about a particular topic of interest and an abstract list of topics, or describe an'},
 {'generated_text': 'Write a concise paragraph about what you want to talk about for the last two paragraphs so that listeners are'}]

In [20]:

#Try 3: gpt2, temperature 3.0, and new token to 100

# STEP 1 : Set up the prompt
input_text = "Write a concise paragraph about"

# STEP 2 : Load the pretrained model.
model_name = "gpt2"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
print(config)

generation_config = GenerationConfig(
    max_new_tokens=100, do_sample=True, top_k=100, eos_token_id=model.config.eos_token_id,temperature=3.0)

#STEP 3 : Load the tokenizer and tokenize the input text
tokenizer  =  AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"]
print(input_ids)

# STEP 5 : Perform inference
outputs = model(input_ids)
result = outputs.logits
print(result)

# STEP 6 :  Interpret the output.
probabilities = F.softmax(result, dim=-1)
print(probabilities)
predicted_class = torch.argmax(probabilities, dim=-1).item()
labels = ["NEGATIVE", "POSITIVE"]
out_string = "[{'label': '" + str(labels[predicted_class]) + "', 'score': " + str(probabilities[0][predicted_class].tolist()) + "}]"
print(out_string)
generator(input_text, max_length=20, num_return_sequences=5)

GPT2Config {
  "_name_or_path": "gpt2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.44.0",
  "use_cache": true,
  "vocab_size": 50257
}

tensor([[16594,   257, 35327,  7322,   546]])




tensor([[-1.0068, -2.8262]], grad_fn=<IndexBackward0>)
tensor([[0.8605, 0.1395]], grad_fn=<SoftmaxBackward0>)
[{'label': 'NEGATIVE', 'score': 0.8604958057403564}]


[{'generated_text': 'Write a concise paragraph about a given question, followed by an answer or question answer along with any additional'},
 {'generated_text': 'Write a concise paragraph about what one must do when you hear such a word. The reader knows whether'},
 {'generated_text': 'Write a concise paragraph about an individual or group. Be respectful to your audience, their ideas, and'},
 {'generated_text': 'Write a concise paragraph about the process that went into the "mismatch" for the original ('},
 {'generated_text': "Write a concise paragraph about some of the issues that affect our readers' lives:\n\nIt's"}]

In [21]:
#Try 4 : microsoft/Phi-3-mini-4k-instruct and temperature to 5.0 max_new_tokens 100

# STEP 1 : Set up the prompt
input_text = "Write a concise paragraph about"

# STEP 2 : Load the pretrained model.
model_name ="microsoft/Phi-3-mini-4k-instruct"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
print(config)

generation_config = GenerationConfig(
    max_new_tokens=100, do_sample=True, top_k=50, eos_token_id=model.config.eos_token_id,temperature=5.0)

#STEP 3 : Load the tokenizer and tokenize the input text
tokenizer  =  AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"]
print(input_ids)

# STEP 5 : Perform inference
outputs = model(input_ids)
result = outputs.logits
print(result)

# STEP 6 :  Interpret the output.
probabilities = F.softmax(result, dim=-1)
print(probabilities)
predicted_class = torch.argmax(probabilities, dim=-1).item()
labels = ["NEGATIVE", "POSITIVE"]
out_string = "[{'label': '" + str(labels[predicted_class]) + "', 'score': " + str(probabilities[0][predicted_class].tolist()) + "}]"
print(out_string)
generator(input_text, max_length=20, num_return_sequences=5)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Phi3Config {
  "_name_or_path": "microsoft/Phi-3-mini-4k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-mini-4k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-mini-4k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 3072,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 4096,
  "model_type": "phi3",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "original_max_position_embeddings": 4096,
  "pad_token_id": 32000,
  "resid_pdrop": 0.0,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "sliding_window": 2047,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.44.0",
  "use_cache": true,
  "v

[{'generated_text': "Write a concise paragraph about how this project has been successful.\n\nPlease don't pass anything to"},
 {'generated_text': 'Write a concise paragraph about the most common problems about web pages, and create a comprehensive problem description with'},
 {'generated_text': "Write a concise paragraph about a specific task or a task you have been doing. That's not required"},
 {'generated_text': 'Write a concise paragraph about how this work started.\n\nMy idea for the paper was two fold'},
 {'generated_text': 'Write a concise paragraph about what the book means. It will also make the reader think about what other'}]

In [13]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

model_name = 'openai-community/gpt2'
input_text = "This review article is about"
model = AutoModelForCausalLM.from_pretrained(model_name, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer.encode(input_text, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view
generator(input_text, max_length=20, num_return_sequences=5)

<IPython.core.display.Javascript object>

[{'generated_text': 'This review article is about our methodology. Please let us know in the comments if you have any questions'},
 {'generated_text': "This review article is about the novel's initial plot development. I won't share any of the material"},
 {'generated_text': "This review article is about a specific type of dog's temperament and why the types and varieties described may"},
 {'generated_text': 'This review article is about the history of the United States in the 1780s and 1810s'},
 {'generated_text': 'This review article is about the relationship between human and animal intelligence. I am going to focus on one'}]

In [14]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

model_name = 'distilbert/distilgpt2'
input_text = "This review article is about"
model = AutoModelForCausalLM.from_pretrained(model_name, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer.encode(input_text, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view
generator(input_text, max_length=20, num_return_sequences=5)

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

<IPython.core.display.Javascript object>

[{'generated_text': 'This review article is about the system, for how the game system compares without reference to others, in'},
 {'generated_text': 'This review article is about a game at launch that was called "Project Veritas".\n\nP'},
 {'generated_text': 'This review article is about the effects of a particular combination of pesticides on bees, and is intended solely'},
 {'generated_text': "This review article is about the topic of using the DSS system to determine if you'd like a"},
 {'generated_text': 'This review article is about the book on the topic that I will use here but am unaware that it'}]

The attention mechanisms vary significantly between models like openai-community/gpt2 and distilbert/distilgpt2 due to their differing sizes and architectures.​ The larger GPT2 model displays more complex attention patterns, with broader distribution across tokens that enables it to capture nuanced contextual relationships effectively. In contrast, the smaller DistilGPT2 model has simpler, more concentrated attention patterns, sacrificing some depth of understanding for efficiency, which may limit its contextual awareness in text generation tasks.