## Homework

1. Load in a generative model using the HuggingFace pipeline and generate text using a batch of prompts.
  * Play with generative parameters such as temperature, max_new_tokens, and the model itself and explain the effect on the legibility of the model response. Try at least 4 different parameter/model combinations.
  * Models that can be used include:
    * `google/gemma-2-2b-it`
    * `microsoft/Phi-3-mini-4k-instruct`
    * `meta-llama/Llama-3.2-1B`
    * Any model from this list: [Text-generation models](https://huggingface.co/models?pipeline_tag=text-generation)
    * `gpt2` if having trouble loading these models in
  * This guide should help! [Text-generation strategies](https://huggingface.co/docs/transformers/en/generation_strategies)
2. Load in 2 models of different parameter size (e.g. GPT2, meta-llama/Llama-2-7b-chat-hf, or distilbert/distilgpt2) and analyze the BertViz for each. How does the attention mechanisms change depending on model size?

In [1]:
pwd

'/lus/eagle/projects/CSTEELML/fbhuiyan/jupyter_notebooks/ai-science-training-series/04_intro_to_llms'

In [6]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

In [2]:
import torch, torchvision
device = torch.device(
    "cuda") if torch.cuda.is_available() else torch.device("cpu")

In [11]:
def generate_text(model, tokenizer, input_text, temperature, task='text-generation', max_length=20,
                  num_return_sequences=7,max_new_tokens=20):
    from transformers import AutoTokenizer,AutoModelForCausalLM, AutoConfig
    input_text = input_text
    from transformers import pipeline
    generator = pipeline(task, model=model, device=device, tokenizer=tokenizer)
    return generator(input_text, max_new_tokens=max_new_tokens, num_return_sequences=1, temperature=temperature)

In [4]:
from huggingface_hub import login
hf_token = "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"
login(token=hf_token, add_to_git_credential=True)

Token is valid (permission: write).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'store' credential helper as default.

git config --global credential.helper store

Read https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage for more details.[0m
Token has not been saved to git credential helper.
Your token has been saved to /home/fbhuiyan/.cache/huggingface/token
Login successful


In [15]:
# Model 1

model_id = "meta-llama/Llama-3.2-1B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model_1 = AutoModelForCausalLM.from_pretrained(model_id, cache_dir='../')

In [16]:
prompts = ['Hey man, look, I ',
           'Get your lazy ',
           'Who do you think is going to win ',
          'I watched the worldcup final ',
          'I\'m sorry, I am feeling ',
          'Hey, you over there, ']


model = model_1
for text in prompts:
    print(generate_text(model=model, input_text=text, tokenizer=tokenizer, temperature= 0.5)[0]['generated_text'])
    print('\n\n')

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Hey man, look, I  just bought a new  iPhone 5s and I want to use it with my Mac





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Get your lazy 3D printer ready, because it’s time to add a little color to your life. This is





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Who do you think is going to win 2012?
I think it is going to be Romney. I think he is the only candidate who





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I watched the worldcup final 2014 between Brazil and Germany on Saturday. It was a fantastic game. I had the pleasure to





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I'm sorry, I am feeling 100% better today, but I'm still not 100%. I'm hoping to be back to



Hey, you over there, 1,000,000,000 people have already voted for you. You’re a popular guy.






In [8]:
# Model 2

model_id_2 = "microsoft/Phi-3-mini-4k-instruct"
tokenizer_2 = AutoTokenizer.from_pretrained(model_id_2)
model_2 = AutoModelForCausalLM.from_pretrained(model_id_2, cache_dir='../')

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [14]:
prompts = ['Hey man, look, I ',
           'Get your lazy ',
           'Who do you think is going to win ',
          'I watched the worldcup final ',
          'I\'m sorry, I am feeling ',
          'Hey, you over there, ']


model = model_2
for text in prompts:
    print(generate_text(model=model_2, input_text=text, tokenizer=tokenizer_2, temperature= 0.5)[0]['generated_text'])
    print('\n\n')



Hey man, look, I 


I'm in a bit of a pickle here. I've got this



Get your lazy 10-minute workout in.

**Exercise 1:**




Who do you think is going to win 2024 presidential election?

I'm sorry, but I can't



I watched the worldcup final 2018 between France and Croatia. I was in the middle of a long flight from



I'm sorry, I am feeling 😞. I'm not sure what to do.

I'm sorry



Hey, you over there, 

I need a CMake script for a C++ project. It's gotta handle





# Play with temperature

In [12]:
prompts = ['Hey man, look, I ',
           'Get your lazy ',
           'Who do you think is going to win ',
          'I watched the worldcup final ',
          'I\'m sorry, I am feeling ',
          'Hey, you over there, ']


model = model_2
for text in prompts:
    print(generate_text(model=model_2, input_text=text, tokenizer=tokenizer_2, temperature= 0.1)[0]['generated_text'])
    print('\n\n')

Hey man, look, I 


I'm in a bit of a pickle here. I've got this



Get your lazy 10-minute workout in.

**Exercise 1:**




Who do you think is going to win 2024 presidential election?

I'm sorry, but I can't



I watched the worldcup final 2018 between France and Croatia. I was in the middle of a long flight from



I'm sorry, I am feeling 😞. I'm not sure what to do.

I'm sorry



Hey, you over there, 

I need a CMake script for a C++ project. It's gotta handle





In [13]:
prompts = ['Hey man, look, I ',
           'Get your lazy ',
           'Who do you think is going to win ',
          'I watched the worldcup final ',
          'I\'m sorry, I am feeling ',
          'Hey, you over there, ']


model = model_2
for text in prompts:
    print(generate_text(model=model_2, input_text=text, tokenizer=tokenizer_2, temperature= 1.0)[0]['generated_text'])
    print('\n\n')

Hey man, look, I 


I'm in a bit of a pickle here. I've got this



Get your lazy 10-minute workout in.

**Exercise 1:**




Who do you think is going to win 2024 presidential election?

I'm sorry, but I can't



I watched the worldcup final 2018 between France and Croatia. I was in the middle of a long flight from



I'm sorry, I am feeling 😞. I'm not sure what to do.

I'm sorry



Hey, you over there, 

I need a CMake script for a C++ project. It's gotta handle





In [17]:
model = model_1
for text in prompts:
    print(generate_text(model=model, input_text=text, tokenizer=tokenizer, temperature= 0.1)[0]['generated_text'])
    print('\n\n')

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Hey man, look, I  know you’re a big fan of the show, but I’m not sure you’re going to





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Get your lazy 2x4s and your 2x4s and your 2x4s and your





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Who do you think is going to win 2016?
I think it will be Hillary. She has the money, the experience, and the





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I watched the worldcup final 2014 between Germany and Argentina. I was very happy to see Germany win the worldcup. I





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I'm sorry, I am feeling icky today. I'm not sure why. I'm not sick, I'm not tired, I



Hey, you over there,  I’m the guy who’s been in the business of helping people for over 20 years.





In [18]:
model = model_1
for text in prompts:
    print(generate_text(model=model, input_text=text, tokenizer=tokenizer, temperature= 1.0)[0]['generated_text'])
    print('\n\n')

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Hey man, look, I 100% understand the appeal of a "gimmick" type device. A good gimmick can





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Get your lazy 10k legs moving to the beats! Bring your workout buddies for some fun cardio fun! We’ll





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Who do you think is going to win 2016?
Discussion in 'Political Debate & Discussion' started by Tiffen, Apr 17





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I watched the worldcup final 8 hours ago and I have to say I was really disappointed with Portugal. They never seemed to find





Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


I'm sorry, I am feeling icky
A while back, I wrote about the need for a strong voice when we're called to



Hey, you over there,  you got a little something for everyone. You're looking for that little something extra to spruce





In [48]:
!pip install bertviz

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
Collecting bertviz
  Downloading bertviz-1.4.0-py3-none-any.whl.metadata (19 kB)
Collecting sentencepiece (from bertviz)
  Downloading sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.7 kB)
Downloading bertviz-1.4.0-py3-none-any.whl (157 kB)
Downloading sentencepiece-0.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m37.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentencepiece, bertviz
Successfully installed bertviz-1.4.0 sentencepiece-0.2.0


In [58]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

model_name = "meta-llama/Llama-3.2-1B"
input_text = "Lets go to the "
model = AutoModelForCausalLM.from_pretrained(model_name, output_attentions=True, cache_dir='./')
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer.encode(input_text, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view

<IPython.core.display.Javascript object>

In [59]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

model_name = "microsoft/Phi-3-mini-4k-instruct"
input_text = "Lets go to the "
model = AutoModelForCausalLM.from_pretrained(model_name, output_attentions=True, cache_dir='./')
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer.encode(input_text, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

<IPython.core.display.Javascript object>

The 'microsoft/Phi-3-mini-4k-instruct' model seems to have a lot more attention mechanisms and layers than the 'Llama 3.2' model.