**HOMEWORK DESCRIPTION:**


Load in a generative model using the HuggingFace pipeline and generate text using a batch of prompts.
Play with generative parameters such as temperature, max_new_tokens, and the model itself and explain the effect on the legibility of the model response. Try at least 4 different parameter/model combinations.
Models that can be used include:
google/gemma-2-2b-it
microsoft/Phi-3-mini-4k-instruct
meta-llama/Llama-3.2-1B
Any model from this list: Text-generation models
gpt2 if having trouble loading these models in

In [None]:
# Install the required library if you haven't already
# !pip install transformers

from transformers import pipeline

# Load the text generation pipeline with GPT-2 or other available models
generator_gpt2 = pipeline("text-generation", model="gpt2")
generator_phi = pipeline("text-generation", model="microsoft/Phi-3-mini-4k-instruct")  # Alternative model

# Define a batch of prompts
prompts = [
    "Once upon a time in a distant galaxy,",
    "The key to a successful life is",
    "In the year 2050, humanity will",
    "The secret to happiness lies"
]

# Experiment with different parameter settings
combinations = [
    {"generator": generator_gpt2, "temperature": 0.7, "max_new_tokens": 50},
    {"generator": generator_gpt2, "temperature": 0.2, "max_new_tokens": 100},
    {"generator": generator_gpt2, "temperature": 0.5, "max_new_tokens": 30},
    {"generator": generator_phi, "temperature": 1.0, "max_new_tokens": 75}  # Using a different model
]

# Generate outputs for each combination
results = {}
for i, params in enumerate(combinations):
    generator = params.pop("generator")
    results[f"Combination {i+1}"] = [generator(prompt, **params)[0]["generated_text"] for prompt in prompts]

# Print the results for comparison
for combination, outputs in results.items():
    print(f"\n{combination}:")
    for i, output in enumerate(outputs):
        print(f"Prompt {i+1}: {output}")


**HOMEWORK DESCRIPTION:**

Load in 2 models of different parameter size (e.g. GPT2, meta-llama/Llama-2-7b-chat-hf, or distilbert/distilgpt2) and analyze the BertViz for each. How does the attention mechanisms change depending on model size?

In [None]:
!pip install transformers bertviz

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from bertviz import model_view

# Load smaller model (e.g., GPT-2)
tokenizer_gpt2 = AutoTokenizer.from_pretrained("gpt2")
model_gpt2 = AutoModelForCausalLM.from_pretrained("gpt2", output_attentions=True))

# Load larger model (e.g., LLaMA 2)
tokenizer_llama = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model_llama = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", output_attentions=True))

# Define a sample input text
text = "The universe is vast and full of mysteries."

# Tokenize input
inputs_gpt2 = tokenizer_gpt2(text, return_tensors="pt")
inputs_llama = tokenizer_llama(text, return_tensors="pt")

# Generate outputs and visualize attention with BertViz for the smaller model
print("Visualizing GPT-2 attention:")
outputs_gpt2 = model_gpt2(inputs_gpt2)
attention_gpt2 = outputs_gpt2[-1]
tokens_gpt2 = tokenizer_gpt2.convert_ids_to_tokens(inputs_gpt2[0])
model_view(attention_gpt2, tokens_gpt2)

# Generate outputs and visualize attention with BertViz for the larger model
print("\nVisualizing LLaMA-2 attention:")
outputs_llama = model_llama(inputs_llama)
attention_llama = outputs_llama[-1]
tokens_llama = tokenizer_llama.convert_ids_to_tokens(inputs_llama[0])
model_view(attention_llama, tokens_llama)