# Prompt Engineering 101 - Part I.
## The Cognitive Engine

---

### *The Mechanics of Alien Intelligence*

## 1. It Reads Numbers, Not Words (Tokenization)
AI does not see "Apple". It sees `[2034]`.
* **The Math Glitch:** Because `9.11` tokenizes into `[9, ., 11]`, the AI often thinks it is larger than `9.9` `[9, ., 9]`. It is predicting text patterns, not calculating values.

## 2. Meaning is Geometry (Embeddings)
Words are coordinates in a high-dimensional map.
* **Semantic Proximity:** "King" is close to "Queen". "Paris" is close to "France".
* **The Chameleon Effect:** A word's location shifts based on context. "Bank" (river) and "Bank" (money) are mathematically distinct concepts to a modern AI.

## 3. The Stochastic Parrot (Probabilities)
The AI predicts the next word based on statistical likelihood.
* **Bias:** If the internet associates "Doctor" with "He", the AI will too, unless steered.
* **Hallucination:** When the AI doesn't know the answer, it picks the most *statistically plausible* sounding word, even if it's a lie.

## 4. You Are The Pilot (Parameters)
* **Temperature:** Controls randomness.
    * **Low (0.0 - 0.3):** Factual, robotic, consistent. (Use for: Data extraction).
    * **High (0.8 - 1.5):** Creative, chaotic, diverse. (Use for: Brainstorming).
* **Seed:** A number that freezes randomness. Using the same Seed + same Temperature = Identical result every time.
"""

---

In [None]:
# @title üõ†Ô∏è Step 0: Laboratory Setup
# We are installing the 'transformers' library to access the AI models
# and visualization tools to see what's happening inside.

!pip install transformers torch scipy matplotlib seaborn scikit-learn --quiet

import torch
import torch.nn.functional as F
from transformers import GPT2LMHeadModel, GPT2Tokenizer, BertModel, BertTokenizer
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.decomposition import PCA

# We load TWO brains today to compare them:
# 1. GPT-2 (The Writer) - Good at generating text.
# 2. BERT (The Reader) - Good at understanding meaning.

print("Loading the 'Writer' (GPT-2)...")
gpt_tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
gpt_model = GPT2LMHeadModel.from_pretrained("gpt2")
gpt_model.eval()

# GPT-2 doesn't have a default pad token, so we use the EOS token.
# This silences the "attention mask" warnings.
gpt_tokenizer.pad_token = gpt_tokenizer.eos_token
gpt_model.config.pad_token_id = gpt_tokenizer.eos_token_id

print("Loading the 'Reader' (BERT)...")
bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_model = BertModel.from_pretrained('bert-base-uncased', output_hidden_states=True)
bert_model.eval()

print("‚úÖ Setup Complete! Ready to explore.")

---

## **Topic 1: Tokenization (The Input)**

**Concept:** The AI does not read English (or any language). It reads numbers. "Tokenization" is the process of translating text into a string of integers.

**Analogy:** Think of this like Morse Code. The AI doesn't hear the "beep," it processes the pattern.

In [None]:
# @title üî¢ Topic 1: Seeing the Matrix (Tokenization)
input_text = "The quick brown fox jumps over the lazy dog." # @param {type:"string"}

# 1. Convert Text to Tokens (Numbers)
input_ids = gpt_tokenizer.encode(input_text)

# 2. Convert Tokens back to Text chunks (to see how it splits words)
tokens = [gpt_tokenizer.decode([x]) for x in input_ids]

# Visualization
print(f"Original Text:  {input_text}")
print(f"-"*40)
print(f"{'TOKEN (The Chunk)':<20} | {'ID (The Number)':<10}")
print(f"-"*40)

for t, i in zip(tokens, input_ids):
    # We use repr() to show spaces clearly
    print(f"{repr(t):<20} | {i:<10}")

print(f"-"*40)
print(f"Total Tokens: {len(input_ids)}")

**Discussion Point:** Notice how common words like "The" have a specific ID. Notice if you type a complex word (like "unimaginable"), it might get split into multiple tokens.  
**Constraint:** The AI charges by the *token*, not the word.

In [None]:
# @title üßÆ Topic 1.5: The Math Glitch (Why LLMs struggle with numbers)
# We often assume AI is a computer, so it must be good at math.
# Let's see how it actually "reads" numbers.

# CASE A: A Simple Number
num_simple = "100"
# CASE B: A Decimal that confuses tokenizers
num_confusing = "9.11"
# CASE C: A slightly larger number
num_comparison = "9.9"

def show_tokens(text):
    ids = gpt_tokenizer.encode(text)
    tokens = [gpt_tokenizer.decode([x]) for x in ids]
    print(f"Input: '{text}' -> Tokens: {tokens}  (IDs: {ids})")

print("--- HOW AI READS NUMBERS ---")
show_tokens(num_simple)
show_tokens(num_confusing)
show_tokens(num_comparison)

print("\n--- THE 'GLITCH' EXPLAINED ---")
print("1. It sees '9.9' as two chunks: [9] and [.9]")
print("2. It sees '9.11' as three chunks: [9], [.], and [11]")
print("3. To an AI predicting text, the token [11] is 'bigger' (more characters) than [9].")
print("4. This is why LLMs often say 9.11 is larger than 9.9. They are doing text completion, not math.")

---

## **Topic 2: Embeddings (The Meaning)**

**Concept:** Once text is a number, the AI looks up that number in a massive multi-dimensional map. Words that mean similar things are "close" to each other on this map.

**Analogy:** Imagine a library. "King" and "Queen" are on the same shelf. "Apple" and "Orange" are in the food section. "Apple" and "iPhone" might be in the tech section.

In [None]:
# @title üó∫Ô∏è Topic 2: The Map of Meaning (Embeddings)

# Let's pick 4 words to see how the AI groups them
words = ["King", "Queen", "Apple", "Orange", "Computer", "Phone"]
word_ids = [gpt_tokenizer.encode(w)[0] for w in words]

# Extract the "embeddings" (The vector coordinates)
# The model sees these words as 768-dimensional coordinates.
# We will squash them down to 2 dimensions so we can plot them on a screen.
embeddings = gpt_model.transformer.wte.weight[word_ids].detach().numpy()

# Use PCA to reduce 768 dimensions to 2 (X and Y axis)
pca = PCA(n_components=2)
reduced_embeddings = pca.fit_transform(embeddings)

# Plotting
plt.figure(figsize=(10, 6))
plt.scatter(reduced_embeddings[:, 0], reduced_embeddings[:, 1], s=100, c='blue')

for i, word in enumerate(words):
    plt.annotate(word, (reduced_embeddings[i, 0]+0.02, reduced_embeddings[i, 1]+0.02), fontsize=14)

plt.title("How the AI 'Sees' Concepts in Space")
plt.xlabel("Abstract Dimension 1")
plt.ylabel("Abstract Dimension 2")
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

**Discussion Point:** Look at the chart. Even without being told, the AI knows "King" and "Queen" belong together, and "Apple" and "Orange" belong together. This is "Semantic Understanding."

---

## **Topic 3: Attention (The Context)**

**Concept:** Words change meaning based on neighbors. "Bank" means something different in "River bank" vs "Bank deposit." The **Attention Mechanism** allows tokens to "talk" to each other to resolve this ambiguity.

**Analogy:** When you read a sentence, your eyes jump back and forth to connect "He" to "John." The AI does this mathematically.

*Note: Visualizing raw attention weights from GPT-2 is complex. This visualization is a conceptual simulation to demonstrate the **effect** of attention.*

In [None]:
# @title üëÅÔ∏è Topic 3: Context Matters (Attention Visualization)

sentence = "The bank of the river."
# We will simulate how the word "bank" pays attention to other words
words = sentence.split()
attention_scores = [0.1, 0.9, 0.1, 0.05, 0.8] # "Bank" focuses heavily on "Bank" and "River"

# Plotting a heatmap
plt.figure(figsize=(10, 2))
sns.heatmap([attention_scores], annot=[words], fmt="", cmap="Blues", cbar=False,
            xticklabels=False, yticklabels=False, square=True, linewidths=1)
plt.title("Visualizing Attention: When processing 'bank', where does the AI look?")
plt.show()

print("\nExplanation:")
print("Darker Blue = The AI is paying more attention to this word to understand the context.")
print("In this case, 'River' helps the AI understand that 'Bank' refers to nature, not money.")

## **The Chameleon Word (Contextual Embeddings)**

**Concept:** 
- **Static Embedding (The Dictionary)**: When the word "Apple" first enters the model, it is just a generic concept. It holds all meanings at once (Fruit, Tech Company, Record Label).
- **Contextualized Embedding (The Understanding)**: As the token passes through the model's layers (interacting with other words via Attention), its vector physically moves in the high-dimensional space. By the final layer, "Apple" (in a pie context) has moved close to "Food", while "Apple" (in an iPhone context) has moved close to "Technology".

**The Demo:** We will take the word **"Apple"**. We will place it in two different sentences. We will then measure the distance of that same word to the concept of **"Fruit"** and **"Technology"** to prove it has "changed sides."

In [None]:
# @title ü¶é Topic 3.5: The Chameleon Word (Contextual Embeddings)
# We will look at how the word "Bank" changes its "location" in meaning-space
# depending on the sentence it lives in.
# We also observe how does the model behaviour changes based on its purpose.

# Define two sentences with the same word but different meanings
# Does the word "Bank" change meaning based on the sentence?
sent_nature = "The boat floated by the river bank"
sent_finance = "I went to the bank to deposit money"


# Define "Anchor Concepts" to measure against
# We want to see if the word is closer to "Water" or "Money"
# Anchors to measure against (The "North Stars" of meaning)
anchor_word_nature = "water"
anchor_word_finance = "money"

# --- HELPER FUNCTIONS ---
                                          
def get_word_index(tokenizer, text, word):
    """ Helper to find where the word is in the token list """
    # Encode with special tokens (if any)
    input_ids = tokenizer.encode(text, return_tensors='pt')
    tokens = [tokenizer.decode([t]).strip().lower() for t in input_ids[0]]
    
    # Find the index
    try:
        # We search for the word. Note: Tokenizers are tricky (e.g., " bank" vs "bank")
        # This is a simplified search for demonstration
        idx = tokens.index(word.lower())
    except ValueError:
        # Fallback: if exact match fails, look for substring
        # (e.g. GPT2 might tokenize " bank" with a space)
        idx = -1
        for i, t in enumerate(tokens):
            if word.lower() in t:
                idx = i
                break
    return idx, input_ids

def get_gpt2_embedding(sentence, word):
    # 1. Find the word index
    idx, input_ids = get_word_index(gpt_tokenizer, sentence, word)
    
    if idx == -1: return None # Word not found
    
    # 2. Run the model
    with torch.no_grad():
        outputs = gpt_model(input_ids, output_hidden_states=True)
        
    # 3. Get the vector
    # GPT-2: We use the last layer. 
    # Even though it's the "Writer", we want to see what it thinks "bank" is 
    # at that specific moment in the sequence.
    hidden_states = outputs.hidden_states[-1]
    return hidden_states[0, idx, :] 

def get_bert_embedding(sentence, word):
    # 1. Find the word index
    idx, input_ids = get_word_index(bert_tokenizer, sentence, word)
    
    if idx == -1: return None
    
    # 2. Run the model
    with torch.no_grad():
        outputs = bert_model(input_ids)
        
    # 3. Get the vector
    # BERT: Use 2nd to last layer for best semantic representation
    return outputs.hidden_states[-2][0, idx, :]

# --- RUN THE EXPERIMENT ---

# 1. Get Anchor vectors (Baseline meanings)
# We embed them in simple contexts to get a clean read
vec_bert_water = get_bert_embedding("water is clear", anchor_word_nature)
vec_bert_money  = get_bert_embedding("money is green", anchor_word_finance)

# 2. Get Target Word vectors (The word "Bank")
vec_bert_nature = get_bert_embedding(sent_nature, "bank")
vec_bert_finance = get_bert_embedding(sent_finance, "bank")

# 3. Calculate Similarities (Cosine Similarity)
# 1.0 = Identical, 0.0 = Unrelated
cos = torch.nn.CosineSimilarity(dim=0)

# BERT Results
bert_nat_w = cos(vec_bert_nature, vec_bert_water).item()
bert_nat_c = cos(vec_bert_nature, vec_bert_money).item()
bert_fin_w = cos(vec_bert_finance, vec_bert_water).item()
bert_fin_c = cos(vec_bert_finance, vec_bert_money).item()

# --- VISUALIZATION ---
fig, ax = plt.subplots(figsize=(14, 6))

# Plot 2: BERT (Reader)
x = np.arange(2)
width = 0.35

rects1 = ax.bar(x - width/2, [bert_nat_w, bert_fin_w], width, label='Similarity to Water', color='blue')
rects2 = ax.bar(x + width/2, [bert_nat_c, bert_fin_c], width, label='Similarity to Money', color='green')

ax.set_ylabel('Similarity')
ax.set_title('BERT (Reader)\nBidirectional reading creates strong context')
ax.set_xticks(x)
ax.set_xticklabels(['Bank in RIVER sentence', 'Bank in MONEY sentence'])
ax.legend()
ax.set_ylim(0, 1.0)

# Add labels
def autolabel(rects, ax):
    for rect in rects:
        height = rect.get_height()
        ax.annotate(f'{height:.2f}',
                    xy=(rect.get_x() + rect.get_width() / 2, height),
                    xytext=(0, 3), textcoords="offset points",
                    ha='center', va='bottom')

autolabel(rects1, ax)
autolabel(rects2, ax)

plt.show()

print("\n--- ANALYSIS ---")
print("1. In the River sentence (Left Pair), the Blue Bar (Water) is much higher.")
print("2. In the Money sentence (Right Pair), the Gold Bar (Money) is much higher.")
print("This proves the AI 'Understands' context.")

---

## **Topic 4: The Prediction Machine (Probabilities)**

**Concept:** The AI is not creative; it is probabilistic. It calculates the % chance of *every single word in the dictionary* coming next, and picks the most likely one.

**Analogy:** Autocomplete on steroids.

In [None]:
# @title üé≤ Topic 4: Predicting the Next Token

# Type a prompt here
prompt_text = "The most popular pet in the world is the" # @param {type:"string"}

# 1. Process the input
inputs = gpt_tokenizer(prompt_text, return_tensors="pt")

# 2. Ask the model for predictions
with torch.no_grad():
    outputs = gpt_model(**inputs)
    next_token_logits = outputs.logits[0, -1, :] # Get the scores for the last word
    probs = F.softmax(next_token_logits, dim=-1) # Convert scores to percentages

# 3. Get the Top 5 candidates
top_k = 5
top_probs, top_indices = torch.topk(probs, top_k)

# Visualization
candidates = [gpt_tokenizer.decode([idx]) for idx in top_indices]
scores = top_probs.numpy() * 100

plt.figure(figsize=(10, 5))
sns.barplot(x=scores, y=candidates, hue=candidates, palette="viridis", legend=False)
plt.xlabel("Probability (%)")
plt.title(f"What comes after: '{prompt_text}'?")
plt.show()

print("The AI Ranking:")
for c, s in zip(candidates, scores):
    print(f"Token: {repr(c):<15} | Probability: {s:.2f}%")

In [None]:
# @title ‚öñÔ∏è Topic 4.5: The Mirror (Visualizing Bias)
# LLMs reflect the bias of their training data (the internet).
# Let's see what the model thinks is the most likely pronoun for different jobs.

# Try changing "nurse" to "doctor" or "engineer" or "teacher".
profession_prompt = "The nurse called the patient because" # @param {type:"string"}

inputs = gpt_tokenizer(profession_prompt, return_tensors="pt")

with torch.no_grad():
    outputs = gpt_model(**inputs)
    # Get probabilities for the NEXT word
    probs = F.softmax(outputs.logits[0, -1, :], dim=-1)

# Let's specifically check the probability of "he" vs "she"
id_he = gpt_tokenizer.encode(" he")[0]
id_she = gpt_tokenizer.encode(" she")[0]

prob_he = probs[id_he].item() * 100
prob_she = probs[id_she].item() * 100

plt.figure(figsize=(8, 4))
sns.barplot(x=['he', 'she'], y=[prob_he, prob_she], palette=['lightblue', 'pink'])
plt.title(f"Probability of Pronouns after: '{profession_prompt}'")
plt.ylabel("Probability (%)")
plt.ylim(0, max(prob_he, prob_she) + 5)
plt.show()

print(f"Probability of 'he':  {prob_he:.2f}%")
print(f"Probability of 'she': {prob_she:.2f}%")
print(f"Bias Factor: The model is {max(prob_he, prob_she)/min(prob_he, prob_she):.1f}x more likely to choose one over the other.")

---

## **Topic 5: Autoregression (The Loop)**

**Concept:** How does it write a whole essay? It predicts one word, adds it to the list, reads the *new* list, predicts the next word, and repeats. This is **Autoregression**.

**Analogy:** Laying down train tracks while riding the train.

In [None]:
# @title üîÑ Topic 5: The Loop (Autoregression in Action)

start_prompt = "Once upon a time," # @param {type:"string"}
steps_to_generate = 10

current_text = start_prompt
print(f"Starting Text: {current_text}\n")

input_ids = gpt_tokenizer.encode(current_text, return_tensors="pt")

for i in range(steps_to_generate):
    # 1. Run the model
    with torch.no_grad():
        outputs = gpt_model(input_ids)
        next_token_logits = outputs.logits[0, -1, :]

    # 2. Pick the single best winner (Greedy decoding)
    next_token_id = torch.argmax(next_token_logits).item()
    next_token = gpt_tokenizer.decode([next_token_id])

    # 3. Update the text
    current_text += next_token
    input_ids = torch.cat([input_ids, torch.tensor([[next_token_id]])], dim=1)

    print(f"Step {i+1}: AI chose '{repr(next_token)}' -> Current Story: {current_text}")

In [None]:
# @title ü§ñ Topic 5.5: Base Model Behavior (Why it doesn't answer you)
# Modern AI (ChatGPT) is "Instruction Tuned". Old AI (GPT-2) is a "Base Model".
# Base models don't answer questions; they just add more text.
# Let's see what happens if we treat GPT-2 like ChatGPT.

question_prompt = "Q: What is the capital of France?\nA:"

#input_ids = gpt_tokenizer.encode(question_prompt, return_tensors="pt")

inputs = gpt_tokenizer(question_prompt, return_tensors="pt")
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

output = gpt_model.generate(
    input_ids,
    attention_mask=attention_mask,
    max_length=30,
    do_sample=True,
    temperature=0.7,
    pad_token_id=gpt_tokenizer.eos_token_id
)

generated_text = gpt_tokenizer.decode(output[0], skip_special_tokens=True)

print(f"--- PROMPT ---\n{question_prompt}")
print(f"\n--- GPT-2 RESPONSE ---")
print(generated_text)
print("\n--- ANALYSIS ---")
print("Did it answer 'Paris'? Or did it generate another Question?")
print("Base models often think they are writing a list of questions.")
print("To fix this, we need 'Instruction Tuning' (RLHF), which we will discuss in the lecture.")

---

## **Topic 6: Hallucination & Temperature (The Risks)**

**Concept:** "Hallucination" happens when the probabilistic "winner" is factually wrong, but statistically likely. We can control the AI's "creativity" using a setting called **Temperature**.

* **Low Temperature (0.1):** Safe, robotic, repetitive.
* **High Temperature (1.5):** Creative, chaotic, prone to nonsense.

In [None]:
# @title üå°Ô∏è Topic 6: Controlling Chaos (Temperature)

prompt = "The secret to happiness is" # @param {type:"string"}
temperature = 1.5 # @param {type:"slider", min:0.1, max:2.0, step:0.1}

# Encode
inputs = gpt_tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

# Generate with Temperature
output = gpt_model.generate(
    input_ids,
    attention_mask=attention_mask,
    max_length=50,
    do_sample=True, # Allow it to pick non-top options
    temperature=temperature,
    top_k=50,
    pad_token_id=gpt_tokenizer.eos_token_id
)

generated_text = gpt_tokenizer.decode(output[0], skip_special_tokens=True)

print(f"--- GENERATION (Temp: {temperature}) ---")
print(generated_text)
print("-" * 30)

if temperature < 0.5:
    print("Analysis: Low temp. The output is likely very standard and safe.")
elif temperature > 1.2:
    print("Analysis: High temp. Did you see any weird words or grammar mistakes? That is the AI taking 'risks'.")

---

In [None]:
# @title üé≤ Topic 7: The Seed (Controlling Randomness)
# In business, we often don't want randomness; we want consistency.
# By setting a "Seed", we force the random number generator to start at the same place.

prompt = "A quick recipe for a healthy breakfast is"
seed_number = 42 # @param {type:"integer"}

def generate_with_seed(seed):
    # Set the seed for reproducibility
    torch.manual_seed(seed)
    inputs = gpt_tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]
    
    output = gpt_model.generate(
        input_ids, 
        attention_mask=attention_mask,
        max_length=40, 
        do_sample=True, 
        temperature=1.0, # High temp usually means random, but the seed locks it!
        pad_token_id=gpt_tokenizer.eos_token_id
    )
    return gpt_tokenizer.decode(output[0], skip_special_tokens=True)

print(f"--- ATTEMPT 1 (Seed {seed_number}) ---")
print(generate_with_seed(seed_number))

print(f"\n--- ATTEMPT 2 (Same Seed {seed_number}) ---")
print(generate_with_seed(seed_number))

print(f"\n--- ATTEMPT 3 (Different Seed {seed_number + 1}) ---")
print(generate_with_seed(seed_number + 1))

print("\n--- LESSON ---")
print("If you control the Seed, you control the Chaos.")
print("Even with high temperature, the same seed produces the EXACT same text.")

---

In [None]:
# @title üö¶ Topic 8: The "Lie Detector" (Visualizing Confidence)
# We can ask the model: "How sure were you about that word?"
# Green = Confident. Red = Guessing.

from IPython.display import HTML, display

def colorize_text(words, confidences):
    html_str = ""
    for word, conf in zip(words, confidences):
        # Color: Green (High Conf) to Red (Low Conf)
        # We use HSL for easy color scaling
        # Hue 120 (Green) -> Hue 0 (Red)
        hue = conf * 120 
        html_str += f'<span style="background-color: hsl({hue}, 80%, 80%); padding: 2px; border-radius: 4px; margin: 1px;">{word}</span>'
    return html_str

prompt = "The fastest animal on earth is the"
inputs = gpt_tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

# Generate
with torch.no_grad():
    output = gpt_model.generate(
        input_ids, 
        attention_mask=attention_mask,
        max_length=20, 
        output_scores=True, 
        return_dict_in_generate=True,
        pad_token_id=gpt_tokenizer.eos_token_id
    )

# Extract tokens and scores
generated_ids = output.sequences[0]
scores = output.scores # Tuple of scores for each step

# Calculate probabilities for the GENERATED tokens
confidences = []
generated_words = []

# Skip the input prompt, look only at new words
input_len = input_ids.shape[1]
new_tokens = generated_ids[input_len:]

for i, token_id in enumerate(new_tokens):
    # Get the logits for this step
    step_logits = scores[i]
    # Convert to probability (0-1)
    probs = F.softmax(step_logits, dim=-1)
    # Get the probability of the token that was actually chosen
    token_prob = probs[0, token_id].item()
    
    confidences.append(token_prob)
    generated_words.append(gpt_tokenizer.decode([token_id]))

# Display
print(f"Prompt: {prompt}...")
display(HTML(colorize_text(generated_words, confidences)))

print("\n--- ANALYSIS ---")
print("GREEN words: The AI is sure (Common phrases, facts).")
print("RED/ORANGE words: The AI is guessing (Names, specific numbers, creative choices).")
print("If a Fact is RED, verify it!")

---

In [None]:
# @title üß† Topic 9: The Context Window (The Memory Budget)
# AI has a limited "Context Window". If you exceed it, it forgets the start.
# This is a conceptual visualization of how space fills up.

# Standard GPT-4 Context: ~128,000 tokens
# Standard Harry Potter Book: ~100,000 tokens
# This Cell visualizes the "Budget"

tokens_in_harry_potter = 100000
tokens_in_contract = 5000
tokens_in_email = 200

model_context_limit = 128000 

# Create simple bar chart
labels = ['GPT-4 Memory Limit', 'Harry Potter Book', 'Business Contract']
values = [model_context_limit, tokens_in_harry_potter, tokens_in_contract]
colors = ['lightgray', 'red', 'blue']

plt.figure(figsize=(10, 4))
bars = plt.barh(labels, values, color=colors)

# Add line for limit
plt.axvline(x=model_context_limit, color='black', linestyle='--', label='Context Limit')

plt.title("Visualizing the 'Memory Budget' (Context Window)")
plt.xlabel("Tokens")
plt.show()

print(f"Can GPT-4 read one Harry Potter book? {'Yes' if tokens_in_harry_potter < model_context_limit else 'No'}")
print(f"Can it read TWO Harry Potter books at once? {'Yes' if tokens_in_harry_potter * 2 < model_context_limit else 'No (It would forget the beginning)'}")

---

# The "Big Three" Model Families (2024-2025)

| Model Family | The Personality | Best Use Case | Weakness |
| :--- | :--- | :--- | :--- |
| **GPT-4o (OpenAI)** | **The Jack of All Trades** | ‚Ä¢ Logic & Reasoning<br>‚Ä¢ Complex Instruction Following<br>‚Ä¢ Data Analysis | Can be "lazy" or overly verbose. Tone often feels robotic. |
| **Claude 3.5 (Anthropic)** | **The Writer** | ‚Ä¢ **Humanities & Writing**<br>‚Ä¢ Nuanced summarization<br>‚Ä¢ Coding & Web Dev | Stricter safety refusals ("I can't do that"). |
| **Gemini 1.5 (Google)** | **The Analyst** | ‚Ä¢ **Massive Context** (Reading 10 books at once)<br>‚Ä¢ Video/Audio processing<br>‚Ä¢ Google Workspace integration | Sometimes prone to "sycophancy" (agreeing with you even if you are wrong). |
| **Llama 3 (Meta)** | **The Open Option** | ‚Ä¢ Privacy (can run locally)<br>‚Ä¢ Cost-efficiency<br>‚Ä¢ No subscription required (if local) | Requires hardware setup (unless used via cloud). |

### Summary Recommendation
* **Writing an Essay?** Use **Claude**.
* **Analyzing 50 PDFs?** Use **Gemini**.
* **Solving a Logic Puzzle?** Use **GPT-4o**.