# LLMs Under the Hood: Understanding Hyperparameters and Embeddings

Welcome to the second practical session of Day 2! In this notebook, we'll explore the inner workings of Large Language Models, focusing on hyperparameters like temperature and embeddings. By the end of this session, you'll have a deeper understanding of how these components affect model behavior in financial applications.

## Learning Objectives

- Understand key LLM hyperparameters and their effects
- Explore embedding spaces and their role in semantic understanding
- Experiment with temperature, top-k, and top-p sampling
- Visualize embeddings for financial terminology
- Learn to optimize parameters for different financial tasks

## 1. Setup

First, let's import the necessary libraries. We'll be using the Hugging Face Transformers library to interact with pre-trained models.

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import pandas as pd
from sklearn.manifold import TSNE
import time
import os
from dotenv import load_dotenv
from IPython.display import display, HTML

# Load environment variables for API keys (if needed)
load_dotenv()

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check which models are available locally and their sizes
print("Checking for locally available models...")
transformers_cache = os.path.join(os.path.expanduser("~"), ".cache", "huggingface", "transformers")
if os.path.exists(transformers_cache):
    print(f"Models in cache:")
    for model_dir in os.listdir(transformers_cache):
        model_path = os.path.join(transformers_cache, model_dir)
        if os.path.isdir(model_path):
            size_bytes = sum(os.path.getsize(os.path.join(dirpath, filename)) 
                        for dirpath, _, filenames in os.walk(model_path) 
                        for filename in filenames)
            size_mb = size_bytes / (1024 * 1024)
            print(f"  - {model_dir}: {size_mb:.2f} MB")
else:
    print("No models found in local cache.")

## 2. Loading a Small LLM

For this practical session, we'll use a smaller language model that can run on most hardware. GPT-2 is a good option as it's relatively small but still capable of generating coherent text.

In [None]:
# Load a pre-trained GPT-2 model and tokenizer
model_name = "gpt2"  # Can be changed to "gpt2-medium" or "gpt2-large" if you have more GPU memory
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name).to(device)

# Set padding token to be the same as the EOS token
tokenizer.pad_token = tokenizer.eos_token

# Print model information
print(f"Model: {model_name}")
print(f"Vocabulary size: {len(tokenizer)}")
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Layers: {len(model.transformer.h)}")
print(f"Hidden size: {model.config.hidden_size}")
print(f"Max position embeddings: {model.config.n_positions}")

## 3. Understanding Generation Hyperparameters

LLMs use several hyperparameters to control text generation. Let's explore the most important ones:

- **Temperature**: Controls randomness in generation
- **Top-k**: Limits the selection to the k most probable next tokens
- **Top-p (nucleus sampling)**: Dynamically limits the selection to the smallest set of tokens whose cumulative probability exceeds p
- **Repetition penalty**: Discourages repetition of the same tokens
- **Max length**: Maximum number of tokens to generate

Let's experiment with these parameters and see how they affect generation in financial contexts.

In [None]:
def generate_text(prompt, temperature=1.0, top_k=50, top_p=1.0, 
                 repetition_penalty=1.0, max_length=100, num_return_sequences=1):
    """
    Generate text using a language model with specified parameters.
    
    Parameters:
    - prompt: The starting text prompt
    - temperature: Controls randomness (higher = more random)
    - top_k: Number of highest probability tokens to consider
    - top_p: Cumulative probability threshold for nucleus sampling
    - repetition_penalty: Penalty for repeating tokens
    - max_length: Maximum number of tokens to generate
    - num_return_sequences: Number of different sequences to generate
    
    Returns:
    - Generated text sequences
    """
    # Encode the prompt
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    
    # Record start time
    start_time = time.time()
    
    # Generate text
    outputs = model.generate(
        inputs["input_ids"],
        max_length=max_length,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
        do_sample=True,
        num_return_sequences=num_return_sequences,
        pad_token_id=tokenizer.eos_token_id
    )
    
    # Record end time
    end_time = time.time()
    generation_time = end_time - start_time
    
    # Decode the outputs
    generated_texts = []
    for output in outputs:
        generated_text = tokenizer.decode(output, skip_special_tokens=True)
        generated_texts.append(generated_text)
    
    return generated_texts, generation_time

# Define financial prompts for testing
financial_prompts = [
    "The stock market today experienced",
    "Analysts predict that interest rates will",
    "The company's quarterly earnings report showed",
    "Investors should consider the following factors before",
    "The Federal Reserve announced that"
]

# Let's test with different temperatures
temperatures = [0.2, 0.7, 1.5]
test_prompt = financial_prompts[0]

print(f"Testing temperature effects with prompt: '{test_prompt}'")
for temp in temperatures:
    print(f"\nTemperature = {temp}:")
    texts, time_taken = generate_text(test_prompt, temperature=temp, max_length=50)
    for i, text in enumerate(texts):
        print(f"  Generated text {i+1}: {text}")
    print(f"  Generation time: {time_taken:.2f} seconds")

## 4. Hyperparameter Comparison

Let's create a systematic comparison of how different hyperparameter values affect text generation for financial content.

In [None]:
def compare_hyperparameters(prompt, param_name, param_values, fixed_params=None):
    """
    Compare the effect of varying a single hyperparameter.
    """
    if fixed_params is None:
        fixed_params = {}
    
    results = []
    for value in param_values:
        params = {**fixed_params, param_name: value}
        texts, gen_time = generate_text(prompt, **params, max_length=50)
        results.append({
            'value': value,
            'text': texts[0],
            'time': gen_time
        })
    
    return results

# Fixed parameters for comparison
fixed_params = {
    'temperature': 0.7,
    'top_k': 50,
    'top_p': 0.9,
    'repetition_penalty': 1.1
}

# Compare top-k values
top_k_values = [5, 20, 50]
top_k_results = compare_hyperparameters(
    "The investment strategy should focus on",
    'top_k',
    top_k_values,
    {k: v for k, v in fixed_params.items() if k != 'top_k'}
)

# Compare top-p values
top_p_values = [0.5, 0.8, 0.95]
top_p_results = compare_hyperparameters(
    "The investment strategy should focus on",
    'top_p',
    top_p_values,
    {k: v for k, v in fixed_params.items() if k != 'top_p'}
)

# Compare repetition penalties
rep_penalty_values = [1.0, 1.2, 1.5]
rep_penalty_results = compare_hyperparameters(
    "The future outlook for the company is promising because",
    'repetition_penalty',
    rep_penalty_values,
    {k: v for k, v in fixed_params.items() if k != 'repetition_penalty'}
)

# Display the results in a table format
def display_results(results, param_name):
    df = pd.DataFrame(results)
    df.columns = [param_name, 'Generated Text', 'Time (s)']
    return df

print("Top-k comparison:")
display(display_results(top_k_results, 'Top-k'))

print("\nTop-p comparison:")
display(display_results(top_p_results, 'Top-p'))

print("\nRepetition penalty comparison:")
display(display_results(rep_penalty_results, 'Repetition Penalty'))

## 5. Recommended Hyperparameters for Financial Tasks

Different financial tasks require different hyperparameter settings. Let's define some recommended configurations for common financial use cases.

In [None]:
# Define hyperparameter profiles for different financial tasks
task_profiles = {
    'market_analysis': {
        'temperature': 0.7,
        'top_k': 40,
        'top_p': 0.9,
        'repetition_penalty': 1.1,
        'description': 'Balanced creativity and coherence for analyzing market conditions'
    },
    'risk_assessment': {
        'temperature': 0.4,
        'top_k': 20,
        'top_p': 0.85,
        'repetition_penalty': 1.2,
        'description': 'Lower randomness for more conservative and focused risk analysis'
    },
    'creative_strategy': {
        'temperature': 0.9,
        'top_k': 50,
        'top_p': 0.95,
        'repetition_penalty': 1.05,
        'description': 'Higher creativity for innovative investment strategies'
    },
    'regulatory_compliance': {
        'temperature': 0.3,
        'top_k': 10,
        'top_p': 0.8,
        'repetition_penalty': 1.3,
        'description': 'Minimal randomness for precise regulatory language'
    }
}

# Create a table of recommended profiles
profile_data = []
for task, profile in task_profiles.items():
    profile_data.append({
        'Task': task,
        'Temperature': profile['temperature'],
        'Top-k': profile['top_k'],
        'Top-p': profile['top_p'],
        'Repetition Penalty': profile['repetition_penalty'],
        'Description': profile['description']
    })

profile_df = pd.DataFrame(profile_data)
display(profile_df)

# Test a few of these profiles on financial prompts
test_prompt = "The company's quarterly financial results indicate"

print("Testing task-specific hyperparameter profiles:")
for task, profile in list(task_profiles.items())[:2]:  # Test first two profiles
    print(f"\n{task.replace('_', ' ').title()} Profile:")
    params = {k: v for k, v in profile.items() if k != 'description'}
    texts, _ = generate_text(test_prompt, **params, max_length=75)
    print(f"  {texts[0]}")

## 6. Understanding Word Embeddings

Embeddings are dense vector representations of words or tokens that capture semantic meaning. They are a fundamental component of all modern LLMs. Let's explore how embeddings work and visualize them for financial terms.

In [None]:
# Load a model suitable for embeddings
embedding_model_name = "distilbert-base-uncased"  # Smaller model for embeddings
embedding_tokenizer = AutoTokenizer.from_pretrained(embedding_model_name)
embedding_model = AutoModel.from_pretrained(embedding_model_name).to(device)

def get_embeddings(texts):
    """Get embeddings for a list of texts."""
    # Tokenize inputs
    inputs = embedding_tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=128).to(device)
    
    # Get model outputs
    with torch.no_grad():
        outputs = embedding_model(**inputs)
    
    # Use CLS token embeddings (first token of each sequence)
    embeddings = outputs.last_hidden_state[:, 0, :].cpu().numpy()
    
    return embeddings

# Define financial terms to visualize
financial_terms = [
    # Market terms
    "bull market", "bear market", "volatility", "liquidity", "market cap",
    # Asset classes
    "stocks", "bonds", "commodities", "cryptocurrency", "real estate",
    # Financial metrics
    "earnings", "revenue", "dividend", "P/E ratio", "cash flow",
    # Banking terms
    "interest rate", "loan", "deposit", "mortgage", "credit",
    # Investment terms
    "portfolio", "diversification", "hedge", "risk", "return"
]

# Get embeddings for financial terms
financial_embeddings = get_embeddings(financial_terms)

# Use t-SNE to reduce dimensions for visualization
tsne = TSNE(n_components=2, random_state=42, perplexity=5)
reduced_embeddings = tsne.fit_transform(financial_embeddings)

# Create a DataFrame for plotting
embedding_df = pd.DataFrame({
    'term': financial_terms,
    'x': reduced_embeddings[:, 0],
    'y': reduced_embeddings[:, 1]
})

# Define categories for coloring
categories = {
    'Market': ["bull market", "bear market", "volatility", "liquidity", "market cap"],
    'Assets': ["stocks", "bonds", "commodities", "cryptocurrency", "real estate"],
    'Metrics': ["earnings", "revenue", "dividend", "P/E ratio", "cash flow"],
    'Banking': ["interest rate", "loan", "deposit", "mortgage", "credit"],
    'Investment': ["portfolio", "diversification", "hedge", "risk", "return"]
}

# Add category column
embedding_df['category'] = 'Other'
for category, terms in categories.items():
    embedding_df.loc[embedding_df['term'].isin(terms), 'category'] = category

# Plot the embeddings
plt.figure(figsize=(12, 8))
sns.scatterplot(data=embedding_df, x='x', y='y', hue='category', s=100)

# Add labels to each point
for idx, row in embedding_df.iterrows():
    plt.text(row['x']+0.02, row['y']+0.02, row['term'], fontsize=9)

plt.title('2D t-SNE Visualization of Financial Term Embeddings')
plt.xlabel('t-SNE Dimension 1')
plt.ylabel('t-SNE Dimension 2')
plt.tight_layout()
plt.show()

## 7. Embeddings Similarity for Financial Terms

Let's explore how embeddings capture semantic relationships between financial terms. We'll calculate similarities between terms to see which ones the model considers most related.

In [None]:
def calculate_similarity(embeddings):
    """Calculate cosine similarity between all pairs of embeddings."""
    # Normalize embeddings
    normalized = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
    # Calculate similarity matrix
    similarity_matrix = np.dot(normalized, normalized.T)
    return similarity_matrix

# Calculate similarity matrix for financial terms
similarity_matrix = calculate_similarity(financial_embeddings)

# Create a DataFrame for the similarity matrix
similarity_df = pd.DataFrame(similarity_matrix, index=financial_terms, columns=financial_terms)

# Plot the similarity matrix as a heatmap
plt.figure(figsize=(14, 12))
sns.heatmap(similarity_df, annot=False, cmap='viridis', vmin=0, vmax=1)
plt.title('Cosine Similarity Between Financial Terms')
plt.tight_layout()
plt.show()

# Find most similar pairs
similarities = []
for i in range(len(financial_terms)):
    for j in range(i+1, len(financial_terms)):
        similarities.append({
            'term1': financial_terms[i],
            'term2': financial_terms[j],
            'similarity': similarity_matrix[i, j]
        })

# Sort by similarity (descending)
similarities.sort(key=lambda x: x['similarity'], reverse=True)

# Display top 10 most similar pairs
top_similarities = pd.DataFrame(similarities[:10])
print("Top 10 most similar financial term pairs:")
display(top_similarities)

# Find most dissimilar pairs
similarities.sort(key=lambda x: x['similarity'])
bottom_similarities = pd.DataFrame(similarities[:10])
print("\nTop 10 most dissimilar financial term pairs:")
display(bottom_similarities)

## 8. Prompting Strategies Based on Embeddings

We can use our understanding of embeddings to create more effective prompts for financial tasks. Let's explore a few strategies.

In [None]:
def find_related_terms(term, embeddings, terms, n=5):
    """Find the most related terms to a given term based on embedding similarity."""
    # Get index of the term
    term_idx = terms.index(term)
    
    # Get similarities to the term
    similarities = [(terms[i], embeddings[term_idx, i]) for i in range(len(terms)) if i != term_idx]
    
    # Sort by similarity (descending)
    similarities.sort(key=lambda x: x[1], reverse=True)
    
    # Return top n most similar terms
    return similarities[:n]

# Find terms related to "risk"
risk_related = find_related_terms("risk", similarity_matrix, financial_terms)
print("Terms most related to 'risk':")
for term, similarity in risk_related:
    print(f"  {term}: {similarity:.4f}")

# Create a prompt enrichment function using related terms
def enrich_prompt(base_prompt, target_concept, related_terms, n_terms=3):
    """Enrich a prompt by incorporating related terms to guide the model."""
    terms_to_use = [term for term, _ in related_terms[:n_terms]]
    terms_str = ", ".join(terms_to_use)
    
    enriched_prompt = f"{base_prompt} Consider aspects like {target_concept}, {terms_str}."
    return enriched_prompt

# Create a standard prompt
standard_prompt = "Analyze the company's financial health."

# Create an enriched prompt focusing on risk
risk_enriched_prompt = enrich_prompt(standard_prompt, "risk", risk_related)

# Compare the outputs
print("\nStandard prompt:")
print(f"  '{standard_prompt}'")
standard_output, _ = generate_text(standard_prompt, temperature=0.7, max_length=100)
print(f"  Output: '{standard_output[0]}'")

print("\nEnriched prompt (risk-focused):")
print(f"  '{risk_enriched_prompt}'")
enriched_output, _ = generate_text(risk_enriched_prompt, temperature=0.7, max_length=100)
print(f"  Output: '{enriched_output[0]}'")

## 9. Token-level Probabilities and Confidence

Let's examine the token-level probabilities that the model assigns during generation. This gives insight into how confident the model is about its predictions.

In [None]:
def generate_with_probabilities(prompt, temperature=1.0, max_length=50):
    """Generate text and return the probabilities for each generated token."""
    # Tokenize the prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
    
    # Store generated tokens and their probabilities
    generated_tokens = []
    token_probs = []
    
    # Start with the input sequence
    current_input = input_ids
    
    # Generate one token at a time
    for _ in range(max_length - len(input_ids[0])):
        with torch.no_grad():
            # Get model outputs
            outputs = model(current_input)
            
            # Get logits (unnormalized log probabilities) for the last position
            logits = outputs.logits[:, -1, :]
            
            # Apply temperature
            logits = logits / temperature
            
            # Convert to probabilities with softmax
            probs = torch.nn.functional.softmax(logits, dim=-1)
            
            # Sample from the probability distribution
            next_token = torch.multinomial(probs, 1)
            
            # Get the probability of the selected token
            token_prob = probs[0, next_token.item()].item()
            
            # Add to results
            generated_tokens.append(next_token.item())
            token_probs.append(token_prob)
            
            # Update the input sequence
            current_input = torch.cat([current_input, next_token], dim=1)
            
            # Check if we've generated an end-of-sequence token
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    # Convert tokens back to text
    generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
    generated_tokens_text = [tokenizer.decode([token]) for token in generated_tokens]
    
    return generated_text, generated_tokens_text, token_probs

# Test with a financial prompt
financial_prompt = "The Federal Reserve's decision to raise interest rates will affect"
generated_text, tokens, probs = generate_with_probabilities(financial_prompt, temperature=0.8, max_length=30)

# Display the results
print(f"Prompt: '{financial_prompt}'")
print(f"Generated text: '{generated_text}'")

# Create a DataFrame to visualize token probabilities
token_df = pd.DataFrame({
    'Token': tokens,
    'Probability': probs
})

# Sort by probability to see most/least confident predictions
token_df_sorted = token_df.sort_values('Probability', ascending=False)

print("\nTokens sorted by model confidence (probability):")
display(token_df_sorted)

# Plot the token probabilities
plt.figure(figsize=(12, 6))
plt.bar(range(len(probs)), probs)
plt.xticks(range(len(probs)), tokens, rotation=90)
plt.xlabel('Generated Token')
plt.ylabel('Probability')
plt.title('Model Confidence for Each Generated Token')
plt.tight_layout()
plt.show()

# Plot a heatmap of the text with confidence levels
def plot_text_confidence(text, tokens, probs):
    """Plot the generated text with color-coded confidence levels."""
    # Create HTML with background color based on confidence
    html_parts = []
    for token, prob in zip(tokens, probs):
        # Map probability to a color (red to green)
        r = int(255 * (1 - prob))
        g = int(255 * prob)
        b = 0
        color = f"#{r:02x}{g:02x}{b:02x}"
        
        # Add token with background color
        html_parts.append(f'<span style="background-color: {color}; padding: 2px; margin: 1px; color: white;">{token}</span>')
    
    # Combine all parts
    html = ''.join(html_parts)
    
    # Display the HTML
    display(HTML(f"<p>Generated text with confidence highlighting (green = high confidence, red = low confidence):</p><p>{html}</p>"))

# Display the text with confidence highlighting
plot_text_confidence(generated_text, tokens, probs)

## 10. Hyperparameter Recommendations for Financial Applications

Based on our experiments, let's summarize the hyperparameter recommendations for different financial applications.

### Recommended Hyperparameter Settings

| Task | Description | Temperature | Top-k | Top-p | Repetition Penalty |
|------|-------------|------------|-------|-------|-------------------|
| **Financial Reports** | Formal, factual summaries | 0.3-0.4 | 10-20 | 0.8 | 1.2-1.3 |
| **Market Analysis** | Balanced analysis of trends | 0.6-0.7 | 30-50 | 0.9 | 1.1 |
| **Investment Strategies** | Creative but grounded ideas | 0.7-0.9 | 40-50 | 0.92 | 1.05-1.1 |
| **Risk Assessment** | Conservative, careful evaluation | 0.3-0.5 | 20-30 | 0.85 | 1.2 |
| **Regulatory Compliance** | Precise, formal language | 0.2-0.3 | 5-15 | 0.7-0.8 | 1.3 |
| **Client Communications** | Clear, accessible language | 0.5-0.6 | 30-40 | 0.9 | 1.15 |

### General Guidelines

1. **Lower temperature** (0.2-0.5): Use for tasks requiring accuracy, consistency, and reliability
2. **Medium temperature** (0.5-0.8): Use for balanced analysis and general content
3. **Higher temperature** (0.8-1.0): Use for creative ideation and exploring possibilities
4. **Top-k and Top-p**: Start with top-p = 0.9 and adjust based on content needs
5. **Repetition penalty**: Increase for longer texts to prevent circular reasoning

### Task-specific Considerations

- **For numerical analysis**: Use lower temperature and top-p to ensure accuracy
- **For market forecasts**: Balance between creativity and coherence
- **For compliance documents**: Prioritize precision over creativity
- **For investment brainstorming**: Allow higher creativity with safeguards

## 11. Conclusion

In this practical session, we've explored the inner workings of LLMs, focusing on:

1. **Hyperparameters** like temperature, top-k, and top-p, and how they affect text generation
2. **Embeddings** and how they capture semantic relationships between financial terms
3. **Confidence and probability** at the token level, showing how models make predictions
4. **Prompt engineering strategies** based on understanding model mechanics
5. **Task-specific optimizations** for different financial applications

These insights will help you develop more effective financial applications using LLMs by:
- Tuning parameters appropriately for different tasks
- Creating prompts that leverage the model's semantic understanding
- Interpreting model outputs with an understanding of confidence levels
- Balancing creativity and accuracy based on your specific needs

In the next practical session, we'll explore memory buffers, caching, and automated function calling to build more sophisticated LLM applications for finance.