# HuggingFace Setup and Local Model Usage

In this notebook, we'll set up HuggingFace and learn how to work with language models locally. HuggingFace is the most popular platform for sharing and using pre-trained models.

## Learning Objectives
- Set up HuggingFace transformers library
- Understand the HuggingFace ecosystem
- Load and use models locally
- Practice with different model types
- Understand model sizes and hardware requirements
- Learn caching and optimization strategies

## 1. HuggingFace Ecosystem Overview

### Key Components:
- **🤗 Hub**: Repository for models, datasets, and demos
- **Transformers**: Library for using pre-trained models
- **Datasets**: Library for accessing and processing datasets
- **Tokenizers**: Fast tokenization library
- **Accelerate**: Distributed training and inference
- **Gradio**: Quick UI creation for ML models

### Popular Models for Finance:
- **FinBERT**: Financial sentiment analysis
- **BloombergGPT**: Financial domain language model
- **RoBERTa**: Robust text understanding
- **GPT-2/GPT-Neo**: Text generation
- **T5**: Text-to-text transfer transformer

## 2. Environment Setup and Authentication

Let's start by setting up our HuggingFace environment:

In [None]:
# Import essential libraries
import os
import torch
from transformers import (
    AutoTokenizer, AutoModel, AutoModelForSequenceClassification,
    AutoModelForCausalLM, pipeline, set_seed
)
import warnings
warnings.filterwarnings('ignore')

# Set seed for reproducibility
set_seed(42)

print("🤗 HuggingFace Transformers Setup")
print(f"Transformers version: {__import__('transformers').__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Check available device
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"🚀 Using GPU: {torch.cuda.get_device_name(0)}")
elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
    device = torch.device("mps")
    print("🍎 Using Apple MPS")
else:
    device = torch.device("cpu")
    print("💻 Using CPU")

print(f"Device: {device}")

## 3. HuggingFace Authentication (Optional)

Some models require authentication. Let's set up the HuggingFace token:

In [None]:
# Load environment variables for HuggingFace token
from dotenv import load_dotenv

# Load .env file if it exists
if os.path.exists('.env'):
    load_dotenv()
    print("✅ Loaded .env file")
else:
    print("⚠️ No .env file found")
    print("   Create one from .env.example if you need API access")

# Check for HuggingFace token
hf_token = os.getenv('HUGGINGFACE_API_KEY')

if hf_token:
    print("🔑 HuggingFace token found")
    # Login to HuggingFace Hub (optional but recommended)
    try:
        from huggingface_hub import login
        login(token=hf_token)
        print("✅ Successfully logged in to HuggingFace Hub")
    except Exception as e:
        print(f"⚠️ Could not login to HuggingFace Hub: {e}")
        print("   This is fine for public models")
else:
    print("🔓 No HuggingFace token - using public models only")
    print("   To access private models, add HUGGINGFACE_API_KEY to .env")

print("\n📝 HuggingFace token info:")
print("   • Get token: https://huggingface.co/settings/tokens")
print("   • Token types: Read, Write, or Fine-grained")
print("   • Required for: Private models, uploading models, some datasets")

## 4. Understanding Model Sizes and Requirements

Let's explore different model sizes and their requirements:

In [None]:
# Model size reference for financial applications
model_info = {
    "Small Models (< 500M params)": {
        "examples": ["distilbert-base-uncased", "albert-base-v2"],
        "memory": "~2GB RAM",
        "speed": "Fast",
        "use_case": "Classification, simple NLP tasks"
    },
    "Medium Models (500M - 2B params)": {
        "examples": ["bert-base-uncased", "roberta-base"],
        "memory": "~4-8GB RAM",
        "speed": "Medium",
        "use_case": "Most financial NLP tasks"
    },
    "Large Models (2B - 7B params)": {
        "examples": ["gpt2-xl", "microsoft/DialoGPT-large"],
        "memory": "~16-32GB RAM",
        "speed": "Slow",
        "use_case": "Text generation, complex reasoning"
    },
    "Extra Large Models (7B+ params)": {
        "examples": ["microsoft/DialoGPT-large", "EleutherAI/gpt-neo-2.7B"],
        "memory": "~32GB+ RAM or GPU",
        "speed": "Very slow on CPU",
        "use_case": "Advanced generation, research"
    }
}

print("📊 Model Size Guide for Financial Applications\n")
for category, info in model_info.items():
    print(f"🔹 {category}")
    print(f"   Examples: {', '.join(info['examples'])}")
    print(f"   Memory: {info['memory']}")
    print(f"   Speed: {info['speed']}")
    print(f"   Best for: {info['use_case']}\n")

# Check available memory
import psutil
available_memory = psutil.virtual_memory().available / (1024**3)  # GB
total_memory = psutil.virtual_memory().total / (1024**3)  # GB

print(f"💾 Your System:")
print(f"   Total RAM: {total_memory:.1f} GB")
print(f"   Available RAM: {available_memory:.1f} GB")

# Recommendations based on available memory
if available_memory < 4:
    print("\n💡 Recommendation: Use small models (DistilBERT, ALBERT)")
elif available_memory < 16:
    print("\n💡 Recommendation: Medium models work well (BERT, RoBERTa)")
else:
    print("\n💡 Recommendation: You can run large models (GPT-2, T5-large)")

## 5. Loading Your First Model: Financial Sentiment Analysis

Let's start with a practical example - financial sentiment analysis:

In [None]:
# Load a financial sentiment analysis model
print("📈 Loading Financial Sentiment Analysis Model...\n")

# Using a financial domain-specific model
model_name = "ProsusAI/finbert"

try:
    # Load tokenizer and model
    print(f"🔄 Loading {model_name}...")
    
    # Using pipeline for easy inference
    finbert = pipeline(
        "sentiment-analysis",
        model=model_name,
        tokenizer=model_name,
        device=0 if torch.cuda.is_available() else -1
    )
    
    print("✅ FinBERT loaded successfully!")
    
    # Test with financial texts
    financial_texts = [
        "The company reported strong quarterly earnings with revenue up 15%",
        "Stock prices fell dramatically after the disappointing earnings report",
        "The market remained stable with mixed trading volumes",
        "Apple announced a new product launch that exceeded expectations",
        "The Federal Reserve raised interest rates by 0.5%"
    ]
    
    print("\n🔍 Analyzing Financial Sentiment:")
    print("="*60)
    
    for i, text in enumerate(financial_texts, 1):
        result = finbert(text)
        sentiment = result[0]['label']
        confidence = result[0]['score']
        
        print(f"\n{i}. Text: {text}")
        print(f"   Sentiment: {sentiment} (confidence: {confidence:.3f})")
        
        # Add emoji for visualization
        if sentiment == 'positive':
            emoji = "📈 🟢"
        elif sentiment == 'negative':
            emoji = "📉 🔴"
        else:
            emoji = "➡️ 🟡"
        
        print(f"   {emoji}")
    
    print("\n✨ FinBERT Analysis Complete!")
    
except Exception as e:
    print(f"❌ Error loading FinBERT: {e}")
    print("\n🔄 Falling back to general sentiment model...")
    
    # Fallback to a smaller, general sentiment model
    try:
        general_sentiment = pipeline(
            "sentiment-analysis",
            model="cardiffnlp/twitter-roberta-base-sentiment-latest",
            device=0 if torch.cuda.is_available() else -1
        )
        
        print("✅ General sentiment model loaded!")
        
        # Test with one example
        test_text = "The company reported strong quarterly earnings"
        result = general_sentiment(test_text)
        print(f"\nTest: {test_text}")
        print(f"Result: {result}")
        
    except Exception as e2:
        print(f"❌ Fallback also failed: {e2}")
        print("   This might be due to network issues or insufficient memory")

## 6. Working with Different Model Types

Let's explore different types of models commonly used in finance:

In [None]:
# Demonstrate different model types
print("🎯 Exploring Different Model Types for Finance\n")

# 1. Classification Model (Already done above with FinBERT)
print("1️⃣ Classification Models:")
print("   ✅ FinBERT (financial sentiment) - demonstrated above")
print("   • Use case: Sentiment analysis, document classification")
print("   • Output: Probability scores for predefined classes\n")

# 2. Feature Extraction Model
print("2️⃣ Feature Extraction Models:")
try:
    # Load a lightweight model for feature extraction
    feature_model_name = "sentence-transformers/all-MiniLM-L6-v2"
    feature_extractor = pipeline(
        "feature-extraction",
        model=feature_model_name,
        device=0 if torch.cuda.is_available() else -1
    )
    
    # Extract features from financial text
    financial_text = "The Federal Reserve announced an interest rate cut"
    features = feature_extractor(financial_text)
    
    print(f"   ✅ Loaded: {feature_model_name}")
    print(f"   • Text: {financial_text}")
    print(f"   • Feature vector shape: {len(features[0])} dimensions")
    print(f"   • Use case: Document similarity, semantic search, clustering\n")
    
except Exception as e:
    print(f"   ❌ Could not load feature extraction model: {e}\n")

# 3. Text Generation Model (Lightweight)
print("3️⃣ Text Generation Models:")
try:
    # Use a small GPT-2 model for demonstration
    generator = pipeline(
        "text-generation",
        model="gpt2",
        device=0 if torch.cuda.is_available() else -1,
        pad_token_id=50256  # GPT-2 doesn't have a pad token
    )
    
    # Generate financial text
    prompt = "The stock market today"
    generated = generator(
        prompt,
        max_length=50,
        num_return_sequences=1,
        temperature=0.7,
        do_sample=True,
        pad_token_id=50256
    )
    
    print(f"   ✅ Loaded: GPT-2")
    print(f"   • Prompt: {prompt}")
    print(f"   • Generated: {generated[0]['generated_text']}")
    print(f"   • Use case: Report generation, content creation\n")
    
except Exception as e:
    print(f"   ❌ Could not load text generation model: {e}\n")

# 4. Question Answering Model
print("4️⃣ Question Answering Models:")
try:
    qa_pipeline = pipeline(
        "question-answering",
        model="distilbert-base-cased-distilled-squad",
        device=0 if torch.cuda.is_available() else -1
    )
    
    # Financial Q&A example
    context = """
    Apple Inc. reported quarterly revenue of $81.4 billion, up 8% year over year. 
    The company's iPhone sales were particularly strong, contributing $51.3 billion 
    to total revenue. Services revenue reached $19.2 billion, marking a 12% increase.
    """
    
    question = "What was Apple's total quarterly revenue?"
    
    answer = qa_pipeline(question=question, context=context)
    
    print(f"   ✅ Loaded: DistilBERT QA")
    print(f"   • Question: {question}")
    print(f"   • Answer: {answer['answer']} (confidence: {answer['score']:.3f})")
    print(f"   • Use case: Document Q&A, information extraction\n")
    
except Exception as e:
    print(f"   ❌ Could not load QA model: {e}\n")

print("🎉 Model exploration complete!")
print("\n💡 Key takeaways:")
print("   • Different models for different tasks")
print("   • Start with smaller models for testing")
print("   • Use pipelines for quick prototyping")
print("   • Consider domain-specific models for finance")

## 7. Model Caching and Optimization

Understanding how models are cached and optimized is important for efficient development:

In [None]:
# Understanding model caching and optimization
import transformers
from pathlib import Path

print("💾 Model Caching and Optimization\n")

# 1. Check cache directory
cache_dir = transformers.utils.hub.TRANSFORMERS_CACHE
print(f"🗂️ HuggingFace Cache Directory:")
print(f"   Location: {cache_dir}")

# Check cache size
try:
    cache_path = Path(cache_dir)
    if cache_path.exists():
        cache_size = sum(f.stat().st_size for f in cache_path.rglob('*') if f.is_file())
        cache_size_gb = cache_size / (1024**3)
        print(f"   Size: {cache_size_gb:.2f} GB")
        
        # List some cached models
        cached_models = [d.name for d in cache_path.iterdir() if d.is_dir()]
        if cached_models:
            print(f"   Cached models: {len(cached_models)} items")
            print(f"   Recent: {cached_models[:3]}..." if len(cached_models) > 3 else f"   All: {cached_models}")
    else:
        print("   Cache directory not found (no models cached yet)")
except Exception as e:
    print(f"   Could not analyze cache: {e}")

print("\n⚡ Optimization Strategies:")

# 2. Model quantization example
print("\n1️⃣ Model Quantization (Reducing Memory):")
print("   • 8-bit quantization: ~50% memory reduction")
print("   • 4-bit quantization: ~75% memory reduction")
print("   • Trade-off: Slight accuracy loss for much lower memory")

# 3. Efficient model loading
print("\n2️⃣ Efficient Loading Strategies:")
strategies = {
    "device_map='auto'": "Automatically distribute model across available devices",
    "torch_dtype=torch.float16": "Use half precision (50% memory reduction)",
    "low_cpu_mem_usage=True": "Reduce CPU memory during loading",
    "load_in_8bit=True": "Load model in 8-bit precision",
    "offload_folder='./offload'": "Offload weights to disk when needed"
}

for strategy, description in strategies.items():
    print(f"   • {strategy}: {description}")

# 4. Practical example of optimized loading
print("\n3️⃣ Optimized Loading Example:")
print("```python")
print("# Memory-efficient model loading")
print("model = AutoModelForCausalLM.from_pretrained(")
print("    'microsoft/DialoGPT-large',")
print("    torch_dtype=torch.float16,  # Half precision")
print("    device_map='auto',          # Auto device placement")
print("    low_cpu_mem_usage=True,     # Reduce CPU usage")
print(")")
print("```")

print("\n🔧 Cache Management:")
print("   • Models are automatically cached on first download")
print("   • Subsequent loads are much faster")
print("   • Clear cache if disk space is limited")
print("   • Use offline mode when internet is unavailable")

print("\n📊 Memory Usage Tips:")
memory_tips = [
    "Monitor GPU/RAM usage during inference",
    "Use smaller batch sizes if memory is limited",
    "Clear model from memory when switching models",
    "Consider model distillation for production",
    "Use gradient checkpointing for training"
]

for i, tip in enumerate(memory_tips, 1):
    print(f"   {i}. {tip}")

# 5. Check current memory usage
if torch.cuda.is_available():
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
    gpu_allocated = torch.cuda.memory_allocated() / (1024**3)
    gpu_reserved = torch.cuda.memory_reserved() / (1024**3)
    
    print(f"\n🖥️ GPU Memory Status:")
    print(f"   Total: {gpu_memory:.1f} GB")
    print(f"   Allocated: {gpu_allocated:.2f} GB")
    print(f"   Reserved: {gpu_reserved:.2f} GB")
    print(f"   Available: {gpu_memory - gpu_reserved:.1f} GB")

print("\n💡 Next steps:")
print("   • Experiment with different model sizes")
print("   • Test optimization strategies")
print("   • Monitor performance vs. accuracy trade-offs")
print("   • Set up API connections for larger models")

## 8. Practical Exercise: Financial Text Analysis

Let's put it all together with a comprehensive financial text analysis:

In [None]:
# Comprehensive financial text analysis pipeline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

print("📊 Financial Text Analysis Pipeline\n")

# Sample financial news/reports
financial_texts = [
    "Apple Inc. reported record quarterly revenue of $123.9 billion, exceeding analyst expectations.",
    "Tesla stock plummeted 15% following disappointing delivery numbers for Q3.",
    "The Federal Reserve maintained interest rates at current levels, signaling a cautious approach.",
    "Microsoft Azure cloud services showed strong growth with 50% year-over-year increase.",
    "Oil prices surged to $85 per barrel amid supply chain disruptions in the Middle East.",
    "The unemployment rate fell to 3.5%, indicating a robust labor market recovery.",
    "Cryptocurrency markets experienced volatility with Bitcoin dropping below $30,000.",
    "Inflation concerns continue to weigh on consumer sentiment and spending patterns.",
    "Goldman Sachs upgraded its outlook for emerging markets in the upcoming quarter.",
    "Tech IPOs showed mixed results with some companies struggling to meet initial valuations."
]

# Create DataFrame
df = pd.DataFrame({
    'text': financial_texts,
    'id': range(1, len(financial_texts) + 1)
})

print(f"📈 Analyzing {len(financial_texts)} financial texts...\n")

# Try to load sentiment model (with fallback)
sentiment_model = None
try:
    # Try financial-specific model first
    sentiment_model = pipeline(
        "sentiment-analysis",
        model="ProsusAI/finbert",
        device=0 if torch.cuda.is_available() else -1
    )
    model_name = "FinBERT"
    print("✅ Using FinBERT for sentiment analysis")
except:
    try:
        # Fallback to general model
        sentiment_model = pipeline(
            "sentiment-analysis",
            model="cardiffnlp/twitter-roberta-base-sentiment-latest",
            device=0 if torch.cuda.is_available() else -1
        )
        model_name = "RoBERTa"
        print("✅ Using RoBERTa for sentiment analysis")
    except:
        print("❌ Could not load sentiment model")

# Perform sentiment analysis
if sentiment_model:
    print("\n🔍 Performing sentiment analysis...")
    
    sentiments = []
    confidences = []
    
    for text in df['text']:
        try:
            result = sentiment_model(text)
            sentiment = result[0]['label'].lower()
            confidence = result[0]['score']
            
            # Normalize sentiment labels
            if 'pos' in sentiment or sentiment == 'positive':
                sentiment = 'positive'
            elif 'neg' in sentiment or sentiment == 'negative':
                sentiment = 'negative'
            else:
                sentiment = 'neutral'
            
            sentiments.append(sentiment)
            confidences.append(confidence)
        except Exception as e:
            print(f"   Error processing text: {e}")
            sentiments.append('neutral')
            confidences.append(0.5)
    
    # Add results to DataFrame
    df['sentiment'] = sentiments
    df['confidence'] = confidences
    
    # Display results
    print("\n📋 Analysis Results:")
    print("="*80)
    
    for idx, row in df.iterrows():
        # Add emoji for sentiment
        if row['sentiment'] == 'positive':
            emoji = "📈"
        elif row['sentiment'] == 'negative':
            emoji = "📉"
        else:
            emoji = "➡️"
        
        print(f"\n{row['id']:2d}. {row['text'][:60]}...")
        print(f"    {emoji} {row['sentiment'].title()} (confidence: {row['confidence']:.3f})")
    
    # Create visualization
    plt.figure(figsize=(12, 6))
    
    # Subplot 1: Sentiment distribution
    plt.subplot(1, 2, 1)
    sentiment_counts = df['sentiment'].value_counts()
    colors = {'positive': 'green', 'negative': 'red', 'neutral': 'orange'}
    bars = plt.bar(sentiment_counts.index, sentiment_counts.values, 
                   color=[colors.get(x, 'blue') for x in sentiment_counts.index])
    plt.title(f'Sentiment Distribution\n({model_name} Analysis)', fontsize=12)
    plt.ylabel('Number of Texts')
    plt.xlabel('Sentiment')
    
    # Add value labels on bars
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height + 0.1,
                f'{int(height)}', ha='center', va='bottom')
    
    # Subplot 2: Confidence distribution
    plt.subplot(1, 2, 2)
    plt.hist(df['confidence'], bins=10, alpha=0.7, color='skyblue', edgecolor='black')
    plt.title('Confidence Score Distribution', fontsize=12)
    plt.xlabel('Confidence Score')
    plt.ylabel('Frequency')
    plt.axvline(df['confidence'].mean(), color='red', linestyle='--', 
                label=f'Mean: {df["confidence"].mean():.3f}')
    plt.legend()
    
    plt.tight_layout()
    plt.savefig('financial_sentiment_analysis.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    # Summary statistics
    print("\n📊 Summary Statistics:")
    print(f"   Total texts analyzed: {len(df)}")
    print(f"   Positive sentiment: {sentiment_counts.get('positive', 0)} ({sentiment_counts.get('positive', 0)/len(df)*100:.1f}%)")
    print(f"   Negative sentiment: {sentiment_counts.get('negative', 0)} ({sentiment_counts.get('negative', 0)/len(df)*100:.1f}%)")
    print(f"   Neutral sentiment: {sentiment_counts.get('neutral', 0)} ({sentiment_counts.get('neutral', 0)/len(df)*100:.1f}%)")
    print(f"   Average confidence: {df['confidence'].mean():.3f}")
    print(f"   Model used: {model_name}")
    
    # Save results
    df.to_csv('financial_sentiment_results.csv', index=False)
    print("\n💾 Results saved to 'financial_sentiment_results.csv'")
    
else:
    print("❌ Sentiment analysis skipped due to model loading issues")

print("\n🎉 Financial text analysis complete!")
print("\n💡 This demonstrates:")
print("   • Loading and using pre-trained models")
print("   • Processing multiple texts efficiently")
print("   • Handling errors and fallbacks gracefully")
print("   • Visualizing and saving results")
print("   • Real-world application to financial data")

## 9. Summary and Next Steps

### ✅ What We've Accomplished:

1. **HuggingFace Setup**: Installed and configured the transformers library
2. **Authentication**: Set up HuggingFace Hub access (optional)
3. **Model Understanding**: Learned about different model sizes and requirements
4. **Practical Application**: Used financial sentiment analysis models
5. **Model Types**: Explored classification, generation, and Q&A models
6. **Optimization**: Understood caching and memory management
7. **End-to-End Pipeline**: Built a complete financial text analysis workflow

### 🔧 Key Technical Skills:

- Using HuggingFace `pipeline()` for quick prototyping
- Loading models with memory optimization
- Understanding model caching and storage
- Error handling and fallback strategies
- Processing multiple texts efficiently
- Visualizing and saving results

### 📈 Financial Applications:

- **Sentiment Analysis**: News, earnings calls, social media
- **Document Classification**: Research reports, regulatory filings
- **Information Extraction**: Key metrics from financial documents
- **Text Generation**: Report summaries, investment insights
- **Question Answering**: Automated analysis of financial documents

### 🚀 Next Steps:

1. **API Integration**: Connect to cloud-based LLMs (OpenAI, DeepSeek)
2. **Advanced Techniques**: Fine-tuning, custom models
3. **Production Deployment**: Scaling and optimization
4. **Domain Adaptation**: Financial-specific model training
5. **Integration**: Combining local models with API-based models

### 🛠️ Troubleshooting Tips:

- **Memory Issues**: Use smaller models, enable quantization
- **Network Problems**: Work offline with cached models
- **Performance**: Use GPU if available, optimize batch sizes
- **Accuracy**: Try domain-specific models (FinBERT vs. general BERT)

### 📚 Additional Resources:

- [HuggingFace Course](https://huggingface.co/course/)
- [Transformers Documentation](https://huggingface.co/docs/transformers/)
- [Financial NLP Papers](https://paperswithcode.com/task/financial-sentiment-analysis)
- [Model Hub](https://huggingface.co/models?pipeline_tag=text-classification&sort=downloads)

**You're now ready to work with local LLMs and integrate them with API-based solutions!**