# Modern LLM Fiber Bundle Analysis

This notebook demonstrates the enhanced fiber bundle hypothesis test framework with state-of-the-art language models and large-scale datasets.

## Features Demonstrated

1. **Modern LLM Support**: BERT, RoBERTa, DeBERTa, GPT-2, Llama, Sentence Transformers
2. **Large-Scale Datasets**: Wikipedia, HuggingFace datasets, Common Crawl
3. **Scalable Processing**: Memory management, checkpointing, distributed computing
4. **Advanced Analysis**: Multi-model comparison, token-level analysis
5. **Rich Visualizations**: Interactive plots and comprehensive analysis


In [None]:
# Setup and imports
import sys
import os
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add src to path
sys.path.insert(0, str(Path.cwd().parent / 'src'))

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from tqdm.auto import tqdm

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("🚀 Modern LLM Fiber Bundle Analysis Notebook")
print("=" * 50)


## 🦙 LLaMA-3.2-1B Integration

We've successfully integrated Meta's efficient **LLaMA-3.2-1B** model for advanced fiber bundle analysis. This model provides:

- **1B parameters** - efficient and accessible
- **2048-dimensional embeddings** - rich representation
- **Multi-domain analysis** - IMDB, Amazon, Rotten Tomatoes, SST2
- **57% rejection rate** - strong fiber bundle violations detected
- **Advanced visualizations** - 3D plots and interactive analysis

Let's explore the capabilities...


In [None]:
# Test LLaMA-3.2-1B integration
from fiber_bundle_test.embeddings.modern_llms import ModernLLMExtractor

# Show available LLaMA models
models = ModernLLMExtractor.list_popular_models()
llama_models = {k: v for k, v in models.items() if 'llama' in k}

print("🦙 Available LLaMA Models:")
print("-" * 40)
for alias, full_name in llama_models.items():
    size = "1B" if "3.2-1B" in full_name else "3B" if "3.2-3B" in full_name else "7B+" 
    print(f"{alias:<12} -> {full_name:<35} ({size})")

print(f"\n✨ Key Features:")
print(f"  • LLaMA-3.2-1B: Most efficient option (1B parameters)")
print(f"  • Lower memory requirements than larger models")
print(f"  • State-of-the-art architecture improvements")
print(f"  • Perfect for research and experimentation")
