# FinBERT Modular Profiling Notebook

This notebook provides a modular framework for profiling the baseline FinBERT model.

**Device Support:**
- **CUDA (NVIDIA GPUs)**: Full profiling with CPU and CUDA time tracking
- **MPS (Apple Silicon)**: CPU time profiling only (GPU execution not separately tracked)
- **CPU**: Standard CPU profiling


## Quick Start Guide

### How to Use This Modular Notebook

1. **Run Setup Cells (1-8)**: Execute all cells from the beginning through "Setup Paths" to load utilities and configuration.

2. **Configure Your Experiment (Cell 7)**: 
   - Set `SELECTED_VARIANT` to currently available model variants: `'baseline'` or `'fp16'`
   - Set `TRAIN_NEW_MODEL = True` if you want to train from scratch, or `False` to use existing model
   - Adjust `USE_GPU` and other settings as needed

3. **Optional: Train Model (Cell 9)**:
   - Only runs if `TRAIN_NEW_MODEL = True`
   - Trains with profiling on first epoch, then continues full training

4. **Load Model Variant (Cell 10)**:
   - Automatically loads the selected model variant (baseline or fp16)

5. **Profile Inference (Cell 11)**:
   - Runs profiled inference on the test text
   - Displays performance metrics and predictions

### Available Model Variants

- **baseline**: Standard FinBERT model (FP32, ~438MB)
- **fp16_int8**: INT8 post-training FP16 model (3-4x smaller, GPU-accelerated)
  - Requires: GPU with CUDA support
  - Benefits: Reduced memory footprint, faster inference
  - Trade-off: Slight accuracy reduction (~1-3%)

### Example Workflow

```python
# 1. Change configuration
SELECTED_VARIANT = 'fp16'  # or 'baseline'
TRAIN_NEW_MODEL = False

# 2. Run cells 8-11 to profile the model
```

### How to add a new model variant



## 1. Setup: Imports and Constants


In [1]:
# Core imports
from pathlib import Path
import shutil
import os
import logging
import sys
import time
from collections import defaultdict
sys.path.append('..')

# NLP & ML
from textblob import TextBlob
from pprint import pprint
from sklearn.metrics import classification_report
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import nltk

# FinBERT
from finbert.finbert import *
import finbert.utils as tools

# PyTorch
import torch
import torch.nn as nn
from torch.profiler import profile, record_function, ProfilerActivity
import torch.ao.quantization

# Data processing
import numpy as np
import pandas as pd

# Notebook utilities
%load_ext autoreload
%autoreload 2

# Global configuration
project_dir = Path.cwd().parent
pd.set_option('max_colwidth', None)

# Constants
LABEL_LIST = ['positive', 'negative', 'neutral']
LABEL_DICT = {0: 'positive', 1: 'negative', 2: 'neutral'}
BASE_TOKENIZER = 'bert-base-uncased'

print("✓ Imports loaded successfully")
print(f"✓ Project directory: {project_dir}")
print(f"✓ PyTorch version: {torch.__version__}")
print(f"✓ CUDA available: {torch.cuda.is_available()}")


✓ Imports loaded successfully
✓ Project directory: /home/tfs2123/finBERT
✓ PyTorch version: 2.9.1+cu128
✓ CUDA available: True


In [2]:
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.ERROR)


## 2. Helper Utilities


## 5.5 Enhanced Metrics Collection

Functions for comprehensive model comparison with statistical rigor.


In [3]:
def profile_inference_enhanced(model, text, variant_name="unknown", use_gpu=False, gpu_name='cuda:0', 
                               batch_size=5, num_runs=10, warmup_runs=2):
    """
    Enhanced inference profiling with detailed metrics for research comparison.
    
    Args:
        model: Model to profile
        text: Text to analyze
        variant_name: Name of the model variant
        use_gpu: Whether to use GPU
        gpu_name: GPU device name
        batch_size: Batch size for inference
        num_runs: Number of timing runs for statistical analysis
        warmup_runs: Number of warmup runs to discard
    
    Returns:
        results_df: DataFrame with predictions
        metrics: Dictionary with comprehensive performance metrics
    """
    from nltk.tokenize import sent_tokenize
    from finbert.utils import InputExample, convert_examples_to_features, softmax, chunks, get_device
    import numpy as np
    
    setup_nltk_data()
    model.eval()
    tokenizer = AutoTokenizer.from_pretrained(BASE_TOKENIZER)
    
    # Device selection
    if use_gpu:
        device = get_device(no_cuda=False)
        if device.type == "cuda" and gpu_name.startswith("cuda:"):
            device = torch.device(gpu_name)
    else:
        device = torch.device("cpu")
    
    print_device_info(device)
    
    # Check if model is fp16
    is_fp16 = hasattr(model, 'is_loaded_in_8bit') and model.is_loaded_in_8bit
    is_fp16 = is_fp16 or (hasattr(model, 'is_loaded_in_4bit') and model.is_loaded_in_4bit)
    
    # Move model if not fp16
    if not is_fp16:
        model = model.to(device)
    else:
        print(f"✓ Model already fp16 and placed on device")
    
    # Tokenize sentences
    sentences = sent_tokenize(text)
    examples = [InputExample(str(i), sentence) for i, sentence in enumerate(sentences)]
    features = convert_examples_to_features(examples, LABEL_LIST, 64, tokenizer)
    
    # Prepare tensors
    all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long).to(device)
    all_attention_mask = torch.tensor([f.attention_mask for f in features], dtype=torch.long).to(device)
    all_token_type_ids = torch.tensor([f.token_type_ids for f in features], dtype=torch.long).to(device)
    
    # Count actual tokens (non-padding)
    total_tokens = all_attention_mask.sum().item()
    
    # Reset peak memory stats
    if device.type == "cuda":
        torch.cuda.reset_peak_memory_stats(device)
    
    # Warmup runs
    print(f"\nRunning {warmup_runs} warmup iterations...")
    with torch.no_grad():
        for _ in range(warmup_runs):
            _ = model(input_ids=all_input_ids, attention_mask=all_attention_mask, 
                     token_type_ids=all_token_type_ids)[0]
    
    # Timed runs
    print(f"Running {num_runs} timed iterations...")
    inference_times = []
    
    with torch.no_grad():
        for run in range(num_runs):
            if device.type == "cuda":
                torch.cuda.synchronize(device)
            
            start_time = time.time()
            logits = model(input_ids=all_input_ids, attention_mask=all_attention_mask, 
                          token_type_ids=all_token_type_ids)[0]
            
            if device.type == "cuda":
                torch.cuda.synchronize(device)
            
            elapsed = time.time() - start_time
            inference_times.append(elapsed)
    
    # Get final logits for predictions
    with torch.no_grad():
        logits = model(input_ids=all_input_ids, attention_mask=all_attention_mask, 
                      token_type_ids=all_token_type_ids)[0]
        logits_np = softmax(np.array(logits.cpu()))
    
    # Memory metrics
    peak_memory_mb = 0
    if device.type == "cuda":
        peak_memory_mb = torch.cuda.max_memory_allocated(device) / 1024**2
    
    # Calculate statistics
    inference_times_ms = np.array(inference_times) * 1000
    
    metrics = {
        'variant': variant_name,
        'device': str(device),
        'is_fp16': is_fp16,
        'total_sentences': len(sentences),
        'total_tokens': total_tokens,
        'avg_tokens_per_sentence': total_tokens / len(sentences),
        
        # Latency metrics (milliseconds)
        'latency_mean_ms': float(np.mean(inference_times_ms)),
        'latency_std_ms': float(np.std(inference_times_ms)),
        'latency_min_ms': float(np.min(inference_times_ms)),
        'latency_max_ms': float(np.max(inference_times_ms)),
        'latency_p50_ms': float(np.percentile(inference_times_ms, 50)),
        'latency_p95_ms': float(np.percentile(inference_times_ms, 95)),
        'latency_p99_ms': float(np.percentile(inference_times_ms, 99)),
        
        # Throughput metrics
        'throughput_tokens_per_sec': total_tokens / np.mean(inference_times),
        'throughput_samples_per_sec': len(sentences) / np.mean(inference_times),
        'time_per_sentence_ms': (np.mean(inference_times) * 1000) / len(sentences),
        'time_per_token_ms': (np.mean(inference_times) * 1000) / total_tokens,
        
        # Memory metrics
        'peak_memory_mb': peak_memory_mb,
    }
    
    # Create results dataframe
    sentiment_score = pd.Series(logits_np[:, 0] - logits_np[:, 1])
    predictions = np.squeeze(np.argmax(logits_np, axis=1))
    
    results_df = pd.DataFrame({
        'sentence': sentences,
        'logit': list(logits_np),
        'prediction': [LABEL_DICT[p] for p in predictions],
        'sentiment_score': sentiment_score
    })
    
    print(f"\n{'='*80}")
    print(f"Enhanced Profiling Results - {variant_name}")
    print(f"{'='*80}")
    print(f"Latency (mean ± std): {metrics['latency_mean_ms']:.2f} ± {metrics['latency_std_ms']:.2f} ms")
    print(f"Latency (P50/P95/P99): {metrics['latency_p50_ms']:.2f} / {metrics['latency_p95_ms']:.2f} / {metrics['latency_p99_ms']:.2f} ms")
    print(f"Throughput: {metrics['throughput_tokens_per_sec']:.2f} tokens/sec, {metrics['throughput_samples_per_sec']:.2f} samples/sec")
    print(f"Peak GPU Memory: {metrics['peak_memory_mb']:.2f} MB")
    print(f"{'='*80}\n")
    
    return results_df, metrics


def evaluate_model_accuracy(model, finbert, test_data, variant_name="unknown"):
    """
    Evaluate model accuracy on test set.
    
    Args:
        model: Model to evaluate
        finbert: FinBert instance for evaluation utilities
        test_data: Test dataset
        variant_name: Name of the model variant
    
    Returns:
        Dictionary with accuracy metrics
    """
    from sklearn.metrics import precision_recall_fscore_support, accuracy_score
    
    print(f"\nEvaluating {variant_name} on test set...")
    
    # Run evaluation
    results = finbert.evaluate(examples=test_data, model=model)
    results['prediction'] = results.predictions.apply(lambda x: np.argmax(x, axis=0))
    
    # Calculate metrics
    y_true = results['labels']
    y_pred = results['prediction']
    
    accuracy = accuracy_score(y_true, y_pred)
    precision, recall, f1, support = precision_recall_fscore_support(
        y_true, y_pred, average=None, labels=[0, 1, 2]
    )
    precision_weighted, recall_weighted, f1_weighted, _ = precision_recall_fscore_support(
        y_true, y_pred, average='weighted'
    )
    
    # Calculate loss - handle GPU tensors and convert to CPU
    from torch.nn import CrossEntropyLoss
    try:
        # Convert predictions to numpy array first, then to tensor on CPU
        predictions_array = np.array([p for p in results['predictions']])
        labels_array = np.array(list(results['labels']))
        
        # Create tensors on CPU
        predictions_tensor = torch.tensor(predictions_array, dtype=torch.float32)
        labels_tensor = torch.tensor(labels_array, dtype=torch.long)
        
        # Ensure class weights are on CPU
        class_weights_cpu = finbert.class_weights.cpu() if finbert.class_weights.is_cuda else finbert.class_weights
        
        cs = CrossEntropyLoss(weight=class_weights_cpu)
        loss = cs(predictions_tensor, labels_tensor)
    except Exception as e:
        print(f"Warning: Could not compute loss ({e}). Using fallback calculation.")
        # Fallback: compute loss manually
        predictions_array = np.array([p for p in results['predictions']])
        labels_array = np.array(list(results['labels']))
        
        # Simple cross-entropy: -log(p_correct)
        correct_probs = predictions_array[np.arange(len(labels_array)), labels_array]
        loss = -np.mean(np.log(correct_probs + 1e-10))
        loss = torch.tensor(loss)
    
    metrics = {
        'variant': variant_name,
        'accuracy': float(accuracy),
        'loss': float(loss.item()),
        'precision_weighted': float(precision_weighted),
        'recall_weighted': float(recall_weighted),
        'f1_weighted': float(f1_weighted),
        
        # Per-class metrics
        'precision_positive': float(precision[0]),
        'precision_negative': float(precision[1]),
        'precision_neutral': float(precision[2]),
        
        'recall_positive': float(recall[0]),
        'recall_negative': float(recall[1]),
        'recall_neutral': float(recall[2]),
        
        'f1_positive': float(f1[0]),
        'f1_negative': float(f1[1]),
        'f1_neutral': float(f1[2]),
        
        'support_positive': int(support[0]),
        'support_negative': int(support[1]),
        'support_neutral': int(support[2]),
    }
    
    print(f"\n{'='*80}")
    print(f"Accuracy Metrics - {variant_name}")
    print(f"{'='*80}")
    print(f"Accuracy: {metrics['accuracy']:.4f}")
    print(f"Loss: {metrics['loss']:.4f}")
    print(f"Weighted F1: {metrics['f1_weighted']:.4f}")
    print(f"\nPer-class F1 scores:")
    print(f"  Positive: {metrics['f1_positive']:.4f}")
    print(f"  Negative: {metrics['f1_negative']:.4f}")
    print(f"  Neutral: {metrics['f1_neutral']:.4f}")
    print(f"{'='*80}\n")
    
    return metrics


print("✓ Enhanced metrics collection functions loaded")


✓ Enhanced metrics collection functions loaded


## 5.6 Comprehensive Comparison Experiment

Run full comparison between baseline and FP16 models.


In [4]:
def run_full_comparison(model_path, test_text, use_gpu=True, gpu_name='cuda:0', num_runs=10):
    """
    Run comprehensive comparison between baseline and FP16 models.
    
    Args:
        model_path: Path to trained model
        test_text: Text for inference profiling
        use_gpu: Whether to use GPU
        gpu_name: GPU device name
        num_runs: Number of timing iterations
    
    Returns:
        Dictionary with all comparison results
    """
    
    results = {
        'baseline': {},
        'fp16': {},
        'comparison': {}
    }
    
    print("\n" + "="*80)
    print("COMPREHENSIVE MODEL COMPARISON EXPERIMENT")
    print("="*80 + "\n")
    
    # =========================================================================
    # BASELINE MODEL
    # =========================================================================
    print("\n" + "="*80)
    print("PHASE 1: BASELINE MODEL (FP32)")
    print("="*80 + "\n")
    
    # Load baseline model
    print("Loading baseline model...")
    baseline_model = AutoModelForSequenceClassification.from_pretrained(
        model_path, cache_dir=None, num_labels=3
    )
    baseline_variant = MODEL_VARIANTS['baseline']
    baseline_variant.model = baseline_model
    baseline_variant.size_mb = get_model_size_mb(baseline_model)
    
    print(f"✓ Baseline model loaded")
    print(f"✓ Model size: {baseline_variant.size_mb:.2f} MB")
    
    # Run inference profiling
    print("\n--- Baseline Inference Profiling ---")
    baseline_results_df, baseline_perf = profile_inference_enhanced(
        baseline_model, test_text, variant_name='baseline',
        use_gpu=use_gpu, gpu_name=gpu_name, num_runs=num_runs
    )
    
    # Add model size to metrics
    baseline_perf['model_size_mb'] = baseline_variant.size_mb
    results['baseline']['performance'] = baseline_perf
    results['baseline']['predictions'] = baseline_results_df
    
    # Setup FinBERT for evaluation
    print("\n--- Baseline Accuracy Evaluation ---")
    config = create_finbert_config(model_path.parent, model_path.parent.parent / 'data' / 'sentiment_data', 
                                   baseline_model, num_train_epochs=4)
    finbert = FinBert(config)
    finbert.base_model = 'bert-base-uncased'
    finbert.prepare_model(label_list=LABEL_LIST)
    
    # Load test data
    test_data = finbert.get_data('test')
    
    # Run accuracy evaluation
    baseline_accuracy = evaluate_model_accuracy(baseline_model, finbert, test_data, variant_name='baseline')
    results['baseline']['accuracy'] = baseline_accuracy
    
    # Clean up
    del baseline_model
    if use_gpu and torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # =========================================================================
    # QUANTIZED MODEL
    # =========================================================================
    print("\n" + "="*80)
    print("PHASE 2: QUANTIZED MODEL (INT8)")
    print("="*80 + "\n")
    
    # Load FP16 model
    print("Loading FP16 model...")
    fp16_model, fp16_variant = load_model_variant('fp16', model_path)
    
    # Run inference profiling
    print("\n--- Quantized Inference Profiling ---")
    fp16_results_df, fp16_perf = profile_inference_enhanced(
        fp16_model, test_text, variant_name='fp16',
        use_gpu=use_gpu, gpu_name=gpu_name, num_runs=num_runs
    )
    
    # Add model size to metrics
    fp16_perf['model_size_mb'] = fp16_variant.size_mb
    results['fp16']['performance'] = fp16_perf
    results['fp16']['predictions'] = fp16_results_df
    
    # Run accuracy evaluation
    print("\n--- Quantized Accuracy Evaluation ---")
    fp16_accuracy = evaluate_model_accuracy(fp16_model, finbert, test_data, variant_name='fp16')
    results['fp16']['accuracy'] = fp16_accuracy
    
    # =========================================================================
    # COMPARISON ANALYSIS
    # =========================================================================
    print("\n" + "="*80)
    print("PHASE 3: COMPARISON ANALYSIS")
    print("="*80 + "\n")
    
    # Calculate speedup and compression ratios
    speedup_latency = baseline_perf['latency_mean_ms'] / fp16_perf['latency_mean_ms']
    speedup_throughput = fp16_perf['throughput_tokens_per_sec'] / baseline_perf['throughput_tokens_per_sec']
    compression_ratio = baseline_perf['model_size_mb'] / fp16_perf['model_size_mb']
    accuracy_delta = fp16_accuracy['accuracy'] - baseline_accuracy['accuracy']
    f1_delta = fp16_accuracy['f1_weighted'] - baseline_accuracy['f1_weighted']
    
    results['comparison'] = {
        'speedup_latency': speedup_latency,
        'speedup_throughput': speedup_throughput,
        'compression_ratio': compression_ratio,
        'accuracy_delta': accuracy_delta,
        'accuracy_delta_pct': accuracy_delta * 100,
        'f1_delta': f1_delta,
        'f1_delta_pct': f1_delta * 100,
        'memory_reduction_mb': baseline_perf['peak_memory_mb'] - fp16_perf['peak_memory_mb'],
        'memory_reduction_pct': ((baseline_perf['peak_memory_mb'] - fp16_perf['peak_memory_mb']) / 
                                 baseline_perf['peak_memory_mb'] * 100) if baseline_perf['peak_memory_mb'] > 0 else 0,
    }
    
    print(f"Speedup (latency): {speedup_latency:.2f}x")
    print(f"Speedup (throughput): {speedup_throughput:.2f}x")
    print(f"Compression ratio: {compression_ratio:.2f}x")
    print(f"Accuracy delta: {accuracy_delta:+.4f} ({accuracy_delta*100:+.2f}%)")
    print(f"F1 delta: {f1_delta:+.4f} ({f1_delta*100:+.2f}%)")
    print(f"Peak memory reduction: {results['comparison']['memory_reduction_mb']:.2f} MB ({results['comparison']['memory_reduction_pct']:.2f}%)")
    
    print("\n" + "="*80)
    print("COMPARISON EXPERIMENT COMPLETE")
    print("="*80 + "\n")
    
    return results


print("✓ Comprehensive comparison function loaded")


✓ Comprehensive comparison function loaded


---

# COMPREHENSIVE COMPARISON EXPERIMENT

Execute full baseline vs fp16 comparison for research analysis.

---

## Experiment Execution

Run comprehensive comparison with the trained model.


In [5]:
# Run the comprehensive comparison experiment
# This cell will execute after paths are set up in the workflow section below

print("Ready to run comprehensive comparison after workflow setup...")


Ready to run comprehensive comparison after workflow setup...


In [6]:
def get_model_size_mb(model):
    """Calculate model size in MB"""
    param_size = 0
    for param in model.parameters():
        param_size += param.nelement() * param.element_size()
    buffer_size = 0
    for buffer in model.buffers():
        buffer_size += buffer.nelement() * buffer.element_size()
    
    size_mb = (param_size + buffer_size) / 1024**2
    return size_mb


def get_profiler_activities(device):
    """Get appropriate profiler activities based on device"""
    activities = [ProfilerActivity.CPU]
    if device.type == "cuda":
        activities.append(ProfilerActivity.CUDA)
    return activities


def print_profiler_results(prof, device, title="Profiling Results"):
    """Pretty print profiler results"""
    print(f"\n{'='*80}")
    print(f"{title}")
    print(f"{'='*80}\n")
    
    print("By CPU Time:")
    print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=20))
    
    if device.type == "cuda":
        print("\nBy CUDA Time:")
        print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=20))
    
    print(f"\n{'='*80}\n")


def setup_nltk_data():
    """Download necessary NLTK data"""
    try:
        nltk.download('punkt', quiet=True)
        nltk.download('punkt_tab', quiet=True)
    except:
        pass


def print_device_info(device):
    """Print device information"""
    print(f"\n{'='*80}")
    print(f"Device: {device}")
    if device.type == "cuda":
        print(f"GPU Name: {torch.cuda.get_device_name(device)}")
        print(f"GPU Memory: {torch.cuda.get_device_properties(device).total_memory / 1024**3:.1f} GB")
    elif device.type == "mps":
        print("Note: MPS profiling shows CPU time only. Actual GPU execution time not separately tracked.")
    print(f"{'='*80}\n")


def quantize_int8_model(model, calibration_loader=None, device='cuda'):
    """
    Apply INT8 post-training quantization using TorchAO for GPU inference.
    
    Args:
        model: The model to quantize
        calibration_loader: Optional data loader for calibration (not used for dynamic quantization)
        device: Device to run quantization on
    
    Returns:
        FP16 model ready for GPU inference
    """
    try:
        from torchao.quantization import quantize_, int8_dynamic_activation_int4_weight
        
        # Move model to GPU for quantization
        model = model.to(device)
        model.eval()
        
        # Apply dynamic INT8 activation with INT4 weight quantization
        # This provides good compression with minimal accuracy loss
        quantize_(model, int8_dynamic_activation_int4_weight())
        
        print("✓ Applied TorchAO INT8 dynamic quantization")
        return model
        
    except ImportError:
        print("⚠ TorchAO not available, falling back to torch.ao.quantization")
        # Fallback to standard PyTorch dynamic quantization (CPU-optimized)
        fp16_model = torch.ao.quantization.quantize_dynamic(
            model.cpu(),
            {torch.nn.Linear},
            dtype=torch.qint8
        )
        print("✓ Applied torch.ao dynamic quantization (CPU-optimized)")
        return fp16_model.to(device)


print("✓ Helper utilities loaded")


✓ Helper utilities loaded


## 3. Data and Training Utilities


In [7]:
def setup_paths(project_dir):
    """Setup and return paths for model and data"""
    cl_path = project_dir / 'models' / 'sentiment'
    cl_data_path = project_dir / 'data' / 'sentiment_data'
    return cl_path, cl_data_path


def create_finbert_config(cl_path, cl_data_path, bert_model, **kwargs):
    """Create FinBERT configuration with defaults"""
    defaults = {
        'data_dir': cl_data_path,
        'bert_model': bert_model,
        'num_train_epochs': 6,
        'model_dir': cl_path,
        'max_seq_length': 48,
        'train_batch_size': 32,
        'learning_rate': 2e-5,
        'output_mode': 'classification',
        'warm_up_proportion': 0.2,
        'local_rank': -1,
        'discriminate': True,
        'gradual_unfreeze': True
    }
    defaults.update(kwargs)
    return Config(**defaults)


def initialize_finbert(config, base_model='bert-base-uncased', use_profiling=True):
    """Initialize FinBERT instance with configuration"""
    if use_profiling:
        finbert = ProfiledFinBert(config)
    else:
        finbert = FinBert(config)
    
    finbert.base_model = base_model
    finbert.config.discriminate = True
    finbert.config.gradual_unfreeze = True
    finbert.prepare_model(label_list=LABEL_LIST)
    
    return finbert


def train_model(finbert, train_data, model):
    """Train the model and return the trained model"""
    trained_model = finbert.train(train_examples=train_data, model=model)
    return trained_model


def evaluate_model(finbert, test_data, model):
    """Evaluate model on test data"""
    results = finbert.evaluate(examples=test_data, model=model)
    results['prediction'] = results.predictions.apply(lambda x: np.argmax(x, axis=0))
    return results


def generate_classification_report(results, finbert, cols=['labels', 'prediction', 'predictions']):
    """Generate and print classification report"""
    from torch.nn import CrossEntropyLoss
    
    cs = CrossEntropyLoss(weight=finbert.class_weights)
    loss = cs(torch.tensor(list(results[cols[2]])), torch.tensor(list(results[cols[0]])))
    
    accuracy = (results[cols[0]] == results[cols[1]]).sum() / results.shape[0]
    
    print(f"Loss: {loss:.2f}")
    print(f"Accuracy: {accuracy:.2f}")
    print("\nClassification Report:")
    print(classification_report(results[cols[0]], results[cols[1]]))
    
    return {'loss': loss.item(), 'accuracy': accuracy}


print("✓ Data and training utilities loaded")


✓ Data and training utilities loaded


## 4. Model Variant Registry

This section defines model variants for profiling, including baseline and optimized models.

### Half-Precision (FP16)

The notebook supports FP16 half-precision loading for GPU acceleration:

- **Method**: Native PyTorch `torch.float16` loading
- **Benefits**: ~2x memory reduction, ~2-3x faster inference on Tensor Cores (T4/V100/A100)
- **Trade-offs**: Minimal accuracy impact, requires GPU support


In [8]:
class ModelVariant:
    """Base class for model variants"""
    def __init__(self, name, description):
        self.name = name
        self.description = description
        self.model = None
        self.size_mb = None
        self.use_amp = False # Default to False
    
    def load_model(self, model_path):
        """Load model - to be implemented by subclasses"""
        raise NotImplementedError
    
    def get_info(self):
        """Get variant information"""
        return {
            'name': self.name,
            'description': self.description,
            'size_mb': self.size_mb
        }


class BaselineVariant(ModelVariant):
    """Standard FinBERT model without any optimization"""
    def __init__(self):
        super().__init__(
            name="baseline",
            description="Standard FinBERT model (FP32)"
        )
    
    def load_model(self, model_path):
        """Load baseline model"""
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_path, cache_dir=None, num_labels=3
        )
        self.size_mb = get_model_size_mb(self.model)
        return self.model


from transformers import AutoModelForSequenceClassification

class FP16Variant(ModelVariant):
    """Half-Precision (FP16) FinBERT model"""
    def __init__(self):
        super().__init__(
            name="fp16",
            description="FinBERT in Half-Precision (FP16) for GPU acceleration"
        )
    
    def load_model(self, model_path):
        """Load model in FP16"""
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_path,
            cache_dir=None,
            num_labels=3,
            torch_dtype=torch.float16
        )
        self.size_mb = get_model_size_mb(self.model)
        return self.model


class AMPVariant(ModelVariant):
    """FinBERT with Automatic Mixed Precision (AMP)"""
    def __init__(self):
        super().__init__(
            name="amp",
            description="FinBERT with Automatic Mixed Precision (AMP) - Optimal for Training & Inference"
        )
        self.use_amp = True
    
    def load_model(self, model_path):
        """Load baseline model (FP32) but configured for AMP"""
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_path,
            cache_dir=None,
            num_labels=3
        )
        self.size_mb = get_model_size_mb(self.model)
        return self.model



# Registry of available variants
MODEL_VARIANTS = {
    'baseline': BaselineVariant(),
    'fp16': FP16Variant(),
    'amp': AMPVariant()
}


def list_available_variants():
    """List all available model variants"""
    print("Available Model Variants:")
    print("=" * 80)
    for name, variant in MODEL_VARIANTS.items():
        print(f"\n{name}:")
        print(f"  Description: {variant.description}")
    print("\n" + "=" * 80)


def load_model_variant(variant_name, model_path):
    """Load a specific model variant"""
    if variant_name not in MODEL_VARIANTS:
        raise ValueError(f"Unknown variant: {variant_name}. Available: {list(MODEL_VARIANTS.keys())}")
    
    variant = MODEL_VARIANTS[variant_name]
    print(f"\nLoading model variant: {variant.name}")
    print(f"Description: {variant.description}")
    
    model = variant.load_model(model_path)
    
    print(f"✓ Model loaded successfully")
    print(f"✓ Model size: {variant.size_mb:.2f} MB")
    
    return model, variant


print("✓ Model variant registry loaded")
list_available_variants()


✓ Model variant registry loaded
Available Model Variants:

baseline:
  Description: Standard FinBERT model (FP32)

fp16:
  Description: FinBERT in Half-Precision (FP16) for GPU acceleration

amp:
  Description: FinBERT with Automatic Mixed Precision (AMP) - Optimal for Training & Inference



## 5. Profiling API

Generic profiling functions that work with any model variant.


In [9]:
def profile_inference(model, text, variant_name="unknown", use_gpu=False, gpu_name='cuda:0', batch_size=5, use_amp=False):
    """
    Profile model inference performance.
    
    Args:
        model: Model to profile
        text: Text to analyze
        variant_name: Name of the model variant
        use_gpu: Whether to use GPU
        gpu_name: GPU device name
        batch_size: Batch size for inference
        use_amp: Whether to use Automatic Mixed Precision (AMP)
    
    Returns:
        results_df: DataFrame with predictions
        metrics: Dictionary with performance metrics
    """
    from nltk.tokenize import sent_tokenize
    from finbert.utils import InputExample, convert_examples_to_features, softmax, chunks, get_device
    import torch.cuda.amp
    
    # Setup NLTK
    setup_nltk_data()
    
    model.eval()
    tokenizer = AutoTokenizer.from_pretrained(BASE_TOKENIZER)
    
    # Device selection
    if use_gpu:
        device = get_device(no_cuda=False)
        if device.type == "cuda" and gpu_name.startswith("cuda:"):
            device = torch.device(gpu_name)
    else:
        device = torch.device("cpu")
    
    print_device_info(device)
    
    # Check if model is already on device (e.g., BitsAndBytes FP16 models)
    is_fp16 = hasattr(model, 'is_loaded_in_8bit') and model.is_loaded_in_8bit
    is_fp16 = is_fp16 or (hasattr(model, 'is_loaded_in_4bit') and model.is_loaded_in_4bit)
    
    # Only move model if it's not already fp16 and placed
    if not is_fp16:
        model = model.to(device)
    else:
        print(f"✓ Model already fp16 and placed on device (skipping .to() call)")
        # For FP16 models, get the actual device from model
        if hasattr(model, 'device'):
            device = model.device
        elif hasattr(model, 'hf_device_map'):
            # BitsAndBytes models have device_map
            device = torch.device('cuda:0')  # Usually on cuda:0
    
    result = pd.DataFrame(columns=['sentence', 'logit', 'prediction', 'sentiment_score'])
    
    # Setup profiler
    activities = get_profiler_activities(device)
    
    total_inference_time = 0
    
    with profile(
        activities=activities,
        record_shapes=True,
        profile_memory=True,
        with_stack=False
    ) as prof:
        
        with record_function("sentence_tokenization"):
            sentences = sent_tokenize(text)
        
        for batch in chunks(sentences, batch_size):
            with record_function("create_examples"):
                examples = [InputExample(str(i), sentence) for i, sentence in enumerate(batch)]
            
            with record_function("convert_to_features"):
                features = convert_examples_to_features(examples, LABEL_LIST, 64, tokenizer)
            
            with record_function("prepare_tensors"):
                all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long).to(device)
                all_attention_mask = torch.tensor([f.attention_mask for f in features], dtype=torch.long).to(device)
                all_token_type_ids = torch.tensor([f.token_type_ids for f in features], dtype=torch.long).to(device)
            
            with torch.no_grad():
                # Remove the model_to_device profiling section for FP16 models
                if not is_fp16:
                    with record_function("model_to_device"):
                        model = model.to(device)
                
                with record_function("inference_forward"):
                    start_time = time.time()
                    with torch.cuda.amp.autocast(enabled=use_amp):
                         logits = model(input_ids=all_input_ids, attention_mask=all_attention_mask, token_type_ids=all_token_type_ids)[0]
                    total_inference_time += time.time() - start_time
                
                with record_function("postprocess_results"):
                    logits = softmax(np.array(logits.cpu()))
                    sentiment_score = pd.Series(logits[:, 0] - logits[:, 1])
                    predictions = np.squeeze(np.argmax(logits, axis=1))
                    
                    batch_result = {
                        'sentence': batch,
                        'logit': list(logits),
                        'prediction': predictions,
                        'sentiment_score': sentiment_score
                    }
                    
                    batch_result = pd.DataFrame(batch_result)
                    result = pd.concat([result, batch_result], ignore_index=True)
    
    # Print profiler results
    print_profiler_results(prof, device, title=f"Inference Profiling - {variant_name}")
    
    result['prediction'] = result.prediction.apply(lambda x: LABEL_DICT[x])
    
    metrics = {
        'variant': variant_name,
        'total_sentences': len(sentences),
        'inference_time_ms': total_inference_time * 1000,
        'time_per_sentence_ms': (total_inference_time * 1000) / len(sentences),
        'device': str(device),
        'is_fp16': is_fp16,
        'use_amp': use_amp
    }
    
    print(f"\nInference Summary:")
    print(f"  Total sentences: {metrics['total_sentences']}")
    print(f"  Total inference time: {metrics['inference_time_ms']:.2f} ms")
    print(f"  Time per sentence: {metrics['time_per_sentence_ms']:.2f} ms")
    if is_fp16:
        print(f"  ✓ FP16 model profiled successfully")
    if use_amp:
        print(f"  ✓ AMP enabled")
    
    return result, metrics

## 6. ProfiledFinBert Class (for Training)


Extended FinBert class with profiling instrumentation for training.


In [10]:
class ProfiledFinBert(FinBert):
    """Extended FinBert class with profiling instrumentation.
    
    Note: GPU-specific profiling (ProfilerActivity.CUDA) only works with NVIDIA CUDA devices.
    For MPS (Apple Silicon), only CPU profiling is available, though actual computation runs on GPU.
    """
    
    def __init__(self, config):
        super().__init__(config)
        self.profile_results = {}
    
    def train(self, train_examples, model):
        """
        Trains the model with profiling instrumentation.
        """
        validation_examples = self.get_data('validation')
        global_step = 0
        self.validation_losses = []
        
        # Training
        train_dataloader = self.get_loader(train_examples, 'train')
        model.train()
        step_number = len(train_dataloader)
        
        # Setup profiler - CUDA profiling only works with NVIDIA GPUs, not MPS
        activities = [ProfilerActivity.CPU]
        if self.device.type == "cuda":
            activities.append(ProfilerActivity.CUDA)
        
        print("\\n" + "="*80)
        print("Starting Profiled Training")
        print(f"Device: {self.device}")
        print(f"Profiling activities: {activities}")
        if self.device.type == "mps":
            print("Note: MPS profiling shows CPU time only. Actual GPU execution time not separately tracked.")
        print("="*80 + "\\n")
        
        i = 0
        
        with profile(
            activities=activities,
            record_shapes=True,
            profile_memory=True,
            with_stack=False
        ) as prof:
            
            for epoch in trange(int(self.config.num_train_epochs), desc="Epoch"):
                model.train()
                tr_loss = 0
                nb_tr_examples, nb_tr_steps = 0, 0
                
                for step, batch in enumerate(tqdm(train_dataloader, desc='Iteration')):
                    
                    # Gradual unfreezing logic
                    if (self.config.gradual_unfreeze and i == 0):
                        for param in model.bert.parameters():
                            param.requires_grad = False
                    
                    if (step % (step_number // 3)) == 0:
                        i += 1
                    
                    if (self.config.gradual_unfreeze and i > 1 and i < self.config.encoder_no):
                        for k in range(i - 1):
                            try:
                                for param in model.bert.encoder.layer[self.config.encoder_no - 1 - k].parameters():
                                    param.requires_grad = True
                            except:
                                pass
                    
                    if (self.config.gradual_unfreeze and i > self.config.encoder_no + 1):
                        for param in model.bert.embeddings.parameters():
                            param.requires_grad = True
                    
                    # Data loading profiling
                    with record_function("data_transfer"):
                        batch = tuple(t.to(self.device) for t in batch)
                        input_ids, attention_mask, token_type_ids, label_ids, agree_ids = batch
                    
                    # Forward pass profiling
                    with record_function("forward_pass"):
                        logits = model(input_ids, attention_mask, token_type_ids)[0]
                    
                    # Loss calculation profiling
                    with record_function("loss_calculation"):
                        weights = self.class_weights.to(self.device)
                        if self.config.output_mode == "classification":
                            loss_fct = CrossEntropyLoss(weight=weights)
                            loss = loss_fct(logits.view(-1, self.num_labels), label_ids.view(-1))
                        elif self.config.output_mode == "regression":
                            loss_fct = MSELoss()
                            loss = loss_fct(logits.view(-1), label_ids.view(-1))
                        
                        if self.config.gradient_accumulation_steps > 1:
                            loss = loss / self.config.gradient_accumulation_steps
                    
                    # Backward pass profiling
                    with record_function("backward_pass"):
                        loss.backward()
                    
                    tr_loss += loss.item()
                    nb_tr_examples += input_ids.size(0)
                    nb_tr_steps += 1
                    
                    # Optimizer step profiling
                    if (step + 1) % self.config.gradient_accumulation_steps == 0:
                        with record_function("optimizer_step"):
                            if self.config.fp16:
                                lr_this_step = self.config.learning_rate * warmup_linear(
                                    global_step / self.num_train_optimization_steps, self.config.warm_up_proportion)
                                for param_group in self.optimizer.param_groups:
                                    param_group['lr'] = lr_this_step
                            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                            self.optimizer.step()
                            self.scheduler.step()
                            self.optimizer.zero_grad()
                            global_step += 1
                    
                    # Only profile first epoch to save time
                    if epoch == 0 and step >= 20:
                        break
                
                # Break after first epoch for profiling
                if epoch == 0:
                    print("\\n" + "="*80)
                    print("Profiling complete for first epoch (20 steps)")
                    print("Continuing full training without profiling...")
                    print("="*80 + "\\n")
                    break
        
        # Print profiler results
        print("\\n" + "="*80)
        print("PROFILING RESULTS - Training")
        print("="*80 + "\\n")
        
        print("\\nBy CPU Time:")
        print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=20))
        
        if self.device.type == "cuda":
            print("\\nBy CUDA Time:")
            print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=20))
        
        print("\\n" + "="*80 + "\\n")
        
        # Store results
        self.profile_results['training'] = prof.key_averages()
        
        # Continue with full training without profiling
        for epoch in trange(int(self.config.num_train_epochs), desc="Epoch"):
            model.train()
            tr_loss = 0
            nb_tr_examples, nb_tr_steps = 0, 0
            
            for step, batch in enumerate(tqdm(train_dataloader, desc='Iteration')):
                
                if (self.config.gradual_unfreeze and i == 0):
                    for param in model.bert.parameters():
                        param.requires_grad = False
                
                if (step % (step_number // 3)) == 0:
                    i += 1
                
                if (self.config.gradual_unfreeze and i > 1 and i < self.config.encoder_no):
                    for k in range(i - 1):
                        try:
                            for param in model.bert.encoder.layer[self.config.encoder_no - 1 - k].parameters():
                                param.requires_grad = True
                        except:
                            pass
                
                if (self.config.gradual_unfreeze and i > self.config.encoder_no + 1):
                    for param in model.bert.embeddings.parameters():
                        param.requires_grad = True
                
                batch = tuple(t.to(self.device) for t in batch)
                input_ids, attention_mask, token_type_ids, label_ids, agree_ids = batch
                
                logits = model(input_ids, attention_mask, token_type_ids)[0]
                weights = self.class_weights.to(self.device)
                
                if self.config.output_mode == "classification":
                    loss_fct = CrossEntropyLoss(weight=weights)
                    loss = loss_fct(logits.view(-1, self.num_labels), label_ids.view(-1))
                elif self.config.output_mode == "regression":
                    loss_fct = MSELoss()
                    loss = loss_fct(logits.view(-1), label_ids.view(-1))
                
                if self.config.gradient_accumulation_steps > 1:
                    loss = loss / self.config.gradient_accumulation_steps
                else:
                    loss.backward()
                
                tr_loss += loss.item()
                nb_tr_examples += input_ids.size(0)
                nb_tr_steps += 1
                
                if (step + 1) % self.config.gradient_accumulation_steps == 0:
                    if self.config.fp16:
                        lr_this_step = self.config.learning_rate * warmup_linear(
                            global_step / self.num_train_optimization_steps, self.config.warm_up_proportion)
                        for param_group in self.optimizer.param_groups:
                            param_group['lr'] = lr_this_step
                    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                    self.optimizer.step()
                    self.scheduler.step()
                    self.optimizer.zero_grad()
                    global_step += 1
            
            # Validation
            validation_loader = self.get_loader(validation_examples, phase='eval')
            model.eval()
            
            valid_loss, valid_accuracy = 0, 0
            nb_valid_steps, nb_valid_examples = 0, 0
            
            for input_ids, attention_mask, token_type_ids, label_ids, agree_ids in tqdm(validation_loader, desc="Validating"):
                input_ids = input_ids.to(self.device)
                attention_mask = attention_mask.to(self.device)
                token_type_ids = token_type_ids.to(self.device)
                label_ids = label_ids.to(self.device)
                agree_ids = agree_ids.to(self.device)
                
                with torch.no_grad():
                    logits = model(input_ids, attention_mask, token_type_ids)[0]
                    
                    if self.config.output_mode == "classification":
                        loss_fct = CrossEntropyLoss(weight=weights)
                        tmp_valid_loss = loss_fct(logits.view(-1, self.num_labels), label_ids.view(-1))
                    elif self.config.output_mode == "regression":
                        loss_fct = MSELoss()
                        tmp_valid_loss = loss_fct(logits.view(-1), label_ids.view(-1))
                    
                    valid_loss += tmp_valid_loss.mean().item()
                    nb_valid_steps += 1
            
            valid_loss = valid_loss / nb_valid_steps
            self.validation_losses.append(valid_loss)
            print("Validation losses: {}".format(self.validation_losses))
            
            if valid_loss == min(self.validation_losses):
                try:
                    os.remove(self.config.model_dir / ('temporary' + str(best_model)))
                except:
                    print('No best model found')
                torch.save({'epoch': str(epoch), 'state_dict': model.state_dict()},
                           self.config.model_dir / ('temporary' + str(epoch)))
                best_model = epoch
        
        # Save the trained model
        checkpoint = torch.load(self.config.model_dir / ('temporary' + str(best_model)))
        model.load_state_dict(checkpoint['state_dict'])
        model_to_save = model.module if hasattr(model, 'module') else model
        output_model_file = os.path.join(self.config.model_dir, WEIGHTS_NAME)
        torch.save(model_to_save.state_dict(), output_model_file)
        output_config_file = os.path.join(self.config.model_dir, CONFIG_NAME)
        with open(output_config_file, 'w') as f:
            f.write(model_to_save.config.to_json_string())
        os.remove(self.config.model_dir / ('temporary' + str(best_model)))
        
        return model

print("Profiling loaded")

Profiling loaded


---

# WORKFLOW SECTION

Below are the workflow cells that demonstrate how to use the modular system.

---

## 7. Configuration: Select Model Variant and Settings


In [11]:
# =============================================================================
# CONFIGURATION CELL - Modify these settings to change workflow behavior
# =============================================================================

# Choose which model variant to use
# Current Options: 'baseline', 'fp16'
SELECTED_VARIANT = 'amp'

# Whether to train a new model (True) or load existing model (False)
TRAIN_NEW_MODEL = True  # Set to False if you already have a trained model

# Number of training epochs (if training)
NUM_EPOCHS = 4

# GPU settings
USE_GPU = True
GPU_NAME = 'cuda:0'

# Test text for inference profiling
TEST_TEXT = """Later that day Apple said it was revising down its earnings expectations in \
the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China. \
The news rapidly infected financial markets. Apple's share price fell by around 7% in after-hours \
trading and the decline was extended to more than 10% when the market opened. The dollar fell \
by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering \
some ground. Asian stockmarkets closed down on January 3rd and European ones opened lower. \
Yields on government bonds fell as investors fled to the traditional haven in a market storm."""

print(f"✓ Configuration loaded")
print(f"  Selected variant: {SELECTED_VARIANT}")
print(f"  Training mode: {'Train new model' if TRAIN_NEW_MODEL else 'Load existing model'}")
print(f"  GPU enabled: {USE_GPU}")


✓ Configuration loaded
  Selected variant: amp
  Training mode: Train new model
  GPU enabled: True


## 8. Setup Paths


In [12]:
# Setup paths using the utility function
cl_path, cl_data_path = setup_paths(project_dir)

print(f"✓ Paths configured:")
print(f"  Model path: {cl_path}")
print(f"  Data path: {cl_data_path}")


✓ Paths configured:
  Model path: /home/tfs2123/finBERT/models/sentiment
  Data path: /home/tfs2123/finBERT/data/sentiment_data


## 9. Training Workflow (Optional - Only if TRAIN_NEW_MODEL = True)

Skip this section if you already have a trained model.


In [13]:
if TRAIN_NEW_MODEL:
    print("="*80)
    print("TRAINING NEW MODEL")
    print("="*80)
    
    # Clean the model path
    try:
        shutil.rmtree(cl_path)
        print("✓ Cleaned previous model directory")
    except:
        print("✓ No previous model directory to clean")
    
    # Create base BERT model
    bertmodel = AutoModelForSequenceClassification.from_pretrained(
        'bert-base-uncased', cache_dir=None, num_labels=3
    )
    
    # Create configuration
    config = create_finbert_config(
        cl_path, cl_data_path, bertmodel, 
        num_train_epochs=NUM_EPOCHS
    )
    
    # Initialize FinBERT
    finbert = initialize_finbert(config, use_profiling=True)
    
    # Get training data
    print("\n✓ Loading training data...")
    train_data = finbert.get_data('train')
    
    # Create model
    print("✓ Creating model...")
    model = finbert.create_the_model()
    
    # Train with profiling
    print("\n✓ Starting training with profiling...")
    trained_model = train_model(finbert, train_data, model)
    
    print("\n" + "="*80)
    print("TRAINING COMPLETE")
    print("="*80)
else:
    print("Skipping training - will load existing model")


TRAINING NEW MODEL
✓ Cleaned previous model directory


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
12/11/2025 02:23:34 - INFO - finbert.finbert -   device: cuda n_gpu: 1, distributed training: False, 16-bits training: False



✓ Loading training data...
✓ Creating model...


12/11/2025 02:23:35 - INFO - finbert.utils -   *** Example ***
12/11/2025 02:23:35 - INFO - finbert.utils -   guid: train-1
12/11/2025 02:23:35 - INFO - finbert.utils -   tokens: [CLS] after the reporting period , bio ##tie north american licensing partner so ##max ##on pharmaceuticals announced positive results with na ##lm ##efe ##ne in a pilot phase 2 clinical trial for smoking ce ##ssa ##tion [SEP]
12/11/2025 02:23:35 - INFO - finbert.utils -   input_ids: 101 2044 1996 7316 2558 1010 16012 9515 2167 2137 13202 4256 2061 17848 2239 24797 2623 3893 3463 2007 6583 13728 27235 2638 1999 1037 4405 4403 1016 6612 3979 2005 9422 8292 11488 3508 102 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:23:35 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:23:35 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/


✓ Starting training with profiling...


12/11/2025 02:23:36 - INFO - finbert.finbert -   ***** Loading data *****
12/11/2025 02:23:36 - INFO - finbert.finbert -     Num examples = 3488
12/11/2025 02:23:36 - INFO - finbert.finbert -     Batch size = 32
12/11/2025 02:23:36 - INFO - finbert.finbert -     Num steps = 48


Starting Profiled Training
Device: cuda
Profiling activities: [<ProfilerActivity.CPU: 0>, <ProfilerActivity.CUDA: 2>]


Iteration:  18%|█▊        | 20/109 [00:02<00:08,  9.91it/s]
Epoch:   0%|          | 0/4 [00:02<?, ?it/s]


Profiling complete for first epoch (20 steps)
Continuing full training without profiling...
PROFILING RESULTS - Training
\nBy CPU Time:
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                       loss_calculation         0.32%       6.439ms        56.20%   

Iteration: 100%|██████████| 109/109 [00:11<00:00,  9.76it/s]
12/11/2025 02:23:59 - INFO - finbert.utils -   *** Example ***
12/11/2025 02:23:59 - INFO - finbert.utils -   guid: validation-1
12/11/2025 02:23:59 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
12/11/2025 02:23:59 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:23:59 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:23:59 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:23:59 - INFO - finbert.utils -   label: neutral (id = 2)
12/11/2025 02:23:59 

Validation losses: [0.8864706066938547]
No best model found


Iteration: 100%|██████████| 109/109 [00:17<00:00,  6.37it/s]
12/11/2025 02:24:17 - INFO - finbert.utils -   *** Example ***
12/11/2025 02:24:17 - INFO - finbert.utils -   guid: validation-1
12/11/2025 02:24:17 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
12/11/2025 02:24:17 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:24:17 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:24:17 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:24:17 - INFO - finbert.utils -   label: neutral (id = 2)
12/11/2025 02:24:17 

Validation losses: [0.8864706066938547, 0.5778215527534485]


Iteration: 100%|██████████| 109/109 [00:23<00:00,  4.73it/s]
12/11/2025 02:24:42 - INFO - finbert.utils -   *** Example ***
12/11/2025 02:24:42 - INFO - finbert.utils -   guid: validation-1
12/11/2025 02:24:42 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
12/11/2025 02:24:42 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:24:42 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:24:42 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:24:42 - INFO - finbert.utils -   label: neutral (id = 2)
12/11/2025 02:24:42 

Validation losses: [0.8864706066938547, 0.5778215527534485, 0.4897470955665295]


Iteration: 100%|██████████| 109/109 [00:26<00:00,  4.08it/s]
12/11/2025 02:25:11 - INFO - finbert.utils -   *** Example ***
12/11/2025 02:25:11 - INFO - finbert.utils -   guid: validation-1
12/11/2025 02:25:11 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
12/11/2025 02:25:11 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:11 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:11 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:11 - INFO - finbert.utils -   label: neutral (id = 2)
12/11/2025 02:25:11 

Validation losses: [0.8864706066938547, 0.5778215527534485, 0.4897470955665295, 0.47151784942700314]


Epoch: 100%|██████████| 4/4 [01:24<00:00, 21.24s/it]



TRAINING COMPLETE


## 10. Load Model Variant

This cell loads the selected model variant using the modular registry.


In [14]:
# Load the selected model variant
model, variant = load_model_variant(SELECTED_VARIANT, cl_path)

print(f"\n✓ Ready for inference profiling with {SELECTED_VARIANT} variant")



Loading model variant: amp
Description: FinBERT with Automatic Mixed Precision (AMP) - Optimal for Training & Inference


✓ Model loaded successfully
✓ Model size: 417.66 MB

✓ Ready for inference profiling with amp variant


## 11. Profile Inference

Run inference profiling on the selected model variant.

### Profiling with Selected Variant


In [15]:
# Run profiled inference with the selected variant
results_df, metrics = profile_inference(
    model=model,
    text=TEST_TEXT,
    variant_name=SELECTED_VARIANT,
    use_gpu=USE_GPU,
    gpu_name=GPU_NAME
)

print("\n" + "="*80)
print("PREDICTION RESULTS")
print("="*80)
print(results_df[['sentence', 'prediction', 'sentiment_score']].to_string(index=False))
print("="*80)


12/11/2025 02:25:15 - INFO - finbert.utils -   *** Example ***
12/11/2025 02:25:15 - INFO - finbert.utils -   guid: 0
12/11/2025 02:25:15 - INFO - finbert.utils -   tokens: [CLS] later that day apple said it was rev ##ising down its earnings expectations in the fourth quarter of 2018 , largely because of lower sales and signs of economic weakness in china . [SEP]
12/11/2025 02:25:15 - INFO - finbert.utils -   input_ids: 101 2101 2008 2154 6207 2056 2009 2001 7065 9355 2091 2049 16565 10908 1999 1996 2959 4284 1997 2760 1010 4321 2138 1997 2896 4341 1998 5751 1997 3171 11251 1999 2859 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:15 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:15 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 


Device: cuda:0
GPU Name: Tesla T4
GPU Memory: 14.6 GB



  with torch.cuda.amp.autocast(enabled=use_amp):
12/11/2025 02:25:15 - INFO - finbert.utils -   *** Example ***
12/11/2025 02:25:15 - INFO - finbert.utils -   guid: 0
12/11/2025 02:25:15 - INFO - finbert.utils -   tokens: [CLS] yields on government bonds fell as investors fled to the traditional haven in a market storm . [SEP]
12/11/2025 02:25:15 - INFO - finbert.utils -   input_ids: 101 16189 2006 2231 9547 3062 2004 9387 6783 2000 1996 3151 4033 1999 1037 3006 4040 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:15 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:15 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12/11/2025 02:25:15 - INFO - finbert.utils -   


Inference Profiling - amp

By CPU Time:
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                      inference_forward        10.54%      11.166ms        43.75%      46.352ms      23.176ms       0.000us         0.00%      34.363ms      17.181ms           0 B