# FinBERT LoRA Hyperparameter Sweep

This notebook extends the standard FinBERT sweep to tune **LoRA-specific** hyperparameters:
- LoRA rank (`lora_r`)
- LoRA alpha (`lora_alpha`) 
- LoRA dropout (`lora_dropout`)
- Target modules for LoRA adaptation
- Higher learning rates suitable for LoRA training

## Key Differences from Standard Sweep
- `discriminate=False` and `gradual_unfreeze=False` are fixed (incompatible with LoRA)
- Learning rate range is higher (1e-4 to 1e-3 vs 1e-5 to 5e-5)
- Additional LoRA-specific hyperparameters in sweep space

## Prerequisites
```bash
pip install wandb peft
wandb login
```


## 1. Setup and Imports


In [None]:
from pathlib import Path
import shutil
import os
import logging
import sys
import numpy as np
sys.path.append('..')

from sklearn.metrics import classification_report, f1_score
from transformers import AutoModelForSequenceClassification
import torch
from torch.nn import CrossEntropyLoss, MSELoss

from finbert.finbert import *
import finbert.utils as tools

# Weights & Biases
import wandb

%load_ext autoreload
%autoreload 2

project_dir = Path.cwd().parent
pd.set_option('max_colwidth', None)

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.ERROR)

print("Imports loaded successfully")
print(f"Project directory: {project_dir}")


## 2. Configuration

Set up paths and W&B project name.


In [None]:
# Paths
cl_path = project_dir/'models'/'sentiment_lora'
cl_data_path = project_dir/'data'/'sentiment_data'

# W&B Configuration
WANDB_PROJECT = "finbert-lora-hyperparameter-sweep"
WANDB_ENTITY = None  # Set your W&B entity/username if needed

print(f"Model path: {cl_path}")
print(f"Data path: {cl_data_path}")
print(f"W&B Project: {WANDB_PROJECT}")


## 3. Define LoRA Sweep Configuration

This defines the hyperparameter search space specifically for LoRA fine-tuning.

### Key LoRA Parameters:
- **lora_r**: Rank of the low-rank matrices. Higher = more parameters, more capacity
- **lora_alpha**: Scaling factor. Often set to 2*r or equal to r
- **lora_dropout**: Dropout applied to LoRA layers
- **lora_target_modules**: Which attention matrices to apply LoRA to


In [None]:
sweep_config = {
    'method': 'bayes',  # 'grid', 'random', or 'bayes'
    'metric': {
        'name': 'val_loss',
        'goal': 'minimize'
    },
    'parameters': {
        # LoRA-specific parameters
        'lora_r': {
            'values': [4, 8, 16, 32]
        },
        'lora_alpha': {
            'values': [8, 16, 32, 64]
        },
        'lora_dropout': {
            'distribution': 'uniform',
            'min': 0.0,
            'max': 0.2
        },
        'lora_target_modules': {
            'values': [
                ['query', 'value'],           # Standard: Q and V matrices
                ['query', 'key', 'value'],    # All attention matrices
                ['query', 'value', 'dense'],  # Q, V + output projection
            ]
        },
        # Higher learning rates for LoRA (key difference from full fine-tuning)
        'learning_rate': {
            'distribution': 'log_uniform_values',
            'min': 1e-4,
            'max': 1e-3
        },
        # Standard training parameters
        'num_train_epochs': {
            'values': [5, 8, 10, 15]
        },
        'train_batch_size': {
            'values': [16, 32, 64]
        },
        'warm_up_proportion': {
            'distribution': 'uniform',
            'min': 0.1,
            'max': 0.3
        },
        'max_seq_length': {
            'values': [48, 64, 96]
        }
    }
}

print("LoRA Sweep configuration created")
print(f"  Method: {sweep_config['method']}")
print(f"  Optimization metric: {sweep_config['metric']['name']}")
print(f"  LoRA parameters being tuned:")
print(f"    - lora_r: {sweep_config['parameters']['lora_r']['values']}")
print(f"    - lora_alpha: {sweep_config['parameters']['lora_alpha']['values']}")
print(f"    - lora_dropout: [{sweep_config['parameters']['lora_dropout']['min']}, {sweep_config['parameters']['lora_dropout']['max']}]")


## 4. Training Function with LoRA and W&B Integration

This wraps the LoRA-enabled training code with W&B logging.


In [None]:
def train_with_lora_config(config=None):
    """
    Training function that W&B will call with different LoRA hyperparameters.
    """
    # Initialize W&B run
    with wandb.init(config=config):
        # Get hyperparameters from W&B
        config = wandb.config
        
        print(f"\n{'='*80}")
        print(f"Starting LoRA training run with config:")
        print(f"  LoRA r: {config.lora_r}")
        print(f"  LoRA alpha: {config.lora_alpha}")
        print(f"  LoRA dropout: {config.lora_dropout:.4f}")
        print(f"  LoRA target modules: {config.lora_target_modules}")
        print(f"  Learning rate: {config.learning_rate:.6f}")
        print(f"  Epochs: {config.num_train_epochs}")
        print(f"  Batch size: {config.train_batch_size}")
        print(f"  Warmup: {config.warm_up_proportion:.4f}")
        print(f"  Max seq length: {config.max_seq_length}")
        print(f"{'='*80}\n")
        
        # Clean previous model directory
        model_path = project_dir / 'models' / 'lora_sweep' / f'sweep_{wandb.run.id}'
        try:
            shutil.rmtree(model_path)
        except:
            pass
        
        # Create BERT model
        bertmodel = AutoModelForSequenceClassification.from_pretrained(
            'bert-base-uncased', cache_dir=None, num_labels=3
        )
        
        # Create FinBERT config with LoRA hyperparameters from sweep
        finbert_config = Config(
            data_dir=cl_data_path,
            bert_model=bertmodel,
            num_train_epochs=config.num_train_epochs,
            model_dir=model_path,
            max_seq_length=config.max_seq_length,
            train_batch_size=config.train_batch_size,
            learning_rate=config.learning_rate,
            output_mode='classification',
            warm_up_proportion=config.warm_up_proportion,
            local_rank=-1,
            # LoRA settings
            use_lora=True,
            lora_r=config.lora_r,
            lora_alpha=config.lora_alpha,
            lora_dropout=config.lora_dropout,
            lora_target_modules=tuple(config.lora_target_modules),
            # These must be OFF for LoRA
            discriminate=False,
            gradual_unfreeze=False
        )
        
        # Initialize FinBERT
        finbert = FinBert(finbert_config)
        finbert.base_model = 'bert-base-uncased'
        finbert.prepare_model(label_list=['positive', 'negative', 'neutral'])
        
        # Load data
        train_data = finbert.get_data('train')
        test_data = finbert.get_data('test')
        
        # Create model with LoRA
        model = finbert.create_the_model()
        
        # Log trainable parameters info
        total_params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        wandb.log({
            'total_params': total_params,
            'trainable_params': trainable_params,
            'trainable_percent': 100 * trainable_params / total_params
        })
        
        # Train with W&B logging
        trained_model = train_lora_with_wandb_logging(
            finbert, train_data, model, test_data
        )
        
        # Final evaluation
        results = finbert.evaluate(examples=test_data, model=trained_model)
        results['prediction'] = results.predictions.apply(lambda x: np.argmax(x, axis=0))
        
        # Calculate metrics
        metrics = calculate_metrics(results, finbert)
        
        # Log final metrics to W&B
        wandb.log({
            'final_test_loss': metrics['loss'],
            'final_test_accuracy': metrics['accuracy'],
            'final_f1_positive': metrics['f1_positive'],
            'final_f1_negative': metrics['f1_negative'],
            'final_f1_neutral': metrics['f1_neutral'],
            'final_f1_macro': metrics['f1_macro']
        })
        
        print(f"\n{'='*80}")
        print(f"Final Results:")
        print(f"  Test Loss: {metrics['loss']:.4f}")
        print(f"  Test Accuracy: {metrics['accuracy']:.4f}")
        print(f"  Macro F1: {metrics['f1_macro']:.4f}")
        print(f"  Trainable params: {trainable_params:,} ({100*trainable_params/total_params:.2f}%)")
        print(f"{'='*80}\n")
        
        return metrics


print("Main training function defined")


In [None]:
def train_lora_with_wandb_logging(finbert, train_data, model, test_data):
    """
    Modified training loop with W&B logging for LoRA models.
    Note: gradual_unfreeze is disabled for LoRA training.
    """
    validation_examples = finbert.get_data('validation')
    global_step = 0
    finbert.validation_losses = []
    
    train_dataloader = finbert.get_loader(train_data, 'train') 
    model.train()
    step_number = len(train_dataloader)
    
    best_val_loss = float('inf')
    best_model_epoch = 0
    
    for epoch in trange(int(finbert.config.num_train_epochs), desc="Epoch"):
        model.train()
        tr_loss = 0
        nb_tr_examples, nb_tr_steps = 0, 0
        
        for step, batch in enumerate(tqdm(train_dataloader, desc='Iteration')):
            batch = tuple(t.to(finbert.device) for t in batch)
            input_ids, attention_mask, token_type_ids, label_ids, agree_ids = batch
            
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask,
                token_type_ids=token_type_ids
            )
            logits = outputs.logits
            weights = finbert.class_weights.to(finbert.device)
            
            if finbert.config.output_mode == "classification":
                loss_fct = CrossEntropyLoss(weight=weights)
                loss = loss_fct(logits.view(-1, finbert.num_labels), label_ids.view(-1))
            elif finbert.config.output_mode == "regression":
                loss_fct = MSELoss()
                loss = loss_fct(logits.view(-1), label_ids.view(-1))
            
            if finbert.config.gradient_accumulation_steps > 1:
                loss = loss / finbert.config.gradient_accumulation_steps
            else:
                loss.backward()
            
            tr_loss += loss.item()
            nb_tr_examples += input_ids.size(0)
            nb_tr_steps += 1
            
            if (step + 1) % finbert.config.gradient_accumulation_steps == 0:
                torch.nn.utils.clip_grad_norm_(
                    (p for p in model.parameters() if p.requires_grad), 1.0
                )
                finbert.optimizer.step()
                finbert.scheduler.step()
                finbert.optimizer.zero_grad()
                global_step += 1
                
                # Log to W&B every N steps
                if global_step % 10 == 0:
                    wandb.log({
                        'train_loss': tr_loss / nb_tr_steps,
                        'learning_rate': finbert.optimizer.param_groups[0]['lr'],
                        'epoch': epoch,
                        'step': global_step
                    })
        
        # Validation at end of epoch
        validation_loader = finbert.get_loader(validation_examples, 'eval')
        model.eval()
        
        valid_loss, valid_accuracy = 0, 0
        nb_valid_steps, nb_valid_examples = 0, 0
        
        for input_ids, attention_mask, token_type_ids, label_ids, agree_ids in tqdm(validation_loader, desc="Validating"):
            input_ids = input_ids.to(finbert.device)
            attention_mask = attention_mask.to(finbert.device)
            token_type_ids = token_type_ids.to(finbert.device)
            label_ids = label_ids.to(finbert.device)
            agree_ids = agree_ids.to(finbert.device)
            
            with torch.no_grad():
                outputs = model(
                    input_ids=input_ids,
                    attention_mask=attention_mask,
                    token_type_ids=token_type_ids
                )
                logits = outputs.logits
                
                if finbert.config.output_mode == "classification":
                    loss_fct = CrossEntropyLoss(weight=weights)
                    tmp_valid_loss = loss_fct(logits.view(-1, finbert.num_labels), label_ids.view(-1))
                elif finbert.config.output_mode == "regression":
                    loss_fct = MSELoss()
                    tmp_valid_loss = loss_fct(logits.view(-1), label_ids.view(-1))
                
                valid_loss += tmp_valid_loss.mean().item()
                nb_valid_steps += 1
        
        valid_loss = valid_loss / nb_valid_steps
        finbert.validation_losses.append(valid_loss)
        
        # Log validation metrics to W&B
        wandb.log({
            'val_loss': valid_loss,
            'epoch': epoch,
            'best_val_loss': min(finbert.validation_losses)
        })
        
        print(f"Epoch {epoch}: Validation loss = {valid_loss:.4f}")
        
        # Save best model
        if valid_loss == min(finbert.validation_losses):
            try:
                os.remove(finbert.config.model_dir / ('temporary' + str(best_model_epoch)))
            except:
                pass
            torch.save({'epoch': str(epoch), 'state_dict': model.state_dict()},
                       finbert.config.model_dir / ('temporary' + str(epoch)))
            best_model_epoch = epoch
            best_val_loss = valid_loss
    
    # Load best model
    checkpoint = torch.load(finbert.config.model_dir / ('temporary' + str(best_model_epoch)))
    model.load_state_dict(checkpoint['state_dict'])
    
    # Save final model
    model_to_save = model.module if hasattr(model, 'module') else model
    output_model_file = os.path.join(finbert.config.model_dir, 'pytorch_model.bin')
    torch.save(model_to_save.state_dict(), output_model_file)
    
    # Clean up temporary files
    try:
        os.remove(finbert.config.model_dir / ('temporary' + str(best_model_epoch)))
    except:
        pass
    
    return model


def calculate_metrics(results, finbert):
    """
    Calculate comprehensive metrics for evaluation.
    """
    cs = CrossEntropyLoss(weight=finbert.class_weights)
    loss = cs(
        torch.tensor(list(results['predictions'])),
        torch.tensor(list(results['labels']))
    )
    
    accuracy = (results['labels'] == results['prediction']).sum() / results.shape[0]
    
    # Calculate per-class F1 scores
    f1_scores = f1_score(results['labels'], results['prediction'], average=None)
    f1_macro = f1_score(results['labels'], results['prediction'], average='macro')
    
    return {
        'loss': loss.item(),
        'accuracy': accuracy,
        'f1_positive': f1_scores[0],
        'f1_negative': f1_scores[1],
        'f1_neutral': f1_scores[2],
        'f1_macro': f1_macro
    }


print("Helper functions defined")


## 5. Initialize and Run LoRA Sweep

This will start the hyperparameter sweep. W&B will automatically try different LoRA configurations.


In [None]:
# Initialize the sweep
sweep_id = wandb.sweep(
    sweep_config, 
    project=WANDB_PROJECT,
    entity=WANDB_ENTITY
)

print(f"LoRA Sweep initialized with ID: {sweep_id}")
print(f"View at: https://wandb.ai/{WANDB_ENTITY or 'your-username'}/{WANDB_PROJECT}/sweeps/{sweep_id}")


In [None]:
# Run the sweep
# count: number of runs to execute (increase for more thorough search)
wandb.agent(sweep_id, function=train_with_lora_config, count=15)

print("\n" + "="*80)
print("LORA SWEEP COMPLETED")
print("="*80)
print(f"View results at: https://wandb.ai/{WANDB_ENTITY or 'your-username'}/{WANDB_PROJECT}/sweeps/{sweep_id}")


## 6. Analyze Results

After the sweep completes, examine the best configuration from the W&B dashboard.

Key metrics to compare:
- **val_loss**: Primary optimization target
- **final_test_accuracy**: Test set performance
- **final_f1_macro**: Balanced F1 across classes
- **trainable_params**: Parameter efficiency


In [None]:
# Once you've identified the best config from the W&B dashboard, record it here:
# Example (update with your actual best values from the sweep)
best_lora_config = {
    # LoRA parameters
    'lora_r': 8,
    'lora_alpha': 16,
    'lora_dropout': 0.1,
    'lora_target_modules': ['query', 'value'],
    # Training parameters
    'learning_rate': 5e-4,
    'num_train_epochs': 10,
    'train_batch_size': 32,
    'warm_up_proportion': 0.2,
    'max_seq_length': 64,
}

print("Best LoRA configuration (update after sweep):")
for k, v in best_lora_config.items():
    print(f"  {k}: {v}")


## 7. Retrain with Best Config (Optional)

Use the best configuration to train a final model.


In [None]:
# Uncomment to retrain with best config
# train_with_lora_config(config=best_lora_config)
