# SMS Claim Extraction - Training on Colab

This notebook trains all 4 approaches for claim extraction research.

## Setup Instructions

**Before running:**
1. Replace `YOUR_PROJECT_ID` with your GCP project ID
2. Replace `your-bucket-name` with your GCS bucket name
3. Make sure your GCS bucket exists and you have write permissions

**What this notebook does:**
- Uses K-Fold Cross-Validation (5 folds) for robust training
- Keeps test set BLIND until final evaluation
- Saves all checkpoints to Google Cloud Storage
- No more Drive storage issues!

In [None]:
# Clone repository
!git clone https://github.com/iamdiluxedbutcooler/sms-claim-check.git
%cd sms-claim-check

In [None]:
# Install dependencies (including GCS support)
!pip install -q transformers datasets torch scikit-learn pandas numpy seaborn matplotlib openai python-dotenv evaluate accelerate sentencepiece seqeval google-cloud-storage

## Setup Google Cloud Storage

Authenticate and configure GCS bucket for saving checkpoints and results.

## Update Code (if needed)

Run this cell ONLY if you need to pull latest code updates. It will backup experiments first.

In [None]:
# Backup experiments before updating code
!cp -r experiments /content/drive/MyDrive/sms-claim-check/backup_experiments_$(date +%Y%m%d_%H%M%S) 2>/dev/null || echo "No experiments to backup yet"

# Pull latest code
!git pull origin main

# IMPORTANT: Restart runtime after pulling to reload modules
print("\n[WARNING] After pulling, go to Runtime > Restart runtime to reload updated code!")
print("Then continue from where you left off.")

In [None]:
# QUICK FIX: Reload modules without restarting runtime
import sys
import importlib

# Remove cached modules
modules_to_reload = [m for m in sys.modules.keys() if m.startswith('src.')]
for module in modules_to_reload:
 del sys.modules[module]

# Reload
import src.models
import src.data

print("[OK] Modules reloaded! Continue training.")

In [None]:
# Authenticate with Google Cloud
from google.colab import auth
auth.authenticate_user()

# Configure GCS
import os
os.environ['GCLOUD_PROJECT'] = 'YOUR_PROJECT_ID' # Replace with your GCP project ID

# Test GCS connection
from google.cloud import storage
client = storage.Client()

# Set your bucket name (IMPORTANT: Replace with your actual bucket name)
GCS_BUCKET_NAME = 'pleng_deposit_sms' # Updated with your bucket name
bucket = client.bucket(GCS_BUCKET_NAME)

print(f" Authenticated and connected to GCS bucket: {GCS_BUCKET_NAME}")
print(f" Project: {os.environ['GCLOUD_PROJECT']}")
print(f" User: plengnaps@gmail.com")

# Test permissions
try:
 list(bucket.list_blobs(max_results=1))
 print(" Permissions OK - bucket is accessible!")
except Exception as e:
 print(f" Permission Error: {e}")
 print("\n FIX: Grant 'Storage Object Admin' role to plengnaps@gmail.com")
 print(f" Run: gsutil iam ch user:plengnaps@gmail.com:objectAdmin gs://{GCS_BUCKET_NAME}")

print("\nBucket ready for checkpoints and results!")

## Mount GCS Bucket as Local Filesystem

This mounts your GCS bucket so training writes **directly** to GCS, not local disk.

In [None]:
# Install gcsfuse
!echo "deb https://packages.cloud.google.com/apt gcsfuse-$(lsb_release -c -s) main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
!sudo apt-get update
!sudo apt-get install -y gcsfuse

# Create mount point
!mkdir -p /content/gcs_bucket

# Mount GCS bucket
!gcsfuse --implicit-dirs {GCS_BUCKET_NAME} /content/gcs_bucket

print(f" GCS bucket mounted at: /content/gcs_bucket")
print(f" All training outputs will save directly to: gs://{GCS_BUCKET_NAME}")
print(f"\n Training will now write directly to GCS - no local storage used!")

## Update Configs to Save to GCS Mount

Redirect all experiment outputs to the mounted GCS bucket.

In [None]:
# Update all config files to use GCS mount path
import yaml
from pathlib import Path

configs = [
 'configs/entity_ner.yaml',
 'configs/claim_ner.yaml',
 'configs/contrastive.yaml',
 'configs/hybrid_llm.yaml',
 'configs/hybrid_claim_llm.yaml'
]

for config_file in configs:
 with open(config_file, 'r') as f:
 config = yaml.safe_load(f)
 
 # Update output_dir to use GCS mount
 approach_name = config['output_config']['output_dir'].split('/')[-1]
 config['output_config']['output_dir'] = f'/content/gcs_bucket/experiments/{approach_name}'
 
 with open(config_file, 'w') as f:
 yaml.dump(config, f, default_flow_style=False, sort_keys=False)
 
 print(f" Updated {config_file} -> /content/gcs_bucket/experiments/{approach_name}")

print(f"\n All configs now save directly to: gs://{GCS_BUCKET_NAME}/experiments/")
print(" Checkpoints during training will write directly to GCS!")

In [None]:
# Check GPU
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
 print(f"GPU Name: {torch.cuda.get_device_name(0)}")

## Optional: Download Checkpoint from GCS

If you need to resume training or load a previous checkpoint:

In [None]:
def download_from_gcs(gcs_folder_path, local_folder):
 """Download folder from GCS to local"""
 client = storage.Client()
 bucket = client.bucket(GCS_BUCKET_NAME)
 
 # List all blobs with prefix
 blobs = bucket.list_blobs(prefix=gcs_folder_path)
 
 file_count = 0
 for blob in blobs:
 # Create local path
 relative_path = blob.name[len(gcs_folder_path):].lstrip('/')
 local_path = Path(local_folder) / relative_path
 
 # Create parent directories
 local_path.parent.mkdir(parents=True, exist_ok=True)
 
 # Download file
 blob.download_to_filename(str(local_path))
 file_count += 1
 
 print(f'[DOWNLOADED] {file_count} files from gs://{GCS_BUCKET_NAME}/{gcs_folder_path} -> {local_folder}')

# Example: Download latest checkpoint for approach 1
# download_from_gcs('checkpoints/approach1_entity_ner_latest', 'experiments/approach1_entity_ner')

print(" Download function ready. Uncomment example above to use.")

In [None]:
# GCS helper functions (optional - for manual sync if needed)
from pathlib import Path
from google.cloud import storage
from datetime import datetime

def list_gcs_experiments():
 """List all experiments in GCS bucket"""
 client = storage.Client()
 bucket = client.bucket(GCS_BUCKET_NAME)
 
 blobs = bucket.list_blobs(prefix='experiments/')
 folders = set()
 for blob in blobs:
 parts = blob.name.split('/')
 if len(parts) >= 2:
 folders.add(parts[1])
 
 print(f"Experiments in gs://{GCS_BUCKET_NAME}/experiments/:")
 for folder in sorted(folders):
 print(f" - {folder}")

def download_experiment_results(approach_name, local_path='./results'):
 """Download specific experiment results from GCS"""
 client = storage.Client()
 bucket = client.bucket(GCS_BUCKET_NAME)
 
 blobs = bucket.list_blobs(prefix=f'experiments/{approach_name}/')
 
 file_count = 0
 for blob in blobs:
 relative_path = blob.name[len('experiments/'):] 
 local_file = Path(local_path) / relative_path
 local_file.parent.mkdir(parents=True, exist_ok=True)
 blob.download_to_filename(str(local_file))
 file_count += 1
 
 print(f" Downloaded {file_count} files to {local_path}")

print(' GCS helper functions loaded!')
print(' - list_gcs_experiments() - List all experiments')
print(' - download_experiment_results(approach_name) - Download results')

## Approach 1: Entity-based NER

In [None]:
!python train_kfold.py --config configs/entity_ner.yaml --n_folds 5

# Already saved to GCS automatically! Check: gs://{GCS_BUCKET_NAME}/experiments/approach1_entity_ner/

## Approach 2: Claim-based NER

In [None]:
!python train_kfold.py --config configs/claim_ner.yaml --n_folds 5

# Already saved to GCS automatically!

## Download Results from GCS and Run Final Evaluation

Now that training is complete, let's download the results and run comprehensive evaluation.

In [None]:
# Download ONLY the final trained models from GCS (not all checkpoints!)
import os
from pathlib import Path

print(" Downloading Approach 1 (Entity-NER) final models...")
# Download each fold's model directory separately
for fold in range(1, 6):
 print(f" Downloading fold_{fold}/model...")
 # Create the fold directory first
 !mkdir -p /content/results/approach1_entity_ner/fold_{fold}
 !gsutil -m cp -r gs://{GCS_BUCKET_NAME}/experiments/approach1_entity_ner/fold_{fold}/model /content/results/approach1_entity_ner/fold_{fold}/

# Download kfold_results.json
!gsutil cp gs://{GCS_BUCKET_NAME}/experiments/approach1_entity_ner/kfold_results.json /content/results/approach1_entity_ner/

print("\n Downloading Approach 2 (Claim-NER) final models...")
# Download each fold's model directory separately
for fold in range(1, 6):
 print(f" Downloading fold_{fold}/model...")
 # Create the fold directory first
 !mkdir -p /content/results/approach2_claim_ner/fold_{fold}
 !gsutil -m cp -r gs://{GCS_BUCKET_NAME}/experiments/approach2_claim_ner/fold_{fold}/model /content/results/approach2_claim_ner/fold_{fold}/

# Download kfold_results.json
!gsutil cp gs://{GCS_BUCKET_NAME}/experiments/approach2_claim_ner/kfold_results.json /content/results/approach2_claim_ner/

print("\n Downloaded ONLY final models (no checkpoints)")
print(f"\nTotal size:")
!du -sh /content/results/

### View K-Fold Results Summary

Check the cross-validation results and test performance for each approach.

In [None]:
import json
import pandas as pd

def load_kfold_results(approach_path):
 """Load and display K-fold results from a trained approach"""
 results_file = Path(approach_path) / 'kfold_results.json'
 
 if not results_file.exists():
 print(f" Results not found: {results_file}")
 return None
 
 with open(results_file, 'r') as f:
 results = json.load(f)
 
 return results

# Load results for both approaches
print("="*70)
print("APPROACH 1: Entity-NER Results")
print("="*70)
entity_results = load_kfold_results('/content/results/approach1_entity_ner')
if entity_results:
 print(f"\n Cross-Validation Results ({entity_results['n_folds']} folds):")
 cv_avg = entity_results['cv_validation_metrics']['average']
 for metric, value in cv_avg.items():
 print(f" {metric}: {value:.4f}")
 
 print(f"\n Test Set Results (BLIND):")
 test_avg = entity_results['test_metrics']['average']
 for metric, value in test_avg.items():
 print(f" {metric}: {value:.4f}")

print("\n" + "="*70)
print("APPROACH 2: Claim-NER Results")
print("="*70)
claim_results = load_kfold_results('/content/results/approach2_claim_ner')
if claim_results:
 print(f"\n Cross-Validation Results ({claim_results['n_folds']} folds):")
 cv_avg = claim_results['cv_validation_metrics']['average']
 for metric, value in cv_avg.items():
 print(f" {metric}: {value:.4f}")
 
 print(f"\n Test Set Results (BLIND):")
 test_avg = claim_results['test_metrics']['average']
 for metric, value in test_avg.items():
 print(f" {metric}: {value:.4f}")

### Generate Comparison Visualizations

Create plots comparing the two approaches across all metrics.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 10)

# Extract metrics for comparison
approaches = ['Entity-NER', 'Claim-NER']
results_list = [entity_results, claim_results]

# Metrics to compare (with eval_ prefix!)
metrics_to_plot = ['eval_f1_mean', 'eval_precision_mean', 'eval_recall_mean', 'eval_accuracy_mean']
metric_labels = ['F1', 'Precision', 'Recall', 'Accuracy']

# Create figure with subplots
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Model Comparison: Entity-NER vs Claim-NER', fontsize=16, fontweight='bold')

for idx, (metric, label) in enumerate(zip(metrics_to_plot, metric_labels)):
 ax = axes[idx // 2, idx % 2]
 
 # Extract CV and Test scores
 cv_scores = []
 test_scores = []
 
 for result in results_list:
 if result:
 cv_scores.append(result['cv_validation_metrics']['average'].get(metric, 0))
 test_scores.append(result['test_metrics']['average'].get(metric, 0))
 else:
 cv_scores.append(0)
 test_scores.append(0)
 
 # Plot grouped bars
 x = np.arange(len(approaches))
 width = 0.35
 
 bars1 = ax.bar(x - width/2, cv_scores, width, label='Cross-Validation', alpha=0.8, color='steelblue')
 bars2 = ax.bar(x + width/2, test_scores, width, label='Test (Blind)', alpha=0.8, color='coral')
 
 # Customize
 ax.set_ylabel(label, fontsize=12)
 ax.set_title(f'{label} Comparison', fontsize=13, fontweight='bold')
 ax.set_xticks(x)
 ax.set_xticklabels(approaches)
 ax.legend()
 ax.set_ylim([0, 1.0])
 ax.grid(axis='y', alpha=0.3)
 
 # Add value labels on bars
 for bars in [bars1, bars2]:
 for bar in bars:
 height = bar.get_height()
 ax.text(bar.get_x() + bar.get_width()/2., height,
 f'{height:.3f}',
 ha='center', va='bottom', fontsize=10)

plt.tight_layout()
plt.savefig('/content/model_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print(" Comparison plot saved to /content/model_comparison.png")
print("\n WARNING: F1, Precision, and Recall are all 0.0!")
print(" This suggests the models may not have trained properly.")
print(" Common causes:")
print(" - Label mismatch during training")
print(" - All predictions are 'O' (Outside) class")
print(" - Training data preprocessing issue")
print(" Check the training logs for errors!")


### K-Fold Variance Analysis

Visualize the variance across folds to assess model stability.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.suptitle('Cross-Validation Stability Analysis', fontsize=16, fontweight='bold')

for idx, (approach_name, result) in enumerate([('Entity-NER', entity_results), ('Claim-NER', claim_results)]):
 if not result:
 continue
 
 ax = axes[idx]
 
 # Extract per-fold metrics
 per_fold = result['cv_validation_metrics']['per_fold']
 
 # Extract F1 scores for each fold (with eval_ prefix!)
 fold_f1s = []
 for fold_metrics in per_fold:
 # Find F1 metric (could be 'eval_f1', 'f1', etc.)
 f1_key = [k for k in fold_metrics.keys() if 'f1' in k.lower()][0]
 fold_f1s.append(fold_metrics[f1_key])
 
 # Plot
 folds = [f'Fold {i+1}' for i in range(len(fold_f1s))]
 bars = ax.bar(folds, fold_f1s, alpha=0.7, color='steelblue')
 
 # Add mean line
 mean_f1 = np.mean(fold_f1s)
 ax.axhline(y=mean_f1, color='red', linestyle='--', linewidth=2, label=f'Mean: {mean_f1:.4f}')
 
 # Add std shading
 std_f1 = np.std(fold_f1s)
 ax.fill_between(range(len(folds)), mean_f1 - std_f1, mean_f1 + std_f1, 
 alpha=0.2, color='red', label=f'Std: ±{std_f1:.4f}')
 
 # Customize
 ax.set_title(f'{approach_name} - F1 Score per Fold', fontsize=13, fontweight='bold')
 ax.set_ylabel('F1 Score', fontsize=12)
 ax.set_xlabel('Fold', fontsize=12)
 ax.set_ylim([0, 1.0])
 ax.legend()
 ax.grid(axis='y', alpha=0.3)
 
 # Add value labels
 for bar, value in zip(bars, fold_f1s):
 height = bar.get_height()
 ax.text(bar.get_x() + bar.get_width()/2., height,
 f'{value:.4f}',
 ha='center', va='bottom', fontsize=10)

plt.tight_layout()
plt.savefig('/content/kfold_variance.png', dpi=300, bbox_inches='tight')
plt.show()

print(" K-Fold variance plot saved to /content/kfold_variance.png")

# Check if all F1 scores are zero
if entity_results and all(fold.get('eval_f1', 0) == 0 for fold in entity_results['cv_validation_metrics']['per_fold']):
 print("\n WARNING: All Entity-NER F1 scores are 0.0 - model not learning!")
if claim_results and all(fold.get('eval_f1', 0) == 0 for fold in claim_results['cv_validation_metrics']['per_fold']):
 print(" WARNING: All Claim-NER F1 scores are 0.0 - model not learning!")


### Generate Results Summary Table

Create a clean comparison table for your research paper.

In [None]:
# Create comprehensive results table
results_data = []

for approach_name, result in [('Entity-NER', entity_results), ('Claim-NER', claim_results)]:
 if not result:
 continue
 
 cv_metrics = result['cv_validation_metrics']['average']
 test_metrics = result['test_metrics']['average']
 
 # Extract key metrics (with eval_ prefix!)
 row = {
 'Approach': approach_name,
 'CV F1': f"{cv_metrics.get('eval_f1_mean', 0):.4f}±{cv_metrics.get('eval_f1_std', 0):.4f}",
 'CV Precision': f"{cv_metrics.get('eval_precision_mean', 0):.4f}±{cv_metrics.get('eval_precision_std', 0):.4f}",
 'CV Recall': f"{cv_metrics.get('eval_recall_mean', 0):.4f}±{cv_metrics.get('eval_recall_std', 0):.4f}",
 'CV Accuracy': f"{cv_metrics.get('eval_accuracy_mean', 0):.4f}±{cv_metrics.get('eval_accuracy_std', 0):.4f}",
 'Test F1': f"{test_metrics.get('eval_f1_mean', 0):.4f}±{test_metrics.get('eval_f1_std', 0):.4f}",
 'Test Precision': f"{test_metrics.get('eval_precision_mean', 0):.4f}±{test_metrics.get('eval_precision_std', 0):.4f}",
 'Test Recall': f"{test_metrics.get('eval_recall_mean', 0):.4f}±{test_metrics.get('eval_recall_std', 0):.4f}",
 'Test Accuracy': f"{test_metrics.get('eval_accuracy_mean', 0):.4f}±{test_metrics.get('eval_accuracy_std', 0):.4f}",
 }
 results_data.append(row)

# Create DataFrame
df_results = pd.DataFrame(results_data)

print("="*120)
print("FINAL RESULTS SUMMARY")
print("="*120)
print(df_results.to_string(index=False))
print("="*120)

# Save to CSV
df_results.to_csv('/content/results_summary.csv', index=False)
print("\n Results summary saved to /content/results_summary.csv")

# Also create a LaTeX table for papers
latex_table = df_results.to_latex(index=False, escape=False)
with open('/content/results_summary.tex', 'w') as f:
 f.write(latex_table)
print(" LaTeX table saved to /content/results_summary.tex")

# Warning if metrics are all zero
if all(row['CV F1'].startswith('0.0000') for row in results_data):
 print("\n" + "="*120)
 print(" CRITICAL WARNING: All F1 scores are 0.0!")
 print("="*120)
 print("\nPossible issues:")
 print(" 1. Label mismatch - check if label encoding matches during train/eval")
 print(" 2. All predictions are 'O' class - model only predicting 'Outside'")
 print(" 3. Tokenization issue - labels not aligned with tokens")
 print(" 4. Check training logs in GCS for actual errors")
 print(" 5. Verify preprocessor outputs match model expectations")
 print("\nNext steps:")
 print(" - Check GCS logs: gs://" + GCS_BUCKET_NAME + "/experiments/approach*/fold_*/")
 print(" - Run inference test (next cell) to see actual predictions")
 print(" - Verify training completed without errors")
 print("="*120)


### Run Inference on Sample Messages

Test the trained models on some example SMS messages.

In [None]:
# Load best models and test on sample messages
from src.models import EntityNERModel, ClaimNERModel

# Sample phishing messages for testing
test_messages = [
 "URGENT: Your Amazon package #AB123 is delayed. Click bit.ly/pkg123 to reschedule within 24h or it will be returned.",
 "Your PayPal account has been locked due to suspicious activity. Call 1-800-FAKE-NUM immediately to verify your identity.",
 "WINNER! You've won $5000 in our lottery. Reply YES with your bank details to claim your prize now!",
 "Hi mom, this is my new number. Can you send me $200? My bank card is blocked and I need it urgently.",
]

print("="*80)
print("TESTING TRAINED MODELS ON SAMPLE MESSAGES")
print("="*80)

# Test Entity-NER model
print("\n" + "="*80)
print("APPROACH 1: Entity-NER Model")
print("="*80)

entity_model = EntityNERModel({'model_name': 'roberta-base'})
# Load best model from Fold 1 (you can choose any fold)
entity_model.load('/content/results/approach1_entity_ner/fold_1/model')

for i, msg in enumerate(test_messages, 1):
 print(f"\n Message {i}:")
 print(f" {msg}")
 
 result = entity_model.predict(msg)
 print(f"\n Extracted Entities:")
 for entity in result['entities']:
 print(f" • {entity['label']}: '{entity['text']}'")

# Test Claim-NER model
print("\n\n" + "="*80)
print("APPROACH 2: Claim-NER Model")
print("="*80)

claim_model = ClaimNERModel({'model_name': 'roberta-base'})
claim_model.load('/content/results/approach2_claim_ner/fold_1/model')

for i, msg in enumerate(test_messages, 1):
 print(f"\n Message {i}:")
 print(f" {msg}")
 
 result = claim_model.predict(msg)
 print(f"\n Extracted Claims:")
 if result['claims']:
 for claim in result['claims']:
 print(f" • {claim}")
 else:
 print(f" (No claims detected)")

print("\n" + "="*80)
print(" Inference testing complete!")
print("="*80)

### Upload Plots and Results Back to GCS

Save all generated visualizations and summaries back to your GCS bucket.

In [None]:
# Upload all generated visualizations and summaries to GCS
!mkdir -p /content/final_results

# Copy all generated files
!cp /content/model_comparison.png /content/final_results/
!cp /content/kfold_variance.png /content/final_results/
!cp /content/results_summary.csv /content/final_results/
!cp /content/results_summary.tex /content/final_results/

# Upload to GCS
print(" Uploading results to GCS...")
!gsutil -m cp -r /content/final_results gs://{GCS_BUCKET_NAME}/analysis/

print("\n All results uploaded to GCS!")
print(f"\nView results at:")
print(f" https://console.cloud.google.com/storage/browser/{GCS_BUCKET_NAME}/analysis/final_results/")
print(f"\nDownload with:")
print(f" gsutil -m cp -r gs://{GCS_BUCKET_NAME}/analysis/final_results ./local_results/")

### Download Everything Locally (Optional)

Create a zip file with all results for easy download.

In [None]:
# Create a comprehensive zip file with everything
!mkdir -p /content/complete_results
!cp -r /content/results/* /content/complete_results/
!cp -r /content/final_results/* /content/complete_results/

# Zip it all
!cd /content && zip -r complete_results.zip complete_results/

print(" Created complete_results.zip")
print("\nContents:")
!zipinfo -1 /content/complete_results.zip | head -20

# Download to your local machine
from google.colab import files
print("\n Downloading zip file to your computer...")
files.download('/content/complete_results.zip')

print("\n" + "="*80)
print(" ALL DONE!")
print("="*80)
print("\nYou now have:")
print(" K-Fold cross-validation results")
print(" Test set evaluation (blind)")
print(" Comparison plots")
print(" Variance analysis")
print(" Results summary table (CSV + LaTeX)")
print(" Sample inference examples")
print(" Everything backed up to GCS")
print("\nNext steps:")
print(" 1. Review the plots and tables")
print(" 2. Write up your research findings")
print(" 3. If needed, train Approach 4 (Contrastive) using same process")
print("="*80)

## Approach 4: Contrastive Learning

In [None]:
!python train_kfold.py --config configs/contrastive.yaml --n_folds 5

# Already saved to GCS automatically!

## Approach 3a: Hybrid Entity + LLM (Inference Only)

In [None]:
# Set OpenAI API key
import os
os.environ['OPENAI_API_KEY'] = 'YOUR_API_KEY_HERE' # Replace with your key

!python inference.py --config configs/hybrid_llm.yaml --model /content/gcs_bucket/experiments/approach1_entity_ner/best_model

# Results saved to GCS automatically!

## Approach 3b: Hybrid Claim + LLM (Inference Only)

In [None]:
!python inference.py --config configs/hybrid_claim_llm.yaml --model /content/gcs_bucket/experiments/approach2_claim_ner/best_model

# Results saved to GCS automatically!

## Compare Results

In [None]:
!python scripts/compare_models.py

# Save final comparison to Drive
save_all_results()

## Download Results

In [None]:
# List all experiments in GCS
list_gcs_experiments()

# Optional: Download results locally and zip
print("\nDownloading all results from GCS...")
download_experiment_results('approach1_entity_ner', './results')
download_experiment_results('approach2_claim_ner', './results')
download_experiment_results('approach4_contrastive', './results')

!zip -r results.zip results/
from google.colab import files
files.download('results.zip')

print(f'\n[COMPLETE] All experiments saved to GCS!')
print(f'View in console: https://console.cloud.google.com/storage/browser/{GCS_BUCKET_NAME}/experiments/')
print(f'\nTo view results directly: gsutil ls -r gs://{GCS_BUCKET_NAME}/experiments/')