# DeBERTa + Qwen Mega-Ensemble - Strategy A

## Overview
This notebook combines the best elements from the two highest-performing approaches:
- **Experiment 7** (DeBERTa ensemble - 0.917 AUC)
- **Experiment 1** (Qwen ensemble - 0.916 AUC)

## Strategy: Mega-Ensemble
Instead of picking one approach, we create a **mega-ensemble** of 8 models:

**From Experiment 7 (DeBERTa approach):**
1. DeBERTa v3 base
2. DistilRoBERTa
3. DeBERTa AUC variant

**From Experiment 1 (Qwen approach):**
4. Qwen 0.5B with LoRA
5. Qwen 14B with LoRA

**Shared (both use this):**
6. Qwen3 Embeddings with semantic search

**Total: 6 distinct model predictions**

## Ensemble Strategy
We'll use a **two-stage ensemble**:

### Stage 1: Within-Method Ensembles
- **DeBERTa ensemble**: Blend DeBERTa v3, DistilRoBERTa, DeBERTa AUC (weights from Exp 7)
- **Qwen ensemble**: Blend Qwen 0.5B, Qwen 14B, Qwen3 Embeddings (weights from Exp 1)

### Stage 2: Meta-Ensemble
- Blend the two ensemble outputs with optimized weights
- Expected best weights: 55% DeBERTa (slightly better) + 45% Qwen

## Expected Performance
- **Target**: 0.918-0.920 AUC
- **Rationale**: Combining two strong, diverse approaches should capture complementary patterns
- **Risk**: Low (both base approaches are proven)

## Acknowledgments

This work combines insights from:

**Experiment 7: DeBERTa Large 2epochs 1hr (0.917 AUC)**
- Author: [itahiro](https://www.kaggle.com/itahiro)
- Notebook: https://www.kaggle.com/code/itahiro/deberta-large-2epochs-1hr
- Contribution: DeBERTa ensemble architecture, URL semantic extraction

**Experiment 1: Qwen Multi-Model Ensemble (0.916 AUC)**
- Internal experiments with Qwen 0.5B, 14B, and embeddings
- Contribution: LoRA fine-tuning approach, semantic search methodology

## Implementation Note

Since both Experiment 7 and Experiment 1 notebooks already contain complete implementations for their respective approaches, this notebook focuses on the **meta-ensemble** stage.

### Required Pre-requisites:
1. Run Experiment 7 notebook to generate:
   - `submission_deberta.csv`
   - `submission_distilroberta.csv`
   - `submission_debertaauc.csv`
   - `submission_qwen3.csv` (from semantic search)
   - `submission_qwen14b.csv`

2. Run Experiment 1 notebook components to generate:
   - `submission_qwen05b.csv` (Qwen 0.5B with LoRA)

### This Notebook:
Combines all 6 predictions using a two-stage ensemble approach.

In [None]:
import pandas as pd
import numpy as np

# Load all model predictions
print("Loading model predictions...")

# DeBERTa-based models (from Experiment 7)
deberta = pd.read_csv('submission_deberta.csv')
distilroberta = pd.read_csv('submission_distilroberta.csv')
deberta_auc = pd.read_csv('submission_debertaauc.csv')

# Qwen-based models (from Experiment 1)
qwen_05b = pd.read_csv('submission_qwen.csv')  # Qwen 0.5B from Exp 1
qwen_14b = pd.read_csv('submission_qwen14b.csv')

# Shared semantic search (used by both)
qwen3_embeddings = pd.read_csv('submission_qwen3.csv')

print("All predictions loaded successfully!")
print(f"Number of test samples: {len(deberta)}")

In [None]:
# Stage 1: Create within-method ensembles
print("\n=== Stage 1: Within-Method Ensembles ===")

# Rank normalization function
def rank_normalize(series):
    return series.rank(method='average') / (len(series) + 1)

# DeBERTa Ensemble (using Experiment 7 weights: 0.5, 0.1, 0.2 for main components)
# Adjusted to exclude Qwen models: renormalize to sum to 1.0
# Original: 0.5 DeBERTa + 0.1 DistilRoBERTa + 0.2 DeBERTa AUC (+ 0.1 Qwen3 + 0.1 Qwen14B)
# New (DeBERTa only): 0.625 DeBERTa + 0.125 DistilRoBERTa + 0.25 DeBERTa AUC
r_deberta = rank_normalize(deberta['rule_violation'])
r_distilroberta = rank_normalize(distilroberta['rule_violation'])
r_deberta_auc = rank_normalize(deberta_auc['rule_violation'])

deberta_ensemble = 0.625 * r_deberta + 0.125 * r_distilroberta + 0.25 * r_deberta_auc

print(f"DeBERTa Ensemble created (3 models)")
print(f"  Weights: [0.625, 0.125, 0.25]")
print(f"  Mean: {deberta_ensemble.mean():.4f}, Std: {deberta_ensemble.std():.4f}")

# Qwen Ensemble (using Experiment 1 approach)
# Typical weights: 0.5 for 0.5B, 0.3 for embeddings, 0.2 for 14B
r_qwen_05b = rank_normalize(qwen_05b['rule_violation'])
r_qwen_14b = rank_normalize(qwen_14b['rule_violation'])
r_qwen3 = rank_normalize(qwen3_embeddings['rule_violation'])

qwen_ensemble = 0.5 * r_qwen_05b + 0.3 * r_qwen3 + 0.2 * r_qwen_14b

print(f"\nQwen Ensemble created (3 models)")
print(f"  Weights: [0.5, 0.3, 0.2]")
print(f"  Mean: {qwen_ensemble.mean():.4f}, Std: {qwen_ensemble.std():.4f}")

In [None]:
# Stage 2: Meta-Ensemble
print("\n=== Stage 2: Meta-Ensemble ===")

# Experiment 7 (DeBERTa) scored 0.917, Experiment 1 (Qwen) scored 0.916
# Give slightly more weight to DeBERTa: 55% vs 45%
deberta_weight = 0.55
qwen_weight = 0.45

# Rank normalize the ensemble predictions
r_deberta_ens = rank_normalize(deberta_ensemble)
r_qwen_ens = rank_normalize(qwen_ensemble)

# Final mega-ensemble
mega_ensemble = deberta_weight * r_deberta_ens + qwen_weight * r_qwen_ens

print(f"Meta-Ensemble Weights:")
print(f"  DeBERTa approach: {deberta_weight:.2f} (Exp 7 - 0.917 AUC)")
print(f"  Qwen approach: {qwen_weight:.2f} (Exp 1 - 0.916 AUC)")
print(f"\nFinal Prediction Statistics:")
print(f"  Mean: {mega_ensemble.mean():.4f}")
print(f"  Std: {mega_ensemble.std():.4f}")
print(f"  Min: {mega_ensemble.min():.4f}")
print(f"  Max: {mega_ensemble.max():.4f}")

In [None]:
# Create final submission
submission = pd.DataFrame({
    'row_id': deberta['row_id'],
    'rule_violation': mega_ensemble
})

submission.to_csv('/kaggle/working/submission.csv', index=False)

print("\n=== Submission Created ===")
print(f"Saved to: /kaggle/working/submission.csv")
print(f"\nFirst 10 rows:")
print(submission.head(10))

print("\n=== Expected Performance ===")
print(f"Target: 0.918-0.920 AUC")
print(f"Rationale: Combining two proven approaches (0.917 + 0.916) with complementary strengths")
print(f"Risk Level: Low (both base methods are validated)")

## Alternative Ensemble Weights to Try

If the 55/45 split doesn't work well, here are alternatives to test:

1. **Equal weight (50/50)**: Treats both methods equally
2. **Conservative (60/40)**: More weight to DeBERTa since it scored slightly better
3. **Aggressive (70/30)**: Heavily favor the better method

You can modify the `deberta_weight` and `qwen_weight` variables in the meta-ensemble cell above to test these alternatives.

## Summary

This mega-ensemble notebook:
- Combines 6 distinct model predictions
- Uses a two-stage ensemble approach
- Leverages diversity from both DeBERTa and Qwen methods
- Expected to achieve 0.918-0.920 AUC

The key insight: Instead of choosing between two good approaches, combine them to capture different patterns in the data.