# üè• Heart Disease Prediction - V20 Ensemble

## **Advanced 4-Submission Blend (A+B+C+D)**

### Strategy:
- **Load 4 previous submissions:** A (V16), B (V17), C, D
- **Rank normalize all**
- **Create two groups:** (A+B) and (C+D)
- **Blend groups with multiple ratios:** CD√ó(0.5-0.9) + AB√ó(0.1-0.5)
- **Main submission:** 70% CD + 30% AB (best ratio)

### Results:
- **Kaggle Score:** 0.95406 ‚≠ê Best yet!
- **Rank:** 101/3993 (2.5% percentile) üèÜ
- **Status:** BREAKTHROUGH!

In [None]:
import pandas as pd
import numpy as np

TARGET = 'Heart Disease'

# Load all 4 submissions
sub_a = pd.read_csv('A.csv')  # V16: 0.95359
sub_b = pd.read_csv('B.csv')  # V17: 0.95360
sub_c = pd.read_csv('C.csv')  # Blend variant
sub_d = pd.read_csv('D.csv')  # Blend variant

print(f"A range: [{sub_a[TARGET].min():.4f}, {sub_a[TARGET].max():.4f}]")
print(f"B range: [{sub_b[TARGET].min():.4f}, {sub_b[TARGET].max():.4f}]")
print(f"C range: [{sub_c[TARGET].min():.4f}, {sub_c[TARGET].max():.4f}]")
print(f"D range: [{sub_d[TARGET].min():.4f}, {sub_d[TARGET].max():.4f}]")

## Step 2: Rank Normalize & Create Groups

In [None]:
# Rank normalize all
def rank_norm(preds):
    return pd.Series(preds).rank(pct=True).values

a_rn = rank_norm(sub_a[TARGET].values)
b_rn = rank_norm(sub_b[TARGET].values)
c_rn = rank_norm(sub_c[TARGET].values)
d_rn = rank_norm(sub_d[TARGET].values)

print(f"\nRank normalized all submissions")

# Average of A + B
ab = 0.5 * a_rn + 0.5 * b_rn
print(f"AB (A+B average) created")

# Average of C + D
cd = 0.5 * c_rn + 0.5 * d_rn
print(f"CD (C+D average) created")

## Step 3: Test Multiple Blend Ratios (CD vs AB)

In [None]:
# Test 5 different blend ratios: CD√ó% + AB√ó%
blend_options = {
    'blend_cd90_ab10': 0.9 * cd + 0.1 * ab,  # 90% CD, 10% AB
    'blend_cd80_ab20': 0.8 * cd + 0.2 * ab,  # 80% CD, 20% AB
    'blend_cd70_ab30': 0.7 * cd + 0.3 * ab,  # 70% CD, 30% AB ‚Üê Best
    'blend_cd60_ab40': 0.6 * cd + 0.4 * ab,  # 60% CD, 40% AB
    'blend_cd50_ab50': 0.5 * cd + 0.5 * ab,  # 50% CD, 50% AB
}

print("\nSaving blend options:")
for name, preds in blend_options.items():
    sub = pd.DataFrame({
        'id': sub_a['id'],
        'Heart Disease': np.clip(preds, 0, 1)
    })
    sub.to_csv(f'{name}.csv', index=False)
    print(f"  ‚úì {name}.csv")

## Step 4: Main Submission (70/30 CD+AB)

In [None]:
# Save 70/30 as main submission (best ratio)
final = pd.DataFrame({
    'id': sub_a['id'],
    'Heart Disease': np.clip(0.7 * cd + 0.3 * ab, 0, 1)
})
final.to_csv('submission.csv', index=False)

print("\n" + "="*50)
print("SUBMISSION READY")
print("="*50)
print(f"\nMain: blend_cd70_ab30.csv (70% CD + 30% AB)")
print(f"\nFirst 10 predictions:")
print(final.head(10))
print(f"\nRange: [{final['Heart Disease'].min():.6f}, {final['Heart Disease'].max():.6f}]")
print(f"\n‚≠ê Expected Score: 0.95406")
print(f"‚≠ê Expected Rank: 101/3993 (2.5%)")