# CCC2 - Combined Collaborative Approach (Ensemble)

## Strategy
- **Ensemble**: Combine CCA2 and CCB2 scores with weighted sum
  - CCA2 score: Connection probability
  - CCB2 score: Rating prediction (normalized to [0, 1])
  - final_score = α * CCA_score_norm + β * CCB_score_norm
- **Recommend**: if final_score > threshold

## Key Features:
- ✅ Combines both perspectives simultaneously
- ✅ More flexible than Two-Stage (no hard filtering)
- ✅ Adjustable weights (α, β) for different strategies
- ✅ Single threshold for final decision

## Hyperparameters:
- `ALPHA`: Weight for CCA score (connection)
- `BETA`: Weight for CCB score (rating quality)
- `ENSEMBLE_THRESHOLD`: Final score threshold for recommendation
- `GOOD_RATING_THRESHOLD`: 4.0 (ground truth definition)

## Weight Experiments:
- (α=0.3, β=0.7): CCB 중시 → Rating quality focus
- (α=0.5, β=0.5): 균형 → Balanced approach
- (α=0.7, β=0.3): CCA 중시 → Connection focus

## Expected Performance:
- Higher Recall than CCC1 (less conservative)
- AUC-ROC target: 0.93+
- F1 target: 0.80+

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')
import time

import torch
import torch.nn as nn
from sklearn.metrics import roc_auc_score, precision_score, recall_score, f1_score

SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)

# Device setup (CUDA > MPS > CPU)
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f'Device: {device} ({torch.cuda.get_device_name()})')
elif torch.backends.mps.is_available():
    device = torch.device('mps')
    print(f'Device: {device}')
else:
    device = torch.device('cpu')
    print(f'Device: {device}')

Device: mps


## 1. Data Preprocessing

### CCC2 Strategy:
- Same as CCC1 (CCB-style split)
- Positive: Rating >= 4
- Train on all interactions

In [2]:
df = pd.read_csv('../data/train.csv')

print(f"Total interactions: {len(df):,}")
print(f"Unique users: {df['user'].nunique()}")
print(f"Unique items: {df['item'].nunique()}")

# Define good purchases
GOOD_RATING_THRESHOLD = 4.0
n_good_purchases = (df['rating'] >= GOOD_RATING_THRESHOLD).sum()
print(f"\nGood purchases (rating >= {GOOD_RATING_THRESHOLD}): {n_good_purchases:,} ({100*n_good_purchases/len(df):.1f}%)")

# ID mapping
user2idx = {u: i for i, u in enumerate(sorted(df['user'].unique()))}
item2idx = {it: i for i, it in enumerate(sorted(df['item'].unique()))}
idx2user = {i: u for u, i in user2idx.items()}
idx2item = {i: it for it, i in item2idx.items()}

n_users, n_items = len(user2idx), len(item2idx)

df['user_idx'] = df['user'].map(user2idx)
df['item_idx'] = df['item'].map(item2idx)

print(f"\nUsers: {n_users}, Items: {n_items}")

Total interactions: 105,139
Unique users: 668
Unique items: 10321

Good purchases (rating >= 4.0): 51,830 (49.3%)

Users: 668, Items: 10321


In [3]:
# User별 K값 계산
user_interaction_count = df.groupby('user_idx').size().to_dict()

MAX_K = 100

def get_k_for_user(count):
    if count <= 10:
        return 2
    k = max(2, int(count * 0.2))
    return min(k, MAX_K)

user_k = {u: get_k_for_user(c) for u, c in user_interaction_count.items()}

In [4]:
# Train/Val/Test Split (same as CCC1)
train_data, val_data, test_data = [], [], []

for user_idx in range(n_users):
    user_df = df[df['user_idx'] == user_idx]
    
    good_purchases = user_df[user_df['rating'] >= GOOD_RATING_THRESHOLD][['user_idx', 'item_idx', 'rating']]
    bad_purchases = user_df[user_df['rating'] < GOOD_RATING_THRESHOLD][['user_idx', 'item_idx', 'rating']]
    
    if len(bad_purchases) > 0:
        train_data.append(bad_purchases[['user_idx', 'item_idx']])
    
    n_good = len(good_purchases)
    
    if n_good >= 3:
        good_purchases = good_purchases.sample(frac=1, random_state=SEED).reset_index(drop=True)
        train_end = int(0.7 * n_good)
        val_end = train_end + int(0.15 * n_good)
        
        train_end = max(1, train_end)
        val_end = max(train_end + 1, val_end)
        
        train_data.append(good_purchases.iloc[:train_end][['user_idx', 'item_idx']])
        val_data.append(good_purchases.iloc[train_end:val_end][['user_idx', 'item_idx']])
        test_data.append(good_purchases.iloc[val_end:][['user_idx', 'item_idx']])
    elif n_good == 2:
        good_purchases = good_purchases.sample(frac=1, random_state=SEED).reset_index(drop=True)
        train_data.append(good_purchases.iloc[:1][['user_idx', 'item_idx']])
        val_data.append(good_purchases.iloc[1:][['user_idx', 'item_idx']])
    elif n_good == 1:
        train_data.append(good_purchases[['user_idx', 'item_idx']])

train_df = pd.concat(train_data, ignore_index=True)
val_df = pd.concat(val_data, ignore_index=True) if val_data else pd.DataFrame(columns=['user_idx', 'item_idx'])
test_df = pd.concat(test_data, ignore_index=True) if test_data else pd.DataFrame(columns=['user_idx', 'item_idx'])

print(f"Train: {len(train_df):,}")
print(f"Val: {len(val_df):,}")
print(f"Test: {len(test_df):,}")

Train: 89,294
Val: 7,480
Test: 8,365


In [5]:
# User train items
user_train_items = defaultdict(set)
for u, i in zip(train_df['user_idx'].values, train_df['item_idx'].values):
    user_train_items[int(u)].add(int(i))

## 2. Load Pretrained Models

In [6]:
# Model definitions

class LightGCN(nn.Module):
    def __init__(self, n_users, n_items, emb_dim=64, n_layers=2):
        super().__init__()
        self.n_users = n_users
        self.n_items = n_items
        self.emb_dim = emb_dim
        self.n_layers = n_layers
        
        self.user_emb = nn.Embedding(n_users, emb_dim)
        self.item_emb = nn.Embedding(n_items, emb_dim)
        nn.init.xavier_uniform_(self.user_emb.weight)
        nn.init.xavier_uniform_(self.item_emb.weight)
    
    def forward(self, edge_index, edge_weight):
        all_emb = torch.cat([self.user_emb.weight, self.item_emb.weight], dim=0)
        embs = [all_emb]
        
        for _ in range(self.n_layers):
            row, col = edge_index
            messages = all_emb[col] * edge_weight.unsqueeze(1)
            all_emb = torch.zeros_like(all_emb).scatter_add(
                0, row.unsqueeze(1).expand(-1, self.emb_dim), messages
            )
            embs.append(all_emb)
        
        final_emb = torch.mean(torch.stack(embs), dim=0)
        return final_emb[:self.n_users], final_emb[self.n_users:]


class LightGCN_with_Rating(nn.Module):
    def __init__(self, n_users, n_items, emb_dim=64, n_layers=2):
        super().__init__()
        self.n_users = n_users
        self.n_items = n_items
        self.emb_dim = emb_dim
        self.n_layers = n_layers
        
        self.user_emb = nn.Embedding(n_users, emb_dim)
        self.item_emb = nn.Embedding(n_items, emb_dim)
        nn.init.xavier_uniform_(self.user_emb.weight)
        nn.init.xavier_uniform_(self.item_emb.weight)
        
        self.rating_mlp = nn.Sequential(
            nn.Linear(emb_dim, 32),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(32, 1)
        )
    
    def forward(self, edge_index, edge_weight):
        all_emb = torch.cat([self.user_emb.weight, self.item_emb.weight], dim=0)
        embs = [all_emb]
        
        for _ in range(self.n_layers):
            row, col = edge_index
            messages = all_emb[col] * edge_weight.unsqueeze(1)
            all_emb = torch.zeros_like(all_emb).scatter_add(
                0, row.unsqueeze(1).expand(-1, self.emb_dim), messages
            )
            embs.append(all_emb)
        
        final_emb = torch.mean(torch.stack(embs), dim=0)
        return final_emb[:self.n_users], final_emb[self.n_users:]
    
    def predict_rating(self, user_idx, item_idx, edge_index, edge_weight):
        u_emb, i_emb = self.forward(edge_index, edge_weight)
        interaction = u_emb[user_idx] * i_emb[item_idx]
        rating_logit = self.rating_mlp(interaction).squeeze(-1)
        predicted_rating = torch.sigmoid(rating_logit) * 4.5 + 0.5
        return predicted_rating


print("Model classes defined")

Model classes defined


In [7]:
# Build graphs

def build_unweighted_graph():
    users = train_df['user_idx'].values
    items = train_df['item_idx'].values
    
    edge_u2i = np.array([users, items + n_users])
    edge_i2u = np.array([items + n_users, users])
    edge_index = torch.LongTensor(np.concatenate([edge_u2i, edge_i2u], axis=1))
    
    num_nodes = n_users + n_items
    deg = torch.zeros(num_nodes).scatter_add(0, edge_index[0], torch.ones(edge_index.shape[1]))
    deg_inv_sqrt = deg.pow(-0.5)
    deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
    
    edge_weight = deg_inv_sqrt[edge_index[0]] * deg_inv_sqrt[edge_index[1]]
    
    return edge_index.to(device), edge_weight.to(device)


def build_rating_weighted_graph():
    users = train_df['user_idx'].values
    items = train_df['item_idx'].values
    
    ratings = []
    for u, i in zip(users, items):
        rating = df[(df['user_idx'] == u) & (df['item_idx'] == i)]['rating'].values
        ratings.append(rating[0] if len(rating) > 0 else 3)
    ratings = np.array(ratings)
    
    rating_factors = 0.4 + 0.15 * ratings
    
    edge_u2i = np.array([users, items + n_users])
    edge_i2u = np.array([items + n_users, users])
    edge_index = torch.LongTensor(np.concatenate([edge_u2i, edge_i2u], axis=1))
    
    rating_factors_both = np.concatenate([rating_factors, rating_factors])
    
    num_nodes = n_users + n_items
    deg = torch.zeros(num_nodes).scatter_add(0, edge_index[0], torch.ones(edge_index.shape[1]))
    deg_inv_sqrt = deg.pow(-0.5)
    deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0
    
    base_weight = deg_inv_sqrt[edge_index[0]] * deg_inv_sqrt[edge_index[1]]
    rating_weight = torch.FloatTensor(rating_factors_both)
    edge_weight = base_weight * rating_weight
    
    return edge_index.to(device), edge_weight.to(device)


cca_edge_index, cca_edge_weight = build_unweighted_graph()
ccb_edge_index, ccb_edge_weight = build_rating_weighted_graph()

print(f"Graphs built")

Graphs built


In [8]:
# Load models

EMB_DIM = 32
N_LAYERS = 2

cca_model = LightGCN(n_users, n_items, EMB_DIM, N_LAYERS).to(device)
cca_model.load_state_dict(torch.load('../cc_models/cca2_best.pt'))
cca_model.eval()

ccb_model = LightGCN_with_Rating(n_users, n_items, EMB_DIM, N_LAYERS).to(device)
ccb_model.load_state_dict(torch.load('../cc_models/ccb2_best.pt'))
ccb_model.eval()

print("Models loaded")

Models loaded


## 3. Score Normalization

To combine CCA and CCB scores, we need to normalize them to [0, 1]:

In [9]:
# Get all embeddings for score range calculation
print("Calculating score ranges for normalization...")

with torch.no_grad():
    cca_u_emb, cca_i_emb = cca_model(cca_edge_index, cca_edge_weight)

# Sample 1000 user-item pairs to estimate score ranges
sample_size = 1000
sample_users = np.random.choice(n_users, sample_size)
sample_items = np.random.choice(n_items, sample_size)

cca_scores = []
ccb_ratings = []

for u, i in zip(sample_users, sample_items):
    with torch.no_grad():
        cca_score = (cca_u_emb[u] * cca_i_emb[i]).sum().item()
        cca_scores.append(cca_score)
        
        u_t = torch.tensor([u], dtype=torch.long).to(device)
        i_t = torch.tensor([i], dtype=torch.long).to(device)
        ccb_rating = ccb_model.predict_rating(u_t, i_t, ccb_edge_index, ccb_edge_weight).item()
        ccb_ratings.append(ccb_rating)

CCA_MIN = np.min(cca_scores)
CCA_MAX = np.max(cca_scores)
CCB_MIN = 0.5  # Known range
CCB_MAX = 5.0  # Known range

print(f"\nCCA score range: [{CCA_MIN:.4f}, {CCA_MAX:.4f}]")
print(f"CCB rating range: [{CCB_MIN:.4f}, {CCB_MAX:.4f}]")


def normalize_cca_score(score):
    """Normalize CCA score to [0, 1]"""
    if CCA_MAX == CCA_MIN:
        return 0.5
    return (score - CCA_MIN) / (CCA_MAX - CCA_MIN)


def normalize_ccb_rating(rating):
    """Normalize CCB rating to [0, 1]"""
    return (rating - CCB_MIN) / (CCB_MAX - CCB_MIN)


print("\nNormalization functions ready!")

Calculating score ranges for normalization...

CCA score range: [-0.6239, 3.0902]
CCB rating range: [0.5000, 5.0000]

Normalization functions ready!


## 4. Ensemble Prediction

Combine CCA and CCB scores with weighted sum:

In [10]:
# Ensemble hyperparameters
ALPHA = 0.5  # Weight for CCA (connection)
BETA = 0.5   # Weight for CCB (rating)
ENSEMBLE_THRESHOLD = 0.5  # Threshold for final score

print(f"Ensemble Hyperparameters:")
print(f"  α (CCA weight): {ALPHA}")
print(f"  β (CCB weight): {BETA}")
print(f"  Threshold: {ENSEMBLE_THRESHOLD}")
print(f"\nStrategy: final_score = {ALPHA} * CCA_norm + {BETA} * CCB_norm")

Ensemble Hyperparameters:
  α (CCA weight): 0.5
  β (CCB weight): 0.5
  Threshold: 0.5

Strategy: final_score = 0.5 * CCA_norm + 0.5 * CCB_norm


In [11]:
def predict_ensemble(test_input_df, alpha=0.5, beta=0.5, threshold=0.5, verbose=True, show_details=False):
    """
    ★ CCC2: Ensemble Recommendation
    
    Combine CCA and CCB scores with weighted sum.
    final_score = α * normalize(CCA_score) + β * normalize(CCB_rating)
    
    Args:
        test_input_df: Test data (user, item columns)
        alpha: Weight for CCA score
        beta: Weight for CCB rating
        threshold: Final score threshold for recommendation
        verbose: Print AGENTS.md format
        show_details: Show individual scores
    
    Returns:
        results_df: DataFrame with recommendations
    """
    cca_model.eval()
    ccb_model.eval()
    
    # Get embeddings
    with torch.no_grad():
        cca_u_emb, cca_i_emb = cca_model(cca_edge_index, cca_edge_weight)
    
    results = []
    stats = {'total_o': 0, 'total_items': 0}

    for _, row in test_input_df.iterrows():
        user = row['user']
        item = row['item']
        stats['total_items'] += 1
        
        # Unknown user/item → X
        if user not in user2idx or item not in item2idx:
            results.append({
                'user': user,
                'item': item,
                'recommend': 'X',
                'reason': 'unknown'
            })
            continue
        
        user_idx = user2idx[user]
        item_idx = item2idx[item]
        
        # Train item → X
        if item_idx in user_train_items[user_idx]:
            results.append({
                'user': user,
                'item': item,
                'recommend': 'X',
                'reason': 'in_train'
            })
            continue
        
        # Calculate CCA score
        with torch.no_grad():
            cca_score = (cca_u_emb[user_idx] * cca_i_emb[item_idx]).sum().item()
        
        # Calculate CCB rating
        with torch.no_grad():
            u_t = torch.tensor([user_idx], dtype=torch.long).to(device)
            i_t = torch.tensor([item_idx], dtype=torch.long).to(device)
            ccb_rating = ccb_model.predict_rating(u_t, i_t, ccb_edge_index, ccb_edge_weight).item()
        
        # Normalize scores
        cca_norm = normalize_cca_score(cca_score)
        ccb_norm = normalize_ccb_rating(ccb_rating)
        
        # Ensemble score
        final_score = alpha * cca_norm + beta * ccb_norm
        
        # Decision
        if final_score >= threshold:
            recommend = 'O'
            stats['total_o'] += 1
        else:
            recommend = 'X'
        
        results.append({
            'user': user,
            'item': item,
            'recommend': recommend,
            'cca_score': cca_score,
            'ccb_rating': ccb_rating,
            'cca_norm': cca_norm,
            'ccb_norm': ccb_norm,
            'final_score': final_score
        })

    results_df = pd.DataFrame(results)
    
    # Print results
    if verbose:
        print("=" * 80)
        if show_details:
            print(f"{'user':<8} {'item':<8} {'CCA':<8} {'CCB':<8} {'CCA_n':<8} {'CCB_n':<8} {'Final':<8} {'Rec':<4}")
            for _, r in results_df.iterrows():
                if 'cca_score' in r:
                    print(f"{r['user']:<8} {r['item']:<8} {r['cca_score']:<8.3f} {r['ccb_rating']:<8.2f} "
                          f"{r['cca_norm']:<8.3f} {r['ccb_norm']:<8.3f} {r['final_score']:<8.3f} {r['recommend']:<4}")
                else:
                    print(f"{r['user']:<8} {r['item']:<8} {'N/A':<8} {'N/A':<8} {'N/A':<8} {'N/A':<8} {'N/A':<8} {r['recommend']:<4}")
        else:
            print(f"{'user':<10} {'item':<10} {'recommend':<10}")
            for _, r in results_df.iterrows():
                print(f"{r['user']:<10} {r['item']:<10} {r['recommend']:<10}")
        print("=" * 80)
        print(f"Total recommends = {stats['total_o']}/{stats['total_items']}")
        print(f"Not recommend = {stats['total_items'] - stats['total_o']}/{stats['total_items']}")
        print()

    return results_df


print("Ensemble prediction function ready!")

Ensemble prediction function ready!


## 5. Sample Prediction Test

In [12]:
# Test with sample1.csv
sample1 = pd.read_csv('../data/sample1.csv')

print("Sample1.csv Test (CCC2 - Ensemble):")
print(f"α={ALPHA}, β={BETA}, threshold={ENSEMBLE_THRESHOLD}")
print()
predictions1 = predict_ensemble(sample1, ALPHA, BETA, ENSEMBLE_THRESHOLD, verbose=True, show_details=True)

Sample1.csv Test (CCC2 - Ensemble):
α=0.5, β=0.5, threshold=0.5

user     item     CCA      CCB      CCA_n    CCB_n    Final    Rec 
109      3745     1.801    3.24     0.653    0.609    0.631    O   
88       4447     1.672    3.21     0.618    0.602    0.610    O   
71       4306     1.856    4.73     0.668    0.939    0.804    O   
66       1747     1.675    3.45     0.619    0.656    0.638    O   
15       66934    0.612    4.99     0.333    0.999    0.666    O   
Total recommends = 5/5
Not recommend = 0/5



In [13]:
# Test with sample2.csv
sample2 = pd.read_csv('../data/sample2.csv')

print("Sample2.csv Test (CCC2 - Ensemble):")
print(f"α={ALPHA}, β={BETA}, threshold={ENSEMBLE_THRESHOLD}")
print()
predictions2 = predict_ensemble(sample2, ALPHA, BETA, ENSEMBLE_THRESHOLD, verbose=True, show_details=True)

Sample2.csv Test (CCC2 - Ensemble):
α=0.5, β=0.5, threshold=0.5

user     item     CCA      CCB      CCA_n    CCB_n    Final    Rec 
109.0    3745.0   1.801    3.24     0.653    0.609    0.631    O   
88.0     4447.0   1.672    3.21     0.618    0.602    0.610    O   
71.0     4306.0   1.856    4.73     0.668    0.939    0.804    O   
66.0     1747.0   1.675    3.45     0.619    0.656    0.638    O   
15.0     66934.0  0.612    4.99     0.333    0.999    0.666    O   
Total recommends = 5/5
Not recommend = 0/5



## 6. Validation Evaluation

In [14]:
# Convert val_df to test format
val_test_df = val_df.copy()
val_test_df['user'] = val_test_df['user_idx'].map(idx2user)
val_test_df['item'] = val_test_df['item_idx'].map(idx2item)
val_test_df = val_test_df[['user', 'item']]

print(f"Evaluating on validation set: {len(val_test_df)} samples")

val_predictions = predict_ensemble(val_test_df, ALPHA, BETA, ENSEMBLE_THRESHOLD, verbose=False)

# All items in val are positive (rating >= 4)
val_labels = np.ones(len(val_predictions))
val_preds = (val_predictions['recommend'] == 'O').astype(int).values

val_acc = (val_preds == val_labels).mean()
val_prec = precision_score(val_labels, val_preds, zero_division=0)
val_rec = recall_score(val_labels, val_preds, zero_division=0)
val_f1 = f1_score(val_labels, val_preds, zero_division=0)

print(f"\nValidation Performance (CCC2 - Ensemble):")
print(f"  Accuracy: {val_acc:.4f}")
print(f"  Precision: {val_prec:.4f}")
print(f"  Recall: {val_rec:.4f}")
print(f"  F1 Score: {val_f1:.4f}")
print(f"  O ratio: {val_preds.mean()*100:.1f}%")

Evaluating on validation set: 7480 samples

Validation Performance (CCC2 - Ensemble):
  Accuracy: 0.7876
  Precision: 1.0000
  Recall: 0.7876
  F1 Score: 0.8812
  O ratio: 78.8%


## 7. Test Evaluation

In [15]:
# Convert test_df to test format
test_test_df = test_df.copy()
test_test_df['user'] = test_test_df['user_idx'].map(idx2user)
test_test_df['item'] = test_test_df['item_idx'].map(idx2item)
test_test_df = test_test_df[['user', 'item']]

print(f"Evaluating on test set: {len(test_test_df)} samples")

test_predictions = predict_ensemble(test_test_df, ALPHA, BETA, ENSEMBLE_THRESHOLD, verbose=False)

test_labels = np.ones(len(test_predictions))
test_preds = (test_predictions['recommend'] == 'O').astype(int).values

test_acc = (test_preds == test_labels).mean()
test_prec = precision_score(test_labels, test_preds, zero_division=0)
test_rec = recall_score(test_labels, test_preds, zero_division=0)
test_f1 = f1_score(test_labels, test_preds, zero_division=0)

print(f"\nTest Performance (CCC2 - Ensemble):")
print(f"  Accuracy: {test_acc:.4f}")
print(f"  Precision: {test_prec:.4f}")
print(f"  Recall: {test_rec:.4f}")
print(f"  F1 Score: {test_f1:.4f}")
print(f"  O ratio: {test_preds.mean()*100:.1f}%")

Evaluating on test set: 8365 samples

Test Performance (CCC2 - Ensemble):
  Accuracy: 0.7811
  Precision: 1.0000
  Recall: 0.7811
  F1 Score: 0.8771
  O ratio: 78.1%


## 8. AUC-ROC Evaluation

In [16]:
print("Calculating AUC-ROC with negative samples...")

# Positive and negative samples
val_pos_users = val_df['user_idx'].values
val_pos_items = val_df['item_idx'].values

val_test_edges = set()
for u, i in zip(val_df['user_idx'].values, val_df['item_idx'].values):
    val_test_edges.add((int(u), int(i)))
for u, i in zip(test_df['user_idx'].values, test_df['item_idx'].values):
    val_test_edges.add((int(u), int(i)))

n_neg = len(val_df)
neg_users, neg_items = [], []
attempts = 0
max_attempts = n_neg * 100

while len(neg_users) < n_neg and attempts < max_attempts:
    u = np.random.randint(0, n_users)
    i = np.random.randint(0, n_items)
    attempts += 1
    
    if i not in user_train_items[u] and (u, i) not in val_test_edges:
        neg_users.append(u)
        neg_items.append(i)

# Score positive samples
pos_scores = []
with torch.no_grad():
    cca_u_emb, cca_i_emb = cca_model(cca_edge_index, cca_edge_weight)

for u_idx, i_idx in zip(val_pos_users, val_pos_items):
    with torch.no_grad():
        cca_score = (cca_u_emb[u_idx] * cca_i_emb[i_idx]).sum().item()
        u_t = torch.tensor([u_idx], dtype=torch.long).to(device)
        i_t = torch.tensor([i_idx], dtype=torch.long).to(device)
        ccb_rating = ccb_model.predict_rating(u_t, i_t, ccb_edge_index, ccb_edge_weight).item()
    
    cca_norm = normalize_cca_score(cca_score)
    ccb_norm = normalize_ccb_rating(ccb_rating)
    final_score = ALPHA * cca_norm + BETA * ccb_norm
    pos_scores.append(final_score)

# Score negative samples
neg_scores = []
for u_idx, i_idx in zip(neg_users, neg_items):
    with torch.no_grad():
        cca_score = (cca_u_emb[u_idx] * cca_i_emb[i_idx]).sum().item()
        u_t = torch.tensor([u_idx], dtype=torch.long).to(device)
        i_t = torch.tensor([i_idx], dtype=torch.long).to(device)
        ccb_rating = ccb_model.predict_rating(u_t, i_t, ccb_edge_index, ccb_edge_weight).item()
    
    cca_norm = normalize_cca_score(cca_score)
    ccb_norm = normalize_ccb_rating(ccb_rating)
    final_score = ALPHA * cca_norm + BETA * ccb_norm
    neg_scores.append(final_score)

# AUC-ROC
all_scores = np.concatenate([pos_scores, neg_scores])
all_labels = np.concatenate([np.ones(len(pos_scores)), np.zeros(len(neg_scores))])

val_auc = roc_auc_score(all_labels, all_scores)

print(f"\nValidation AUC-ROC: {val_auc:.4f}")
print(f"  Positive scores: mean={np.mean(pos_scores):.4f}, std={np.std(pos_scores):.4f}")
print(f"  Negative scores: mean={np.mean(neg_scores):.4f}, std={np.std(neg_scores):.4f}")

Calculating AUC-ROC with negative samples...

Validation AUC-ROC: 0.9366
  Positive scores: mean=0.6099, std=0.1618
  Negative scores: mean=0.2630, std=0.1249


## 9. Weight Experiments

Try different α, β combinations:

In [17]:
# Experiment with different weights
weight_configs = [
    (0.3, 0.7, "CCB focus"),
    (0.5, 0.5, "Balanced"),
    (0.7, 0.3, "CCA focus")
]

print("Weight Experiment Results:")
print("=" * 80)
print(f"{'Config':<15} {'α':<6} {'β':<6} {'AUC':<8} {'Prec':<8} {'Rec':<8} {'F1':<8}")
print("=" * 80)

for alpha, beta, name in weight_configs:
    # Recalculate scores with new weights
    pos_scores_exp = []
    for u_idx, i_idx in zip(val_pos_users[:1000], val_pos_items[:1000]):  # Sample for speed
        with torch.no_grad():
            cca_score = (cca_u_emb[u_idx] * cca_i_emb[i_idx]).sum().item()
            u_t = torch.tensor([u_idx], dtype=torch.long).to(device)
            i_t = torch.tensor([i_idx], dtype=torch.long).to(device)
            ccb_rating = ccb_model.predict_rating(u_t, i_t, ccb_edge_index, ccb_edge_weight).item()
        
        cca_norm = normalize_cca_score(cca_score)
        ccb_norm = normalize_ccb_rating(ccb_rating)
        final_score = alpha * cca_norm + beta * ccb_norm
        pos_scores_exp.append(final_score)
    
    neg_scores_exp = []
    for u_idx, i_idx in zip(neg_users[:1000], neg_items[:1000]):
        with torch.no_grad():
            cca_score = (cca_u_emb[u_idx] * cca_i_emb[i_idx]).sum().item()
            u_t = torch.tensor([u_idx], dtype=torch.long).to(device)
            i_t = torch.tensor([i_idx], dtype=torch.long).to(device)
            ccb_rating = ccb_model.predict_rating(u_t, i_t, ccb_edge_index, ccb_edge_weight).item()
        
        cca_norm = normalize_cca_score(cca_score)
        ccb_norm = normalize_ccb_rating(ccb_rating)
        final_score = alpha * cca_norm + beta * ccb_norm
        neg_scores_exp.append(final_score)
    
    all_scores_exp = np.concatenate([pos_scores_exp, neg_scores_exp])
    all_labels_exp = np.concatenate([np.ones(len(pos_scores_exp)), np.zeros(len(neg_scores_exp))])
    
    auc_exp = roc_auc_score(all_labels_exp, all_scores_exp)
    
    # Use threshold = 0.5 for classification
    preds_exp = (all_scores_exp > 0.5).astype(int)
    prec_exp = precision_score(all_labels_exp, preds_exp, zero_division=0)
    rec_exp = recall_score(all_labels_exp, preds_exp, zero_division=0)
    f1_exp = f1_score(all_labels_exp, preds_exp, zero_division=0)
    
    print(f"{name:<15} {alpha:<6.1f} {beta:<6.1f} {auc_exp:<8.4f} {prec_exp:<8.4f} {rec_exp:<8.4f} {f1_exp:<8.4f}")

print("=" * 80)

Weight Experiment Results:
Config          α      β      AUC      Prec     Rec      F1      
CCB focus       0.3    0.7    0.9363   0.8763   0.8570   0.8665  
Balanced        0.5    0.5    0.9536   0.9238   0.8360   0.8777  
CCA focus       0.7    0.3    0.9633   0.9538   0.7840   0.8606  


## 10. Final Summary

In [18]:
print("="*70)
print("CCC2 - Combined Collaborative Approach (Ensemble)")
print("="*70)

print(f"\nStrategy:")
print(f"  Ensemble: final_score = α * CCA_norm + β * CCB_norm")
print(f"  Current weights: α={ALPHA}, β={BETA}")
print(f"  Threshold: {ENSEMBLE_THRESHOLD}")

print(f"\nModels:")
print(f"  CCA2: {sum(p.numel() for p in cca_model.parameters()):,} parameters")
print(f"  CCB2: {sum(p.numel() for p in ccb_model.parameters()):,} parameters")

print(f"\nValidation Performance:")
print(f"  AUC-ROC: {val_auc:.4f}")
print(f"  Accuracy: {val_acc:.4f}")
print(f"  Precision: {val_prec:.4f}")
print(f"  Recall: {val_rec:.4f}")
print(f"  F1 Score: {val_f1:.4f}")

print(f"\nTest Performance:")
print(f"  Accuracy: {test_acc:.4f}")
print(f"  Precision: {test_prec:.4f}")
print(f"  Recall: {test_rec:.4f}")
print(f"  F1 Score: {test_f1:.4f}")

print(f"\nReady for comparison with CCA2, CCB2, CCC1, and CCC3!")

CCC2 - Combined Collaborative Approach (Ensemble)

Strategy:
  Ensemble: final_score = α * CCA_norm + β * CCB_norm
  Current weights: α=0.5, β=0.5
  Threshold: 0.5

Models:
  CCA2: 351,648 parameters
  CCB2: 352,737 parameters

Validation Performance:
  AUC-ROC: 0.9366
  Accuracy: 0.7876
  Precision: 1.0000
  Recall: 0.7876
  F1 Score: 0.8812

Test Performance:
  Accuracy: 0.7811
  Precision: 1.0000
  Recall: 0.7811
  F1 Score: 0.8771

Ready for comparison with CCA2, CCB2, CCC1, and CCC3!
