# Network Latency Prediction: Vertical Partitioning Approach

This notebook demonstrates the vertical partitioning strategy for network latency prediction. We'll split features into infrastructure-related (Model A) and user behavior-related (Model B) subsets, train specialized models, and fuse their predictions.

## Table of Contents
1. [Data Loading and Exploration](#data-loading)
2. [Data Preprocessing](#preprocessing)
3. [Vertical Feature Partitioning](#vertical-partitioning)
4. [Model Training](#model-training)
5. [Model Fusion](#model-fusion)
6. [Performance Evaluation](#evaluation)
7. [Results Analysis](#analysis)
8. [Conclusions](#conclusions)

## Setup and Imports

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

# Import project modules
from data_processing.data_loader import DataLoader
from data_processing.feature_engineering import FeatureEngineer
from models.infrastructure_model import InfrastructureModel
from models.user_behavior_model import UserBehaviorModel
from models.fusion_model import FusionModel
from models.monolithic_model import MonolithicModel
from evaluation.model_evaluator import ModelEvaluator

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Configuration
DATA_FILE = 'test_data.xlsx'
TEST_SIZE = 0.2
RANDOM_STATE = 42

print("Setup complete!")

<a id="data-loading"></a>
## 1. Data Loading and Exploration

Let's start by loading our network latency dataset and exploring its structure.

In [None]:
# Initialize data loader
data_loader = DataLoader()

# Load the dataset
print("Loading dataset...")
raw_data = data_loader.load_dataset(DATA_FILE)

print(f"Dataset loaded successfully!")
print(f"Shape: {raw_data.shape}")
print(f"Columns: {list(raw_data.columns)}")

In [None]:
# Display basic information about the dataset
print("Dataset Info:")
print(raw_data.info())
print("\nFirst 5 rows:")
raw_data.head()

In [None]:
# Statistical summary
print("Statistical Summary:")
raw_data.describe()

In [None]:
# Check for missing values
print("Missing Values:")
missing_values = raw_data.isnull().sum()
print(missing_values[missing_values > 0])

if missing_values.sum() == 0:
    print("No missing values found!")

### Data Visualization

In [None]:
# Create visualizations for data exploration
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
fig.suptitle('Network Latency Dataset - Exploratory Data Analysis', fontsize=16)

# Target variable distribution
axes[0, 0].hist(raw_data['Latency (ms)'], bins=30, alpha=0.7, color='skyblue')
axes[0, 0].set_title('Latency Distribution')
axes[0, 0].set_xlabel('Latency (ms)')
axes[0, 0].set_ylabel('Frequency')

# Signal Strength distribution
axes[0, 1].hist(raw_data['Signal Strength (dBm)'], bins=30, alpha=0.7, color='lightgreen')
axes[0, 1].set_title('Signal Strength Distribution')
axes[0, 1].set_xlabel('Signal Strength (dBm)')
axes[0, 1].set_ylabel('Frequency')

# Network Traffic distribution
axes[0, 2].hist(raw_data['Network Traffic (MB)'], bins=30, alpha=0.7, color='salmon')
axes[0, 2].set_title('Network Traffic Distribution')
axes[0, 2].set_xlabel('Network Traffic (MB)')
axes[0, 2].set_ylabel('Frequency')

# User Count distribution
axes[1, 0].hist(raw_data['User Count'], bins=30, alpha=0.7, color='gold')
axes[1, 0].set_title('User Count Distribution')
axes[1, 0].set_xlabel('User Count')
axes[1, 0].set_ylabel('Frequency')

# Device Type distribution
device_counts = raw_data['Device Type'].value_counts()
axes[1, 1].bar(device_counts.index, device_counts.values, alpha=0.7, color='plum')
axes[1, 1].set_title('Device Type Distribution')
axes[1, 1].set_xlabel('Device Type')
axes[1, 1].set_ylabel('Count')
axes[1, 1].tick_params(axis='x', rotation=45)

# Location Type distribution
location_counts = raw_data['Location Type'].value_counts()
axes[1, 2].bar(location_counts.index, location_counts.values, alpha=0.7, color='lightcoral')
axes[1, 2].set_title('Location Type Distribution')
axes[1, 2].set_xlabel('Location Type')
axes[1, 2].set_ylabel('Count')

plt.tight_layout()
plt.show()

In [None]:
# Correlation analysis
# Select only numeric columns for correlation
numeric_cols = ['Signal Strength (dBm)', 'Network Traffic (MB)', 'Latency (ms)', 'User Count']
correlation_matrix = raw_data[numeric_cols].corr()

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=0.5)
plt.title('Feature Correlation Matrix')
plt.tight_layout()
plt.show()

print("Correlation with Latency:")
latency_corr = correlation_matrix['Latency (ms)'].sort_values(key=abs, ascending=False)
for feature, corr in latency_corr.items():
    if feature != 'Latency (ms)':
        print(f"{feature}: {corr:.3f}")

<a id="preprocessing"></a>
## 2. Data Preprocessing

Now let's preprocess the data to prepare it for model training.

In [None]:
# Validate data structure
print("Validating data structure...")
is_valid = data_loader.validate_data(raw_data)

if not is_valid:
    validation_errors = data_loader.get_validation_errors()
    print("Validation issues found:")
    for error in validation_errors:
        print(f"  - {error}")
else:
    print("Data validation passed!")

In [None]:
# Preprocess the data
print("Preprocessing data...")
processed_data = data_loader.preprocess_data(raw_data)

print(f"Original data shape: {raw_data.shape}")
print(f"Processed data shape: {processed_data.shape}")
print(f"Rows removed during preprocessing: {len(raw_data) - len(processed_data)}")

# Display processed data summary
print("\nProcessed data summary:")
processed_data.describe()

In [None]:
# Split data into training and testing sets
print("Splitting data into train/test sets...")
train_data, test_data = train_test_split(
    processed_data, 
    test_size=TEST_SIZE, 
    random_state=RANDOM_STATE
)

print(f"Training set shape: {train_data.shape}")
print(f"Testing set shape: {test_data.shape}")
print(f"Training set size: {len(train_data)} samples")
print(f"Testing set size: {len(test_data)} samples")

<a id="vertical-partitioning"></a>
## 3. Vertical Feature Partitioning

In vertical partitioning, we split features into two groups:
- **Model A (Infrastructure)**: Signal Strength, Network Traffic
- **Model B (User Behavior)**: User Count, Device Type

In [None]:
# Initialize feature engineer
feature_engineer = FeatureEngineer()

# Split features vertically for training data
print("Performing vertical feature partitioning...")
train_infra, train_user = feature_engineer.split_vertical_features(
    train_data, include_target=True
)

# Split features vertically for testing data
test_infra, test_user = feature_engineer.split_vertical_features(
    test_data, include_target=True
)

print("Vertical partitioning completed!")
print(f"\nInfrastructure features (Model A):")
print(f"  Training shape: {train_infra.shape}")
print(f"  Testing shape: {test_infra.shape}")
print(f"  Columns: {list(train_infra.columns)}")

print(f"\nUser Behavior features (Model B):")
print(f"  Training shape: {train_user.shape}")
print(f"  Testing shape: {test_user.shape}")
print(f"  Columns: {list(train_user.columns)}")

In [None]:
# Visualize the feature partitioning
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Vertical Feature Partitioning Visualization', fontsize=16)

# Infrastructure features vs Latency
axes[0, 0].scatter(train_infra['Signal Strength (dBm)'], train_infra['Latency (ms)'], 
                   alpha=0.6, color='blue')
axes[0, 0].set_title('Signal Strength vs Latency')
axes[0, 0].set_xlabel('Signal Strength (dBm)')
axes[0, 0].set_ylabel('Latency (ms)')

axes[0, 1].scatter(train_infra['Network Traffic (MB)'], train_infra['Latency (ms)'], 
                   alpha=0.6, color='green')
axes[0, 1].set_title('Network Traffic vs Latency')
axes[0, 1].set_xlabel('Network Traffic (MB)')
axes[0, 1].set_ylabel('Latency (ms)')

# User behavior features vs Latency
axes[1, 0].scatter(train_user['User Count'], train_user['Latency (ms)'], 
                   alpha=0.6, color='red')
axes[1, 0].set_title('User Count vs Latency')
axes[1, 0].set_xlabel('User Count')
axes[1, 0].set_ylabel('Latency (ms)')

# Device Type vs Latency (boxplot)
device_types = train_user['Device Type'].unique()
device_latencies = [train_user[train_user['Device Type'] == dt]['Latency (ms)'].values 
                   for dt in device_types]
axes[1, 1].boxplot(device_latencies, labels=device_types)
axes[1, 1].set_title('Device Type vs Latency')
axes[1, 1].set_xlabel('Device Type')
axes[1, 1].set_ylabel('Latency (ms)')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

<a id="model-training"></a>
## 4. Model Training

Now we'll train our specialized models:
- **Model A**: Infrastructure Model (Signal Strength + Network Traffic)
- **Model B**: User Behavior Model (User Count + Device Type)

### 4.1 Training Model A (Infrastructure Model)

In [None]:
# Prepare features for Model A (Infrastructure)
print("Training Model A (Infrastructure Model)...")
X_train_infra, y_train_infra = feature_engineer.prepare_infrastructure_features(
    train_infra, fit_scaler=True
)
X_test_infra, y_test_infra = feature_engineer.prepare_infrastructure_features(
    test_infra, fit_scaler=False
)

print(f"Infrastructure training features shape: {X_train_infra.shape}")
print(f"Infrastructure testing features shape: {X_test_infra.shape}")

# Train Infrastructure Model
infrastructure_model = InfrastructureModel(random_state=RANDOM_STATE)
infrastructure_model.train(X_train_infra, y_train_infra)

print("Model A training completed!")

In [None]:
# Evaluate Model A
pred_infra_train = infrastructure_model.predict(X_train_infra)
pred_infra_test = infrastructure_model.predict(X_test_infra)

# Calculate metrics
evaluator = ModelEvaluator()
infra_train_metrics = evaluator.calculate_metrics(y_train_infra, pred_infra_train)
infra_test_metrics = evaluator.calculate_metrics(y_test_infra, pred_infra_test)

print("Model A (Infrastructure) Performance:")
print(f"Training - MAE: {infra_train_metrics['MAE']:.3f}, RMSE: {infra_train_metrics['RMSE']:.3f}, R²: {infra_train_metrics['R2_Score']:.3f}")
print(f"Testing  - MAE: {infra_test_metrics['MAE']:.3f}, RMSE: {infra_test_metrics['RMSE']:.3f}, R²: {infra_test_metrics['R2_Score']:.3f}")

### 4.2 Training Model B (User Behavior Model)

In [None]:
# Prepare features for Model B (User Behavior)
print("Training Model B (User Behavior Model)...")

# Prepare user behavior features - use raw features for UserBehaviorModel
user_count_train = train_user['User Count'].values.reshape(-1, 1)
device_type_train = train_user['Device Type'].values.reshape(-1, 1)
X_train_user = np.hstack([user_count_train, device_type_train])
y_train_user = train_user['Latency (ms)'].values

user_count_test = test_user['User Count'].values.reshape(-1, 1)
device_type_test = test_user['Device Type'].values.reshape(-1, 1)
X_test_user = np.hstack([user_count_test, device_type_test])
y_test_user = test_user['Latency (ms)'].values

print(f"User behavior training features shape: {X_train_user.shape}")
print(f"User behavior testing features shape: {X_test_user.shape}")

# Train User Behavior Model
user_behavior_model = UserBehaviorModel(random_state=RANDOM_STATE)
user_behavior_model.train(X_train_user, y_train_user)

print("Model B training completed!")

In [None]:
# Evaluate Model B
pred_user_train = user_behavior_model.predict(X_train_user)
pred_user_test = user_behavior_model.predict(X_test_user)

# Calculate metrics
user_train_metrics = evaluator.calculate_metrics(y_train_user, pred_user_train)
user_test_metrics = evaluator.calculate_metrics(y_test_user, pred_user_test)

print("Model B (User Behavior) Performance:")
print(f"Training - MAE: {user_train_metrics['MAE']:.3f}, RMSE: {user_train_metrics['RMSE']:.3f}, R²: {user_train_metrics['R2_Score']:.3f}")
print(f"Testing  - MAE: {user_test_metrics['MAE']:.3f}, RMSE: {user_test_metrics['RMSE']:.3f}, R²: {user_test_metrics['R2_Score']:.3f}")

### 4.3 Individual Model Predictions Visualization

In [None]:
# Visualize individual model predictions
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
fig.suptitle('Individual Model Predictions vs Actual Values', fontsize=16)

# Model A predictions
axes[0].scatter(y_test_infra, pred_infra_test, alpha=0.6, color='blue')
axes[0].plot([y_test_infra.min(), y_test_infra.max()], 
             [y_test_infra.min(), y_test_infra.max()], 'r--', lw=2)
axes[0].set_title(f'Model A (Infrastructure)\nR² = {infra_test_metrics["R2_Score"]:.3f}')
axes[0].set_xlabel('Actual Latency (ms)')
axes[0].set_ylabel('Predicted Latency (ms)')
axes[0].grid(True, alpha=0.3)

# Model B predictions
axes[1].scatter(y_test_user, pred_user_test, alpha=0.6, color='green')
axes[1].plot([y_test_user.min(), y_test_user.max()], 
             [y_test_user.min(), y_test_user.max()], 'r--', lw=2)
axes[1].set_title(f'Model B (User Behavior)\nR² = {user_test_metrics["R2_Score"]:.3f}')
axes[1].set_xlabel('Actual Latency (ms)')
axes[1].set_ylabel('Predicted Latency (ms)')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<a id="model-fusion"></a>
## 5. Model Fusion

Now we'll combine the predictions from both models using a fusion strategy.

In [None]:
# Prepare combined features for fusion model
def prepare_fusion_features(data):
    """Prepare features for fusion model."""
    required_cols = ['Signal Strength (dBm)', 'Network Traffic (MB)', 'User Count', 'Device Type']
    features = []
    features.append(data['Signal Strength (dBm)'].values)
    features.append(data['Network Traffic (MB)'].values)
    features.append(data['User Count'].values)
    features.append(data['Device Type'].values)
    return np.column_stack(features)

print("Training Fusion Model...")
X_train_combined = prepare_fusion_features(train_data)
y_train_combined = train_data['Latency (ms)'].values

X_test_combined = prepare_fusion_features(test_data)
y_test_combined = test_data['Latency (ms)'].values

print(f"Combined training features shape: {X_train_combined.shape}")
print(f"Combined testing features shape: {X_test_combined.shape}")

# Train fusion model
fusion_model = FusionModel(
    fusion_strategy='weighted_average',
    random_state=RANDOM_STATE
)
fusion_model.train(X_train_combined, y_train_combined)

print("Fusion model training completed!")

# Get fusion weights
weights = fusion_model.get_fusion_weights()
if weights:
    print(f"\nFusion weights:")
    print(f"  Infrastructure Model: {weights['infrastructure']:.3f}")
    print(f"  User Behavior Model: {weights['user_behavior']:.3f}")

In [None]:
# Evaluate Fusion Model
pred_fusion_train = fusion_model.predict(X_train_combined)
pred_fusion_test = fusion_model.predict(X_test_combined)

# Calculate metrics
fusion_train_metrics = evaluator.calculate_metrics(y_train_combined, pred_fusion_train)
fusion_test_metrics = evaluator.calculate_metrics(y_test_combined, pred_fusion_test)

print("Fusion Model Performance:")
print(f"Training - MAE: {fusion_train_metrics['MAE']:.3f}, RMSE: {fusion_train_metrics['RMSE']:.3f}, R²: {fusion_train_metrics['R2_Score']:.3f}")
print(f"Testing  - MAE: {fusion_test_metrics['MAE']:.3f}, RMSE: {fusion_test_metrics['RMSE']:.3f}, R²: {fusion_test_metrics['R2_Score']:.3f}")

### 5.1 Baseline Monolithic Model

Let's train a baseline model using all features together for comparison.

In [None]:
# Train baseline monolithic model
print("Training Baseline Monolithic Model...")

X_train_mono = train_data.drop(columns=['Tower ID', 'Latency (ms)'])
y_train_mono = train_data['Latency (ms)'].values

X_test_mono = test_data.drop(columns=['Tower ID', 'Latency (ms)'])
y_test_mono = test_data['Latency (ms)'].values

print(f"Monolithic training features shape: {X_train_mono.shape}")
print(f"Monolithic testing features shape: {X_test_mono.shape}")

# Train monolithic model
monolithic_model = MonolithicModel(random_state=RANDOM_STATE)
monolithic_model.train(X_train_mono, y_train_mono)

print("Monolithic model training completed!")

In [None]:
# Evaluate Monolithic Model
pred_mono_train = monolithic_model.predict(X_train_mono)
pred_mono_test = monolithic_model.predict(X_test_mono)

# Calculate metrics
mono_train_metrics = evaluator.calculate_metrics(y_train_mono, pred_mono_train)
mono_test_metrics = evaluator.calculate_metrics(y_test_mono, pred_mono_test)

print("Monolithic Model Performance:")
print(f"Training - MAE: {mono_train_metrics['MAE']:.3f}, RMSE: {mono_train_metrics['RMSE']:.3f}, R²: {mono_train_metrics['R2_Score']:.3f}")
print(f"Testing  - MAE: {mono_test_metrics['MAE']:.3f}, RMSE: {mono_test_metrics['RMSE']:.3f}, R²: {mono_test_metrics['R2_Score']:.3f}")

<a id="evaluation"></a>
## 6. Performance Evaluation

Let's compare all models and analyze their performance.

In [None]:
# Create comprehensive performance comparison
results = {
    'Infrastructure Model (A)': infra_test_metrics,
    'User Behavior Model (B)': user_test_metrics,
    'Fusion Model': fusion_test_metrics,
    'Monolithic Model': mono_test_metrics
}

# Create performance comparison table
comparison_data = []
for model_name, metrics in results.items():
    comparison_data.append({
        'Model': model_name,
        'MAE': metrics['MAE'],
        'RMSE': metrics['RMSE'],
        'R²': metrics['R2_Score'],
        'MAPE': metrics.get('MAPE', 0)
    })

comparison_df = pd.DataFrame(comparison_data)
comparison_df = comparison_df.sort_values('R²', ascending=False).reset_index(drop=True)

print("Model Performance Comparison:")
print(comparison_df.to_string(index=False, float_format='%.4f'))

In [None]:
# Visualize model performance comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 6))
fig.suptitle('Model Performance Comparison', fontsize=16)

models = comparison_df['Model']
colors = ['skyblue', 'lightgreen', 'salmon', 'gold']

# MAE comparison
axes[0].bar(models, comparison_df['MAE'], color=colors, alpha=0.7)
axes[0].set_title('Mean Absolute Error (MAE)')
axes[0].set_ylabel('MAE')
axes[0].tick_params(axis='x', rotation=45)
axes[0].grid(True, alpha=0.3)

# RMSE comparison
axes[1].bar(models, comparison_df['RMSE'], color=colors, alpha=0.7)
axes[1].set_title('Root Mean Square Error (RMSE)')
axes[1].set_ylabel('RMSE')
axes[1].tick_params(axis='x', rotation=45)
axes[1].grid(True, alpha=0.3)

# R² comparison
axes[2].bar(models, comparison_df['R²'], color=colors, alpha=0.7)
axes[2].set_title('R² Score')
axes[2].set_ylabel('R² Score')
axes[2].tick_params(axis='x', rotation=45)
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Prediction comparison visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Prediction vs Actual Comparison for All Models', fontsize=16)

predictions = {
    'Infrastructure Model (A)': pred_infra_test,
    'User Behavior Model (B)': pred_user_test,
    'Fusion Model': pred_fusion_test,
    'Monolithic Model': pred_mono_test
}

y_actual = y_test_combined
colors = ['blue', 'green', 'red', 'orange']

for i, (model_name, pred) in enumerate(predictions.items()):
    row, col = i // 2, i % 2
    r2 = results[model_name]['R2_Score']
    
    axes[row, col].scatter(y_actual, pred, alpha=0.6, color=colors[i])
    axes[row, col].plot([y_actual.min(), y_actual.max()], 
                        [y_actual.min(), y_actual.max()], 'r--', lw=2)
    axes[row, col].set_title(f'{model_name}\nR² = {r2:.3f}')
    axes[row, col].set_xlabel('Actual Latency (ms)')
    axes[row, col].set_ylabel('Predicted Latency (ms)')
    axes[row, col].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<a id="analysis"></a>
## 7. Results Analysis

Let's analyze the results and provide insights about the vertical partitioning approach.

In [None]:
# Generate detailed analysis
best_model = comparison_df.iloc[0]
print(f"VERTICAL PARTITIONING ANALYSIS")
print("=" * 50)
print(f"\nBest Performing Model: {best_model['Model']}")
print(f"  - R² Score: {best_model['R²']:.4f}")
print(f"  - MAE: {best_model['MAE']:.4f}")
print(f"  - RMSE: {best_model['RMSE']:.4f}")

# Performance interpretation
best_r2 = best_model['R²']
if best_r2 > 0.8:
    interpretation = "Excellent model performance achieved (R² > 0.8)"
elif best_r2 > 0.6:
    interpretation = "Good model performance achieved (R² > 0.6)"
elif best_r2 > 0.4:
    interpretation = "Moderate model performance achieved (R² > 0.4)"
else:
    interpretation = "Model performance needs improvement (R² < 0.4)"

print(f"\nPerformance Interpretation: {interpretation}")

# Compare fusion vs individual models
fusion_r2 = results['Fusion Model']['R2_Score']
infra_r2 = results['Infrastructure Model (A)']['R2_Score']
user_r2 = results['User Behavior Model (B)']['R2_Score']
mono_r2 = results['Monolithic Model']['R2_Score']

print(f"\nVertical Partitioning Analysis:")
if fusion_r2 > max(infra_r2, user_r2):
    print("✓ Fusion model outperforms individual models")
    print("  This indicates successful feature complementarity in vertical partitioning")
else:
    print("⚠ Individual models perform better than fusion")
    print("  This suggests potential overfitting or suboptimal fusion strategy")

if fusion_r2 > mono_r2:
    print("✓ Vertical partitioning approach outperforms monolithic model")
    improvement = ((fusion_r2 - mono_r2) / mono_r2) * 100
    print(f"  Performance improvement: {improvement:.2f}%")
else:
    print("⚠ Monolithic model outperforms vertical partitioning")
    decline = ((mono_r2 - fusion_r2) / mono_r2) * 100
    print(f"  Performance decline: {decline:.2f}%")

# Feature importance analysis
print(f"\nFeature Group Analysis:")
print(f"  Infrastructure Features (Model A) R²: {infra_r2:.4f}")
print(f"  User Behavior Features (Model B) R²: {user_r2:.4f}")

if infra_r2 > user_r2:
    print("  → Infrastructure features are more predictive of network latency")
else:
    print("  → User behavior features are more predictive of network latency")

# Fusion weights analysis
if weights:
    print(f"\nFusion Strategy Analysis:")
    print(f"  Infrastructure weight: {weights['infrastructure']:.3f}")
    print(f"  User behavior weight: {weights['user_behavior']:.3f}")
    
    if weights['infrastructure'] > weights['user_behavior']:
        print("  → Fusion model relies more heavily on infrastructure features")
    else:
        print("  → Fusion model relies more heavily on user behavior features")

In [None]:
# Error analysis
print("\nERROR ANALYSIS")
print("=" * 30)

# Calculate residuals for best model
if best_model['Model'] == 'Fusion Model':
    residuals = y_test_combined - pred_fusion_test
elif best_model['Model'] == 'Monolithic Model':
    residuals = y_test_mono - pred_mono_test
elif best_model['Model'] == 'Infrastructure Model (A)':
    residuals = y_test_infra - pred_infra_test
else:
    residuals = y_test_user - pred_user_test

print(f"Residual Statistics for {best_model['Model']}:")
print(f"  Mean: {np.mean(residuals):.4f}")
print(f"  Std: {np.std(residuals):.4f}")
print(f"  Min: {np.min(residuals):.4f}")
print(f"  Max: {np.max(residuals):.4f}")

# Residual plot
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(range(len(residuals)), residuals, alpha=0.6)
plt.axhline(y=0, color='r', linestyle='--')
plt.title(f'Residuals Plot - {best_model["Model"]}')
plt.xlabel('Sample Index')
plt.ylabel('Residuals')
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.hist(residuals, bins=20, alpha=0.7, color='skyblue')
plt.title('Residuals Distribution')
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

<a id="conclusions"></a>
## 8. Conclusions

### Key Findings:

1. **Vertical Partitioning Effectiveness**: The vertical partitioning approach successfully separated features into meaningful groups (infrastructure vs. user behavior).

2. **Model Performance**: Individual models showed different strengths in predicting network latency based on their specialized feature sets.

3. **Fusion Strategy**: The fusion model combined predictions from both specialized models, potentially capturing complementary information.

4. **Comparison with Baseline**: The comparison with the monolithic model provides insights into whether feature specialization improves prediction accuracy.

### Recommendations:

- **Feature Engineering**: Consider additional feature transformations or interactions
- **Fusion Strategies**: Experiment with different fusion approaches (stacking, meta-learning)
- **Model Selection**: Evaluate different algorithms for each specialized model
- **Hyperparameter Tuning**: Optimize model parameters for better performance

### Next Steps:

1. Implement horizontal partitioning approach for geographical-based modeling
2. Compare vertical vs. horizontal partitioning strategies
3. Deploy the best-performing model for production use
4. Monitor model performance over time and retrain as needed

In [None]:
# Save results for future reference
print("Saving results...")

# Save comparison table
comparison_df.to_csv('vertical_partitioning_notebook_results.csv', index=False)

# Save detailed results
with open('vertical_partitioning_notebook_summary.txt', 'w') as f:
    f.write("VERTICAL PARTITIONING NOTEBOOK RESULTS\n")
    f.write("=" * 50 + "\n\n")
    f.write(f"Best Performing Model: {best_model['Model']}\n")
    f.write(f"R² Score: {best_model['R²']:.4f}\n")
    f.write(f"MAE: {best_model['MAE']:.4f}\n")
    f.write(f"RMSE: {best_model['RMSE']:.4f}\n\n")
    f.write("DETAILED RESULTS:\n")
    f.write(comparison_df.to_string(index=False))

print("Results saved to:")
print("  - vertical_partitioning_notebook_results.csv")
print("  - vertical_partitioning_notebook_summary.txt")
print("\nNotebook execution completed successfully!")