# Framework Comparison Analysis: TFF vs Flower

## Technical Factors Evaluation

This notebook compares TensorFlow Federated and Flower based on:

### 1. Communication Cost
- Total bytes sent/received
- Per-round communication overhead

### 2. Security (Poisoning Attack Robustness)
- **Model Poisoning**: Adding +100 to gradients
- **Data Poisoning**: Flipping labels (buggy ↔ non-buggy)
- **Metrics**: F1 Score, Accuracy, Precision, Recall, Loss (before/after)

## 1. Setup

In [None]:
# Install required packages
!pip install matplotlib numpy pandas scipy -q
print("Setup complete!")

In [None]:
# Import libraries
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from typing import Dict, List, Any
from dataclasses import dataclass, asdict

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

## 2. Upload Results Files

Upload the results from TFF and Flower experiments:
- `tff_results.json` (from notebook 01)
- `flower_results.json` (from notebook 02)

In [None]:
from google.colab import files

print("Please upload tff_results.json and flower_results.json")
print("(You can select both files at once)")
uploaded = files.upload()

print(f"\nUploaded {len(uploaded)} file(s): {list(uploaded.keys())}")

In [None]:
# Load results
with open('tff_results.json', 'r') as f:
    tff_results = json.load(f)

with open('flower_results.json', 'r') as f:
    flower_results = json.load(f)

print("TFF Results loaded:")
print(f"  - Framework: {tff_results['framework']}")
print(f"  - Experiments: {list(tff_results['experiments'].keys())}")

print("\nFlower Results loaded:")
print(f"  - Framework: {flower_results['framework']}")
print(f"  - Experiments: {list(flower_results['experiments'].keys())}")

## 3. Baseline Metrics Comparison

In [None]:
# Extract baseline metrics
tff_baseline = tff_results['baseline_metrics']
flower_baseline = flower_results['baseline_metrics']

print("="*70)
print("BASELINE METRICS COMPARISON (Before Any Attacks)")
print("="*70)
print(f"\n{'Metric':<20} {'TFF':>15} {'Flower':>15} {'Difference':>15}")
print("-"*65)

metrics = ['accuracy', 'f1_score', 'precision', 'recall', 'loss']
for metric in metrics:
    tff_val = tff_baseline[metric]
    flower_val = flower_baseline[metric]
    diff = flower_val - tff_val
    print(f"{metric:<20} {tff_val:>15.4f} {flower_val:>15.4f} {diff:>+15.4f}")

# Determine which is better for each metric
print("\n" + "-"*65)
print("Summary:")
better_acc = "Flower" if flower_baseline['accuracy'] > tff_baseline['accuracy'] else "TFF"
better_f1 = "Flower" if flower_baseline['f1_score'] > tff_baseline['f1_score'] else "TFF"
print(f"  Better Accuracy: {better_acc}")
print(f"  Better F1 Score: {better_f1}")

## 4. Communication Cost Analysis

In [None]:
# Extract communication data
tff_comm = tff_results['communication']
flower_comm = flower_results['communication']

print("="*70)
print("COMMUNICATION COST COMPARISON")
print("="*70)
print(f"\n{'Metric':<30} {'TFF':>18} {'Flower':>18}")
print("-"*70)
print(f"{'Total Communication (MB)':<30} {tff_comm['total_bytes']/1e6:>18.2f} {flower_comm['total_bytes']/1e6:>18.2f}")
print(f"{'Per Round (KB)':<30} {tff_comm['per_round_bytes']/1e3:>18.2f} {flower_comm['per_round_bytes']/1e3:>18.2f}")

# Calculate difference
comm_diff_pct = ((tff_comm['total_bytes'] - flower_comm['total_bytes']) / flower_comm['total_bytes']) * 100
print(f"\n→ Flower uses {abs(comm_diff_pct):.1f}% {'less' if comm_diff_pct > 0 else 'more'} bandwidth than TFF")

In [None]:
# Visualize communication cost
fig, ax = plt.subplots(figsize=(10, 5))

frameworks = ['TFF', 'Flower']
total_comm = [tff_comm['total_bytes']/1e6, flower_comm['total_bytes']/1e6]

bars = ax.bar(frameworks, total_comm, color=['#2196F3', '#4CAF50'], width=0.5)
ax.set_ylabel('Total Communication (MB)')
ax.set_title('Communication Cost Comparison: TFF vs Flower')

# Add value labels
for bar, val in zip(bars, total_comm):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
            f'{val:.2f} MB', ha='center', va='bottom', fontsize=12)

plt.tight_layout()
plt.savefig('communication_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

## 5. Security Analysis: Model Poisoning (Gradient +100)

In [None]:
# Extract model poisoning results
tff_model_poison = tff_results['poisoning_analysis']['model_poisoning']
flower_model_poison = flower_results['poisoning_analysis']['model_poisoning']

print("="*80)
print("SECURITY ANALYSIS: MODEL POISONING (Gradient +100)")
print("="*80)

# Create comparison table
print("\n" + "-"*80)
print("TFF - Model Poisoning Results")
print("-"*80)
print(f"{'Malicious %':<15} {'Accuracy':>12} {'F1 Score':>12} {'Precision':>12} {'Recall':>12} {'Loss':>12}")
print("-"*80)
print(f"{'Baseline':<15} {tff_baseline['accuracy']:>12.4f} {tff_baseline['f1_score']:>12.4f} {tff_baseline['precision']:>12.4f} {tff_baseline['recall']:>12.4f} {tff_baseline['loss']:>12.4f}")

for pct, data in sorted(tff_model_poison.items()):
    m = data['metrics_after']
    print(f"{pct:<15} {m['accuracy']:>12.4f} {m['f1_score']:>12.4f} {m['precision']:>12.4f} {m['recall']:>12.4f} {m['loss']:>12.4f}")

print("\n" + "-"*80)
print("Flower - Model Poisoning Results")
print("-"*80)
print(f"{'Malicious %':<15} {'Accuracy':>12} {'F1 Score':>12} {'Precision':>12} {'Recall':>12} {'Loss':>12}")
print("-"*80)
print(f"{'Baseline':<15} {flower_baseline['accuracy']:>12.4f} {flower_baseline['f1_score']:>12.4f} {flower_baseline['precision']:>12.4f} {flower_baseline['recall']:>12.4f} {flower_baseline['loss']:>12.4f}")

for pct, data in sorted(flower_model_poison.items()):
    m = data['metrics_after']
    print(f"{pct:<15} {m['accuracy']:>12.4f} {m['f1_score']:>12.4f} {m['precision']:>12.4f} {m['recall']:>12.4f} {m['loss']:>12.4f}")

In [None]:
# Model Poisoning: Metric Drops Comparison
print("\n" + "="*80)
print("MODEL POISONING: METRIC DROPS FROM BASELINE")
print("="*80)

print(f"\n{'Malicious %':<15} {'TFF Acc Drop':>15} {'Flower Acc Drop':>15} {'TFF F1 Drop':>15} {'Flower F1 Drop':>15}")
print("-"*80)

for pct in sorted(tff_model_poison.keys()):
    tff_acc_drop = tff_model_poison[pct]['accuracy_drop']
    flower_acc_drop = flower_model_poison[pct]['accuracy_drop']
    tff_f1_drop = tff_model_poison[pct]['f1_drop']
    flower_f1_drop = flower_model_poison[pct]['f1_drop']
    print(f"{pct:<15} {tff_acc_drop:>+15.4f} {flower_acc_drop:>+15.4f} {tff_f1_drop:>+15.4f} {flower_f1_drop:>+15.4f}")

In [None]:
# Visualize Model Poisoning Impact
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

percentages = sorted(tff_model_poison.keys())
x_labels = ['Baseline'] + [p for p in percentages]
x = np.arange(len(x_labels))
width = 0.35

# Accuracy comparison
tff_acc = [tff_baseline['accuracy']] + [tff_model_poison[p]['metrics_after']['accuracy'] for p in percentages]
flower_acc = [flower_baseline['accuracy']] + [flower_model_poison[p]['metrics_after']['accuracy'] for p in percentages]

axes[0].bar(x - width/2, tff_acc, width, label='TFF', color='#2196F3')
axes[0].bar(x + width/2, flower_acc, width, label='Flower', color='#4CAF50')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Model Poisoning: Accuracy Impact')
axes[0].set_xticks(x)
axes[0].set_xticklabels(x_labels)
axes[0].legend()
axes[0].set_ylim(0, 1)

# F1 Score comparison
tff_f1 = [tff_baseline['f1_score']] + [tff_model_poison[p]['metrics_after']['f1_score'] for p in percentages]
flower_f1 = [flower_baseline['f1_score']] + [flower_model_poison[p]['metrics_after']['f1_score'] for p in percentages]

axes[1].bar(x - width/2, tff_f1, width, label='TFF', color='#2196F3')
axes[1].bar(x + width/2, flower_f1, width, label='Flower', color='#4CAF50')
axes[1].set_ylabel('F1 Score')
axes[1].set_title('Model Poisoning: F1 Score Impact')
axes[1].set_xticks(x)
axes[1].set_xticklabels(x_labels)
axes[1].legend()
axes[1].set_ylim(0, 1)

plt.suptitle('Security Analysis: Model Poisoning (Gradient +100)', fontsize=14)
plt.tight_layout()
plt.savefig('model_poisoning_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

## 6. Security Analysis: Data Poisoning (Label Flipping)

In [None]:
# Extract data poisoning results
tff_data_poison = tff_results['poisoning_analysis']['data_poisoning']
flower_data_poison = flower_results['poisoning_analysis']['data_poisoning']

print("="*80)
print("SECURITY ANALYSIS: DATA POISONING (Label Flipping)")
print("="*80)

print("\n" + "-"*80)
print("TFF - Data Poisoning Results")
print("-"*80)
print(f"{'Malicious %':<15} {'Accuracy':>12} {'F1 Score':>12} {'Precision':>12} {'Recall':>12} {'Loss':>12}")
print("-"*80)
print(f"{'Baseline':<15} {tff_baseline['accuracy']:>12.4f} {tff_baseline['f1_score']:>12.4f} {tff_baseline['precision']:>12.4f} {tff_baseline['recall']:>12.4f} {tff_baseline['loss']:>12.4f}")

for pct, data in sorted(tff_data_poison.items()):
    m = data['metrics_after']
    print(f"{pct:<15} {m['accuracy']:>12.4f} {m['f1_score']:>12.4f} {m['precision']:>12.4f} {m['recall']:>12.4f} {m['loss']:>12.4f}")

print("\n" + "-"*80)
print("Flower - Data Poisoning Results")
print("-"*80)
print(f"{'Malicious %':<15} {'Accuracy':>12} {'F1 Score':>12} {'Precision':>12} {'Recall':>12} {'Loss':>12}")
print("-"*80)
print(f"{'Baseline':<15} {flower_baseline['accuracy']:>12.4f} {flower_baseline['f1_score']:>12.4f} {flower_baseline['precision']:>12.4f} {flower_baseline['recall']:>12.4f} {flower_baseline['loss']:>12.4f}")

for pct, data in sorted(flower_data_poison.items()):
    m = data['metrics_after']
    print(f"{pct:<15} {m['accuracy']:>12.4f} {m['f1_score']:>12.4f} {m['precision']:>12.4f} {m['recall']:>12.4f} {m['loss']:>12.4f}")

In [None]:
# Data Poisoning: Metric Drops Comparison
print("\n" + "="*80)
print("DATA POISONING: METRIC DROPS FROM BASELINE")
print("="*80)

print(f"\n{'Malicious %':<15} {'TFF Acc Drop':>15} {'Flower Acc Drop':>15} {'TFF F1 Drop':>15} {'Flower F1 Drop':>15}")
print("-"*80)

for pct in sorted(tff_data_poison.keys()):
    tff_acc_drop = tff_data_poison[pct]['accuracy_drop']
    flower_acc_drop = flower_data_poison[pct]['accuracy_drop']
    tff_f1_drop = tff_data_poison[pct]['f1_drop']
    flower_f1_drop = flower_data_poison[pct]['f1_drop']
    print(f"{pct:<15} {tff_acc_drop:>+15.4f} {flower_acc_drop:>+15.4f} {tff_f1_drop:>+15.4f} {flower_f1_drop:>+15.4f}")

In [None]:
# Visualize Data Poisoning Impact
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

percentages = sorted(tff_data_poison.keys())
x_labels = ['Baseline'] + [p for p in percentages]
x = np.arange(len(x_labels))
width = 0.35

# Accuracy comparison
tff_acc = [tff_baseline['accuracy']] + [tff_data_poison[p]['metrics_after']['accuracy'] for p in percentages]
flower_acc = [flower_baseline['accuracy']] + [flower_data_poison[p]['metrics_after']['accuracy'] for p in percentages]

axes[0].bar(x - width/2, tff_acc, width, label='TFF', color='#2196F3')
axes[0].bar(x + width/2, flower_acc, width, label='Flower', color='#4CAF50')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Data Poisoning: Accuracy Impact')
axes[0].set_xticks(x)
axes[0].set_xticklabels(x_labels)
axes[0].legend()
axes[0].set_ylim(0, 1)

# F1 Score comparison
tff_f1 = [tff_baseline['f1_score']] + [tff_data_poison[p]['metrics_after']['f1_score'] for p in percentages]
flower_f1 = [flower_baseline['f1_score']] + [flower_data_poison[p]['metrics_after']['f1_score'] for p in percentages]

axes[1].bar(x - width/2, tff_f1, width, label='TFF', color='#2196F3')
axes[1].bar(x + width/2, flower_f1, width, label='Flower', color='#4CAF50')
axes[1].set_ylabel('F1 Score')
axes[1].set_title('Data Poisoning: F1 Score Impact')
axes[1].set_xticks(x)
axes[1].set_xticklabels(x_labels)
axes[1].legend()
axes[1].set_ylim(0, 1)

plt.suptitle('Security Analysis: Data Poisoning (Label Flipping)', fontsize=14)
plt.tight_layout()
plt.savefig('data_poisoning_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

## 7. Comprehensive Security Comparison

In [None]:
# Calculate average robustness scores
def calculate_robustness(poison_results, baseline):
    """Calculate robustness as 1 - average metric drop."""
    acc_drops = []
    f1_drops = []
    for pct, data in poison_results.items():
        acc_drops.append(data['accuracy_drop'])
        f1_drops.append(data['f1_drop'])
    avg_drop = (np.mean(acc_drops) + np.mean(f1_drops)) / 2
    return max(0, 1 - avg_drop)

# TFF robustness
tff_model_robust = calculate_robustness(tff_model_poison, tff_baseline)
tff_data_robust = calculate_robustness(tff_data_poison, tff_baseline)
tff_overall_robust = (tff_model_robust + tff_data_robust) / 2

# Flower robustness
flower_model_robust = calculate_robustness(flower_model_poison, flower_baseline)
flower_data_robust = calculate_robustness(flower_data_poison, flower_baseline)
flower_overall_robust = (flower_model_robust + flower_data_robust) / 2

print("="*70)
print("SECURITY ROBUSTNESS SCORES (Higher is Better)")
print("="*70)
print(f"\n{'Metric':<30} {'TFF':>15} {'Flower':>15}")
print("-"*60)
print(f"{'Model Poisoning Robustness':<30} {tff_model_robust:>15.4f} {flower_model_robust:>15.4f}")
print(f"{'Data Poisoning Robustness':<30} {tff_data_robust:>15.4f} {flower_data_robust:>15.4f}")
print(f"{'Overall Robustness':<30} {tff_overall_robust:>15.4f} {flower_overall_robust:>15.4f}")

more_robust = "TFF" if tff_overall_robust > flower_overall_robust else "Flower"
print(f"\n→ {more_robust} is more robust against poisoning attacks")

In [None]:
# Comprehensive visualization: All metrics at 20% malicious
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Get 20% results (most common test case)
pct_key = '20pct'

metrics = ['accuracy', 'f1_score', 'precision', 'recall']
titles = ['Accuracy', 'F1 Score', 'Precision', 'Recall']

for ax, metric, title in zip(axes.flat, metrics, titles):
    # Data for grouped bar chart
    scenarios = ['Baseline', 'Model\nPoisoning', 'Data\nPoisoning']
    x = np.arange(len(scenarios))
    width = 0.35
    
    tff_vals = [
        tff_baseline[metric],
        tff_model_poison[pct_key]['metrics_after'][metric],
        tff_data_poison[pct_key]['metrics_after'][metric]
    ]
    flower_vals = [
        flower_baseline[metric],
        flower_model_poison[pct_key]['metrics_after'][metric],
        flower_data_poison[pct_key]['metrics_after'][metric]
    ]
    
    bars1 = ax.bar(x - width/2, tff_vals, width, label='TFF', color='#2196F3')
    bars2 = ax.bar(x + width/2, flower_vals, width, label='Flower', color='#4CAF50')
    
    ax.set_ylabel(title)
    ax.set_title(f'{title} at 20% Malicious Clients')
    ax.set_xticks(x)
    ax.set_xticklabels(scenarios)
    ax.legend()
    ax.set_ylim(0, 1)
    
    # Add value labels
    for bar in bars1:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                f'{bar.get_height():.2f}', ha='center', va='bottom', fontsize=9)
    for bar in bars2:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                f'{bar.get_height():.2f}', ha='center', va='bottom', fontsize=9)

plt.suptitle('Security Analysis: All Metrics at 20% Malicious Clients', fontsize=14)
plt.tight_layout()
plt.savefig('security_comprehensive.png', dpi=150, bbox_inches='tight')
plt.show()

## 8. Summary Report

In [None]:
# Generate comprehensive summary
print("\n" + "="*80)
print("TECHNICAL FACTORS COMPARISON: FINAL REPORT")
print("TensorFlow Federated vs Flower for Bug Prediction")
print("="*80)

print("\n" + "-"*80)
print("1. COMMUNICATION COST")
print("-"*80)
print(f"   TFF Total:    {tff_comm['total_bytes']/1e6:.2f} MB")
print(f"   Flower Total: {flower_comm['total_bytes']/1e6:.2f} MB")
comm_winner = "Flower" if flower_comm['total_bytes'] < tff_comm['total_bytes'] else "TFF"
print(f"   Winner: {comm_winner}")

print("\n" + "-"*80)
print("2. SECURITY: MODEL POISONING (Gradient +100)")
print("-"*80)
print(f"   TFF Robustness Score:    {tff_model_robust:.4f}")
print(f"   Flower Robustness Score: {flower_model_robust:.4f}")
model_winner = "TFF" if tff_model_robust > flower_model_robust else "Flower"
print(f"   Winner: {model_winner}")

print("\n" + "-"*80)
print("3. SECURITY: DATA POISONING (Label Flipping)")
print("-"*80)
print(f"   TFF Robustness Score:    {tff_data_robust:.4f}")
print(f"   Flower Robustness Score: {flower_data_robust:.4f}")
data_winner = "TFF" if tff_data_robust > flower_data_robust else "Flower"
print(f"   Winner: {data_winner}")

print("\n" + "-"*80)
print("4. OVERALL SECURITY ROBUSTNESS")
print("-"*80)
print(f"   TFF Overall:    {tff_overall_robust:.4f}")
print(f"   Flower Overall: {flower_overall_robust:.4f}")
overall_winner = "TFF" if tff_overall_robust > flower_overall_robust else "Flower"
print(f"   Winner: {overall_winner}")

print("\n" + "="*80)
print("FINAL SUMMARY")
print("="*80)
print(f"\n   Communication Cost Winner:     {comm_winner}")
print(f"   Model Poisoning Robustness:    {model_winner}")
print(f"   Data Poisoning Robustness:     {data_winner}")
print(f"   Overall Security Winner:       {overall_winner}")

print("\n" + "-"*80)
print("Key Findings:")
print("-"*80)
print("   • Model poisoning (+100 to gradients) significantly impacts both frameworks")
print("   • Data poisoning (label flipping) has severe effects on training quality")
print("   • Both frameworks need Byzantine-robust aggregation for production use")
print("   • Consider gradient clipping and anomaly detection as defenses")

## 9. Export Results

In [None]:
# Prepare comprehensive results
comparison_results = {
    'technical_factors': {
        'communication_cost': {
            'tff': tff_comm,
            'flower': flower_comm,
            'winner': comm_winner
        },
        'security': {
            'model_poisoning': {
                'attack_description': 'Add +100 to gradients from malicious clients',
                'tff_results': tff_model_poison,
                'flower_results': flower_model_poison,
                'tff_robustness': tff_model_robust,
                'flower_robustness': flower_model_robust,
                'winner': model_winner
            },
            'data_poisoning': {
                'attack_description': 'Flip labels (buggy <-> non-buggy) for malicious clients',
                'tff_results': tff_data_poison,
                'flower_results': flower_data_poison,
                'tff_robustness': tff_data_robust,
                'flower_robustness': flower_data_robust,
                'winner': data_winner
            },
            'overall_robustness': {
                'tff': tff_overall_robust,
                'flower': flower_overall_robust,
                'winner': overall_winner
            }
        }
    },
    'baseline_comparison': {
        'tff': tff_baseline,
        'flower': flower_baseline
    },
    'summary': {
        'communication_winner': comm_winner,
        'model_poisoning_winner': model_winner,
        'data_poisoning_winner': data_winner,
        'overall_security_winner': overall_winner
    }
}

# Save to JSON
with open('comparison_results.json', 'w') as f:
    json.dump(comparison_results, f, indent=2)

print("Results saved to comparison_results.json")

In [None]:
# Download all results
from google.colab import files

files.download('comparison_results.json')
files.download('communication_comparison.png')
files.download('model_poisoning_comparison.png')
files.download('data_poisoning_comparison.png')
files.download('security_comprehensive.png')

print("\nAll files downloaded:")
print("  - comparison_results.json")
print("  - communication_comparison.png")
print("  - model_poisoning_comparison.png")
print("  - data_poisoning_comparison.png")
print("  - security_comprehensive.png")

## Summary

This notebook compared TFF and Flower on two technical factors:

### 1. Communication Cost
- Measured total bytes sent/received during training
- Compared per-round communication overhead

### 2. Security (Poisoning Attacks)
- **Model Poisoning**: Added +100 to gradients
- **Data Poisoning**: Flipped labels (buggy ↔ non-buggy)
- Measured impact on: Accuracy, F1, Precision, Recall, Loss
- Calculated robustness scores for each framework

### Output Files
- `comparison_results.json`: Complete comparison data
- PNG visualizations for all analyses